Commit Graph

577 Commits

Author SHA1 Message Date
Amulyam24
d5a18173b9 virtcontainers: fix failing template test on ppc64le
If a file/directory doesn't exist, os.Stat() returns an
error. Assert the returned value with os.IsNotExist() to
prevent it from failing.

Fixes: #2920

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2021-11-18 15:37:40 +05:30
Manabu Sugimoto
3be50adab9 agent: Add support for Seccomp
The kata-agent supports seccomp feature based on the OCI runtime specification.
This seccomp capability in the kata-agent is enabled by default.
However, it is not enforced by default: users need to enable that by setting
`disable_guest_seccomp` to `false` in the main configuration file.

Fixes: #1476

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2021-10-27 19:06:13 +09:00
bin
5f306330f4 virtcontainers: delete duplicated notify in watchHypervisor function
When hypervisor check failed, the notify function is called twice.

Fixes: #2901

Signed-off-by: bin <bin@hyper.sh>
2021-10-26 11:58:26 +08:00
James O. D. Hunt
ec3aa1694b Merge pull request #2844 from jongwu/unit_test
enable unit test on arm
2021-10-25 10:58:21 +01:00
David Gibson
a0825badf6 Merge pull request #2795 from dgibson/vfio-as-vfio
Allow VFIO devices to be used as VFIO devices in the container
2021-10-25 14:25:26 +11:00
David Gibson
34273da98f runtime/device: Allow VFIO devices to be presented to guest as VFIO devices
On a conventional (e.g. runc) container, passing in a VFIO group device,
/dev/vfio/NN, will result in the same VFIO group device being available
within the container.

With Kata, however, the VFIO device will be bound to the guest kernel's
driver (if it has one), possibly appearing as some other device (or a
network interface) within the guest.

This add a new `vfio_mode` option to alter this.  If set to "vfio" it will
instruct the agent to remap VFIO devices to the VFIO driver within the
guest as well, meaning they will appear as VFIO devices within the
container.

Unlike a runc container, the VFIO devices will have different names to the
host, since the names correspond to the IOMMU groups of the guest and those
can't be remapped with namespaces.

For now we keep 'guest-kernel' as the value in the default configuration
files, to maintain current Kata behaviour.  In future we should change this
to 'vfio' as the default.  That will make Kata's default behaviour more
closely resemble OCI specified behaviour.

fixes #693

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:29:31 +11:00
David Gibson
68696e051d runtime: Add parameter to constrainGRPCSpec to control VFIO handling
Currently constrainGRPCSpec always removes VFIO devices from the OCI
container spec which will be used for the inner container.  For
upcoming support for VFIO devices in DPDK usecases we'll need to not
do that.

As a preliminary to that, add an extra parameter to the function to
control whether or not it will remove the VFIO devices from the spec.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:29:31 +11:00
David Gibson
d9e2e9edb2 runtime: Rename constraintGRPCSpec to improve grammar
"constraint" is a noun, "constrain" is the associated verb, which makes
more sense in this context.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:29:31 +11:00
David Gibson
57ab408576 runtime: Introduce "vfio_mode" config variable and annotation
In order to support DPDK workloads, we need to change the way VFIO devices
will be handled in Kata containers.  However, the current method, although
it is not remotely OCI compliant has real uses.  Therefore, introduce a new
runtime configuration field "vfio_mode" to control how VFIO devices will be
presented to the container.

We also add a new sandbox annotation -
io.katacontainers.config.runtime.vfio_mode - to override this on a
per-sandbox basis.

For now, the only allowed value is "guest-kernel" which refers to the
current behaviour where VFIO devices added to the container will be bound
to whatever driver in the VM kernel claims them.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:29:29 +11:00
Jianyong Wu
1a96b8ba35 template: disable template unit test on arm
Template is broken on arm. here we disable the template unit test
temporarily.

Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-10-23 15:07:25 +08:00
Jianyong Wu
43b13a4a6d runtime: DefaultMaxVCPUs should not greater than defaultMaxQemuVCPUs
DefaultMaxVCPUs may be larger than the defaultMaxQemuVCPUs that should
be checked and avoided.

Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-10-23 15:07:25 +08:00
Manohar Castelino
52268d0ece hypervisor: Expose the hypervisor itself
Export the top level hypervisor type

s/hypervisor/Hypervisor

Fixes: #2880

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-10-22 16:46:02 -07:00
Eric Ernst
a72bed5b34 hypervisor: update tests based on createSandbox->CreateVM change
Fixup a couple of broken tests.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
f434bcbf6c hypervisor: createSandbox is CreateVM
Last of a series of commits to export the top level
hypervisor generic methods.

s/createSandbox/CreateVM

Fixes #2880

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
76f1ce9e30 hypervisor: startSandbox is StartVM
s/startSandbox/StartVM

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
fd24a695bf hypervisor: waitSandbox is waitVM
renaming...

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
a6385c8fde hypervisor: stopSandbox is StopVM
Renaming. There is no Sandbox specific logic except tracing.

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
f989078cd2 hypervisor: resumeSandbox is ResumeVM
renaming...

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
73b4f27c46 hypervisor: saveSandbox is SaveVM
rename

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
7308610c41 hypervisor: pauseSandbox is nothing but PauseVM
renaming

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
8f78e1cc19 hypervisor: The SandboxConsole is the VM's console
update naming

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
4d47aeef2e hypervisor: Export generic interface methods
This is in preparation for creating a seperate hypervisor package.
Non functional change.

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
6baf2586ee hypervisor: Minimal exports of generic hypervisor internal fields
Export commonly used hypervisor fields and utility functions.
These need to be exposed to allow the hypervisor to be consumed
externally.

Note: This does not change the hypervisor interface definition.
Those changes will be separate commits.

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
GabyCT
03877f3479 Merge pull request #2872 from likebreath/1020/clh_v19.0
Upgrade to Cloud Hypervisor v19.0
2021-10-21 10:26:55 -05:00
James O. D. Hunt
09741272bc Merge pull request #2783 from likebreath/1001/clh_enable_seccomp
virtcontainers: clh: Enable the `seccomp` feature
2021-10-21 09:21:33 +01:00
Bo Chen
8030b6caf0 virtcontainers: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v19.0.
Note: The client code of cloud-hypervisor's (CLH) OpenAPI is
automatically generated by openapi-generator [1-2].

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-10-20 15:48:55 -07:00
Chelsea Mafrica
4ce2b14e60 Merge pull request #2817 from jodh-intel/clh+fc-agent-tracing
Enable agent tracing for hybrid VSOCK hypervisors
2021-10-18 22:01:52 -07:00
bin
76f16fd1a7 runtime: use containerd package instead of cri-containerd
cri-containerd project has been merged into containerd repo, and
we should not reference it any more in code and docs.

This commit will use containerd package instead of cri-containerd
package.

Fixes: #2791

Signed-off-by: bin <bin@hyper.sh>
2021-10-19 09:40:20 +08:00
James O. D. Hunt
41c49a7bf5 Merge pull request #2771 from fengwang666/debug-pid
runtime: update sandbox root dir cleanup behavior in rootless hypervisor
2021-10-18 17:47:47 +01:00
Christophe de Dinechin
bcffa26305 tracing: Fix typo in "package" tag name
The tracing tags for api.go contain `"packages"` as a tag name,
whereas all other tags contain `"package"`.

Fixes: #2847

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2021-10-15 14:48:00 +02:00
James O. D. Hunt
e61f5e2931 runtime: Show socket path in kata-env output
Display a pseudo path to the sandbox socket in the output of
`kata-runtime env` for those hypervisors that use Hybrid VSOCK.

The path is not a real path since the command does not create a sandbox.
The output includes a `{ID}` tag which would be replaced with the real
sandbox ID (name) when the sandbox was created.

This feature is only useful for agent tracing with the trace forwarder
where the configured hypervisor uses Hybrid VSOCK.

Note that the features required a new `setConfig()` method to be added
to the `hypervisor` interface. This isn't normally needed as the
specified hypervisor configuration passed to `setConfig()` is also
passed to `createSandbox()`. However the new call is required by
`kata-runtime env` to display the correct socket path for Firecracker.
The new method isn't wholly redundant for the main code path though as
it's now used by each hypervisor's `createSandbox()` call.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-10-15 11:45:29 +01:00
James O. D. Hunt
321be0f794 tracing: Remove trace mode and trace type
Remove the `trace_mode` and `trace_type` agent tracing options as
decided in the Architecture Committee meeting.

See:

- https://github.com/kata-containers/kata-containers/pull/2062

Fixes: #2352.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-10-15 10:09:38 +01:00
Bo Chen
7b2bfd4eca virtcontainers: clh: Use 'quiet' as the default kernel parameter
The 'quiet' kernel parameter can avoid guest kernel logs while booting,
which can reduce boot time.

Fix: #2820

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-10-11 22:06:27 -07:00
Bo Chen
3e24e46c70 virtcontainers: clh: Turn-off serial and virtio-console by default
We will need to have console output from the guest only for debugging
purposes. As a result, we can turn-off both the serial and
virtio-console devices by default for better boot time.

Fixes: #2820

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-10-11 22:06:23 -07:00
Feng Wang
adc9e0baaf runtime: fix two bugs in rootless hypervisor
Update the sandbox dir clean up logic to be more appropriate
Add different seeds for randInt() method

Fixes #2770

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2021-10-08 15:52:42 -07:00
Bo Chen
51cbe14584 runtime: Add option "disable_seccomp" to config hypervisor.clh
This patch adds an option "disable_seccomp" to the config
hypervisor.clh, from which users can disable the `seccomp`
feature from Cloud Hypervisor when needed (for debugging purposes).

Fixes: #2782

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-10-08 15:10:30 -07:00
Bo Chen
98b7350a1b virtcontainers: clh: Enable the seccomp feature
This patch enables the `seccomp` feature from Cloud Hypervisor which
provides fine-grained allowed syscalls for each of its worker
threads. It brings important security benefits, while would increase
memory footprint.

Fixes: #2782

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-10-08 15:07:43 -07:00
Fupan Li
988eb95621 Merge pull request #2760 from liubin/fix/2759-optimize-code-for-managing-temp-users
runtime: optimize code for managing temp users for rootless mode
2021-10-08 13:49:14 +08:00
bin
bf8f582c1d runtime: optimize code for managing temp users for rootless mode
This commit does two chagnes:

- move code for managing temp users to rootless.go.
- use common function in qemu.go when shutdown the VM.

Fixes: #2759

Signed-off-by: bin <bin@hyper.sh>
2021-10-08 11:04:21 +08:00
Bin Liu
10ec4b133c Merge pull request #2742 from liubin/fix/2741-delete-file-code
Delete file virtcontainers-setup.sh
2021-10-07 11:54:47 +08:00
Jianyong Wu
7eac2ec786 protection: add confidential compute frame for arm
Even CCA, which is the confidential compute archtecture, has not been
ready, add a empty implementation to avoid static check error.

Fixes: #2789
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Suggested-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-10-06 15:53:36 +02:00
Jianyong Wu
8acfc154de check: fix typecheck failure in qemu_arm64_test.go
fix typecheck failure in qemu_arm64_test.go

Fixes: #2789
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-10-06 15:53:35 +02:00
Amulya Meka
5b02d54e23 virtcontainers: fix lint failure on ppc64le
Add nolint for arch specific code to exclude
from lint check.

Fixes: #2773

Signed-off-by: Amulya Meka <amulmek1@in.ibm.com>
2021-10-06 15:53:35 +02:00
Jakob Naucke
ff9728f032 virtcontainers: nolint guestProtection
Exclude from lint checking for it is ultimately only used in
architecture-specific code.

Fixes: #2273
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-10-06 15:53:35 +02:00
Samuel Ortiz
71ce6cfe9e runtime: Pass the route IP family to the agent
When updating the guest routing table, we should forward the IP family
information up to the guest.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2021-10-01 14:35:17 +02:00
Samuel Ortiz
99450bd1f7 agent: protos: Add a Family field to the Route payload
Our check for the IP family is working as long as we have either a
gateway or a destination IP. Some routes are missing both.
The RT netlink messages provide the IP family information for each
route, so we can carry that piece of information up to the guest. That
will allow for a more reliable route IP family determination.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2021-10-01 14:35:17 +02:00
Samuel Ortiz
f85fe70231 runtime: vendor: Bump the netlink package dependency
We need to be able to get the IP family from the netlink route meesages,
and the Route.Family field only got recently added to the netlink
package.

The update generates static check warnings about the call for
nethandler.Delete() being deprecated in favor of a Close() call instead.
So we include the s/Delete()/Close()/ change as part of this PR.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2021-10-01 14:35:01 +02:00
James O. D. Hunt
2ce8d4263c clh: Suppress hypervisor output to make guest output visible
Reduce the cloud-hypervisor log level from `Debug` to `Info` when hypervisor
debug is enabled. This is required since `Debug` level:

- Is overkill for debugging hypervisor failures.
- Effectively hides the output from the guest kernel and userland: CLH
  generates so much output that the output from the guest gets "lost in
  the noise" (experiments show that for each full CLH debug message, at most
  1 _byte_ of guest output is displayed).

Fixes: #2726.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-09-30 14:22:09 +01:00
bin
762922a521 runtime: delete func ConstraintsToVCPUs
ConstraintsToVCPUs is not used any more.

Fixes: #2741

Signed-off-by: bin <bin@hyper.sh>
2021-09-30 14:44:41 +08:00
bin
4f4854308a runtime: delete virtcontainers-setup.sh
This file is not used anymore.

Fixes: #2741

Signed-off-by: bin <bin@hyper.sh>
2021-09-30 14:44:30 +08:00