Commit Graph

51 Commits

Author SHA1 Message Date
Peng Tao
65670e6b0a Merge pull request #6699 from zvonkok/cold-plug-vfio
gpu: cold plug VFIO devices
2023-05-05 10:04:29 +08:00
Zvonko Kaiser
0fec2e6986 gpu: Add cold-plug test
Cold plug setting is now correctly decoded in toml

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-27 09:30:24 +00:00
Zvonko Kaiser
377ebc2ad1 gpu: Add configuration option for cold-plug VFIO
Users can set cold-plug="root-port" to cold plug a VFIO device in QEMU

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-26 09:47:37 +00:00
Eduardo Berrocal
9c38204f13 virtcontainers/persist: Improved test coverage 65% to 87.5%
Expanded tests on manager_test.go to cover more lines of code.

Fixes: #259

Signed-off-by: Eduardo Berrocal <eduardo.berrocal@intel.com>
2023-04-25 23:53:46 +00:00
Eric Ernst
6ee550e9a5 runtime: vCPUs pinning is sandbox specific, not hypervisor
While at it, make sure we persist this and fix a misc typo.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2023-01-12 15:44:25 -08:00
Bin Liu
86a82cace9 runtime: change cache mode from none to never
New Rust virtiofsd's `cache` mode doesn't support `none` mode,
we should use `never` to replace it.

Fixes: #6018

Signed-off-by: Bin Liu <bin@hyper.sh>
2023-01-10 17:29:48 +08:00
Manabu Sugimoto
c617bbe70d runtime: Pass SELinux policy for containers to the agent
Pass SELinux policy for containers to the agent if `disable_guest_selinux`
is set to `false` in the runtime configuration. The `container_t` type
is applied to the container process inside the guest by default.
Users can also set a custom SELinux policy to the container process using
`guest_selinux_label` in the runtime configuration. This will be an
alternative configuration of Kubernetes' security context for SELinux
because users cannot specify the policy in Kata through Kubernetes's security
context. To apply SELinux policy to the container, the guest rootfs must
be CentOS that is created and built with `SELINUX=yes`.

Fixes: #4812

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2022-11-29 19:07:56 +09:00
Bin Liu
27b1bb5ed9 Merge pull request #4467 from egernst/device-pkg
device package cleanup/refactor
2022-06-27 14:40:53 +08:00
Eric Ernst
f97d9b45c8 runtime: device/persist: drop persist dependency from device pkgs
Rather than have device package depend on persist, let's define the
(almost duplicate) structures within device itself, and have the Kata
Container's persist pkg import these.

This'll help avoid unecessary dependencies within our core packages.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-06-26 21:31:29 -07:00
Liang Zhou
ef925d40ce runtime: enable sandbox feature on qemu
Enable "-sandbox on" in qemu can introduce another protect layer
on the host, to make the secure container more secure.

The default option is disable because this feature may introduce some
performance cost, even though user can enable
/proc/sys/net/core/bpf_jit_enable to reduce the impact.

Fixes: #2266

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-06-17 15:30:46 -07:00
David Gibson
1b931f4203 runtime: Allock mockfs storage to be placed in any directory
Currently EnableMockTesting() takes no arguments and will always place the
mock storage in the fixed location /tmp/vc/mockfs.  This means that one
test run can interfere with the next one if anything isn't cleaned up
(and there are other bugs which means that happens).  If if those were
fixed this would allow developers testing on the same machine to interfere
with each other.

So, allow the mockfs to be placed at an arbitrary place given as a
parameter to EnableMockTesting().  In TestMain() we place it under our
existing temporary directory, so we don't need any additional cleanup just
for the mockfs.

fixes #4140

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:47:59 +10:00
David Gibson
ef6d54a781 runtime: Let MockFSInit create a mock fs driver at any path
Currently MockFSInit always creates the mockfs at the fixed path
/tmp/vc/mockfs.  This change allows it to be initialized at any path
given as a parameter.  This allows the tests in fs_test.go to be
simplified, because the by using a temporary directory from
t.TempDir(), which is automatically cleaned up, we don't need to
manually trigger initTestDir() (which is misnamed, it's actually a
cleanup function).

For now we still use the fixed path when auto-creating the mockfs in
MockAutoInit(), but we'll change that later.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:23:36 +10:00
David Gibson
5d8438e939 runtime: Move mockfs control global into mockfs.go
virtcontainers/persist/fs/mockfs.go defines a mock filesystem type for
testing.  A global variable in virtcontainers/persist/manager.go is used to
force use of the mock fs rather than a normal one.

This patch moves the global, and the EnableMockTesting() function which
sets it into mockfs.go.  This is slightly cleaner to begin with, and will
allow some further enhancements.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:23:36 +10:00
David Gibson
963d03ea8a runtime: Export StoragePathSuffix
storagePathSuffix defines the file path suffix - "vc" - used for
Kata's persistent storage information, as a private constant.  We
duplicate this information in fc.go which also needs it.

Export it from fs.go instead, so it can be used in fc.go.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:23:36 +10:00
David Gibson
1719a8b491 runtime: Don't abuse MockStorageRootPath() for factory tests
A number of unit tests under virtcontainers/factory use
MockStorageRootPath() as a general purpose temporary directory.  This
doesn't make sense: the mockfs driver isn't even in use here since we only
call EnableMockTesting for the pase virtcontainers package, not the
subpackages.

Instead use t.TempDir() which is for exactly this purpose.  As a bonus it
also handles the cleanup, so we don't need MockStorageDestroy any more.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:23:36 +10:00
zhanghj
efa19c41eb device: use const strings for block-driver option instead of hard coding
Currently, the block driver option is specifed by hard coding, maybe it
is better to use const string variables instead of hard coded strings.
Another modification is to remove duplicate consts for virtio driver in
manager.go.

Fixes: #3321

Signed-off-by: Jason Zhang <zhanghj.lc@inspur.com>
2022-03-14 09:20:43 +08:00
Samuel Ortiz
5e119e90e8 virtcontainers: Rename the Network structure fields and methods
We are converting the Network structure into an interface, so that
different host OSes can have different networking implementations for
Kata.
One step into that direction is to rename all the Network structure
fields and methods to something that is less Linux networking namespace
specific. This will make the Network interface naming consistent.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Julio Montes
49223e67af runtime: remove enable_swap option
`enable_swap` option was added long time ago to add
`-realtime mlock=off` to the QEMU's command line.
Kata now supports QEMU 6, `-realtime` option has been deprecated and
`mlock=on` is causing unexpected behaviors in kata.
This patch removes support for `enable_swap`, `-realtime` and `mlock=`
since they are causing bugs in kata.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2022-01-18 11:12:29 -06:00
bin
03546f75a6 runtime: change io/ioutil to io/os packages
Change io/ioutil to io/os packages because io/ioutil package
is deprecated from 1.16:

Discard => io.Discard
NopCloser => io.NopCloser
ReadAll => io.ReadAll
ReadDir => os.ReadDir
ReadFile => os.ReadFile
TempDir => os.MkdirTemp
TempFile => os.CreateTemp
WriteFile => os.WriteFile

Details: https://go.dev/doc/go1.16#ioutil

Fixes: #3265

Signed-off-by: bin <bin@hyper.sh>
2021-12-15 07:31:48 +08:00
Peng Tao
01b6ffc0a4 Merge pull request #3028 from egernst/hypervisor-hacking
Hypervisor cleanup, refactoring
2021-11-26 10:21:49 +08:00
bin
ddc68131df runtime: delete netmon
Netmon is not used anymore.

Fixes: #3112

Signed-off-by: bin <bin@hyper.sh>
2021-11-24 15:08:18 +08:00
Eric Ernst
4c2883f7e2 vc: hypervisor: remove dependency on persist API
Today the hypervisor code in vc relies on persist pkg for two things:
1. To get the VM/run store path on the host filesystem,
2. For type definition of the Load/Save functions of the hypervisor
   interface.

For (1), we can simply remove the store interface from the hypervisor
config and replace it with just the path, since this is all we really
need. When we create a NewHypervisor structure, outside of the
hypervisor, we can populate this path.

For (2), rather than have the persist pkg define the structure, let's
let the hypervisor code (soon to be pkg) define the structure. persist
API already needs to call into hypervisor anyway; let's allow us to
define the structure.

We'll probably want to look at following similar pattern for other parts
of vc that we want to make independent of the persist API.

In doing this, we started an initial hypervisors pkg, to hold these
types (avoid a circular dependency between virtcontainers and persist
pkg). Next step will be to remove all other dependencies and move the
hypervisor specific code into this pkg, and out of virtcontaienrs.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-11-19 12:20:41 -08:00
Eric Ernst
1e7cb4bc3a macvlan: drop bridged part of name
The fact that we need to "bridge" the endpoint is a bit irrelevant. To
be consistent with the rest of the endpoints, let's just call this
"macvlan"

Fixes: #3050

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-11-16 16:44:29 -08:00
bin
09f7962ff1 runtime: merge virtcontainers/pkg/types into virtcontainers/types
There are two types packages under virtcontainers, and the
virtcontainers/pkg/types has a few codes, merging them into
one can make it easy for outstanding and using types package.

Fixes: #3031

Signed-off-by: bin <bin@hyper.sh>
2021-11-12 15:06:39 +08:00
Feng Wang
1cfe59304d runtime: Run QEMU using a non-root user/group
A random generated user/group is used to start QEMU VMM process.
The /dev/kvm group owner is also added to the QEMU process to grant it access.

Fixes #2444

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2021-09-17 11:28:44 -07:00
Samuel Ortiz
9bed2ade0f virtcontainers: Convert to the new cgroups package API
The new API is based on containerd's cgroups package.
With that conversion we can simpligy the virtcontainers sandbox code and
also uniformize our cgroups external API dependency. We now only depend
on containerd/cgroups for everything cgroups related.

Depends-on: github.com/kata-containers/tests#3805
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-09-14 07:09:34 +02:00
Binbin Zhang
e2a9e78c9e virtcontainers: Remove NewStoreFeature
remove NewStoreFeature

Fixes: #2296

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2021-09-02 21:28:36 +08:00
Julio Montes
80ab91ac2f runtime: virtcontainers/persist: fix govet fieldalignment
Fix structures alignment

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-07-20 11:59:15 -05:00
Julio Montes
7834f4127f virtcontainers: change memory_offset to uint64
`memory_offset` is used to increase the maximum amount of memory
supported in a VM, this offset is equal to the NVDIMM/PMEM device that
is hot added, in real use case workloads such devices are bigger than
4G, which is the current limit (uint32).

fixes #2006

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-06-16 07:16:49 -05:00
Peng Tao
1f5b229bef runtime: remove FIXME in SandboxState about CgroupPath
It is in real life usage as we put non constrained sandbox processes
(like shim) in a separate cgroup path.

Fixes: #1944
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-05-29 13:17:14 +08:00
Eric Ernst
7f1030d303 sandbox-bindmount: persist mount information
Without this, if the shim dies, we will not have a reliable way to
identify what mounts should be cleaned up if `containerd-shim-kata-v2
cleanup` is called for the sandbox.

Before this, if you `ctr run` with a sandbox bindmount defined and SIGKILL the
containerd-shim-kata-v2, you'll notice the sandbox bindmount left on
host.

With this change, the shim is able to get the sandbox bindmount
information from disk and do the appropriate cleanup.

Fixes #1896

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-05-21 12:54:35 -07:00
Christophe de Dinechin
dcb9f40394 config: Protect annotation for entropy_source
It would be undesirable to be given an annotation like "/dev/null".
Filter out bad annotation values.

Fixes: #1043

Suggested-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2021-04-22 15:26:40 +02:00
bin
532ff7c909 runtime: update virtcontainers API documentation
Virtcontainers API documentation is outdated, update documentation from the latest
source.

Fixes: #1455

Signed-off-by: bin <bin@hyper.sh>
2021-03-29 11:50:53 +08:00
Peng Tao
0153f76b07 runtime: gofmt code
Looks like we have merged a lot of code that is not properly formated.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-03-24 14:37:46 +08:00
David Gibson
72cb9287a0 vhost-user-blk: Use PciPath type for vhost user devices
VhostUserDeviceAttrs::PCIAddr didn't actually store a PCI address
(DDDD:BB:DD.F), but rather a PCI path.  Use the PciPath type and
rename things to make that clearer.

TestHandleBlockVolume previously used the bizarre value "0001:01"
which is neither a PCI address nor a PCI path for this value.  Change
it to a valid PCI path - it appears the actual value didn't matter for
that test, as long as it was consistent.

Forward port of
3596058c67

fixes #1040

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-02-19 09:56:08 +11:00
David Gibson
74f5b5febe runtime/block: Use PciPath type through block code
BlockDrive::PCIAddr doesn't actually store a PCI address
(DDDD:BB:DD.F) but a PCI path.  Use the PciPath type and rename things
to make that clearer.

TestHandleBlockVolume() previously used a bizarre value "0002:01" for
the "PCI address" which was neither an actual PCI address, nor a PCI
path.  Update it to use a PCI path - the actual value appears not to
matter in this test, as long as its consistent throughout.

Forward port of
64751f377b

fixes #1040

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-02-19 09:56:08 +11:00
David Gibson
32b40f5fe4 runtime/network: Use PciPath type through network handling
The "PCI address" returned by Endpoint::PciPath() isn't actually a PCI
address (DDDD:BB:DD.F), but rather a PCI path.  Rename and use the
PciPath type to clean this up and the various parts of the network
code connected to it.

Forward port of
3e589713cf

fixes #1040

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-02-19 09:56:08 +11:00
Christophe de Dinechin
7c6aede5d4 config: Whitelist hypervisor annotations by name
Add a field "enable_annotations" to the runtime configuration that can
be used to whitelist annotations using a list of regular expressions,
which are used to match any part of the base annotation name, i.e. the
part after "io.katacontainers.config.hypervisor."

For example, the following configuraiton will match "virtio_fs_daemon",
"initrd" and "jailer_path", but not "path" nor "firmware":

  enable_annotations = [ "virtio.*", "initrd", "_path" ]

The default is an empty list of enabled annotations, which disables
annotations entirely.

If an anontation is rejected, the message is something like:

  annotation io.katacontainers.config.hypervisor.virtio_fs_daemon is not enabled

Fixes: #901

Suggested-by: Peng Tao <tao.peng@linux.alibaba.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-14 16:10:12 +02:00
Christophe de Dinechin
4e89b885d2 config: Protect file_mem_backend against annotation attacks
This one could theoretically be used to overwrite data on the host.
It seems somewhat less risky than the earlier ones for a number
of reasons, but worth protecting a little anyway.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-14 16:10:12 +02:00
Christophe de Dinechin
aae9656d8b config: Protect vhost_user_store_path against annotation attacks
This path could be used to overwrite data on the host.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-14 16:10:12 +02:00
Christophe de Dinechin
b21a829c61 config: Protect ctlpath from annotation attack
This also adds annotation for ctlpath which were not present
before. It's better to implement the code consistenly right now to make
sure that we don't end up with a leaky implementation tacked on later.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-14 16:10:12 +02:00
Christophe de Dinechin
27b6620b23 config: Protect jailer_path annotation
The jailer_path annotation can be used to execute arbitrary code on
the host. Add a jailer_path_list configuration entry providing a list
of regular expressions that can be used to filter annotations that
represent valid file names.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-14 16:10:12 +02:00
Christophe de Dinechin
bf13ff0a3a config: Protect virtio_fs_daemon annotation
Sending the virtio_fs_daemon annotation can be used to execute
arbitrary code on the host. In order to prevent this, restrict the
values of the annotation to a list provided by the configuration
file.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-14 16:10:12 +02:00
Julio Montes
6df165c19d runtime: add support for SGX
Support the `sgx.intel.com/epc` annotation that is defined by the intel
k8s plugin. This annotation enables SGX. Hardware-based isolation and
memory encryption.

For example, use `sgx.intel.com/epc = "64Mi"` to create a container
with 1 EPC section with pre-allocated memory.

At the time of writing this patch, SGX patches have not landed on the
linux kernel project.
The following github kernel fork contains all the SGX patches for the
host and guest: https://github.com/intel/kvm-sgx

fixes #483

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-01 08:24:29 -05:00
Penny Zheng
1099a28830 kata 2.0: delete use_vsock option and proxy abstraction
With kata containers moving to 2.0, (hybrid-)vsock will be the only
way to directly communicate between host and agent.
And kata-proxy as additional component to handle the multiplexing on
serial port is also no longer needed.
Cleaning up related unit tests, and also add another mock socket type
`MockHybridVSock` to deal with ttrpc-based hybrid-vsock mock server.

Fixes: #389

Signed-off-by: Penny Zheng penny.zheng@arm.com
2020-07-16 04:20:02 +00:00
Julio Montes
3eb694c518 device: add ColdPlug flag
Add ColdPlug flag to DeviceInfo and DeviceState to identify whether a device
must be or was cold plugged

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-07-08 09:32:35 -05:00
bin liu
bd8f03a5ef runtime: remove agent abstraction
This PR will delete agent abstraction and use Kata agent as the only one agent.

Fixes: #377

Signed-off-by: bin liu <bin@hyper.sh>
2020-07-08 10:07:40 +08:00
Jia He
fa9d619e8a qemu: add cpu_features option
[ port from runtime commit 0100af18a2afdd6dfcc95129ec6237ba4915b3e5 ]

To control whether guest can enable/disable some CPU features. E.g. pmu=off,
vmx=off. As discussed in the thread [1], the best approach is to let users
specify them. How about adding a new option in the configuration file.

Currently this patch only supports this option in qemu,no other vmm.

[1] https://github.com/kata-containers/runtime/pull/2559#issuecomment-603998256

Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-06-29 20:16:11 -07:00
Penny Zheng
c2645f5d5a rate-limiter: add rate limiter configuration/annotation on VM level
Add configuration/annotation about network I/O throttling on VM level.
rx_rate_limiter_max_rate is dedicated to control network inbound
bandwidth per pod.
tx_rate_limiter_max_rate is dedicated to control network outbound
bandwidth per pod.

Fixes: #250

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
2020-06-24 06:14:04 +00:00
Peng Tao
6de95bf36c gomod: update runtime import path
To use the kata-containers repo path.

Most of the change is generated by script:
find . -type f -name "*.go" |xargs sed -i -e \
's|github.com/kata-containers/runtime|github.com/kata-containers/kata-containers/src/runtime|g'

Fixes: #201
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-04-29 18:39:03 -07:00