kata-containers

mirror of https://github.com/aljazceru/kata-containers.git synced 2026-02-23 15:34:28 +01:00

Author	SHA1	Message	Date
alex.lyn	ba632ba825	runitme-rs: kata with multi-containers sharing one direct volume When multiple containers in a kata pod share one direct volume, it's important to make sure that the corresponding block device is only mounted once in the guest. This means that there should be only one mount entry for the device in the mount information. Fixes: #8328 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2023-11-15 10:37:01 +08:00
alex.lyn	d7594d830c	runtime-rs: correct the path from cid to device_id. When a direct volume is used by multiple containers in Kata, Generating many shared paths with cids will cause IO error as the result of one direct volume mounts more than once. To correct it, use the device_id instead of cid which ensures that the guest only mounts the FS once. Fixes: #8328 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2023-11-15 10:30:39 +08:00
Fabiano Fidêncio	fd9b6d6837	Merge pull request #7623 from fidencio/topic/runtime-improve-vcpu-allocation-on-host-side runtime: Improve vCPU allocation for the VMMs	2023-11-14 14:10:54 +01:00
Xuewei Niu	49c2e6e23c	dragonball: Remove vhost-net dependency on virtio-net This patch is to remove vhost-net dependency on virtio-net for dbs-virtio-devices crate. Then, the feature of vhost-net is able to enable without enabling virtio-net device, error, etc. Fixes: #8423 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2023-11-14 15:35:10 +08:00
alex.lyn	4d65c2e8a2	runtime-rs: introduce `update_device` in trait Hypervisor Introduce the `update_device` trait in Hypervisor to enable device updates for VMMs.This trait will initially be utilized for virtiofs Mount operations. Fixes: #7915 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2023-11-14 11:56:36 +08:00
James O. D. Hunt	7f666f783d	runtime-rs: ch: Fix TDX PR #8311 inadvertently broke the runtime-rs / Cloud Hypervisor TDX handling. It also introduced unrecoverable failure scenarios. Hence, replace slow, fallible regex matching in logging fast path with single pass non-failing multi-string log level matching. Also, added a unit test for `parse_ch_log_level()`. Fixes: #8418. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2023-11-13 08:49:47 +00:00
Xuewei Niu	0a9125e629	Merge pull request #7675 from justxuewei/vhost-net	2023-11-12 20:38:18 +08:00
Xuewei Niu	d1deaf0538	dragonball: Minor changes for a comment from Bian - Add feature control for InsertNetworkDevice. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2023-11-12 14:14:10 +08:00
Xuewei Niu	e4f83e27c4	dragonball: vhost-net set_offload with acked features set_offload() for tap devices depends on acked features. Signed-off-by: Helin Guo <helinguo@linux.alibaba.com> Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2023-11-12 14:10:39 +08:00
Xuewei Niu	6cd572dbbb	dragonball: Minor changes for Chao's comments - Remove two panic statements from InsertNetworkDevice test. - Rename `NUM_QUEUES` to `DEFAULT_NUM_QUEUES`, `QUEUE_SIZE` to `DEFAULT_QUEUE_SIZE` for vhost-net and virtio-net. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2023-11-12 14:10:39 +08:00
Xuewei Niu	dcdf3c6556	runtime-rs: Supply missing fields of NetworkConfig `test_networkconfig_to_netconfig` from clh depends on `NetworkConfig` which has some new fields in this PR. Therefore, this commit gives the test missing fields. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2023-11-12 14:10:39 +08:00
Xuewei Niu	58e9709c1f	dragonball: Changes for ZizhengBian's comments - Dragonball's vhost-net feature not depends on virtio-net feature. - Remove `TapError` from dbs-virtio-devices's Error, and add `VirtioNet` and `VhostNet` two fields. - Downgrade visiblity of two fields of `VhostNetDeviceMgr` from `pub(crate)`. - File an issue to record a todo for network rate limiter. - Print internal errors with `{0:?}. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2023-11-12 14:10:33 +08:00
Fabiano Fidêncio	5e9cf75937	vc: utils: Rename CalculateMilliCPUs() to CalculateCPUsF() With the change done in the last commit, instead of calculating milli cpus, we're actually converting the CPUs to a fraction number, a float. Let's update the function name (and associated vars) to represent that change. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-11-10 18:26:01 +01:00
Fabiano Fidêncio	e477ed0e86	runtime: Improve vCPU allocation for the VMMs First of all, this is a controversial piece, and I know that. In this commit we're trying to make a less greedy approach regards the amount of vCPUs we allocate for the VMM, which will be advantageous mainly when using the `static_sandbox_resource_mgmt` feature, which is used by the confidential guests. The current approach we have basically does: * Gets the amount of vCPUs set in the config (an integer) * Gets the amount of vCPUs set as limit (an integer) * Sum those up * Starts / Updates the VMM to use that total amount of vCPUs The fact we're dealing with integers is logical, as we cannot request 500m vCPUs to the VMMs. However, it leads us to, in several cases, be wasting one vCPU. Let's take the example that we know the VMM requires 500m vCPUs to be running, and the workload sets 250m vCPUs as a resource limit. In that case, we'd do: * Gets the amount of vCPUs set in the config: 1 * Gets the amount of vCPUs set as limit: ceil(0.25) * 1 + ceil(0.25) = 1 + 1 = 2 vCPUs * Starts / Updates the VMM to use 2 vCPUs With the logic changed here, what we're doing is considering everything as float till just before we start / update the VMM. So, the flow describe above would be: * Gets the amount of vCPUs set in the config: 0.5 * Gets the amount of vCPUs set as limit: 0.25 * ceil(0.5 + 0.25) = 1 vCPUs * Starts / Updates the VMM to use 1 vCPUs In the way I've written this patch we introduce zero regressions, as the default values set are still the same, and those will only be changed for the TEE use cases (although I can see firecracker, or any other user of `static_sandbox_resource_mgmt=true` taking advantage of this). There's, though, an implicit assumption in this patch that we'd need to make explicit, and that's that the default_vcpus / default_memory is the amount of vcpus / memory required by the VMM, and absolutely nothing else. Also, the amount set there should be reflected in the podOverhead for the specific runtime class. One other possible approach, which I am not that much in favour of taking as I think it's less clear, is that we could actually get the podOverhead amount, subtract it from the default_vcpus (treating the result as a float), then sum up what the user set as limit (as a float), and finally ceil the result. It could work, but IMHO this is less clear, and less explicit on what we're actually doing, and how the default_vcpus / default_memory should be used. Fixes: #6909 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2023-11-10 18:25:57 +01:00
Fabiano Fidêncio	b0157ad73a	runtime: confidential: Do not set the max_vcpu to cpu We don't have to do this since we're relying on the `static_sandbox_resource_mgmt` feature, which gives us the correct amount of memory and CPUs to be allocated. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-11-10 12:58:20 +01:00
Chao Wu	a62fb83c91	Merge pull request #8169 from openanolis/chao/fix_typo_shm runtime-rs: fix a typo in shm	2023-11-10 14:00:11 +08:00
gaohuatao	78df1bb851	agent: update AGENT_THREADS metrics value Fixes: #8369 Signed-off-by: gaohuatao <gaohuatao@bytedance.com>	2023-11-10 10:39:57 +08:00
Chao Wu	afb002c25c	runtime-rs: fix a typo in shm is_shim_volume should be is_shm_volume in shm_volume mod. fixes: #8168 Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>	2023-11-10 10:36:58 +08:00
Archana Shinde	1611723465	Merge pull request #8379 from likebreath/1103/clh_v36.0 Upgrade to Cloud Hypervisor v36.0	2023-11-08 21:10:41 -08:00
Archana Shinde	268d4d622f	Merge pull request #8389 from justxuewei/vm-capable-test runtime: Fix TestCheckHostIsVMContainerCapable unstablity issue	2023-11-08 12:14:04 -08:00
Archana Shinde	92a517156c	Merge pull request #8367 from amshinde/add-nerdctl-ipvlan-test network: Fix network hotplug for ipvlan and macvlan endpoints for qemu and add tests	2023-11-08 11:45:13 -08:00
Chelsea Mafrica	83e731328f	Merge pull request #8023 from cmaf/runtime-rs-ch-pause-resume runtime-rs: Update status for pause and resume	2023-11-08 11:34:47 -08:00
Xuewei Niu	acd9057c7b	runtime: Fix TestCheckHostIsVMContainerCapable unstablity issue TestCheckHostIsVMContainerCapable removes sysModuleDir to simulate a case that the kernel modules are not loaded. However, checkKernelModules() executes modprobe <module> if a module not found in that directory. Loading those modules is required to be denied temporarily. Fixes: #8390 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2023-11-08 22:40:08 +08:00
Fupan Li	100a73d2fd	Merge pull request #7531 from justxuewei/device-cgroup agent: Restrict device access at upper node of container's cgroup	2023-11-08 22:01:48 +08:00
Xuewei Niu	023d8dc01e	agent: Changes according to Pan's comments - Disable device cgroup restriction while pod cgroup is not available. - Remove balcklist-related names and change whitelist-related names to allowed_all. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2023-11-08 09:39:08 +08:00
Xuewei Niu	b5f3a8cb39	agent: Fix container launching failure with systemd cgroup FSManager of systemd cgroup manager is responsible for setting up cgroup path. The container launching will be failed if the FSManager is in read-only mode. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2023-11-08 09:39:07 +08:00
Xuewei Niu	6477825195	agent: Minor changes according to Zhou's comments The changes include: - Change to debug logging level for resources after processed. - Remove a todo for pod cgroup cleanup. - Add an anyhow context to `get_paths_and_mounts()`. - Remove code which denys access to VMROOTFS since it won't take effect. If blackmode is in use, the VMROOTFS will be denyed as default. Otherwise, device cgroups won't be updated in whitelist mode. - Add a unit test for `default_allowed_devices()`. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2023-11-08 09:39:07 +08:00
Xuewei Niu	cec8044744	agent: Make devcg_info optional for LinuxContainer::new() The runk is a standard OCI runtime that isnt' aware of concept of sandbox. Therefore, the `devcg_info` argument of `LinuxContainer::new()` is unneccessary to be provided. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2023-11-08 09:39:07 +08:00
Xuewei Niu	ef4c3844a3	agent: Restrict device access at upper node of container's cgroup The target is to guarantee that containers couldn't escape to access extra devices, like vm rootfs, etc. Assume that there is a cgroup, such as `/A/B`. The `B` is container cgroup, and the `A` is what we called pod cgroup. No matter what permissions are set for the container (`B`), the `A`'s permission is always `a : rwm`. It leads that containers could acquire permission to access to other devices in VM that not belongs to themselves. In order to set devices cgroup properly, the order of setting cgroups is that the pod cgroup comes first and the container cgroup comes after. The `Sandbox` has a new field, `devcg_info`, to save cgroup states. To avoid setting container cgroup too early, an initialization should be done carefully. `inited`, one of the states, is a boolean to indicate if the pod cgroup is initialized. If no, the pod cgroup should be created firstly, and set default permissions. After that, the pause container cgroup is created and inherits the permissions from the pod cgroup. If whitelist mode which allows containers to access all devices in VM is enabled, then device resources from OCI spec are ignored. This feature not supports systemd cgroup and cgroup v2, since: - Systemd cgroup implemented on Agent hasn't supported devices subsystem so far, see: https://github.com/kata-containers/kata-containers/issues/7506. - Cgroup v2's device controller depends on eBPF programs, which is out of scope of cgroup. Fixes: #7507 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2023-11-08 09:39:07 +08:00
Archana Shinde	a6272733e7	network: Fix network hotplug for ipvlan and macvlan endpoints. Since moving from network coldplug to hotplug, the only case verified was veth endpoints. Support for network hotplug for ipvlan and macvlan was broken/not added. Fix it. Fixes: #8391 Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2023-11-07 10:13:51 -08:00
James O. D. Hunt	59d0d4caff	runtime-rs: ch: Simplify VSOCK error handling Remove the redundant `VmConfigError::EmptyVsockSocketPath` error from the Cloud Hypervisor config crate since this scenario is already handled by the `VsockConfigError::NoVsockSocketPath` error. Fixes: #8385. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2023-11-07 17:45:38 +00:00
James O. D. Hunt	bdb83f8282	runtime-rs: ch: Remove unused function Remove the redundant `parse_mac()` function: this was never used and we already have an implementation in `crates/resource/src/network/utils/mod.rs`. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2023-11-07 17:45:38 +00:00
Xuewei Niu	8ea87405ed	runtime-rs: Remove virtio config from Backend Virtio-net and vhost-net share a common virtio config, and vhost-user-net uses another config, named `VhostUserConfig`. Thus, the virtio config could be added into `NetworkConfig` instead of `Backend`. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2023-11-07 19:35:02 +08:00
Xuewei Niu	ad66378bf5	runtime-rs: Move Dragonball stuff out of device drivers Moving Dragonball structs convertions out of device drivers to keep driver neutral. The convertions include `NetworkBackend` to `DragonballNetworkBackend` and `NetworkConfig` to `DragonballNetworkConfig`. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2023-11-07 19:35:02 +08:00
Xuewei Niu	3e0614cdf0	dragonball: Minor changes to comments Changes include: - Merge `VhostNetDeviceError` import item. - Replace if with match in `add_vhost_net_device()` Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2023-11-07 19:35:02 +08:00
Xuewei Niu	a047331a34	runtime-rs: Network config distinguishes backends Network backends determine the virtio dataplane implementations. Common protocols include virtio-net, vhost-net and vhost-user-net, etc. Network config has a new field named `backend` to specify which protocol to use. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2023-11-07 19:35:02 +08:00
Xuewei Niu	9203371833	dragonball: Introduce vhost-net device PLEASE NOTE THAT this pull request just implements vhost-net support for Dragonball, and adaptation for the Runtime-rs. And this pull request DOESN'T provide an item to config which backend to use. To sum up, virtio-net as a default backend is only choice for the user so far. This pull request introduces vhost-net device for the Dragonball. In addition, this pull request includes changes of Runtime-rs to improve network configuration abilities. The Dragonball part implements a vhost-net device and a vhost-net device manager, named `VhostNetDeviceMgr`, to manage vhost-net device. `NetworkInterfaceConfig` is introduced as a high-level abstract for network config. Then, the Dragonball is able to distinguish network backends, e.g. virtio-net, vhost-net, vhost-user-net(WIP), etc. The Runtime-rs part adds support of multiple network backends as well. `NetworkConfig` has a couple of new fields, like `backend`, `use_shared_irq`, etc. And Dragonball's network config structs are implmented `From` trait which allow to be converted from the Runtime-rs's network config conveniently. Fixes: #7674 Signed-off-by: Eric Ren <renzhen@linux.alibaba.com> Signed-off-by: Zizheng Bian <zizheng.bian@linux.alibaba.com> Signed-off-by: wllenyj <wllenyj@linux.alibaba.com> Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2023-11-07 19:35:02 +08:00
Beraldo Leal	dd530ba8ee	tests: fixes AMD errors TestCheckHostIsVMContainerCapable is failing on AMD machines. kata-check_amd64_test.go:96 has no AMD modules, also getCPUType is missing. Fixes #8384. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2023-11-06 16:49:59 +00:00
Beraldo Leal	7641c19f74	runtime: bump containerd for gogo deprecation This update includes necessary changes due to the version bump of containerd and its dependencies. It's part of a broader initiative to phase out gogo protobuf, which has been deprecated, and to align with the current supported libraries. Fixes #7420. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2023-11-06 16:49:59 +00:00
Beraldo Leal	16fa2c39e6	protocols: replace gogo/types.Empty and Any by Google versions. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2023-11-06 16:49:58 +00:00
Beraldo Leal	c61f4a8592	protocols: remove unused fieldpath option The +fieldpath option, specific to gogoprotobuf, enabled dynamic field access in protobuf messages, allowing nested fields to be accessed via string paths. This change is part of a larger effort to transition to the official Go protobuf library for better maintainability and community support. Upon review, no instances of dynamic field access were found in the codebase, confirming that the feature is not in use. By removing this unused feature, we simplify the build process and make it easier to complete the transition away from gogoprotobuf. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2023-11-06 16:49:58 +00:00
Beraldo Leal	c87bc60ea0	protocols: removing unused mappings Those mappings are not used by our .proto files and there is no difference between .pb.go files generated. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2023-11-06 16:49:58 +00:00
Beraldo Leal	c5d845b30a	agent: updating Cargo.lock files Probably previous changes missed updating Cargo.lock. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2023-11-06 16:49:58 +00:00
Beraldo Leal	5d88c78a6e	protocols: generating agent.pb.go `a3b003c345` modified agent but agent.pb.go was not updated. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2023-11-06 16:49:58 +00:00
Archana Shinde	036b7787dd	runtime-rs: Use PCI path from hypervisor for vfio devices Remove earlier functionality that tries to assign PCI path to vfio devices from the host assuming pci slots to start from 1. Get this from the hypervisor instead. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2023-11-05 21:59:44 -08:00
Archana Shinde	c3ce6a1d15	runtime-rs: Provide PCI path to the agent for virtio-block If PCI path for block device is not empty for a block device, use that as identifier for agent instead of virt path which is valid only for mmio devices. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2023-11-05 21:59:44 -08:00
Archana Shinde	a2bbbad711	runtime-rs: change hypervisor add_device trait to return device copy Block(virtio-blk) and vfio devices are currently not handled correctly by the agent as the agent is not provided with correct PCI paths for these devices. The PCI paths for these devices can be inferred from the PCI information provided by the hypervisor when the device is added. Hence changing the add_device trait function to return a device copy with PCI info potentially provided by the hypervisor. This can then be provided to the agent to correctly detect devices within the VM. This commit includes implementation for PCI info update for cloud-hupervisor for virtio-blk devices with stubs provided for other hypervisors. Removing Vsock from the DeviceType enum as Vsock currently does not implement the Device Trait, it has no attach and detach trait functions among others. Part of the reason is because these functions require Vsock to implement Clone trait as these functions need cloned copies to be passed down the hypervisor. The change introduced for returning a device copy from the add_device hypervisor trait explicitly requires a device to implement Copy trait. Hence removing Vsock from the DeviceType enum for now, as its implementation is incomplete and not currently used. Note, one of the blockers for adding the Clone trait to Vsock is that it currently includes a file handle which cannot be cloned. For Clone and Device Traits to be implemented for Vsock, it requires an implementation change in the future for it to be cloneable. Fixes: #8283 Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2023-11-05 21:59:44 -08:00
Bo Chen	071667f1ca	runtime: clh: Re-generate the client code This patch re-generates the client code for Cloud Hypervisor v35.0. Note: The client code of cloud-hypervisor's OpenAPI is automatically generated by openapi-generator. Fixes: #8378 Signed-off-by: Bo Chen <chen.bo@intel.com>	2023-11-03 10:47:06 -07:00
Fabiano Fidêncio	40cc397218	Merge pull request #8255 from cmaf/migrate-checks-fixes-links docs: Fix broken links	2023-11-01 14:46:30 +01:00
Beraldo Leal	afec54799e	libs: fixes dereferenced reference make check is giving us the following error: error: this expression creates a reference which is immediately dereferenced by the compiler. Fixes #8344 Signed-off-by: Beraldo Leal <bleal@redhat.com>	2023-10-31 15:55:32 -04:00

... 3 4 5 6 7 ...

3879 Commits