kata-containers

mirror of https://github.com/aljazceru/kata-containers.git synced 2026-01-14 11:54:28 +01:00

Author	SHA1	Message	Date
Archana Shinde	97845c93d8	network: Fix network hotplug for ipvlan and macvlan endpoints. Since moving from network coldplug to hotplug, the only case verified was veth endpoints. Support for network hotplug for ipvlan and macvlan was broken/not added. Fix it. Fixes: #8391 Signed-off-by: Archana Shinde <archana.m.shinde@intel.com> (cherry picked from commit `a6272733e7`)	2023-11-08 11:50:02 -08:00
Fabiano Fidêncio	220a2a0300	Merge pull request #8339 from fidencio/topic/stable-3.2-backports-oct-31st-2023 stable-3.2 \| Backport everything needed after the release till Oct 31st 2023	2023-10-31 14:36:38 +01:00
Archana Shinde	3b9e6e7ee4	network: Fix network attach for ipvlan and macvlan We used the approach of cold-plugging network interface for pre-shimv2 support for docker.Since the hotplug approach was not required, we never really got to implementing hotplug support for certain network endpoints, ipvlan and macvlan being among them. Since moving to shimv2 interface as the default for runtime, we switched to hotplugging the network interface for supporting docker and nerdctl. This was done for veth endpoints only. Implement the hot-attach apis for ipvlan and macvlan as well to support ipvlan and macvlan networks with docker and nerdctl. Fixes: #8333 Signed-off-by: Archana Shinde <archana.m.shinde@intel.com> (cherry picked from commit `f53f86884f`)	2023-10-31 10:24:29 +01:00
Bo Chen	af3703bc38	runtime: clh: Re-generate the client code This patch re-generates the client code for Cloud Hypervisor v35.0. Note: The client code of cloud-hypervisor's OpenAPI is automatically generated by openapi-generator. Fixes: #8057 Signed-off-by: Bo Chen <chen.bo@intel.com> (cherry picked from commit `dfd0c9fa9a`)	2023-10-25 11:37:02 -07:00
James O. D. Hunt	92f283f062	runtime: Validate hypervisor section name in config file Previously, if you accidentally modified the name of the hypervisor section in the config file, the default golang runtime gives a cryptic error message ("`VM memory cannot be zero`"). This can be demonstrated using the `kata-runtime` utility program which uses the same golang config package as the actual runtime (`containerd-shim-kata-v2`): ```bash $ kata-runtime env >/dev/null; echo $? 0 $ sudo sed -i 's!^\[hypervisor\.qemu\]!\[hypervisor\.foo\]!g' /etc/kata-containers/configuration.toml $ kata-runtime env >/dev/null; echo $? VM memory cannot be zero 1 ``` The hypervisor name is now validated so that the behaviour becomes: ```bash $ kata-runtime env >/dev/null; echo $? 0 $ sudo sed -i 's!^\[hypervisor\.qemu\]!\[hypervisor\.foo\]!g' /etc/kata-containers/configuration.toml $ ./kata-runtime env >/dev/null; echo $? /etc/kata-containers/configuration.toml: configuration file contains invalid hypervisor section: "foo" 1 ``` Fixes: #8212. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com> (cherry picked from commit `3e8cf6959c`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-20 00:44:10 +02:00
Peteris Rudzusiks	668c8979f0	runtime: fix reading cgroup stats of sandboxes The cgroup stats come from resourcecontrol package in the form of pointers to structs. The sandbox Stat() method incorrectly was expecting structs. This caused the cpu and memory stats to always be 0, which in turn caused incorrect pod overhead metrics. Fixes #8035 Signed-off-by: Peteris Rudzusiks <rye@stripe.com> (cherry picked from commit `94e2ccc2d5`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 17:45:49 +02:00
Fabiano Fidêncio	9eb8723a5b	clh: arm: Use static_sandbox_resource_mgmt=true Users have noticed that this is needed, as CLH does not yet implement a way to hotplug resources on aarh64. With this patch, when building for x86_64, I can see the this is the resulting config: ``` $ ARCH=amd64 make ... $ cat config/configuration-clh.toml \| grep static_sandbox_resource_mgmt static_sandbox_resource_mgmt=false ``` And when building for aarch64: ``` $ ARCH=arm64 make ... $ cat config/configuration-clh.toml \| grep static_sandbox_resource_mgmt static_sandbox_resource_mgmt=true ``` Fixes: #7941 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> (cherry picked from commit `72599f1911`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 17:22:23 +02:00
Greg Kurz	e7579d20f7	runtime/qemu: Rework QMP/HMP support PR #6146 added the possibility to control QEMU with an extra HMP socket as an aid for debugging. This is great for development or bug chasing but this raises some concerns in production. The HMP monitor allows to temper with the VM state in a variety of ways. This could be intentionally or mistakenly used to inject subtle bugs in the VM that would be extremely hard if not even impossible to debug. We definitely don't want that to be enabled by default. The feature is currently wired to the `enable_debug` setting in the `[hypervisor.qemu]` section of the configuration file. This setting has historically been used to control "debug output" and it is used as such by some downstream users (e.g. Openshift). Forcing people to have the extra HMP backdoor at the same time is abusive and dangerous. A new `extra_monitor_socket` is added to `[hypervisor.qemu]` to give fine control on whether the HMP socket is wanted or not. This setting is still gated by `enable_debug = true` to make it clear it is for debug only. The default is to not have the HMP socket though. This isn't backward compatible with #6416 but it is for the sake of "better safe than sorry". An extra monitor socket makes the QEMU instance untrusted. A warning is thus logged to the journal when one is requested. While here, also allow the user to choose between HMP and QMP for the extra monitor socket. Motivation is that QMP offers way more options to control or introspect the VM than HMP does. Users can also ask for pretty json formatting well suited for human reading. This will improve the debugging experience. This feature is only made visible in the base and GPU configurations of QEMU for now. Fixes #7952 Signed-off-by: Greg Kurz <groug@kaod.org> (cherry picked from commit `1f16b6627b`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 12:17:43 +02:00
Greg Kurz	f0278f41d7	runtime/virtiofsd: Drop all references to "--cache=none" This syntax belongs to the legacy C virtiofsd implementation that we don't support anymore since kata-containers 3.1.3 because of other API breaking changes. People have been warned to switch from "none" to "never" since kata-containers 2.5.2. Let's officially do that. The compat code that would convert "none" to "never" isn't needed anymore. Just drop it. Fixes #7864 Signed-off-by: Greg Kurz <groug@kaod.org> (cherry picked from commit `72c510d057`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 12:17:43 +02:00
Greg Kurz	4679aa7712	runtime/qemu: Pass "--xattr" to virtiofsd instead of "-o xattr" The "-o" syntax belongs to the legacy C virtiofsd. It is deprecated with the rust implementation. Signed-off-by: Greg Kurz <groug@kaod.org> (cherry picked from commit `81536f21af`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 12:17:43 +02:00
Fabiano Fidêncio	03d712ab25	runtime: Allow virtio_fs_extra_args annotation Some use cases may just require passing extra arguments to virtiofsd, and having this disabled by default makes it impossible to set when using kata-deploy, as changes in the configuration file would be overwritten by the daemon-set. With this in mind, let's allow users to pass whatever thet need (and here I'm specifically looking at `--xattr`) as a virtio_fs_extra_arg. Fixes: #7853 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> (cherry picked from commit `b1dd09a4d3`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 12:17:43 +02:00
Peng Tao	e0513094a0	runtime/vc: runPrestartHooks should ignore GetHypervisorPid failure If we are running FC hypervisor, it is not started when prestart hooks are executed. So we should just ignore such error and just go ahead and run the hooks. Signed-off-by: Peng Tao <bergwolf@hyper.sh> (cherry picked from commit `2e4c874726`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 12:17:43 +02:00
Peng Tao	c17cbd30f0	runtime: fail early when starting docker container with FC FC does not support network device hotplug. Let's add a check to fail early when starting containers created by docker. Signed-off-by: Peng Tao <bergwolf@hyper.sh> (cherry picked from commit `21204caf20`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 12:17:43 +02:00
Peng Tao	7e6f8010bd	runtime: run prestart hooks before starting VM for FC Add a new hypervisor capability to tell if it supports device hotplug. If not, we should run prestart hooks before starting new VMs as nerdctl is using the prestart hooks to set up netns. To make nerdctl + FC to work, we need to run the prestart hooks before starting new VMs. Fixes: #6384 Signed-off-by: Peng Tao <bergwolf@hyper.sh> (cherry picked from commit `32fd013716`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 12:17:43 +02:00
Fabiano Fidêncio	fa824af234	qemu: tdx: Workaround SMP issue with TDX 1.5 `...,sockets=1,cores=numvcpus,threads=1,...` must be used. Fixes: #7770 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> (cherry picked from commit `d1b54ede29`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 12:17:43 +02:00
Archana Shinde	07471cd7a6	qemu: tdx: Adapt to the TDX 1.5 stack QEMU for TDX 1.5 makes use of private memory map/unmap. Make changes to govmm to support this. Support for private backing fd for memory is added as knob to the qemu config. Userspace's map/unmap operations are done by fallocate() ioctl on the backing store fd. Reference: https://lore.kernel.org/linux-mm/20220519153713.819591-1-chao.p.peng@linux.intel.com/ Fixes: #7770 Signed-off-by: Archana Shinde <archana.m.shinde@intel.com> Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> (cherry picked from commit `1e34220c41`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 12:17:43 +02:00
Peng Tao	9ce8ee6c0c	runtime/fc: fix image/initrd annotation handling Right now if we configure an image annotation and have a config file setting initrd, the initrd config would override the image annotation. Make sure annotations are preferred over config options in image and initrd path handling. Signed-off-by: Peng Tao <bergwolf@hyper.sh> (cherry picked from commit `18d42da21e`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 11:13:31 +02:00
Peng Tao	f86bfe0da3	runtime/clh: fix image/initrd annotation handling We should make sure annotations are preferred over config options in image and initrd path handling. Fixes: #7705 Signed-off-by: Peng Tao <bergwolf@hyper.sh> (cherry picked from commit `9fda7059a5`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 11:13:27 +02:00
Peng Tao	59fae423b5	runtime/qemu: fix image/initrd annotation handling Right now if we configure an image annotation and have a config file setting initrd, the initrd config would override the image annotation. Add a helper function ImageOrInitrdAssetPath to make sure annotations are preferred over config options in image and initrd path handling. Signed-off-by: Peng Tao <bergwolf@hyper.sh> (cherry picked from commit `1a0092d631`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 11:13:24 +02:00
Jeremi Piotrowski	3b5c5bcfa4	runtime: clh: Support enabling iommu by enabling IOMMU on the default PCI segment. For hotplug to work we need a virtualized iommu and clh exposes one if there is some device or PCI segment that requests it. I would have preferred to add a separate PCI segment for hotplugging vfio devices but unfortunately kata assumes there is only one segment all over the place. See create_pci_root_bus_path(), split_vfio_pci_option() and grep for '0000'. Enabling the IOMMU on the default PCI segment requires passing enabling IOMMU on every device that is attached to it, which is why it is sprinkled all over the place. CLH does not support IOMMU for VirtioFs, so I've added a non IOMMU segment for that device. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com> (cherry picked from commit `3a1db7a86b`)	2023-09-21 13:56:01 +02:00
Jeremi Piotrowski	18a8b8df03	runtime: Remove redundant check in checkPCIeConfig There is no way for this branch to be hit, as port is only set when it is different than config.NoPort. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com> (cherry picked from commit `bfc93927fb`)	2023-09-21 13:55:47 +02:00
Jeremi Piotrowski	d86af5923f	runtime: Add test cases for checkPCIeConfig These test cases shows which options are valid for CLH/Qemu, and test that we correctly catch unsupported combinations. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com> (cherry picked from commit `7c4e73b609`)	2023-09-21 13:55:43 +02:00
Jeremi Piotrowski	0a918d0d20	runtime: Check config for supported CLH (cold\|hot)_plug_vfio values The only supported options are hot_plug_vfio=root-port or no-port. cold_plug_vfio not supported yet. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com> (cherry picked from commit `fc51e4b9eb`)	2023-09-21 13:55:38 +02:00
Jeremi Piotrowski	86201ace5a	runtime: clh: Add hot_plug_vfio entry to config hot_plug_vfio needs to be set to root-port, otherwise attaching vfio devices to CLH VMs fails. Either cold_plug_vfio or hot_plug_vfio is required, and we have not implemented support for cold_plug_vfio in CLH yet. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com> (cherry picked from commit `509771e6f5`)	2023-09-21 13:55:31 +02:00
Zvonko Kaiser	cddcde1d40	vfio: Fix vfio device ordering If modeVFIO is enabled we need 1st to attach the VFIO control group device /dev/vfio/vfio an 2nd the actuall device(s) afterwards.Sort the devices starting with device #1 being the VFIO control group device and the next the actuall device(s) /dev/vfio/<group> Fixes: #7493 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-31 11:26:27 +00:00
Zvonko Kaiser	1fc715bc65	s390x: Add AP Attach/Detach test Now that we have propper AP device support add a unit test for testing the correct Attach/Detach of AP devices. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-23 13:44:19 +00:00
Zvonko Kaiser	545de5042a	vfio: Fix tests Now with more elaborate checking of cold\|hot plug ports we needed to update some of the tests. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-20 13:42:44 +00:00
Zvonko Kaiser	62aa6750ec	vfio: Added better handling of VFIO Control Devices Depending on the vfio_mode we need to mount the VFIO control device additionally into the container. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-20 13:42:42 +00:00
Zvonko Kaiser	dd422ccb69	vfio: Remove obsolete HotplugVFIOonRootBus Removing HotplugVFIOonRootBus which is obsolete with the latest PCI topology changes, users can set cold_plug_vfio or hot_plug_vfio either in the configuration.toml or via annotations. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-20 07:25:40 +00:00
Zvonko Kaiser	114542e2ba	s390x: Fixing device.Bus assignment The device.Bus was reset if a specific combination of configuration parameters were not met. With the new PCIe topology this should not happen anymore Fixes: #7381 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-20 07:24:26 +00:00
Peng Tao	581be92b25	Merge pull request #4492 from zvonkok/pcie-topology runtime: fix PCIe topology for GPUDirect use-case	2023-07-03 09:17:12 +08:00
Fabiano Fidêncio	6a21e20c63	runtime: Add "none" as a shared_fs option Currently, even when using devmapper, if the VMM supports virtio-fs / virtio-9p, that's used to share a few files between the host and the guest. This needed, as we need to share with the guest contents like secrets, certificates, and configurations, via Kubernetes objects like configMaps or secrets, and those are rotated and must be updated into the guest whenever the rotation happens. However, there are still use-cases users can live with just copying those files into the guest at the pod creation time, and for those there's absolutely no need to have a shared filesystem process running with no extra obvious benefit, consuming memory and even increasing the attack surface used by Kata Containers. For the case mentioned above, we should allow users, making it very clear which limitations it'll bring, to run Kata Containers with devmapper without actually having to use a shared file system, which is already the approach taken when using Firecracker as the VMM. Fixes: #7207 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-06-30 20:45:00 +02:00
Zvonko Kaiser	0f454d0c04	gpu: Fixing typos for PCIe topology changes Some comments and functions had typos and wrong capitalization. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-30 08:42:55 +00:00
Zvonko Kaiser	8330fb8ee7	gpu: Update unit tests Some tests are now failing due to the changes how PCIe is handled. Update the test accordingly. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-23 11:16:25 +00:00
Greg Kurz	a43ea24dfc	virtiofsd: Convert legacy `-o` sub-options to their `--` replacement The `-o` option is the legacy way to configure virtiofsd, inherited from the C implementation. The rust implementation honours it for compatibility but it logs deprecation warnings. Let's use the replacement options in the go shim code. Also drop references to `-o` from the configuration TOML file. Fixes #7111 Signed-off-by: Greg Kurz <groug@kaod.org>	2023-06-16 11:42:54 +02:00
Greg Kurz	8e00dc6944	virtiofsd: Drop `-o no_posix_lock` The C implementation of virtiofsd had some kind of limited support for remote POSIX locks that was causing some workflows to fail with kata. Commit `432f9bea6e` hard coded `-o no_posix_lock` in order to enforce guest local POSIX locks and avoid the issues. We've switched to the rust implementation of virtiofsd since then, but it emits a warning about `-o` being deprecated. According to https://gitlab.com/virtio-fs/virtiofsd/-/issues/53 : The C implementation of the daemon has limited support for remote POSIX locks, restricted exclusively to non-blocking operations. We tried to implement the same level of functionality in #2, but we finally decided against it because, in practice most applications will fail if non-blocking operations aren't supported. Implementing support for non-blocking isn't trivial and will probably require extending the kernel interface before we can even start working on the daemon side. There is thus no justification to pass `-o no_posix_lock` anymore. Signed-off-by: Greg Kurz <groug@kaod.org>	2023-06-16 11:42:39 +02:00
Greg Kurz	2a15ad9788	virtiofsd: Stop using deprecated `-f` option The rust implementation of virtiofsd always runs foreground and spits a deprecation warning when `-f` is passed. Signed-off-by: Greg Kurz <groug@kaod.org>	2023-06-16 10:30:40 +02:00
Zvonko Kaiser	72f2cb84e6	gpu: Reset cold or hot plug after overriding If we override the cold, hot plug with an annotation we need to reset the other plugging mechanism to NoPort otherwise both will be enabled. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-15 17:51:01 +00:00
Zvonko Kaiser	fbacc09646	gpu: PCIe topology, consider vhost-user-block in Virt In Virt the vhost-user-block is an PCIe device so we need to make sure to consider it as well. We're keeping track of vhost-user-block devices and deduce the correct amount of PCIe root ports. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-15 17:39:55 +00:00
Zvonko Kaiser	b11246c3aa	gpu: Various fixes for virt machine type The PCI qom path was not deduced correctly added regex for correct path walking. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:33:57 +00:00
Zvonko Kaiser	40101ea7db	vfio: Added annotation for hot(cold) plug Now it is possible to configure the PCIe topology via annotations and addded a simple test, checking for Invalid and RootPort Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zvonko Kaiser	8f0d4e2612	vfio: Cleanup of Cold and Hot Plug Removed the configuration of PCIeRootPort and PCIeSwitchPort, those values can be deduced in createPCIeTopology Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zvonko Kaiser	b5c4677e0e	vfio: Rearrange the bus assignemnt Refactor the bus assignment so that the call to GetAllVFIODevicesFromIOMMUGroup can be used by any module without affecting the topology. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zvonko Kaiser	b1aa8c8a24	gpu: Moved the PCIe configs to drivers The hypervisor_state file was the wrong location for the PCIe Port settings, moved everything under device umbrella, where it can be consumed more easily and we do not get into circular deps. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zvonko Kaiser	55a66eb7fb	gpu: Add config to TOML Update cold-plug and hot-plug setting to include bridge, root and switch-port Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zvonko Kaiser	da42801c38	gpu: Add config settings tests for hot-plug Updated all references and config settings for hot-plug to match cold-plug Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zvonko Kaiser	de39fb7d38	runtime: Add support for GPUDirect and GPUDirect RDMA PCIe topology Fixes: #4491 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zhongtao Hu	355a24e0e1	Merge pull request #6289 from openanolis/runtime_vcpu_resize feat(runtime): vcpu resize capability	2023-06-13 10:54:11 +08:00
Yushuo	aaa96c749b	feat(runtime-rs): modify onlineCpuMemRequest Some vmms, such as dragonball, will actively help us perform online cpu operations when doing cpu hotplug. Under the old onlineCpuMem interface, it is difficult to adapt to this situation. So we modify the semantics of nb_cpus in onlineCpuMemRequest. In the original semantics, nb_cpus represents the number of newly added CPUs that need to be online. The modified semantics become that the number of online CPUs in the guest needs to be guaranteed. Fixes: #5030 Signed-off-by: Yushuo <y-shuo@linux.alibaba.com> Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>	2023-06-12 17:53:16 +08:00
James O. D. Hunt	8cb4238b46	packaging: Remove snap package Nobody has volunteered to maintain the (currently broken) snap build, so remove it. Fixes: #6769. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2023-06-12 09:24:09 +01:00

1 2 3 4 5 ...

1651 Commits