kata-containers

mirror of https://github.com/aljazceru/kata-containers.git synced 2026-01-14 11:54:28 +01:00

Author	SHA1	Message	Date
Peteris Rudzusiks	668c8979f0	runtime: fix reading cgroup stats of sandboxes The cgroup stats come from resourcecontrol package in the form of pointers to structs. The sandbox Stat() method incorrectly was expecting structs. This caused the cpu and memory stats to always be 0, which in turn caused incorrect pod overhead metrics. Fixes #8035 Signed-off-by: Peteris Rudzusiks <rye@stripe.com> (cherry picked from commit `94e2ccc2d5`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 17:45:49 +02:00
Greg Kurz	e7579d20f7	runtime/qemu: Rework QMP/HMP support PR #6146 added the possibility to control QEMU with an extra HMP socket as an aid for debugging. This is great for development or bug chasing but this raises some concerns in production. The HMP monitor allows to temper with the VM state in a variety of ways. This could be intentionally or mistakenly used to inject subtle bugs in the VM that would be extremely hard if not even impossible to debug. We definitely don't want that to be enabled by default. The feature is currently wired to the `enable_debug` setting in the `[hypervisor.qemu]` section of the configuration file. This setting has historically been used to control "debug output" and it is used as such by some downstream users (e.g. Openshift). Forcing people to have the extra HMP backdoor at the same time is abusive and dangerous. A new `extra_monitor_socket` is added to `[hypervisor.qemu]` to give fine control on whether the HMP socket is wanted or not. This setting is still gated by `enable_debug = true` to make it clear it is for debug only. The default is to not have the HMP socket though. This isn't backward compatible with #6416 but it is for the sake of "better safe than sorry". An extra monitor socket makes the QEMU instance untrusted. A warning is thus logged to the journal when one is requested. While here, also allow the user to choose between HMP and QMP for the extra monitor socket. Motivation is that QMP offers way more options to control or introspect the VM than HMP does. Users can also ask for pretty json formatting well suited for human reading. This will improve the debugging experience. This feature is only made visible in the base and GPU configurations of QEMU for now. Fixes #7952 Signed-off-by: Greg Kurz <groug@kaod.org> (cherry picked from commit `1f16b6627b`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 12:17:43 +02:00
Greg Kurz	f0278f41d7	runtime/virtiofsd: Drop all references to "--cache=none" This syntax belongs to the legacy C virtiofsd implementation that we don't support anymore since kata-containers 3.1.3 because of other API breaking changes. People have been warned to switch from "none" to "never" since kata-containers 2.5.2. Let's officially do that. The compat code that would convert "none" to "never" isn't needed anymore. Just drop it. Fixes #7864 Signed-off-by: Greg Kurz <groug@kaod.org> (cherry picked from commit `72c510d057`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 12:17:43 +02:00
Greg Kurz	4679aa7712	runtime/qemu: Pass "--xattr" to virtiofsd instead of "-o xattr" The "-o" syntax belongs to the legacy C virtiofsd. It is deprecated with the rust implementation. Signed-off-by: Greg Kurz <groug@kaod.org> (cherry picked from commit `81536f21af`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 12:17:43 +02:00
Peng Tao	e0513094a0	runtime/vc: runPrestartHooks should ignore GetHypervisorPid failure If we are running FC hypervisor, it is not started when prestart hooks are executed. So we should just ignore such error and just go ahead and run the hooks. Signed-off-by: Peng Tao <bergwolf@hyper.sh> (cherry picked from commit `2e4c874726`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 12:17:43 +02:00
Peng Tao	c17cbd30f0	runtime: fail early when starting docker container with FC FC does not support network device hotplug. Let's add a check to fail early when starting containers created by docker. Signed-off-by: Peng Tao <bergwolf@hyper.sh> (cherry picked from commit `21204caf20`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 12:17:43 +02:00
Peng Tao	7e6f8010bd	runtime: run prestart hooks before starting VM for FC Add a new hypervisor capability to tell if it supports device hotplug. If not, we should run prestart hooks before starting new VMs as nerdctl is using the prestart hooks to set up netns. To make nerdctl + FC to work, we need to run the prestart hooks before starting new VMs. Fixes: #6384 Signed-off-by: Peng Tao <bergwolf@hyper.sh> (cherry picked from commit `32fd013716`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 12:17:43 +02:00
Fabiano Fidêncio	fa824af234	qemu: tdx: Workaround SMP issue with TDX 1.5 `...,sockets=1,cores=numvcpus,threads=1,...` must be used. Fixes: #7770 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> (cherry picked from commit `d1b54ede29`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 12:17:43 +02:00
Archana Shinde	07471cd7a6	qemu: tdx: Adapt to the TDX 1.5 stack QEMU for TDX 1.5 makes use of private memory map/unmap. Make changes to govmm to support this. Support for private backing fd for memory is added as knob to the qemu config. Userspace's map/unmap operations are done by fallocate() ioctl on the backing store fd. Reference: https://lore.kernel.org/linux-mm/20220519153713.819591-1-chao.p.peng@linux.intel.com/ Fixes: #7770 Signed-off-by: Archana Shinde <archana.m.shinde@intel.com> Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> (cherry picked from commit `1e34220c41`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 12:17:43 +02:00
Peng Tao	9ce8ee6c0c	runtime/fc: fix image/initrd annotation handling Right now if we configure an image annotation and have a config file setting initrd, the initrd config would override the image annotation. Make sure annotations are preferred over config options in image and initrd path handling. Signed-off-by: Peng Tao <bergwolf@hyper.sh> (cherry picked from commit `18d42da21e`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 11:13:31 +02:00
Peng Tao	f86bfe0da3	runtime/clh: fix image/initrd annotation handling We should make sure annotations are preferred over config options in image and initrd path handling. Fixes: #7705 Signed-off-by: Peng Tao <bergwolf@hyper.sh> (cherry picked from commit `9fda7059a5`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 11:13:27 +02:00
Peng Tao	59fae423b5	runtime/qemu: fix image/initrd annotation handling Right now if we configure an image annotation and have a config file setting initrd, the initrd config would override the image annotation. Add a helper function ImageOrInitrdAssetPath to make sure annotations are preferred over config options in image and initrd path handling. Signed-off-by: Peng Tao <bergwolf@hyper.sh> (cherry picked from commit `1a0092d631`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 11:13:24 +02:00
Jeremi Piotrowski	3b5c5bcfa4	runtime: clh: Support enabling iommu by enabling IOMMU on the default PCI segment. For hotplug to work we need a virtualized iommu and clh exposes one if there is some device or PCI segment that requests it. I would have preferred to add a separate PCI segment for hotplugging vfio devices but unfortunately kata assumes there is only one segment all over the place. See create_pci_root_bus_path(), split_vfio_pci_option() and grep for '0000'. Enabling the IOMMU on the default PCI segment requires passing enabling IOMMU on every device that is attached to it, which is why it is sprinkled all over the place. CLH does not support IOMMU for VirtioFs, so I've added a non IOMMU segment for that device. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com> (cherry picked from commit `3a1db7a86b`)	2023-09-21 13:56:01 +02:00
Jeremi Piotrowski	0a918d0d20	runtime: Check config for supported CLH (cold\|hot)_plug_vfio values The only supported options are hot_plug_vfio=root-port or no-port. cold_plug_vfio not supported yet. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com> (cherry picked from commit `fc51e4b9eb`)	2023-09-21 13:55:38 +02:00
Zvonko Kaiser	cddcde1d40	vfio: Fix vfio device ordering If modeVFIO is enabled we need 1st to attach the VFIO control group device /dev/vfio/vfio an 2nd the actuall device(s) afterwards.Sort the devices starting with device #1 being the VFIO control group device and the next the actuall device(s) /dev/vfio/<group> Fixes: #7493 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-31 11:26:27 +00:00
Zvonko Kaiser	1fc715bc65	s390x: Add AP Attach/Detach test Now that we have propper AP device support add a unit test for testing the correct Attach/Detach of AP devices. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-23 13:44:19 +00:00
Zvonko Kaiser	545de5042a	vfio: Fix tests Now with more elaborate checking of cold\|hot plug ports we needed to update some of the tests. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-20 13:42:44 +00:00
Zvonko Kaiser	62aa6750ec	vfio: Added better handling of VFIO Control Devices Depending on the vfio_mode we need to mount the VFIO control device additionally into the container. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-20 13:42:42 +00:00
Zvonko Kaiser	dd422ccb69	vfio: Remove obsolete HotplugVFIOonRootBus Removing HotplugVFIOonRootBus which is obsolete with the latest PCI topology changes, users can set cold_plug_vfio or hot_plug_vfio either in the configuration.toml or via annotations. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-20 07:25:40 +00:00
Zvonko Kaiser	114542e2ba	s390x: Fixing device.Bus assignment The device.Bus was reset if a specific combination of configuration parameters were not met. With the new PCIe topology this should not happen anymore Fixes: #7381 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-20 07:24:26 +00:00
Peng Tao	581be92b25	Merge pull request #4492 from zvonkok/pcie-topology runtime: fix PCIe topology for GPUDirect use-case	2023-07-03 09:17:12 +08:00
Fabiano Fidêncio	6a21e20c63	runtime: Add "none" as a shared_fs option Currently, even when using devmapper, if the VMM supports virtio-fs / virtio-9p, that's used to share a few files between the host and the guest. This needed, as we need to share with the guest contents like secrets, certificates, and configurations, via Kubernetes objects like configMaps or secrets, and those are rotated and must be updated into the guest whenever the rotation happens. However, there are still use-cases users can live with just copying those files into the guest at the pod creation time, and for those there's absolutely no need to have a shared filesystem process running with no extra obvious benefit, consuming memory and even increasing the attack surface used by Kata Containers. For the case mentioned above, we should allow users, making it very clear which limitations it'll bring, to run Kata Containers with devmapper without actually having to use a shared file system, which is already the approach taken when using Firecracker as the VMM. Fixes: #7207 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-06-30 20:45:00 +02:00
Greg Kurz	a43ea24dfc	virtiofsd: Convert legacy `-o` sub-options to their `--` replacement The `-o` option is the legacy way to configure virtiofsd, inherited from the C implementation. The rust implementation honours it for compatibility but it logs deprecation warnings. Let's use the replacement options in the go shim code. Also drop references to `-o` from the configuration TOML file. Fixes #7111 Signed-off-by: Greg Kurz <groug@kaod.org>	2023-06-16 11:42:54 +02:00
Greg Kurz	8e00dc6944	virtiofsd: Drop `-o no_posix_lock` The C implementation of virtiofsd had some kind of limited support for remote POSIX locks that was causing some workflows to fail with kata. Commit `432f9bea6e` hard coded `-o no_posix_lock` in order to enforce guest local POSIX locks and avoid the issues. We've switched to the rust implementation of virtiofsd since then, but it emits a warning about `-o` being deprecated. According to https://gitlab.com/virtio-fs/virtiofsd/-/issues/53 : The C implementation of the daemon has limited support for remote POSIX locks, restricted exclusively to non-blocking operations. We tried to implement the same level of functionality in #2, but we finally decided against it because, in practice most applications will fail if non-blocking operations aren't supported. Implementing support for non-blocking isn't trivial and will probably require extending the kernel interface before we can even start working on the daemon side. There is thus no justification to pass `-o no_posix_lock` anymore. Signed-off-by: Greg Kurz <groug@kaod.org>	2023-06-16 11:42:39 +02:00
Greg Kurz	2a15ad9788	virtiofsd: Stop using deprecated `-f` option The rust implementation of virtiofsd always runs foreground and spits a deprecation warning when `-f` is passed. Signed-off-by: Greg Kurz <groug@kaod.org>	2023-06-16 10:30:40 +02:00
Zvonko Kaiser	72f2cb84e6	gpu: Reset cold or hot plug after overriding If we override the cold, hot plug with an annotation we need to reset the other plugging mechanism to NoPort otherwise both will be enabled. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-15 17:51:01 +00:00
Zvonko Kaiser	fbacc09646	gpu: PCIe topology, consider vhost-user-block in Virt In Virt the vhost-user-block is an PCIe device so we need to make sure to consider it as well. We're keeping track of vhost-user-block devices and deduce the correct amount of PCIe root ports. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-15 17:39:55 +00:00
Zvonko Kaiser	b11246c3aa	gpu: Various fixes for virt machine type The PCI qom path was not deduced correctly added regex for correct path walking. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:33:57 +00:00
Zvonko Kaiser	40101ea7db	vfio: Added annotation for hot(cold) plug Now it is possible to configure the PCIe topology via annotations and addded a simple test, checking for Invalid and RootPort Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zvonko Kaiser	8f0d4e2612	vfio: Cleanup of Cold and Hot Plug Removed the configuration of PCIeRootPort and PCIeSwitchPort, those values can be deduced in createPCIeTopology Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zvonko Kaiser	b5c4677e0e	vfio: Rearrange the bus assignemnt Refactor the bus assignment so that the call to GetAllVFIODevicesFromIOMMUGroup can be used by any module without affecting the topology. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zvonko Kaiser	b1aa8c8a24	gpu: Moved the PCIe configs to drivers The hypervisor_state file was the wrong location for the PCIe Port settings, moved everything under device umbrella, where it can be consumed more easily and we do not get into circular deps. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zvonko Kaiser	55a66eb7fb	gpu: Add config to TOML Update cold-plug and hot-plug setting to include bridge, root and switch-port Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zvonko Kaiser	da42801c38	gpu: Add config settings tests for hot-plug Updated all references and config settings for hot-plug to match cold-plug Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zvonko Kaiser	de39fb7d38	runtime: Add support for GPUDirect and GPUDirect RDMA PCIe topology Fixes: #4491 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Yushuo	aaa96c749b	feat(runtime-rs): modify onlineCpuMemRequest Some vmms, such as dragonball, will actively help us perform online cpu operations when doing cpu hotplug. Under the old onlineCpuMem interface, it is difficult to adapt to this situation. So we modify the semantics of nb_cpus in onlineCpuMemRequest. In the original semantics, nb_cpus represents the number of newly added CPUs that need to be online. The modified semantics become that the number of online CPUs in the guest needs to be guaranteed. Fixes: #5030 Signed-off-by: Yushuo <y-shuo@linux.alibaba.com> Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>	2023-06-12 17:53:16 +08:00
Beraldo Leal	0e47cfc4c7	runtime: sending SIGKILL to qemu There is a race condition when virtiofsd is killed without finishing all the clients. Because of that, when a pod is stopped, QEMU detects virtiofsd is gone, which is legitimate. Sending a SIGTERM first before killing could introduce some latency during the shutdown. Fixes #6757. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2023-05-24 11:31:28 -04:00
Fabiano Fidêncio	9aae333343	Merge pull request #6871 from kmjohansen/bugfix/ptmx runtime: make debug console work with sandbox_cgroup_only	2023-05-23 22:24:51 +02:00
Archana Shinde	2c9efbe04c	Merge pull request #6907 from likebreath/0519/clh_v32.0 Upgrade to Cloud Hypervisor v32.0	2023-05-22 09:53:05 -07:00
Bo Chen	35c3d7b4bc	runtime: clh: Re-generate the client code This patch re-generates the client code for Cloud Hypervisor v32.0. Note: The client code of cloud-hypervisor's OpenAPI is automatically generated by openapi-generator. Fixes: #6632 Signed-off-by: Bo Chen <chen.bo@intel.com>	2023-05-19 12:49:45 -07:00
Krister Johansen	eff6ed2d5f	runtime: make debug console work with sandbox_cgroup_only If a hypervisor debug console is enabled and sandbox_cgroup_only is set, the hypervisor can fail to open /dev/ptmx, which prevents the sandbox from launching. This is caused by the absence of a device cgroup entry to allow access to /dev/ptmx. When sandbox_cgroup_only is not set, the hypervisor inherits the default unrestrcited device cgroup, but with it enabled it runs into allow / deny list restrictions. Fix by adding an allowlist entry for /dev/ptmx when debug is enabled, sandbox_cgroup_only is true, and no /dev/ptmx is already in the list of devices. Fixes: #6870 Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>	2023-05-18 10:36:24 -07:00
Gabriela Cervantes	11a34a72e2	docs: Update container network model url This PR updates the container network model url that is part of the virtcontainers documentation. Fixes #6889 Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2023-05-18 15:08:08 +00:00
Salvador Fuentes	b76058c979	Merge pull request #6721 from nedsouza/virtcontainers-qemu-go-coverage virtcontainers/qemu_test.go: Improve coverage	2023-05-16 11:11:43 -06:00
James O. D. Hunt	a96fcfd5be	Merge pull request #6735 from nedsouza/258/tests-coverage-compatoci virtcontainers/pkg/compatoci/: Improved coverage for for Kata 2.0	2023-05-16 15:36:35 +01:00
Tamas K Lengyel	20cb875087	virtcontainers/qemu_test.go: Improve test coverage Rework TestQemuCreateVM routine to be a table driven test with various config variations passed to it. After CreateVM a handful of additional functions are exercised to improve code-coverage. Also add partial coverage for StartVM routine. Currently improving from 19.7% to 35.7% Credit PR to Hackathon Team3 Fixes: #267 Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>	2023-05-15 15:26:35 -04:00
LiuWeijie	50cc9c582f	tests: Improve coverage for virtcontainers/pkg/compatoci/ for Kata 2.0 Add test cases for ParseConfigJson function and GetContainerSpec function Fixes: #258 Signed-off-by: LiuWeijie <weijie.liu@intel.com>	2023-05-15 11:58:17 +08:00
Archana Shinde	32b39ee347	Merge pull request #6763 from nedsouza/266/tests_coverage_virtcontainers_fc virtcontainers: Improved test coverage for fc.go from 4.6% to 18.5%	2023-05-12 11:53:27 -07:00
Feng Wang	4e0dce6802	Merge pull request #6738 from fengwang666/oss-fix-fd-leak runtime: Fix virtiofs fd leak	2023-05-08 10:52:36 -07:00
Eduardo Berrocal	a4c0303d89	virtcontainers: Fixed static checks for improved test coverage for fc.go Expanded tests on fc_test.go to cover more lines of code. Coverage went from 4.6% to 18.5%. Fixed very simple static check fail on line 202. Fixes: #266 Signed-off-by: Eduardo Berrocal <eduardo.berrocal@intel.com>	2023-05-07 00:17:36 -07:00
Peng Tao	65670e6b0a	Merge pull request #6699 from zvonkok/cold-plug-vfio gpu: cold plug VFIO devices	2023-05-05 10:04:29 +08:00

1 2 3 4 5 ...

1033 Commits