kata-containers

mirror of https://github.com/aljazceru/kata-containers.git synced 2026-01-14 20:04:21 +01:00

Author	SHA1	Message	Date
Fabiano Fidêncio	9eb8723a5b	clh: arm: Use static_sandbox_resource_mgmt=true Users have noticed that this is needed, as CLH does not yet implement a way to hotplug resources on aarh64. With this patch, when building for x86_64, I can see the this is the resulting config: ``` $ ARCH=amd64 make ... $ cat config/configuration-clh.toml \| grep static_sandbox_resource_mgmt static_sandbox_resource_mgmt=false ``` And when building for aarch64: ``` $ ARCH=arm64 make ... $ cat config/configuration-clh.toml \| grep static_sandbox_resource_mgmt static_sandbox_resource_mgmt=true ``` Fixes: #7941 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> (cherry picked from commit `72599f1911`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 17:22:23 +02:00
Greg Kurz	e7579d20f7	runtime/qemu: Rework QMP/HMP support PR #6146 added the possibility to control QEMU with an extra HMP socket as an aid for debugging. This is great for development or bug chasing but this raises some concerns in production. The HMP monitor allows to temper with the VM state in a variety of ways. This could be intentionally or mistakenly used to inject subtle bugs in the VM that would be extremely hard if not even impossible to debug. We definitely don't want that to be enabled by default. The feature is currently wired to the `enable_debug` setting in the `[hypervisor.qemu]` section of the configuration file. This setting has historically been used to control "debug output" and it is used as such by some downstream users (e.g. Openshift). Forcing people to have the extra HMP backdoor at the same time is abusive and dangerous. A new `extra_monitor_socket` is added to `[hypervisor.qemu]` to give fine control on whether the HMP socket is wanted or not. This setting is still gated by `enable_debug = true` to make it clear it is for debug only. The default is to not have the HMP socket though. This isn't backward compatible with #6416 but it is for the sake of "better safe than sorry". An extra monitor socket makes the QEMU instance untrusted. A warning is thus logged to the journal when one is requested. While here, also allow the user to choose between HMP and QMP for the extra monitor socket. Motivation is that QMP offers way more options to control or introspect the VM than HMP does. Users can also ask for pretty json formatting well suited for human reading. This will improve the debugging experience. This feature is only made visible in the base and GPU configurations of QEMU for now. Fixes #7952 Signed-off-by: Greg Kurz <groug@kaod.org> (cherry picked from commit `1f16b6627b`) Signed-off-by: Greg Kurz <groug@kaod.org>	2023-10-18 12:17:43 +02:00
Jeremi Piotrowski	3b5c5bcfa4	runtime: clh: Support enabling iommu by enabling IOMMU on the default PCI segment. For hotplug to work we need a virtualized iommu and clh exposes one if there is some device or PCI segment that requests it. I would have preferred to add a separate PCI segment for hotplugging vfio devices but unfortunately kata assumes there is only one segment all over the place. See create_pci_root_bus_path(), split_vfio_pci_option() and grep for '0000'. Enabling the IOMMU on the default PCI segment requires passing enabling IOMMU on every device that is attached to it, which is why it is sprinkled all over the place. CLH does not support IOMMU for VirtioFs, so I've added a non IOMMU segment for that device. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com> (cherry picked from commit `3a1db7a86b`)	2023-09-21 13:56:01 +02:00
Jeremi Piotrowski	86201ace5a	runtime: clh: Add hot_plug_vfio entry to config hot_plug_vfio needs to be set to root-port, otherwise attaching vfio devices to CLH VMs fails. Either cold_plug_vfio or hot_plug_vfio is required, and we have not implemented support for cold_plug_vfio in CLH yet. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com> (cherry picked from commit `509771e6f5`)	2023-09-21 13:55:31 +02:00
Peng Tao	581be92b25	Merge pull request #4492 from zvonkok/pcie-topology runtime: fix PCIe topology for GPUDirect use-case	2023-07-03 09:17:12 +08:00
Fabiano Fidêncio	6a21e20c63	runtime: Add "none" as a shared_fs option Currently, even when using devmapper, if the VMM supports virtio-fs / virtio-9p, that's used to share a few files between the host and the guest. This needed, as we need to share with the guest contents like secrets, certificates, and configurations, via Kubernetes objects like configMaps or secrets, and those are rotated and must be updated into the guest whenever the rotation happens. However, there are still use-cases users can live with just copying those files into the guest at the pod creation time, and for those there's absolutely no need to have a shared filesystem process running with no extra obvious benefit, consuming memory and even increasing the attack surface used by Kata Containers. For the case mentioned above, we should allow users, making it very clear which limitations it'll bring, to run Kata Containers with devmapper without actually having to use a shared file system, which is already the approach taken when using Firecracker as the VMM. Fixes: #7207 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-06-30 20:45:00 +02:00
Greg Kurz	a43ea24dfc	virtiofsd: Convert legacy `-o` sub-options to their `--` replacement The `-o` option is the legacy way to configure virtiofsd, inherited from the C implementation. The rust implementation honours it for compatibility but it logs deprecation warnings. Let's use the replacement options in the go shim code. Also drop references to `-o` from the configuration TOML file. Fixes #7111 Signed-off-by: Greg Kurz <groug@kaod.org>	2023-06-16 11:42:54 +02:00
Zvonko Kaiser	55a66eb7fb	gpu: Add config to TOML Update cold-plug and hot-plug setting to include bridge, root and switch-port Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Fabiano Fidêncio	ca1531fe9d	runtime: Use static_sandbox_resource_mgmt=true for TEEs When this option is enabled the runtime will attempt to determine the appropriate sandbox size (memory, CPU) before booting the virtual machine. As TEEs do not support memory and CPU hotplug, this approach must be used. Fixes: #6818 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-05-17 19:21:52 +02:00
Fabiano Fidêncio	3a9d3c72aa	gpu: Rename the last bits from `gpu` to `nvidia-gpu` Let's specifically name the `gpu` runtime class as `nvidia-gpu`. By doing this we keep the door open and ease the life of the next vendor adding GPU support for Kata Containers. Fixes: #6553 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-05-16 13:47:52 +02:00
Fabiano Fidêncio	edfaae85cb	Merge pull request #6700 from fitzthum/snp-artifacts packaging: Add SEV-SNP artifacts to main	2023-05-11 10:47:10 +02:00
Fabiano Fidêncio	c937d0a5d4	Merge pull request #6591 from UnmeshDeodhar/add-sev-artifacts-to-main packaging: Add sev artifacts to main	2023-05-11 09:09:36 +02:00
Tobin Feldman-Fitzthum	0bb37bff78	config: Add SNP configuration SNP requires many specific configurations, so let's make a new SNP configuration file that we can use with the kata-qemu-snp runtime class. Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com> Signed-off-by: Alex Carter <Alex.Carter@ibm.com>	2023-05-10 20:55:36 +00:00
Unmesh Deodhar	fb9c1fc36e	runtime: Add qemu-sev config Adding config file that can be used with qemu-sev runtime class. Since SEV has limited hotplug support, increase the pod overhead to account for fixed resource usage. Fixes: #6572 Signed-off-by: Unmesh Deodhar <udeodhar@amd.com>	2023-05-10 12:19:56 -05:00
Zvonko Kaiser	138ada049c	gpu: Cold Plug VFIO toml setting Added the cold_plug_vfio setting to the qemu-toml.in with some epxlanation Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-04-27 11:04:45 +00:00
Fupan Li	a1568cd2f5	Merge pull request #6676 from zvonkok/gpu-runtime gpu: Add GPU enabled confguration and runtime	2023-04-19 13:01:49 +08:00
Zvonko Kaiser	a81fff706f	gpu: Adding a GPU enabled configuration We need to set hotplug on pci root port and enable at least one root port. Also set the guest-hooks-dir to the correct path Fixes: #6675 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-04-17 10:40:09 +00:00
Fabiano Fidêncio	dc662333df	runtime: Increase the dial_timeout When testing on AKS, we've been hitting the dial_timeout every now and then. Let's increase it to 45 seconds (instead of 30) for all the VMMs, and to 60 seconfs in case of TEEs. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-04-13 22:42:52 +02:00
Fabiano Fidêncio	3b3656d96d	Merge pull request #6522 from fidencio/topic/add-tdx-artefacts-from-2023ww01-to-main tdx: Add artefacts from the latest TDX tools release into main	2023-04-11 20:43:02 +02:00
Fabiano Fidêncio	98682805be	config: Add configuration for QEMU TDX As the QEMU configuration for TDX differs quite a lot from the normal QEMU configuration, let's add a new configuration file for the QEMU TDX. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-04-11 16:10:35 +02:00
Feng Wang	cbe6ad9034	runtime: support non-root for clh This change enables to run cloud-hypervisor VMM using a non-root user when rootless flag is set true in the configuration Fixes: #2567 Signed-off-by: Feng Wang <fwang@confluent.io>	2023-02-22 13:57:09 -08:00
zhaojizhuang	ca02c9f512	runtime: add reconnect timeout for vhost user block Fixes: #6075 Signed-off-by: zhaojizhuang <571130360@qq.com>	2023-02-13 14:33:46 +08:00
yaoyinnan	bdf20b5d26	rootfs: support EROFS filesystem For kata containers, rootfs is used in the read-only way. EROFS can noticably decrease metadata overhead. On the basis of supporting the EROFS file system, it supports using the config parameter to switch the file system used by rootfs. Fixes: #6063 Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>	2023-02-11 00:44:13 +08:00
zhaojizhuang	9092c23a2e	runtime: Add hmp for qemu Fixes: #6092 Signed-off-by: zhaojizhuang <571130360@qq.com>	2023-01-29 14:22:04 +08:00
Eric Ernst	6ee550e9a5	runtime: vCPUs pinning is sandbox specific, not hypervisor While at it, make sure we persist this and fix a misc typo. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2023-01-12 15:44:25 -08:00
Bin Liu	86a82cace9	runtime: change cache mode from none to never New Rust virtiofsd's `cache` mode doesn't support `none` mode, we should use `never` to replace it. Fixes: #6018 Signed-off-by: Bin Liu <bin@hyper.sh>	2023-01-10 17:29:48 +08:00
Manabu Sugimoto	c617bbe70d	runtime: Pass SELinux policy for containers to the agent Pass SELinux policy for containers to the agent if `disable_guest_selinux` is set to `false` in the runtime configuration. The `container_t` type is applied to the container process inside the guest by default. Users can also set a custom SELinux policy to the container process using `guest_selinux_label` in the runtime configuration. This will be an alternative configuration of Kubernetes' security context for SELinux because users cannot specify the policy in Kata through Kubernetes's security context. To apply SELinux policy to the container, the guest rootfs must be CentOS that is created and built with `SELINUX=yes`. Fixes: #4812 Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>	2022-11-29 19:07:56 +09:00
liyuxuan.darfux	3bb145c63a	runtime: Support virtiofs queue size for qemu and make it configurable The default vhost-user-fs queue-size of qemu is 128 now. Set it to 1024 by default which is same as clh. Also make this value configurable. Fixes: #5694 Signed-off-by: liyuxuan.darfux <liyuxuan.darfux@bytedance.com>	2022-11-19 15:38:11 +08:00
LitFlwr0	2508d39b7c	runtime: added vcpus pinning logics Core VCPU threads pinning logics for issue 4476. Also provided docs. Fixes:#4476 Signed-off-by: LitFlwr0 <861690705@qq.com>	2022-11-04 17:52:42 +08:00
Joana Pecholt	ded60173d4	runtime: Enable choice between AMD SEV and SNP This is based on a patch from @niteeshkd that adds a config parameter to choose between AMD SEV and SEV-SNP VMs as the confidential guest type in case both types are supported. SEV is the default. Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>	2022-09-16 17:51:41 +02:00
Joana Pecholt	105eda5b9a	runtime: Initrd path option added to config Adds initrd configuration option to the configuration.toml that is generated for the setup using QEMU. Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>	2022-09-16 17:51:41 +02:00
Archana Shinde	7d52934ec1	Merge pull request #4798 from amshinde/use-iouring-qemu Use iouring for qemu block devices	2022-08-26 04:00:24 +05:30
Fabiano Fidêncio	c142fa2541	clh: Lift the sharedFS restriction used with TDX When booting the TDX kernel with `tdx_disable_filter`, as it's been done for QEMU, VirtioFS can work without any issues. Whether this will be part of the upstream kernel or not is a different story, but it easily could make it there as Cloud Hypervisor relies on the VIRTIO_F_IOMMU_PLATFORM feature, which forces the guest to use the DMA API, making these devices compatible with TDX. See Sebastien Boeuf's explanation of this in the 3c973fa7ce208e7113f69424b7574b83f584885d commit: """ By using DMA API, the guest triggers the TDX codepath to share some of the guest memory, in particular the virtqueues and associated buffers so that the VMM and vhost-user backends/processes can access this memory. """ Fixes: #4977 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-08-24 17:14:05 +02:00
Archana Shinde	ed0f1d0b32	config: Add "block_device_aio" as a config option for qemu This configuration will allow users to choose between different I/O backends for qemu, with the default being io_uring. This will allow users to fallback to a different I/O mechanism while running on kernels olders than 5.1. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2022-08-05 13:16:34 -07:00
Manabu Sugimoto	4d89476c91	runtime: Fix DisableSelinux config Enable Kata runtime to handle `disable_selinux` flag properly in order to be able to change the status by the runtime configuration whether the runtime applies the SELinux label to VMM process. Fixes: #4599 Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>	2022-07-06 15:50:28 +09:00
Fabiano Fidêncio	0939f5181b	config: Expose default_maxmemory Expose the newly added `default_maxmemory` to the project's Makefile and to the configuration files. Fixes: #4516 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-06-28 21:19:24 +02:00
Liang Zhou	ef925d40ce	runtime: enable sandbox feature on qemu Enable "-sandbox on" in qemu can introduce another protect layer on the host, to make the secure container more secure. The default option is disable because this feature may introduce some performance cost, even though user can enable /proc/sys/net/core/bpf_jit_enable to reduce the impact. Fixes: #2266 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-06-17 15:30:46 -07:00
Snir Sheriber	c67b9d2975	qemu: allow using legacy serial device for the console This allows to get guest early boot logs which are usually missed when virtconsole is used. - It utilizes previous work on the govmm side: https://github.com/kata-containers/govmm/pull/203 - unit test added Fixes: #4237 Signed-off-by: Snir Sheriber <ssheribe@redhat.com>	2022-05-17 12:06:11 +03:00
Fabiano Fidêncio	b6467ddd73	clh: Expose disk rate limiter config With everything implemented, let's now expose the disk rate limiter configuration options in the Cloud Hypervisor configuration file. Fixes: #4139 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-04-28 10:28:29 +02:00
Fabiano Fidêncio	7580bb5a78	clh: Expose net rate limiter config With everything implemented, let's now expose the net rate limiter configuration options in the Cloud Hypervisor configuration file. Fixes: #4017 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-04-28 10:28:13 +02:00
bin	9d5b03a1b7	runtime: delete debug option in virtiofsd virtiofsd's debug will be enabled if hypervisor's debug has been enabled, this will generate too many noisy logs from virtiofsd. Unbind the relationship of log level between virtiofsd and hypervisor, if users want to see debug log of virtiofsd, can set it by: virtio_fs_extra_args = ["-o", "log_level=debug"] Fixes: #3303 Signed-off-by: bin <bin@hyper.sh>	2022-04-07 19:55:22 +08:00
Fabiano Fidêncio	98750d792b	clh: Expose service offload configuration This configuration option is valid for all the hypervisor that are going to be used with the confidential containers effort, thus exposing the configuration option for Cloud Hypervisor as well. Fixes: #4022 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-04-01 11:15:55 +02:00
Evan Foster	afc567a9ae	storage: make k8s emptyDir creation configurable This change introduces the `disable_guest_empty_dir` config option, which allows the user to change whether a Kubernetes emptyDir volume is created on the guest (the default, for performance reasons), or the host (necessary if you want to pass data from the host to a guest via an emptyDir). Fixes #2053 Signed-off-by: Evan Foster <efoster@adobe.com>	2022-03-04 12:02:42 -08:00
Fabiano Fidêncio	12af632952	Merge pull request #3814 from fidencio/wip/disable-block-device-use-minor-fixes Minor fixes for the `disable_block_device_use` comments	2022-03-03 23:26:05 +01:00
Fabiano Fidêncio	97951a2d12	clh: Don't use SharedFS with Confidential Guests kata-containers/pulls#3771 added TDX support for Cloud Hypervisor, but two big things got overlooked while doing that. 1. virtio-fs, as of now, cannot be part of the trust boundary, so the Confidential Guest will not be using it. 2. virtio-block hotplug should be enabled in order to use virtio-block for the rootfs (used with the devmapper plugin). When trying to use cloud-hypervisor with TDX using virtio-fs, we're facing the following error on the guest kernel: ``` virtiofs virtio2: device must provide VIRTIO_F_ACCESS_PLATFORM ``` After checking and double-checking with virtiofs and cloud-hypervisor developers, it happens as confidential containers might put some limitations on the device, so it can't access all of the guests' memory and that's where this restriction seems to be coming from. Vivek mentioned that virtiofsd do not support VIRTIO_F_ACCESS_PLATFORM (aka VIRTIO_F_IOMMU_PLATFORM) yet, and that for ecrypted guests virtiofs may not be the best solution at the moment. @sboeuf put this in a very nice way: "if the virtio-fs driver doesn't support VIRTIO_F_ACCESS_PLATFORM, then the pages corresponding to the virtqueues and the buffers won't be marked as SHARED, meaning the VMM won't have access to it". Interestingly enough, it works with QEMU, and it may be due to some change done on the patched QEMU that @devimc is packaging, but we won't take the path to figure out what was the change and patch cloud-hypervisor on the same way, because of 1. Fixes: #3810 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-03-03 12:49:40 +01:00
Fabiano Fidêncio	76e4f6a2a3	Revert "hypervisors: Confidential Guests do not support Device hotplug" This reverts commit `df8ffecde0`, as device hotplug is supported and, more than that, is very much needed when using virtio-blk instead of virtio-fs. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-03-03 09:59:55 +01:00
Fabiano Fidêncio	fa8b93927c	config: qemu: Fix disable_block_device_use comments virtio-fs, instead of virtio-9p, is the default shared file system type in case virtio-blk is not used. Fixes: #3813 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-03-02 20:43:36 +01:00
Fabiano Fidêncio	9615c8bc9c	config: fc: Don't expose disable_block_device_use Relying on virtio-block is the only way to use Firecracker with Kata Containers, as shared FS (virtio-{fs,fs-nydus,9p}) is not supported by Firecracker. As configuration doesn't make sense to be exposed, we hardcode the `false` value in the Firecracker configuration structure. Fixes: #3813 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-03-02 20:43:28 +01:00
Fabiano Fidêncio	de57466212	config: Expand confidential_guest comments Let's clarify that an error will be reported in case confidential_guest is enabled, but the hardware where Kata Containers is running doesn't provide the required feature set. Fixes: #3787 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-02-28 11:57:42 +01:00
Fabiano Fidêncio	641d475fa6	config: clh: Use "Intel TDX" instead of just "TDX" Let's use "Intel TDX" rather than just "TDX", as it can ease the understanding of the terminology. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-02-28 10:27:21 +01:00

1 2

75 Commits