kata-containers

mirror of https://github.com/aljazceru/kata-containers.git synced 2025-12-22 08:44:25 +01:00

Author	SHA1	Message	Date
Zvonko Kaiser	dded731db3	gpu: Add OVMF setting for MMIO aperture The default size of OVMFs aperture is too low to initialized PCIe devices with huge BARs Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-04-26 09:47:37 +00:00
Zvonko Kaiser	2a830177ca	gpu: Add fwcfg helper function Added driver util function for easier handling of VFIO devices outside of the VFIO module. At the sandbox level we may need to set options depending if we have a VFIO/PCIe device, like the fwCfg for confiential guests. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-04-26 09:47:37 +00:00
Zvonko Kaiser	c8cf7ed3bc	gpu: Add ColdPlug of VFIO devices with devManager If we have a VFIO device and cold-plug is enabled we mark each device as ColdPlug=true and let the VFIO module do the attaching. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-04-26 09:47:37 +00:00
Fabiano Fidêncio	50ce33b02d	Merge pull request #6205 from fengwang666/non-root-clh runtime: support non-root for clh	2023-04-11 19:34:00 +02:00
Jakob Naucke	f666f8e2df	agent: Add VFIO-AP device handling Initial VFIO-AP support (#578) was simple, but somewhat hacky; a different code path would be chosen for performing the hotplug, and agent-side device handling was bound to knowing the assigned queue numbers (APQNs) through some other means; plus the code for awaiting them was written for the Go agent and never released. This code also artificially increased the hotplug timeout to wait for the (relatively expensive, thus limited to 5 seconds at the quickest) AP rescan, which is impractical for e.g. common k8s timeouts. Since then, the general handling logic was improved (#1190), but it assumed PCI in several places. In the runtime, introduce and parse AP devices. Annotate them as such when passing to the agent, and include information about the associated APQNs. The agent awaits the passed APQNs through uevents and triggers a rescan directly. Fixes: #3678 Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2023-03-16 10:07:48 +09:00
Jakob Naucke	b546eca26f	runtime: Generalize VFIO devices Generalize VFIO devices to allow for adding AP in the next patch. The logic for VFIOPciDeviceMediatedType() has been changed and IsAPVFIOMediatedDevice() has been removed. The rationale for the revomal is: - VFIODeviceMediatedType is divided into 2 subtypes for AP and PCI - Logic of checking a subtype of mediated device is included in GetVFIODeviceType() - VFIOPciDeviceMediatedType() can simply fulfill the device addition based on a type categorized by GetVFIODeviceType() Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2023-03-16 10:06:37 +09:00
Jakob Naucke	4c527d00c7	agent: Rename VFIO handling to VFIO PCI handling e.g., split_vfio_option is PCI-specific and should instead be named split_vfio_pci_option. This mutually affects the runtime, most notably how the labels are named for the agent. Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2023-03-16 07:43:39 +09:00
Feng Wang	cbe6ad9034	runtime: support non-root for clh This change enables to run cloud-hypervisor VMM using a non-root user when rootless flag is set true in the configuration Fixes: #2567 Signed-off-by: Feng Wang <fwang@confluent.io>	2023-02-22 13:57:09 -08:00
zhaojizhuang	ca02c9f512	runtime: add reconnect timeout for vhost user block Fixes: #6075 Signed-off-by: zhaojizhuang <571130360@qq.com>	2023-02-13 14:33:46 +08:00
Greg Kurz	334c4b8bdc	runtime: Drop QEMU log file support The QEMU log file is essentially about fine grain tracing of QEMU internals and mostly useful for developpers, not production. Notably, the log file isn't limited in size, nor rotated in any way. It means that a container running in the VM could possibly flood the log file with a guest triggerable trace. For example, on openshift, the log file is supposed to reside on a per-VM 14 GiB tmpfs mount. This means that each pod running with the kata runtime could potentially consume this amount of host RAM which is not acceptable. Error messages are best collected from QEMU's stderr as kata is doing now since PR #5736 was merged. Drop support for the QEMU log file because it doesn't bring any value but can certainly do harm. Fixes #6173 Signed-off-by: Greg Kurz <groug@kaod.org>	2023-01-31 09:20:29 +01:00
zhaojizhuang	9092c23a2e	runtime: Add hmp for qemu Fixes: #6092 Signed-off-by: zhaojizhuang <571130360@qq.com>	2023-01-29 14:22:04 +08:00
Greg Kurz	39fe4a4b6f	runtime: Collect QEMU's stderr LaunchQemu now connects a pipe to QEMU's stderr and makes it usable by callers through a Go io.ReadCloser object. As explained in [0], all messages should be read from the pipe before calling cmd.Wait : introduce a LogAndWait helper to handle that. Fixes #5780 Signed-off-by: Greg Kurz <groug@kaod.org>	2023-01-24 23:09:17 +01:00
Greg Kurz	a5319c6be6	runtime: Start QEMU undaemonized QEMU has always been started daemonized since the beginning. I could not find any justification for that though, but it certainly introduces a problem : QEMU stops logging errors when started this way, which isn't accaptable from a support standpoint. The QEMU community discourages the use of -daemonize ; mostly because libvirt, QEMU's primary consummer, doesn't use this option and prefers getting errors from QEMU's stderr through a pipe in order to enforce rollover. Now that virtcontainers knows how to start QEMU with a pre- established QMP connection, let's start QEMU without -daemonize. This requires to handle the reaping of QEMU when it terminates. Since cmd.Wait() is blocking, call it from a goroutine. Signed-off-by: Greg Kurz <groug@kaod.org>	2023-01-24 23:09:11 +01:00
Greg Kurz	bf4e3a618f	runtime: Launch QEMU with cmd.Start() LaunchCustomQemu() currently starts QEMU with cmd.Run() which is supposed to block until the child process terminates. This assumes that QEMU daemonizes itself, otherwise LaunchCustomQemu() would block forever. The virtcontainers package indeed enables the Daemonize knob in the configuration but having such an implicit dependency on a supposedly configurable setting is ugly and fragile. cmd.Run() is : func (c *Cmd) Run() error { if err := c.Start(); err != nil { return err } return c.Wait() } Let's open-code this : govmm calls cmd.Start() and returns the cmd to virtcontainers which calls cmd.Wait(). If QEMU doesn't start, e.g. missing binary, there won't be any errors to collect from QEMU output. Just drop these lines in govmm. Similarily there won't be any log file to read from in virtcontainers. Drop that as well. Signed-off-by: Greg Kurz <groug@kaod.org>	2023-01-24 23:09:11 +01:00
Greg Kurz	8a1723a5cb	runtime: Pre-establish the QMP connection Running QEMU daemonized ensures that the QMP socket is ready to accept connections when LaunchQemu() returns. In order to be able to run QEMU undaemonized, let's handle that part upfront. Create a listener socket and connect to it. Pass the listener to QEMU and pass the connected socket to QMP : this ensures that we cannot fail to establish QMP connection and that we can detect if QEMU exits before accepting the connection. This is basically what libvirt does. Signed-off-by: Greg Kurz <groug@kaod.org>	2023-01-24 23:09:11 +01:00
Manabu Sugimoto	c617bbe70d	runtime: Pass SELinux policy for containers to the agent Pass SELinux policy for containers to the agent if `disable_guest_selinux` is set to `false` in the runtime configuration. The `container_t` type is applied to the container process inside the guest by default. Users can also set a custom SELinux policy to the container process using `guest_selinux_label` in the runtime configuration. This will be an alternative configuration of Kubernetes' security context for SELinux because users cannot specify the policy in Kata through Kubernetes's security context. To apply SELinux policy to the container, the guest rootfs must be CentOS that is created and built with `SELINUX=yes`. Fixes: #4812 Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>	2022-11-29 19:07:56 +09:00
Bin Liu	1dfd845f51	runtime: go fix code for 1.19 We have starting to use golang 1.19, some features are not supported later, so run `go fix` to fix them. Fixes: #5750 Signed-off-by: Bin Liu <bin@hyper.sh>	2022-11-25 11:29:18 +08:00
Fabiano Fidêncio	df3d9878d5	Merge pull request #5695 from darfux/virtiofs-queue-size runtime: Support virtiofs queue size for qemu and make it configurable	2022-11-22 20:04:30 +01:00
liyuxuan.darfux	3bb145c63a	runtime: Support virtiofs queue size for qemu and make it configurable The default vhost-user-fs queue-size of qemu is 128 now. Set it to 1024 by default which is same as clh. Also make this value configurable. Fixes: #5694 Signed-off-by: liyuxuan.darfux <liyuxuan.darfux@bytedance.com>	2022-11-19 15:38:11 +08:00
Alexandru Matei	a04afab74d	qemu: early exit from Check if the process was stopped Fixes: #5625 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>	2022-11-10 22:43:32 +02:00
Alexandru Matei	7e481f2179	qemu: set stopped only if StopVM is successful Fixes: #5624 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>	2022-11-10 22:43:32 +02:00
wangyongchao.bj	04bbce8dc3	virtcontainers: add warn log record for qmp hotplug cpu error The qmp command of hotplug cpu failed error was hidden. It didn't friendly for the user tracing the hotplug cpu error. The PR help us to improve the hotplug cpu error log. Add real qemu command error log for `failed to hot add vCPUs`. Through the error message, we can get the reason of the failed qmp command for hotplug cpu operation. Fixes: #5234 Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>	2022-09-23 08:22:30 +08:00
Feng Wang	f914319874	runtime: store the user name in hypervisor config The user name will be used to delete the user instead of relying on uid lookup because uid can be reused. Fixes: #5155 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-09-13 10:32:55 -07:00
Feng Wang	5cafe21770	runtime: make StopVM thread-safe StopVM can be invoked by multiple threads and needs to be thread-safe Fixes: #5155 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-09-12 21:56:15 -07:00
Feng Wang	c3015927a3	runtime: add more debug logs for non-root user operation Previously the logging was insufficient and made debugging difficult Fixes: #5155 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-09-12 21:38:57 -07:00
Eric Ernst	e0142db24f	hypervisor: Add GetTotalMemoryMB to interface It'll be useful to get the total memory provided to the guest (hotplugged + coldplugged). We'll use this information when calcualting how much memory we can add at a time when utilizing ACPI hotplug. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-08-30 16:37:47 -07:00
Archana Shinde	7d52934ec1	Merge pull request #4798 from amshinde/use-iouring-qemu Use iouring for qemu block devices	2022-08-26 04:00:24 +05:30
Tim Zhang	8d4d98587f	Merge pull request #4746 from liubin/fix/4745-add-log-field runtime: explicitly mark the source of the log is from qemu.log	2022-08-08 15:21:01 +08:00
Archana Shinde	c1e3b8f40f	govmm: Refactor qmp functions for adding block device Instead of passing a bunch of arguments to qmp functions for adding block devices, use govmm BlockDevice structure to reduce these. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2022-08-05 13:16:34 -07:00
Archana Shinde	00860a7e43	qmp: Pass aio backend while adding block device Allow govmm to pass aio backend while adding block device. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2022-08-05 13:16:34 -07:00
Bin Liu	85f4e7caf6	runtime: explicitly mark the source of the log is from qemu.log In qemu.StopVM(), if debug is enabled, the shim will dump logs from qemu.log, but users don't know which logs are from qemu.log and shim itself. Adding some additional messages will help users to distinguish these logs. Fixes: #4745 Signed-off-by: Bin Liu <bin@hyper.sh>	2022-07-26 16:08:59 +08:00
Fabiano Fidêncio	aa561b49f5	Merge pull request #4540 from fidencio/topic/default_maxmemory Add `default_maxmemory` config option	2022-06-30 12:08:15 +02:00
Fabiano Fidêncio	323271403e	virtcontainers: Remove unused function While working on the previous commits, some of the functions become non-used. Let's simply remove them. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-06-28 21:19:24 +02:00
Fabiano Fidêncio	58ff2bd5c9	clh,qemu: Adapt to using default_maxmemory Let's adapt Cloud Hypervisor's and QEMU's code to properly behave to the newly added `default_maxmemory` config. While implementing this, a change of behaviour (or a bug fix, depending on how you see it) has been introduced as if a pod requests more memory than the amount avaiable in the host, instead of failing to start the pod, we simply hotplug the maximum amount of memory available, mimicing better the runc behaviour. Fixes: #4516 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-06-28 21:19:24 +02:00
Eric Ernst	bdf5e5229b	virtcontainers: validate hypervisor config outside of hypervisor itself Depending on the user of it, the hypervisor from hypervisor interface could have differing view on what is valid or not. To help decouple, let's instead check the hypervisor config validity as part of the sandbox creation, rather than as part of the CreateVM call within the hypervisor interface implementation. Fixes: #4251 Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-06-27 11:53:41 -07:00
Eric Ernst	469e098543	katautils: don't do validation when loading hypervisor config Policy for whats valid/invalid within the config varies by VMM, host, and by silicon architecture. Let's keep katautils simple for just translating a toml to the hypervisor config structure, and leave validation to virtcontainers. Without this change, we're doing duplicate validation. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-06-27 10:13:26 -07:00
Bin Liu	27b1bb5ed9	Merge pull request #4467 from egernst/device-pkg device package cleanup/refactor	2022-06-27 14:40:53 +08:00
Eric Ernst	f9e96c6506	runtime: device: move to top level package Let's move device package to runtime/pkg instead of being buried under virtcontainers. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-06-26 21:31:29 -07:00
Fabiano Fidêncio	133528dd14	Merge pull request #4503 from amshinde/multi-queue-block block: Leverage multiqueue for virtio-block	2022-06-23 12:17:11 +02:00
Fabiano Fidêncio	78e27de6c3	Merge pull request #4358 from zvonkok/memreserve runtime: Add heuristic to get the right value(s) for mem-reserve	2022-06-22 13:41:23 +02:00
Archana Shinde	e227b4c404	block: Leverage multiqueue for virtio-block Similar to network, we can use multiple queues for virtio-block devices. This can help improve storage performance. This commit changes the number of queues for block devices to the number of cpus for cloud-hypervisor and qemu. Today the default number of cpus a VM starts with is 1. Hence the queues used will be 1. This change will help improve performance when the default cold-plugged cpus is greater than one by changing this in the config file. This may also help when we use the sandboxing feature with k8s that passes down the sum of the resources required down to Kata. Fixes #4502 Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2022-06-21 12:38:53 -07:00
Zvonko Kaiser	e7e7dc9dfe	runtime: Add heuristic to get the right value(s) for mem-reserve Fixes: #2938 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2022-06-21 03:44:28 -07:00
Liang Zhou	ef925d40ce	runtime: enable sandbox feature on qemu Enable "-sandbox on" in qemu can introduce another protect layer on the host, to make the secure container more secure. The default option is disable because this feature may introduce some performance cost, even though user can enable /proc/sys/net/core/bpf_jit_enable to reduce the impact. Fixes: #2266 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-06-17 15:30:46 -07:00
Fabiano Fidêncio	811ac6a8ce	Merge pull request #4282 from r4f4/runtime-dedup-types-import runtime: remove duplicate 'types' import	2022-05-19 22:15:36 +02:00
Chelsea Mafrica	d8be0f8e9f	Merge pull request #4281 from r4f4/runtime-qemu-comments runtime: sync docstrings with function names	2022-05-19 09:17:38 -07:00
Rafael Fonseca	7a5ccd1264	runtime: sync docstrings with function names The functions were renamed but their docstrings were not. Fixes #4006 Signed-off-by: Rafael Fonseca <r4f4rfs@gmail.com>	2022-05-19 14:31:47 +02:00
Rafael Fonseca	ce2e521a0f	runtime: remove duplicate 'types' import Fallout of `09f7962ff` Fixes #4285 Signed-off-by: Rafael Fonseca <r4f4rfs@gmail.com>	2022-05-19 13:49:47 +02:00
Snir Sheriber	44814dce19	qemu: treat console kernel params within appendConsole as it is tightly coupled with the appended console device additionally have it tested Signed-off-by: Snir Sheriber <ssheribe@redhat.com>	2022-05-17 12:05:31 +03:00
Jianyong Wu	982c32358a	Merge pull request #4031 from Jaylyn-Ren/kata-spdk Virtcontainers: Enable hot plugging vhost-user-blk device on ARM	2022-04-29 12:16:38 +08:00
bin	9d5b03a1b7	runtime: delete debug option in virtiofsd virtiofsd's debug will be enabled if hypervisor's debug has been enabled, this will generate too many noisy logs from virtiofsd. Unbind the relationship of log level between virtiofsd and hypervisor, if users want to see debug log of virtiofsd, can set it by: virtio_fs_extra_args = ["-o", "log_level=debug"] Fixes: #3303 Signed-off-by: bin <bin@hyper.sh>	2022-04-07 19:55:22 +08:00

1 2 3 4

179 Commits