Added driver util function for easier handling of VFIO
devices outside of the VFIO module. At the sandbox level
we may need to set options depending if we have a VFIO/PCIe
device, like the fwCfg for confiential guests.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Some functions may be used in other modules then only in
the VFIO module, extract them and make them available to
other layers like sandbox.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
If we have a VFIO device and cold-plug is enabled
we mark each device as ColdPlug=true and let the VFIO
module do the attaching.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
RawDevics are used to get PCIe device info early before the sandbox
is started to make better PCIe topology decisions
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
For the hypervisor to distinguish between PCIe components, adding
a new enum that can be used for hot-plug and cold-plug of PCIe devices
Fixes: #6687
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
We need to set hotplug on pci root port and enable at least one
root port. Also set the guest-hooks-dir to the correct path
Fixes: #6675
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
On some systems a GPU is in a IOMMU group with a PCI Bridge and
PCI Host Bridge. Per default no PCI Bridge needs to be passed-through.
When scanning the IOMMU group, ignore devices with a 0x60 class ID prefix.
Fixes: #6663
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
When testing on AKS, we've been hitting the dial_timeout every now and
then. Let's increase it to 45 seconds (instead of 30) for all the VMMs,
and to 60 seconfs in case of TEEs.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Booting up TDX takes more time than booting up a normal VM. Those
values are being already used as part of the CCv0 branch, and we're just
bringing them to the `main` branch as well.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The socket file for shim management is created in /run/kata
and it isn't deleted after the container is stopped. After
running and stopping thousands of containers /run folder
will run out of space.
Fixes#6622
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Co-authored-by: Greg Kurz <groug@kaod.org>
As the QEMU configuration for TDX differs quite a lot from the normal
QEMU configuration, let's add a new configuration file for the QEMU TDX.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Since TDX doesn't support readonly memslot, TDVF cannot be mapped as
pflash device and it actually works as RAM. "-bios" option is chosen to
load TDVF.
OVMF is the opensource firmware that implements the TDVF support. Thus
the command line to specify and load TDVF is ``-bios OVMF.fd``
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's make sure we also check /sys/firmwares/tdx for TDX guest
protection, as the location may depend on whether TDX Seam is being used
or not.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The kata monitor metrics API returns a huge size response,
if containers or sandboxs are a large number,
focus on what we need will be harder.
Fixes: #6500
Signed-off-by: Miao Xia <xia.miao1@zte.com.cn>
For now, image nvdimm on qemu/arm64 depends on UEFI/ACPI, so if there
is no firmware offered, it should be disabled.
Fixes: #6468
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
This PR is a continuing work for (kata-containers#3679).
This generalizes the previous VFIO device handling which only
focuses on PCI to include AP (IBM Z specific).
Fixes: kata-containers#3678
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Initial VFIO-AP support (#578) was simple, but somewhat hacky; a
different code path would be chosen for performing the hotplug, and
agent-side device handling was bound to knowing the assigned queue
numbers (APQNs) through some other means; plus the code for awaiting
them was written for the Go agent and never released. This code also
artificially increased the hotplug timeout to wait for the (relatively
expensive, thus limited to 5 seconds at the quickest) AP rescan, which
is impractical for e.g. common k8s timeouts.
Since then, the general handling logic was improved (#1190), but it
assumed PCI in several places.
In the runtime, introduce and parse AP devices. Annotate them as such
when passing to the agent, and include information about the associated
APQNs.
The agent awaits the passed APQNs through uevents and triggers a
rescan directly.
Fixes: #3678
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
Generalize VFIO devices to allow for adding AP in the next patch.
The logic for VFIOPciDeviceMediatedType() has been changed and IsAPVFIOMediatedDevice() has been removed.
The rationale for the revomal is:
- VFIODeviceMediatedType is divided into 2 subtypes for AP and PCI
- Logic of checking a subtype of mediated device is included in GetVFIODeviceType()
- VFIOPciDeviceMediatedType() can simply fulfill the device addition based
on a type categorized by GetVFIODeviceType()
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
e.g., split_vfio_option is PCI-specific and should instead be named
split_vfio_pci_option. This mutually affects the runtime, most notably
how the labels are named for the agent.
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
This adds /dev/mshv to the list of sandbox devices so that VMMs can
create Hyper-V VMs.
In our testing, this also doesn't error out in case /dev/mshv isn't
present.
Fixes#6454.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
On hotplug of memory as containers are started, remount all ephemeral mounts with size option set to the total sandbox memory
Fixes: #6417
Signed-off-by: Sidhartha Mani <sidhartha_mani@apple.com>
When update the nydusd to 2.2, the argument "--hybrid-mode" cause
the following error:
thread 'main' panicked at 'ArgAction::SetTrue / ArgAction::SetFalse is defaulted'
Maybe we should remove it to upgrad nydusd
Fixes: #6407
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
$ make install
make: *** No rule to make target 'containerd-shim-kata-v2', needed by 'install-containerd-shim-v2'. Stop.
Spotted when building kata-runtime with a different name for
SHIMV2_OUTPUT. For instance, trying to keep different runtime binaries
installed at the same time, one from master and another from lets say,
the CCv0 branch, with the following small change applied.
diff --git a/src/runtime/Makefile b/src/runtime/Makefile
index 95efaff78..2bab9eb75 100644
--- a/src/runtime/Makefile
+++ b/src/runtime/Makefile
@@ -231,7 +231,7 @@ SED = sed
CLI_DIR = cmd
SHIMV2 = containerd-shim-kata-v2
-SHIMV2_OUTPUT = $(bCURDIR)/$(SHIMV2)
+SHIMV2_OUTPUT = $(CURDIR)/$(SHIMV2)-ccv0
SHIMV2_DIR = $(CLI_DIR)/$(SHIMV2)
MONITOR = kata-monitor
Fixes: #6398
Signed-off-by: Eduardo Lima (Etrunko) <etrunko@redhat.com>
This patch re-generates the client code for Cloud Hypervisor v30.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.
Fixes: #6375
Signed-off-by: Bo Chen <chen.bo@intel.com>
Fix path check bypassed issuse introduced by #6082,
use filepath.Clean() to clean path before check
Fixes: #6082
Signed-off-by: XDTG <click1799@163.com>
This change enables to run cloud-hypervisor VMM using a non-root user
when rootless flag is set true in the configuration
Fixes: #2567
Signed-off-by: Feng Wang <fwang@confluent.io>
Currently the symbolic link for virtiofsd which is used as
a valid path is not updated on every CI run. Fix it by
using the actual path of installation.
Fixes: #6311
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
Normally we return the context when creating a trace span so that the
ordering of spans w.r.t. calls is maintained in tracing output. Add
missing context for StartVM() for Cloud Hypervisor.
Fixes#6271
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>