Now that we have propper AP device support add a
unit test for testing the correct Attach/Detach of AP devices.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Removing HotplugVFIOonRootBus which is obsolete with the latest PCI
topology changes, users can set cold_plug_vfio or hot_plug_vfio either
in the configuration.toml or via annotations.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
The device.Bus was reset if a specific combination of
configuration parameters were not met. With the new
PCIe topology this should not happen anymore
Fixes: #7381
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Currently, even when using devmapper, if the VMM supports virtio-fs /
virtio-9p, that's used to share a few files between the host and the
guest.
This *needed*, as we need to share with the guest contents like secrets,
certificates, and configurations, via Kubernetes objects like configMaps
or secrets, and those are rotated and must be updated into the guest
whenever the rotation happens.
However, there are still use-cases users can live with just copying
those files into the guest at the pod creation time, and for those
there's absolutely no need to have a shared filesystem process running
with no extra obvious benefit, consuming memory and even increasing the
attack surface used by Kata Containers.
For the case mentioned above, we should allow users, making it very
clear which limitations it'll bring, to run Kata Containers with
devmapper without actually having to use a shared file system, which is
already the approach taken when using Firecracker as the VMM.
Fixes: #7207
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
If we override the cold, hot plug with an annotation
we need to reset the other plugging mechanism to NoPort
otherwise both will be enabled.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
In Virt the vhost-user-block is an PCIe device so
we need to make sure to consider it as well. We're keeping
track of vhost-user-block devices and deduce the correct
amount of PCIe root ports.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Now it is possible to configure the PCIe topology via annotations
and addded a simple test, checking for Invalid and RootPort
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Removed the configuration of PCIeRootPort and PCIeSwitchPort, those
values can be deduced in createPCIeTopology
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Refactor the bus assignment so that the call to GetAllVFIODevicesFromIOMMUGroup
can be used by any module without affecting the topology.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
The hypervisor_state file was the wrong location for the PCIe Port
settings, moved everything under device umbrella, where it can be
consumed more easily and we do not get into circular deps.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Supports both online and offline modes of interaction with simple-kbs
for SEV/SEV-ES confidential guests.
Fixes: #6795
Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>
The sev package provides utilities for launching AMD SEV and SEV-ES
confidential guests.
Fixes: #6795
Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>
With this fix the vCPU pinning feature chooses the correct
physical cores to pin the vCPU threads on rather than always using core 0.
Fixes#6831
Signed-off-by: Peteris Rudzusiks <rye@stripe.com>
Added driver util function for easier handling of VFIO
devices outside of the VFIO module. At the sandbox level
we may need to set options depending if we have a VFIO/PCIe
device, like the fwCfg for confiential guests.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Some functions may be used in other modules then only in
the VFIO module, extract them and make them available to
other layers like sandbox.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
If we have a VFIO device and cold-plug is enabled
we mark each device as ColdPlug=true and let the VFIO
module do the attaching.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
For the hypervisor to distinguish between PCIe components, adding
a new enum that can be used for hot-plug and cold-plug of PCIe devices
Fixes: #6687
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Expanded tests on signals_test.go to cover more lines of code. 'go test' won't show 100% coverage (only 66.7%), because one test need to spawn a new
process (since it is testing a function that calls os.Exit(1)).
Fixes: #256
Signed-off-by: Eduardo Berrocal <eduardo.berrocal@intel.com>
On some systems a GPU is in a IOMMU group with a PCI Bridge and
PCI Host Bridge. Per default no PCI Bridge needs to be passed-through.
When scanning the IOMMU group, ignore devices with a 0x60 class ID prefix.
Fixes: #6663
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
The socket file for shim management is created in /run/kata
and it isn't deleted after the container is stopped. After
running and stopping thousands of containers /run folder
will run out of space.
Fixes#6622
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Co-authored-by: Greg Kurz <groug@kaod.org>
Since TDX doesn't support readonly memslot, TDVF cannot be mapped as
pflash device and it actually works as RAM. "-bios" option is chosen to
load TDVF.
OVMF is the opensource firmware that implements the TDVF support. Thus
the command line to specify and load TDVF is ``-bios OVMF.fd``
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The kata monitor metrics API returns a huge size response,
if containers or sandboxs are a large number,
focus on what we need will be harder.
Fixes: #6500
Signed-off-by: Miao Xia <xia.miao1@zte.com.cn>
For now, image nvdimm on qemu/arm64 depends on UEFI/ACPI, so if there
is no firmware offered, it should be disabled.
Fixes: #6468
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
This PR is a continuing work for (kata-containers#3679).
This generalizes the previous VFIO device handling which only
focuses on PCI to include AP (IBM Z specific).
Fixes: kata-containers#3678
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Initial VFIO-AP support (#578) was simple, but somewhat hacky; a
different code path would be chosen for performing the hotplug, and
agent-side device handling was bound to knowing the assigned queue
numbers (APQNs) through some other means; plus the code for awaiting
them was written for the Go agent and never released. This code also
artificially increased the hotplug timeout to wait for the (relatively
expensive, thus limited to 5 seconds at the quickest) AP rescan, which
is impractical for e.g. common k8s timeouts.
Since then, the general handling logic was improved (#1190), but it
assumed PCI in several places.
In the runtime, introduce and parse AP devices. Annotate them as such
when passing to the agent, and include information about the associated
APQNs.
The agent awaits the passed APQNs through uevents and triggers a
rescan directly.
Fixes: #3678
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
Generalize VFIO devices to allow for adding AP in the next patch.
The logic for VFIOPciDeviceMediatedType() has been changed and IsAPVFIOMediatedDevice() has been removed.
The rationale for the revomal is:
- VFIODeviceMediatedType is divided into 2 subtypes for AP and PCI
- Logic of checking a subtype of mediated device is included in GetVFIODeviceType()
- VFIOPciDeviceMediatedType() can simply fulfill the device addition based
on a type categorized by GetVFIODeviceType()
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
e.g., split_vfio_option is PCI-specific and should instead be named
split_vfio_pci_option. This mutually affects the runtime, most notably
how the labels are named for the agent.
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>