Currently, even when using devmapper, if the VMM supports virtio-fs /
virtio-9p, that's used to share a few files between the host and the
guest.
This *needed*, as we need to share with the guest contents like secrets,
certificates, and configurations, via Kubernetes objects like configMaps
or secrets, and those are rotated and must be updated into the guest
whenever the rotation happens.
However, there are still use-cases users can live with just copying
those files into the guest at the pod creation time, and for those
there's absolutely no need to have a shared filesystem process running
with no extra obvious benefit, consuming memory and even increasing the
attack surface used by Kata Containers.
For the case mentioned above, we should allow users, making it very
clear which limitations it'll bring, to run Kata Containers with
devmapper without actually having to use a shared file system, which is
already the approach taken when using Firecracker as the VMM.
Fixes: #7207
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The `-o` option is the legacy way to configure virtiofsd, inherited
from the C implementation. The rust implementation honours it for
compatibility but it logs deprecation warnings.
Let's use the replacement options in the go shim code. Also drop
references to `-o` from the configuration TOML file.
Fixes#7111
Signed-off-by: Greg Kurz <groug@kaod.org>
The C implementation of virtiofsd had some kind of limited support
for remote POSIX locks that was causing some workflows to fail with
kata. Commit 432f9bea6e hard coded `-o no_posix_lock` in order
to enforce guest local POSIX locks and avoid the issues.
We've switched to the rust implementation of virtiofsd since then,
but it emits a warning about `-o` being deprecated.
According to https://gitlab.com/virtio-fs/virtiofsd/-/issues/53 :
The C implementation of the daemon has limited support for
remote POSIX locks, restricted exclusively to non-blocking
operations. We tried to implement the same level of
functionality in #2, but we finally decided against it because,
in practice most applications will fail if non-blocking
operations aren't supported.
Implementing support for non-blocking isn't trivial and will
probably require extending the kernel interface before we can
even start working on the daemon side.
There is thus no justification to pass `-o no_posix_lock` anymore.
Signed-off-by: Greg Kurz <groug@kaod.org>
The rust implementation of virtiofsd always runs foreground and
spits a deprecation warning when `-f` is passed.
Signed-off-by: Greg Kurz <groug@kaod.org>
If we override the cold, hot plug with an annotation
we need to reset the other plugging mechanism to NoPort
otherwise both will be enabled.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
In Virt the vhost-user-block is an PCIe device so
we need to make sure to consider it as well. We're keeping
track of vhost-user-block devices and deduce the correct
amount of PCIe root ports.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Now it is possible to configure the PCIe topology via annotations
and addded a simple test, checking for Invalid and RootPort
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Removed the configuration of PCIeRootPort and PCIeSwitchPort, those
values can be deduced in createPCIeTopology
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Refactor the bus assignment so that the call to GetAllVFIODevicesFromIOMMUGroup
can be used by any module without affecting the topology.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
The hypervisor_state file was the wrong location for the PCIe Port
settings, moved everything under device umbrella, where it can be
consumed more easily and we do not get into circular deps.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Some vmms, such as dragonball, will actively help us
perform online cpu operations when doing cpu hotplug.
Under the old onlineCpuMem interface, it is difficult
to adapt to this situation.
So we modify the semantics of nb_cpus in onlineCpuMemRequest.
In the original semantics, nb_cpus represents the number of
newly added CPUs that need to be online. The modified
semantics become that the number of online CPUs in the guest
needs to be guaranteed.
Fixes: #5030
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
Nobody has volunteered to maintain the (currently broken) snap build, so
remove it.
Fixes: #6769.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
After we have a guest kernel with builtin initramfs which
provide the rootfs measurement capability and Kata rootfs
image with hash device, we need set related root hash value
and measure config to the kernel params in kata configuration file.
Fixes: #6674
Signed-off-by: Wang, Arron <arron.wang@intel.com>
There is a race condition when virtiofsd is killed without finishing all
the clients. Because of that, when a pod is stopped, QEMU detects
virtiofsd is gone, which is legitimate.
Sending a SIGTERM first before killing could introduce some latency
during the shutdown.
Fixes#6757.
Signed-off-by: Beraldo Leal <bleal@redhat.com>
This patch re-generates the client code for Cloud Hypervisor v32.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.
Fixes: #6632
Signed-off-by: Bo Chen <chen.bo@intel.com>
If a hypervisor debug console is enabled and sandbox_cgroup_only is set,
the hypervisor can fail to open /dev/ptmx, which prevents the sandbox
from launching.
This is caused by the absence of a device cgroup entry to allow access
to /dev/ptmx. When sandbox_cgroup_only is not set, the hypervisor
inherits the default unrestrcited device cgroup, but with it enabled it
runs into allow / deny list restrictions.
Fix by adding an allowlist entry for /dev/ptmx when debug is enabled,
sandbox_cgroup_only is true, and no /dev/ptmx is already in the list of
devices.
Fixes: #6870
Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
This PR updates the container network model url that is part of the
virtcontainers documentation.
Fixes#6889
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
When this option is enabled the runtime will attempt to determine the
appropriate sandbox size (memory, CPU) before booting the virtual
machine.
As TEEs do not support memory and CPU hotplug, this approach must be
used.
Fixes: #6818
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The AmdSev firmware package should be used with
measured direct boot. If the expected hashes are not
injected into the firmware binary by the VMM, the
guest will not boot. This is required for security.
Currently the main branch does not have the extended
shim support for SEV, which tells the VMM to inject
the expected hashes.
We ship the standard OVMF package to use with SNP,
so let's switch SEV to that for now. This will need
to be changed back when shim support for SEV(-ES)
is added to main.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
We have been using the C version of virtiofsd on ppc64le. Now that the issue with
rust virtiofsd have been fixed, let's switch to it.
Fixes: #4259
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
Supports both online and offline modes of interaction with simple-kbs
for SEV/SEV-ES confidential guests.
Fixes: #6795
Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>
The sev package provides utilities for launching AMD SEV and SEV-ES
confidential guests.
Fixes: #6795
Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>
Let's specifically name the `gpu` runtime class as `nvidia-gpu`. By
doing this we keep the door open and ease the life of the next vendor
adding GPU support for Kata Containers.
Fixes: #6553
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Rework TestQemuCreateVM routine to be a table driven test with
various config variations passed to it. After CreateVM a handful
of additional functions are exercised to improve code-coverage.
Also add partial coverage for StartVM routine.
Currently improving from 19.7% to 35.7%
Credit PR to Hackathon Team3
Fixes: #267
Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
With this fix the vCPU pinning feature chooses the correct
physical cores to pin the vCPU threads on rather than always using core 0.
Fixes#6831
Signed-off-by: Peteris Rudzusiks <rye@stripe.com>