Commit Graph

1631 Commits

Author SHA1 Message Date
Manabu Sugimoto
416445e7eb docs: Remove installation step in virtcontainers doc
Remove the installation step in the virtcontainers doc
because the virtcontainers install/uninstall targets have
been removed by 86723b51ae
and they are not used anymore.

Fixes: #7637

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-14 15:15:24 +09:00
Manabu Sugimoto
cc922be5ec versions: Update firecracker version to 1.4.0
This patch upgrades Firecracker version from v1.1.0 to v1.4.0.

* Generate swagger models for v1.4.0 (from `firecracker.yaml`)
  - The version of go-swagger used is v0.30.0
* The firecracker v1.4.0 includes the following changes.
  - Added
    * Added support for custom CPU templates allowing users to adjust vCPU features
    exposed to the guest via CPUID, MSRs and ARM registers.
    * Introduced V1N1 static CPU template for ARM to represent Neoverse V1 CPU
    as Neoverse N1.
    * Added support for the virtio-rng entropy device. The device is optional. A
    single device can be enabled per VM using the /entropy endpoint.
    * Added a cpu-template-helper tool for assisting with creating and managing
    custom CPU templates.
  - Changed
    * Set FDP_EXCPTN_ONLY bit (CPUID.7h.0:EBX[6]) and ZERO_FCS_FDS bit
    (CPUID.7h.0:EBX[13]) in Intel's CPUID normalization process.
  - Fixed
    * Fixed feature flags in T2S CPU template on Intel Ice Lake.
    * Fixed CPUID leaf 0xb to be exposed to guests running on AMD host.
    * Fixed a performance regression in the jailer logic for closing open file
    descriptors.
    * A race condition that has been identified between the API thread and the VMM
    thread due to a misconfiguration of the api_event_fd.
    * Fixed CPUID leaf 0x1 to disable perfmon and debug feature on x86 host.
    * Fixed passing through cache information from host in CPUID leaf 0x80000006.
    * Fixed the T2S CPU template to set the RRSBA bit of the IA32_ARCH_CAPABILITIES
    MSR to 1 in accordance with an Intel microcode update.
    * Fixed the T2CL CPU template to pass through the RSBA and RRSBA bits of the
    IA32_ARCH_CAPABILITIES MSR from the host in accordance with an Intel microcode
    update.
    * Fixed passing through cache information from host in CPUID leaf 0x80000005.
    * Fixed the T2A CPU template to disable SVM (nested virtualization).
    * Fixed the T2A CPU template to set EferLmsleUnsupported bit
    (CPUID.80000008h:EBX[20]), which indicates that EFER[LMSLE] is not supported.

Fixes: #7610

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-10 16:48:13 +09:00
Wedson Almeida Filho
4fbe0a3a53 runtime: bind-mount mounted block device into container
When the mounted block device isn't a layer, we want to mount it into
containers, but since it's already mounted with the correct fs (e.g.,
tar, ext4, etc.) in the pod, we just bind-mount it into the container.

Fixes: #7536

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 17:58:39 -03:00
Wedson Almeida Filho
7e1b1949d4 runtime: add support for kata overlays
When at least one `io.katacontainers.fs-opt.layer` option is added to
the rootfs, it gets inserted into the VM as a layer, and the file system
is mounted as an overlay of all layers using the overlayfs driver.

Additionally, if the `io.katacontainers.fs-opt.block_device=file` option
is present in a layer, it is mounted as a block device backed by a file
on the host.

Fixes: #7536

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 17:58:39 -03:00
Zvonko Kaiser
cddcde1d40 vfio: Fix vfio device ordering
If modeVFIO is enabled we need 1st to attach the VFIO control group
device /dev/vfio/vfio an 2nd the actuall device(s) afterwards.Sort the
devices starting with device #1 being the VFIO control group device and
the next the actuall device(s)
/dev/vfio/<group>

Fixes: #7493

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-31 11:26:27 +00:00
Zvonko Kaiser
1fc715bc65 s390x: Add AP Attach/Detach test
Now that we have propper AP device support add a
unit test for testing the correct Attach/Detach of AP devices.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-23 13:44:19 +00:00
Zvonko Kaiser
545de5042a vfio: Fix tests
Now with more elaborate checking of cold|hot plug ports
we needed to update some of the tests.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-20 13:42:44 +00:00
Zvonko Kaiser
62aa6750ec vfio: Added better handling of VFIO Control Devices
Depending on the vfio_mode we need to mount the
VFIO control device additionally into the container.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-20 13:42:42 +00:00
Zvonko Kaiser
dd422ccb69 vfio: Remove obsolete HotplugVFIOonRootBus
Removing HotplugVFIOonRootBus which is obsolete with the latest PCI
topology changes, users can set cold_plug_vfio or hot_plug_vfio either
in the configuration.toml or via annotations.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-20 07:25:40 +00:00
Zvonko Kaiser
114542e2ba s390x: Fixing device.Bus assignment
The device.Bus was reset if a specific combination of
configuration parameters were not met. With the new
PCIe topology this should not happen anymore

Fixes: #7381

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-20 07:24:26 +00:00
Peng Tao
581be92b25 Merge pull request #4492 from zvonkok/pcie-topology
runtime: fix PCIe topology for GPUDirect use-case
2023-07-03 09:17:12 +08:00
Fabiano Fidêncio
6a21e20c63 runtime: Add "none" as a shared_fs option
Currently, even when using devmapper, if the VMM supports virtio-fs /
virtio-9p, that's used to share a few files between the host and the
guest.

This *needed*, as we need to share with the guest contents like secrets,
certificates, and configurations, via Kubernetes objects like configMaps
or secrets, and those are rotated and must be updated into the guest
whenever the rotation happens.

However, there are still use-cases users can live with just copying
those files into the guest at the pod creation time, and for those
there's absolutely no need to have a shared filesystem process running
with no extra obvious benefit, consuming memory and even increasing the
attack surface used by Kata Containers.

For the case mentioned above, we should allow users, making it very
clear which limitations it'll bring, to run Kata Containers with
devmapper without actually having to use a shared file system, which is
already the approach taken when using Firecracker as the VMM.

Fixes: #7207

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-06-30 20:45:00 +02:00
Zvonko Kaiser
0f454d0c04 gpu: Fixing typos for PCIe topology changes
Some comments and functions had typos and wrong capitalization.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-30 08:42:55 +00:00
Zvonko Kaiser
8330fb8ee7 gpu: Update unit tests
Some tests are now failing due to the changes how PCIe is
handled. Update the test accordingly.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-23 11:16:25 +00:00
Greg Kurz
a43ea24dfc virtiofsd: Convert legacy -o sub-options to their -- replacement
The `-o` option is the legacy way to configure virtiofsd, inherited
from the C implementation. The rust implementation honours it for
compatibility but it logs deprecation warnings.

Let's use the replacement options in the go shim code. Also drop
references to `-o` from the configuration TOML file.

Fixes #7111

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-06-16 11:42:54 +02:00
Greg Kurz
8e00dc6944 virtiofsd: Drop -o no_posix_lock
The C implementation of virtiofsd had some kind of limited support
for remote POSIX locks that was causing some workflows to fail with
kata. Commit 432f9bea6e hard coded `-o no_posix_lock` in order
to enforce guest local POSIX locks and avoid the issues.

We've switched to the rust implementation of virtiofsd since then,
but it emits a warning about `-o` being deprecated.

According to https://gitlab.com/virtio-fs/virtiofsd/-/issues/53 :

   The C implementation of the daemon has limited support for
   remote POSIX locks, restricted exclusively to non-blocking
   operations. We tried to implement the same level of
   functionality in #2, but we finally decided against it because,
   in practice most applications will fail if non-blocking
   operations aren't supported.

   Implementing support for non-blocking isn't trivial and will
   probably require extending the kernel interface before we can
   even start working on the daemon side.

There is thus no justification to pass `-o no_posix_lock` anymore.

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-06-16 11:42:39 +02:00
Greg Kurz
2a15ad9788 virtiofsd: Stop using deprecated -f option
The rust implementation of virtiofsd always runs foreground and
spits a deprecation warning when `-f` is passed.

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-06-16 10:30:40 +02:00
Zvonko Kaiser
72f2cb84e6 gpu: Reset cold or hot plug after overriding
If we override the cold, hot plug with an annotation
we need to reset the other plugging mechanism to NoPort
otherwise both will be enabled.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-15 17:51:01 +00:00
Zvonko Kaiser
fbacc09646 gpu: PCIe topology, consider vhost-user-block in Virt
In Virt the vhost-user-block is an PCIe device so
we need to make sure to consider it as well. We're keeping
track of vhost-user-block devices and deduce the correct
amount of PCIe root ports.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-15 17:39:55 +00:00
Zvonko Kaiser
b11246c3aa gpu: Various fixes for virt machine type
The PCI qom path was not deduced correctly added regex for correct
path walking.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:33:57 +00:00
Zvonko Kaiser
40101ea7db vfio: Added annotation for hot(cold) plug
Now it is possible to configure the PCIe topology via annotations
and addded a simple test, checking for Invalid and RootPort

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
8f0d4e2612 vfio: Cleanup of Cold and Hot Plug
Removed the configuration of PCIeRootPort and PCIeSwitchPort, those
values can be deduced in createPCIeTopology

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
b5c4677e0e vfio: Rearrange the bus assignemnt
Refactor the bus assignment so that the call to GetAllVFIODevicesFromIOMMUGroup
can be used by any module without affecting the topology.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
b1aa8c8a24 gpu: Moved the PCIe configs to drivers
The hypervisor_state file was the wrong location for the PCIe Port
settings, moved everything under device umbrella, where it can be
consumed more easily and we do not get into circular deps.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
55a66eb7fb gpu: Add config to TOML
Update cold-plug and hot-plug setting to include bridge, root and
switch-port

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
da42801c38 gpu: Add config settings tests for hot-plug
Updated all references and config settings for hot-plug to match
cold-plug

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
de39fb7d38 runtime: Add support for GPUDirect and GPUDirect RDMA PCIe topology
Fixes: #4491

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zhongtao Hu
355a24e0e1 Merge pull request #6289 from openanolis/runtime_vcpu_resize
feat(runtime): vcpu resize capability
2023-06-13 10:54:11 +08:00
Yushuo
aaa96c749b feat(runtime-rs): modify onlineCpuMemRequest
Some vmms, such as dragonball, will actively help us
perform online cpu operations when doing cpu hotplug.
Under the old onlineCpuMem interface, it is difficult
to adapt to this situation.

So we modify the semantics of nb_cpus in onlineCpuMemRequest.
In the original semantics, nb_cpus represents the number of
newly added CPUs that need to be online. The modified
semantics become that the number of online CPUs in the guest
needs to be guaranteed.

Fixes: #5030

Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2023-06-12 17:53:16 +08:00
James O. D. Hunt
8cb4238b46 packaging: Remove snap package
Nobody has volunteered to maintain the (currently broken) snap build, so
remove it.

Fixes: #6769.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-06-12 09:24:09 +01:00
Wang, Arron
f62b2670c0 config: Add root hash value and measure config to kernel params
After we have a guest kernel with builtin initramfs which
provide the rootfs measurement capability and Kata rootfs
image with hash device, we need set related root hash value
and measure config to the kernel params in kata configuration file.

Fixes: #6674

Signed-off-by: Wang, Arron <arron.wang@intel.com>
2023-06-06 12:34:13 +02:00
Beraldo Leal
0e47cfc4c7 runtime: sending SIGKILL to qemu
There is a race condition when virtiofsd is killed without finishing all
the clients. Because of that, when a pod is stopped, QEMU detects
virtiofsd is gone, which is legitimate.

Sending a SIGTERM first before killing could introduce some latency
during the shutdown.

Fixes #6757.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-05-24 11:31:28 -04:00
Fabiano Fidêncio
9aae333343 Merge pull request #6871 from kmjohansen/bugfix/ptmx
runtime: make debug console work with sandbox_cgroup_only
2023-05-23 22:24:51 +02:00
Archana Shinde
2c9efbe04c Merge pull request #6907 from likebreath/0519/clh_v32.0
Upgrade to Cloud Hypervisor v32.0
2023-05-22 09:53:05 -07:00
GabyCT
6796af511b Merge pull request #6890 from GabyCT/topic/fixurlvirt
docs: Update container network model url
2023-05-19 15:10:26 -06:00
Bo Chen
35c3d7b4bc runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v32.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.

Fixes: #6632

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-05-19 12:49:45 -07:00
Fabiano Fidêncio
0364620844 Merge pull request #6819 from fidencio/topic/use-static-sandbox-resource-mgmt-for-TEEs
runtime: Use static_sandbox_resource_mgmt=true for TEEs
2023-05-18 22:38:31 +02:00
Krister Johansen
eff6ed2d5f runtime: make debug console work with sandbox_cgroup_only
If a hypervisor debug console is enabled and sandbox_cgroup_only is set,
the hypervisor can fail to open /dev/ptmx, which prevents the sandbox
from launching.

This is caused by the absence of a device cgroup entry to allow access
to /dev/ptmx.  When sandbox_cgroup_only is not set, the hypervisor
inherits the default unrestrcited device cgroup, but with it enabled it
runs into allow / deny list restrictions.

Fix by adding an allowlist entry for /dev/ptmx when debug is enabled,
sandbox_cgroup_only is true, and no /dev/ptmx is already in the list of
devices.

Fixes: #6870

Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
2023-05-18 10:36:24 -07:00
Gabriela Cervantes
11a34a72e2 docs: Update container network model url
This PR updates the container network model url that is part of the
virtcontainers documentation.

Fixes #6889

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-05-18 15:08:08 +00:00
Fabiano Fidêncio
3a4b924226 Merge pull request #6833 from rye-stripe/bugfix/vcpu-pinning
resource-control: fix setting CPU affinities on Linux
2023-05-18 08:12:39 +02:00
Fabiano Fidêncio
e762f70920 Merge pull request #6838 from rye-stripe/bugfix/use-enable-vcpus-pinning-from-toml
runtime: use enable_vcpus_pinning from toml
2023-05-17 21:30:44 +02:00
Fabiano Fidêncio
ca1531fe9d runtime: Use static_sandbox_resource_mgmt=true for TEEs
When this option is enabled the runtime will attempt to determine the
appropriate sandbox size (memory, CPU) before booting the virtual
machine.

As TEEs do not support memory and CPU hotplug, this approach must be
used.

Fixes: #6818

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-17 19:21:52 +02:00
Fabiano Fidêncio
8ce14e709a Merge pull request #6810 from fitzthum/snp-enable
gha: Enable SEV-SNP tests on main
2023-05-17 15:29:54 +02:00
Wainer Moschetta
259158f1c3 Merge pull request #6789 from dubek/add-sev-package
runtime: Port sev package to main
2023-05-17 10:02:19 -03:00
Tobin Feldman-Fitzthum
cbb9fe8b81 config: Use standard OVMF with SEV
The AmdSev firmware package should be used with
measured direct boot. If the expected hashes are not
injected into the firmware binary by the VMM, the
guest will not boot. This is required for security.

Currently the main branch does not have the extended
shim support for SEV, which tells the VMM to inject
the expected hashes.

We ship the standard OVMF package to use with SNP,
so let's switch SEV to that for now. This will need
to be changed back when shim support for SEV(-ES)
is added to main.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
2023-05-17 11:36:04 +02:00
Fabiano Fidêncio
9630c13ac0 Merge pull request #6845 from fidencio/topic/yet-more-nvidia-gpu-naming-fixes
gpu: Rename the last bits from `gpu` to `nvidia-gpu`
2023-05-17 09:05:12 +02:00
Amulya Meka
3ccc29030d Merge pull request #6780 from Amulyam24/rust-virtfs
ppc64le: switch virtiofsd from C to rust version
2023-05-17 09:36:28 +05:30
Salvador Fuentes
b76058c979 Merge pull request #6721 from nedsouza/virtcontainers-qemu-go-coverage
virtcontainers/qemu_test.go: Improve coverage
2023-05-16 11:11:43 -06:00
James O. D. Hunt
a96fcfd5be Merge pull request #6735 from nedsouza/258/tests-coverage-compatoci
virtcontainers/pkg/compatoci/: Improved coverage for  for Kata 2.0
2023-05-16 15:36:35 +01:00
Amulyam24
c5a59caca1 ppc64le: switch virtiofsd from C to rust version
We have been using the C version of virtiofsd on ppc64le. Now that the issue with
rust virtiofsd have been fixed, let's switch to it.

Fixes: #4259

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2023-05-16 14:46:19 +02:00