Commit Graph

1821 Commits

Author SHA1 Message Date
Fabiano Fidêncio
516468815e cc: Merge from main to CCv0 - Aug 14th
Conflicts:
	src/agent/src/rpc.rs

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-14 09:22:03 +02:00
Manabu Sugimoto
cc922be5ec versions: Update firecracker version to 1.4.0
This patch upgrades Firecracker version from v1.1.0 to v1.4.0.

* Generate swagger models for v1.4.0 (from `firecracker.yaml`)
  - The version of go-swagger used is v0.30.0
* The firecracker v1.4.0 includes the following changes.
  - Added
    * Added support for custom CPU templates allowing users to adjust vCPU features
    exposed to the guest via CPUID, MSRs and ARM registers.
    * Introduced V1N1 static CPU template for ARM to represent Neoverse V1 CPU
    as Neoverse N1.
    * Added support for the virtio-rng entropy device. The device is optional. A
    single device can be enabled per VM using the /entropy endpoint.
    * Added a cpu-template-helper tool for assisting with creating and managing
    custom CPU templates.
  - Changed
    * Set FDP_EXCPTN_ONLY bit (CPUID.7h.0:EBX[6]) and ZERO_FCS_FDS bit
    (CPUID.7h.0:EBX[13]) in Intel's CPUID normalization process.
  - Fixed
    * Fixed feature flags in T2S CPU template on Intel Ice Lake.
    * Fixed CPUID leaf 0xb to be exposed to guests running on AMD host.
    * Fixed a performance regression in the jailer logic for closing open file
    descriptors.
    * A race condition that has been identified between the API thread and the VMM
    thread due to a misconfiguration of the api_event_fd.
    * Fixed CPUID leaf 0x1 to disable perfmon and debug feature on x86 host.
    * Fixed passing through cache information from host in CPUID leaf 0x80000006.
    * Fixed the T2S CPU template to set the RRSBA bit of the IA32_ARCH_CAPABILITIES
    MSR to 1 in accordance with an Intel microcode update.
    * Fixed the T2CL CPU template to pass through the RSBA and RRSBA bits of the
    IA32_ARCH_CAPABILITIES MSR from the host in accordance with an Intel microcode
    update.
    * Fixed passing through cache information from host in CPUID leaf 0x80000005.
    * Fixed the T2A CPU template to disable SVM (nested virtualization).
    * Fixed the T2A CPU template to set EferLmsleUnsupported bit
    (CPUID.80000008h:EBX[20]), which indicates that EFER[LMSLE] is not supported.

Fixes: #7610

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-10 16:48:13 +09:00
stevenhorsman
a0ebfbf18a runtime: Re-added hypervisor annotations
- Add support for setting the sandbox name and namespace
in the hypervisor config, which is needed in the remote hypervisor
implementation to get the pod name and namespace for the remote pod
create request

Fixes: #7588
Co-authored-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Co-authored-by: Yohei Ueda <yohei@jp.ibm.com>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-08-08 14:04:37 +01:00
Fabiano Fidêncio
5f5e05a77f CC: Merge from main to CCv0 - Aug 7th, 2023
Conflicts:
	src/runtime/pkg/containerd-shim-v2/create.go
	tools/packaging/kata-deploy/local-build/kata-deploy-binaries.sh
	tools/packaging/scripts/lib.sh

Fixes: #7563
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-07 11:12:04 +02:00
Fabiano Fidêncio
7164ced4dc CCv0: Merge from main -- August 1st
Conflicts:
	src/runtime/pkg/katautils/config.go
	src/runtime/virtcontainers/container.go
	src/runtime/virtcontainers/hypervisor.go
	src/runtime/virtcontainers/qemu_arch_base.go
	src/runtime/virtcontainers/sandbox.go
	tests/integration/kubernetes/gha-run.sh
	tests/integration/kubernetes/setup.sh
	tools/packaging/kata-deploy/kata-deploy/base/kata-deploy.yaml
	tools/packaging/kata-deploy/local-build/kata-deploy-binaries.sh
	tools/packaging/kata-deploy/scripts/kata-deploy.sh
	tools/packaging/kernel/kata_config_version
	versions.yaml

Fixes: #7433

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-04 22:15:09 +02:00
Wedson Almeida Filho
4fbe0a3a53 runtime: bind-mount mounted block device into container
When the mounted block device isn't a layer, we want to mount it into
containers, but since it's already mounted with the correct fs (e.g.,
tar, ext4, etc.) in the pod, we just bind-mount it into the container.

Fixes: #7536

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 17:58:39 -03:00
Wedson Almeida Filho
7e1b1949d4 runtime: add support for kata overlays
When at least one `io.katacontainers.fs-opt.layer` option is added to
the rootfs, it gets inserted into the VM as a layer, and the file system
is mounted as an overlay of all layers using the overlayfs driver.

Additionally, if the `io.katacontainers.fs-opt.block_device=file` option
is present in a layer, it is mounted as a block device backed by a file
on the host.

Fixes: #7536

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 17:58:39 -03:00
Zvonko Kaiser
cddcde1d40 vfio: Fix vfio device ordering
If modeVFIO is enabled we need 1st to attach the VFIO control group
device /dev/vfio/vfio an 2nd the actuall device(s) afterwards.Sort the
devices starting with device #1 being the VFIO control group device and
the next the actuall device(s)
/dev/vfio/<group>

Fixes: #7493

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-31 11:26:27 +00:00
ChengyuZhu6
57b932c127 kata-runtime: Add configurable image request timeout
Add ImageRequestTimeout field in the config struct, set RequestTimeout
by configured image request timeout, add image_request_timeout to
default configuration files, add image request timeout to annotations
and add image timeout annotation to sandbox config documentation.

exp:

configure the image request timout in the configuration:
[image]
image_request_timeout = 300

configure the image request timeout in the yaml:
annotations:
      "io.katacontainers.config.runtime.image_request_timeout": "300"

Fixes: #7389

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2023-07-27 18:18:54 +02:00
Fabiano Fidêncio
068e535b9d runtime: tdx: Adjust QEMU TDX path
We need to use qemu-system-x86_64-tdx-experimental instead of
qemu-system-x86_64-tdx.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-25 00:39:52 +02:00
Zvonko Kaiser
1fc715bc65 s390x: Add AP Attach/Detach test
Now that we have propper AP device support add a
unit test for testing the correct Attach/Detach of AP devices.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-23 13:44:19 +00:00
Zvonko Kaiser
545de5042a vfio: Fix tests
Now with more elaborate checking of cold|hot plug ports
we needed to update some of the tests.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-20 13:42:44 +00:00
Zvonko Kaiser
62aa6750ec vfio: Added better handling of VFIO Control Devices
Depending on the vfio_mode we need to mount the
VFIO control device additionally into the container.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-20 13:42:42 +00:00
Zvonko Kaiser
dd422ccb69 vfio: Remove obsolete HotplugVFIOonRootBus
Removing HotplugVFIOonRootBus which is obsolete with the latest PCI
topology changes, users can set cold_plug_vfio or hot_plug_vfio either
in the configuration.toml or via annotations.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-20 07:25:40 +00:00
Zvonko Kaiser
114542e2ba s390x: Fixing device.Bus assignment
The device.Bus was reset if a specific combination of
configuration parameters were not met. With the new
PCIe topology this should not happen anymore

Fixes: #7381

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-20 07:24:26 +00:00
Zvonko Kaiser
3b9f8fdbcb CCv0: Adding CDI support for cold and hot-plug of VFIO devices
We need to do proper sandbox sizing when we're doing cold-plug introduce CDI,
the de-facto standard for enabling devices in containers. containerd
will pass-through annotations for accumulated CPU,Memory and now CDI
devices. With that information sandbox sizing can be derived correctly.

Fixes: #7331

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-19 06:55:58 +00:00
stevenhorsman
15647a000e runtime: Ignore cyclomatic complexity
Ignore cyclomatic complexity failure. I have fixed this in my PR waiting
to forward port remote-hypervisor support into main

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-07-11 19:55:36 +01:00
stevenhorsman
7188a60e25 runtime: Fix bad merge
- Fix the HotPlug type

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-07-11 19:47:45 +01:00
stevenhorsman
f4d7011f3b CCv0: Merge main into CCv0 branch
- Merge remote-tracking branch 'upstream/main' into CCv0
- Note excludes 532755ce31 due to incompatiblity

Fixes: #7278
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-07-11 14:45:58 +01:00
stevenhorsman
335a456425 config: Update remote hypervisor config
- Add annotation enablement for machine_type, default_memory and
default_vcpus
- Remove note that says that cpu and memory settings are ignored.

Fixes: #7256
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-07-07 08:37:46 +01:00
Peng Tao
581be92b25 Merge pull request #4492 from zvonkok/pcie-topology
runtime: fix PCIe topology for GPUDirect use-case
2023-07-03 09:17:12 +08:00
Fabiano Fidêncio
6a21e20c63 runtime: Add "none" as a shared_fs option
Currently, even when using devmapper, if the VMM supports virtio-fs /
virtio-9p, that's used to share a few files between the host and the
guest.

This *needed*, as we need to share with the guest contents like secrets,
certificates, and configurations, via Kubernetes objects like configMaps
or secrets, and those are rotated and must be updated into the guest
whenever the rotation happens.

However, there are still use-cases users can live with just copying
those files into the guest at the pod creation time, and for those
there's absolutely no need to have a shared filesystem process running
with no extra obvious benefit, consuming memory and even increasing the
attack surface used by Kata Containers.

For the case mentioned above, we should allow users, making it very
clear which limitations it'll bring, to run Kata Containers with
devmapper without actually having to use a shared file system, which is
already the approach taken when using Firecracker as the VMM.

Fixes: #7207

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-06-30 20:45:00 +02:00
Zvonko Kaiser
0f454d0c04 gpu: Fixing typos for PCIe topology changes
Some comments and functions had typos and wrong capitalization.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-30 08:42:55 +00:00
stevenhorsman
51eb0c5130 runtime: SEV sysconfig fix
- SEV and SNP need a different sysconfig path

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-06-29 20:52:57 +01:00
Steve Horsman
70e6e40a8d Merge pull request #7134 from stevenhorsman/CCv0-merge-19th-june
CCv0: Merge main into CCv0 branch
2023-06-27 09:16:49 +01:00
Zvonko Kaiser
8330fb8ee7 gpu: Update unit tests
Some tests are now failing due to the changes how PCIe is
handled. Update the test accordingly.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-23 11:16:25 +00:00
Pradipta Banerjee
004f07f076 runtime: Add support for key annotations to remote hyp
In order to support different pod VM instance type via
remote hypervisor implementation (cloud-api-adaptor),
we need to pass machine_type, default_vcpus
and default_memory annotations to cloud-api-adaptor.

The cloud-api-adaptor then uses these annotations to spin
up the appropriate cloud instance.

Reference PR for cloud-api-adaptor
https://github.com/confidential-containers/cloud-api-adaptor/pull/1088

Fixes: #7140
Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
2023-06-21 20:22:36 +05:30
stevenhorsman
5a4a89c108 runtime: Remove duplicated variables
Remove duplicated variables that were in `CCv0` and merged in from main

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-06-20 15:01:54 +01:00
stevenhorsman
64a27d962b CCv0: Merge main into CCv0 branch
Merge remote-tracking branch 'upstream/main' into CCv0

Fixes: #7083
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-06-19 11:24:03 +01:00
Greg Kurz
a43ea24dfc virtiofsd: Convert legacy -o sub-options to their -- replacement
The `-o` option is the legacy way to configure virtiofsd, inherited
from the C implementation. The rust implementation honours it for
compatibility but it logs deprecation warnings.

Let's use the replacement options in the go shim code. Also drop
references to `-o` from the configuration TOML file.

Fixes #7111

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-06-16 11:42:54 +02:00
Greg Kurz
8e00dc6944 virtiofsd: Drop -o no_posix_lock
The C implementation of virtiofsd had some kind of limited support
for remote POSIX locks that was causing some workflows to fail with
kata. Commit 432f9bea6e hard coded `-o no_posix_lock` in order
to enforce guest local POSIX locks and avoid the issues.

We've switched to the rust implementation of virtiofsd since then,
but it emits a warning about `-o` being deprecated.

According to https://gitlab.com/virtio-fs/virtiofsd/-/issues/53 :

   The C implementation of the daemon has limited support for
   remote POSIX locks, restricted exclusively to non-blocking
   operations. We tried to implement the same level of
   functionality in #2, but we finally decided against it because,
   in practice most applications will fail if non-blocking
   operations aren't supported.

   Implementing support for non-blocking isn't trivial and will
   probably require extending the kernel interface before we can
   even start working on the daemon side.

There is thus no justification to pass `-o no_posix_lock` anymore.

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-06-16 11:42:39 +02:00
Greg Kurz
2a15ad9788 virtiofsd: Stop using deprecated -f option
The rust implementation of virtiofsd always runs foreground and
spits a deprecation warning when `-f` is passed.

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-06-16 10:30:40 +02:00
Zvonko Kaiser
72f2cb84e6 gpu: Reset cold or hot plug after overriding
If we override the cold, hot plug with an annotation
we need to reset the other plugging mechanism to NoPort
otherwise both will be enabled.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-15 17:51:01 +00:00
Zvonko Kaiser
fbacc09646 gpu: PCIe topology, consider vhost-user-block in Virt
In Virt the vhost-user-block is an PCIe device so
we need to make sure to consider it as well. We're keeping
track of vhost-user-block devices and deduce the correct
amount of PCIe root ports.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-15 17:39:55 +00:00
Zvonko Kaiser
b11246c3aa gpu: Various fixes for virt machine type
The PCI qom path was not deduced correctly added regex for correct
path walking.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:33:57 +00:00
Zvonko Kaiser
40101ea7db vfio: Added annotation for hot(cold) plug
Now it is possible to configure the PCIe topology via annotations
and addded a simple test, checking for Invalid and RootPort

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
8f0d4e2612 vfio: Cleanup of Cold and Hot Plug
Removed the configuration of PCIeRootPort and PCIeSwitchPort, those
values can be deduced in createPCIeTopology

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
b5c4677e0e vfio: Rearrange the bus assignemnt
Refactor the bus assignment so that the call to GetAllVFIODevicesFromIOMMUGroup
can be used by any module without affecting the topology.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
b1aa8c8a24 gpu: Moved the PCIe configs to drivers
The hypervisor_state file was the wrong location for the PCIe Port
settings, moved everything under device umbrella, where it can be
consumed more easily and we do not get into circular deps.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
55a66eb7fb gpu: Add config to TOML
Update cold-plug and hot-plug setting to include bridge, root and
switch-port

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
da42801c38 gpu: Add config settings tests for hot-plug
Updated all references and config settings for hot-plug to match
cold-plug

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
de39fb7d38 runtime: Add support for GPUDirect and GPUDirect RDMA PCIe topology
Fixes: #4491

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zhongtao Hu
355a24e0e1 Merge pull request #6289 from openanolis/runtime_vcpu_resize
feat(runtime): vcpu resize capability
2023-06-13 10:54:11 +08:00
Unmesh Deodhar
f4ee2a622f runtime: Update snp qemu command name
Main merge back to CCv0 caused snp qemu build to move from install_qemu to install_qemu_experimental.
Thus, reflecting this change into the qemu snp command.

Fixes: #7059

Signed-Off-By: Unmesh Deodhar <udeodhar@amd.com>
2023-06-12 12:34:42 -05:00
Yushuo
aaa96c749b feat(runtime-rs): modify onlineCpuMemRequest
Some vmms, such as dragonball, will actively help us
perform online cpu operations when doing cpu hotplug.
Under the old onlineCpuMem interface, it is difficult
to adapt to this situation.

So we modify the semantics of nb_cpus in onlineCpuMemRequest.
In the original semantics, nb_cpus represents the number of
newly added CPUs that need to be online. The modified
semantics become that the number of online CPUs in the guest
needs to be guaranteed.

Fixes: #5030

Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2023-06-12 17:53:16 +08:00
James O. D. Hunt
8cb4238b46 packaging: Remove snap package
Nobody has volunteered to maintain the (currently broken) snap build, so
remove it.

Fixes: #6769.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-06-12 09:24:09 +01:00
Wang, Arron
f62b2670c0 config: Add root hash value and measure config to kernel params
After we have a guest kernel with builtin initramfs which
provide the rootfs measurement capability and Kata rootfs
image with hash device, we need set related root hash value
and measure config to the kernel params in kata configuration file.

Fixes: #6674

Signed-off-by: Wang, Arron <arron.wang@intel.com>
2023-06-06 12:34:13 +02:00
Fabiano Fidêncio
bdb214aa34 runtimne: Add back the IMAGETDXPATH
This was mistakenly removed as part of the rebase.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-30 10:17:43 +02:00
stevenhorsman
8b7b88f341 runtime: Update FIRMWARETDVFPATH
Correct path

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-05-30 10:13:29 +02:00
stevenhorsman
66ca2f1bc4 qemu: static-check disable
Disable gocyclo on large complex function in CCv0 branch

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-05-25 17:05:16 +01:00