Commit Graph

65 Commits

Author SHA1 Message Date
Fabiano Fidêncio
edfaae85cb Merge pull request #6700 from fitzthum/snp-artifacts
packaging: Add SEV-SNP artifacts to main
2023-05-11 10:47:10 +02:00
Fabiano Fidêncio
c937d0a5d4 Merge pull request #6591 from UnmeshDeodhar/add-sev-artifacts-to-main
packaging: Add sev artifacts to main
2023-05-11 09:09:36 +02:00
Tobin Feldman-Fitzthum
0bb37bff78 config: Add SNP configuration
SNP requires many specific configurations, so let's make
a new SNP configuration file that we can use with the
kata-qemu-snp runtime class.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Signed-off-by: Alex Carter <Alex.Carter@ibm.com>
2023-05-10 20:55:36 +00:00
Unmesh Deodhar
fb9c1fc36e runtime: Add qemu-sev config
Adding config file that can be used with qemu-sev runtime class.
Since SEV has limited hotplug support, increase
the pod overhead to account for fixed resource usage.

Fixes: #6572

Signed-off-by: Unmesh Deodhar <udeodhar@amd.com>
2023-05-10 12:19:56 -05:00
Zvonko Kaiser
138ada049c gpu: Cold Plug VFIO toml setting
Added the cold_plug_vfio setting to the qemu-toml.in with some
epxlanation

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-27 11:04:45 +00:00
Fupan Li
a1568cd2f5 Merge pull request #6676 from zvonkok/gpu-runtime
gpu: Add GPU enabled confguration and runtime
2023-04-19 13:01:49 +08:00
Zvonko Kaiser
a81fff706f gpu: Adding a GPU enabled configuration
We need to set hotplug on pci root port and enable at least one
root port. Also set the guest-hooks-dir to the correct path

Fixes: #6675

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-17 10:40:09 +00:00
Fabiano Fidêncio
dc662333df runtime: Increase the dial_timeout
When testing on AKS, we've been hitting the dial_timeout every now and
then.  Let's increase it to 45 seconds (instead of 30) for all the VMMs,
and to 60 seconfs in case of TEEs.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-13 22:42:52 +02:00
Fabiano Fidêncio
3b3656d96d Merge pull request #6522 from fidencio/topic/add-tdx-artefacts-from-2023ww01-to-main
tdx: Add artefacts from the latest TDX tools release into main
2023-04-11 20:43:02 +02:00
Fabiano Fidêncio
98682805be config: Add configuration for QEMU TDX
As the QEMU configuration for TDX differs quite a lot from the normal
QEMU configuration, let's add a new configuration file for the QEMU TDX.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 16:10:35 +02:00
Feng Wang
cbe6ad9034 runtime: support non-root for clh
This change enables to run cloud-hypervisor VMM using a non-root user
when rootless flag is set true in the configuration

Fixes: #2567

Signed-off-by: Feng Wang <fwang@confluent.io>
2023-02-22 13:57:09 -08:00
zhaojizhuang
ca02c9f512 runtime: add reconnect timeout for vhost user block
Fixes: #6075
Signed-off-by: zhaojizhuang <571130360@qq.com>
2023-02-13 14:33:46 +08:00
yaoyinnan
bdf20b5d26 rootfs: support EROFS filesystem
For kata containers, rootfs is used in the read-only way.
EROFS can noticably decrease metadata overhead.

On the basis of supporting the EROFS file system, it supports using the config parameter to switch the file system used by rootfs.

Fixes: #6063

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
2023-02-11 00:44:13 +08:00
zhaojizhuang
9092c23a2e runtime: Add hmp for qemu
Fixes: #6092
Signed-off-by: zhaojizhuang <571130360@qq.com>
2023-01-29 14:22:04 +08:00
Eric Ernst
6ee550e9a5 runtime: vCPUs pinning is sandbox specific, not hypervisor
While at it, make sure we persist this and fix a misc typo.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2023-01-12 15:44:25 -08:00
Bin Liu
86a82cace9 runtime: change cache mode from none to never
New Rust virtiofsd's `cache` mode doesn't support `none` mode,
we should use `never` to replace it.

Fixes: #6018

Signed-off-by: Bin Liu <bin@hyper.sh>
2023-01-10 17:29:48 +08:00
Manabu Sugimoto
c617bbe70d runtime: Pass SELinux policy for containers to the agent
Pass SELinux policy for containers to the agent if `disable_guest_selinux`
is set to `false` in the runtime configuration. The `container_t` type
is applied to the container process inside the guest by default.
Users can also set a custom SELinux policy to the container process using
`guest_selinux_label` in the runtime configuration. This will be an
alternative configuration of Kubernetes' security context for SELinux
because users cannot specify the policy in Kata through Kubernetes's security
context. To apply SELinux policy to the container, the guest rootfs must
be CentOS that is created and built with `SELINUX=yes`.

Fixes: #4812

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2022-11-29 19:07:56 +09:00
liyuxuan.darfux
3bb145c63a runtime: Support virtiofs queue size for qemu and make it configurable
The default vhost-user-fs queue-size of qemu is 128 now. Set it to 1024
by default which is same as clh. Also make this value configurable.

Fixes: #5694

Signed-off-by: liyuxuan.darfux <liyuxuan.darfux@bytedance.com>
2022-11-19 15:38:11 +08:00
LitFlwr0
2508d39b7c runtime: added vcpus pinning logics
Core VCPU threads pinning logics for issue 4476. Also provided docs.

Fixes:#4476
Signed-off-by: LitFlwr0 <861690705@qq.com>
2022-11-04 17:52:42 +08:00
Joana Pecholt
ded60173d4 runtime: Enable choice between AMD SEV and SNP
This is based on a patch from @niteeshkd that adds a config
parameter to choose between AMD SEV and SEV-SNP VMs as the
confidential guest type in case both types are supported. SEV is
the default.

Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>
2022-09-16 17:51:41 +02:00
Joana Pecholt
105eda5b9a runtime: Initrd path option added to config
Adds initrd configuration option to the configuration.toml that is
generated for the setup using QEMU.

Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>
2022-09-16 17:51:41 +02:00
Archana Shinde
7d52934ec1 Merge pull request #4798 from amshinde/use-iouring-qemu
Use iouring for qemu block devices
2022-08-26 04:00:24 +05:30
Fabiano Fidêncio
c142fa2541 clh: Lift the sharedFS restriction used with TDX
When booting the TDX kernel with `tdx_disable_filter`, as it's been done
for QEMU, VirtioFS can work without any issues.

Whether this will be part of the upstream kernel or not is a different
story, but it easily could make it there as Cloud Hypervisor relies on
the VIRTIO_F_IOMMU_PLATFORM feature, which forces the guest to use the
DMA API, making these devices compatible with TDX.

See Sebastien Boeuf's explanation of this in the
3c973fa7ce208e7113f69424b7574b83f584885d commit:
"""
By using DMA API, the guest triggers the TDX codepath to share some of
the guest memory, in particular the virtqueues and associated buffers so
that the VMM and vhost-user backends/processes can access this memory.
"""

Fixes: #4977

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-24 17:14:05 +02:00
Archana Shinde
ed0f1d0b32 config: Add "block_device_aio" as a config option for qemu
This configuration will allow users to choose between different
I/O backends for qemu, with the default being io_uring.
This will allow users to fallback to a different I/O mechanism while
running on kernels olders than 5.1.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-05 13:16:34 -07:00
Manabu Sugimoto
4d89476c91 runtime: Fix DisableSelinux config
Enable Kata runtime to handle `disable_selinux` flag properly in order
to be able to change the status by the runtime configuration whether the
runtime applies the SELinux label to VMM process.

Fixes: #4599
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2022-07-06 15:50:28 +09:00
Fabiano Fidêncio
0939f5181b config: Expose default_maxmemory
Expose the newly added `default_maxmemory` to the project's Makefile and
to the configuration files.

Fixes: #4516

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-28 21:19:24 +02:00
Liang Zhou
ef925d40ce runtime: enable sandbox feature on qemu
Enable "-sandbox on" in qemu can introduce another protect layer
on the host, to make the secure container more secure.

The default option is disable because this feature may introduce some
performance cost, even though user can enable
/proc/sys/net/core/bpf_jit_enable to reduce the impact.

Fixes: #2266

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-06-17 15:30:46 -07:00
Snir Sheriber
c67b9d2975 qemu: allow using legacy serial device for the console
This allows to get guest early boot logs which are usually
missed when virtconsole is used.
- It utilizes previous work on the govmm side:
https://github.com/kata-containers/govmm/pull/203
- unit test added

Fixes: #4237
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-05-17 12:06:11 +03:00
Fabiano Fidêncio
b6467ddd73 clh: Expose disk rate limiter config
With everything implemented, let's now expose the disk rate limiter
configuration options in the Cloud Hypervisor configuration file.

Fixes: #4139

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:28:29 +02:00
Fabiano Fidêncio
7580bb5a78 clh: Expose net rate limiter config
With everything implemented, let's now expose the net rate limiter
configuration options in the Cloud Hypervisor configuration file.

Fixes: #4017

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:28:13 +02:00
bin
9d5b03a1b7 runtime: delete debug option in virtiofsd
virtiofsd's debug will be enabled if hypervisor's debug has been
enabled, this will generate too many noisy logs from virtiofsd.

Unbind the relationship of log level between virtiofsd and
hypervisor, if users want to see debug log of virtiofsd,
can set it by:

  virtio_fs_extra_args = ["-o", "log_level=debug"]

Fixes: #3303

Signed-off-by: bin <bin@hyper.sh>
2022-04-07 19:55:22 +08:00
Fabiano Fidêncio
98750d792b clh: Expose service offload configuration
This configuration option is valid for all the hypervisor that are going
to be used with the confidential containers effort, thus exposing the
configuration option for Cloud Hypervisor as well.

Fixes: #4022

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-01 11:15:55 +02:00
Evan Foster
afc567a9ae storage: make k8s emptyDir creation configurable
This change introduces the `disable_guest_empty_dir` config option,
which allows the user to change whether a Kubernetes emptyDir volume is
created on the guest (the default, for performance reasons), or the host
(necessary if you want to pass data from the host to a guest via an
emptyDir).

Fixes #2053

Signed-off-by: Evan Foster <efoster@adobe.com>
2022-03-04 12:02:42 -08:00
Fabiano Fidêncio
12af632952 Merge pull request #3814 from fidencio/wip/disable-block-device-use-minor-fixes
Minor fixes for the `disable_block_device_use` comments
2022-03-03 23:26:05 +01:00
Fabiano Fidêncio
97951a2d12 clh: Don't use SharedFS with Confidential Guests
kata-containers/pulls#3771 added TDX support for Cloud Hypervisor, but
two big things got overlooked while doing that.

1. virtio-fs, as of now, cannot be part of the trust boundary, so the
   Confidential Guest will not be using it.

2. virtio-block hotplug should be enabled in order to use virtio-block
   for the rootfs (used with the devmapper plugin).

When trying to use cloud-hypervisor with TDX using virtio-fs, we're
facing the following error on the guest kernel:
```
virtiofs virtio2: device must provide VIRTIO_F_ACCESS_PLATFORM
```

After checking and double-checking with virtiofs and cloud-hypervisor
developers, it happens as confidential containers might put some
limitations on the device, so it can't access all of the guests' memory
and that's where this restriction seems to be coming from. Vivek
mentioned that virtiofsd do not support VIRTIO_F_ACCESS_PLATFORM (aka
VIRTIO_F_IOMMU_PLATFORM) yet, and that for ecrypted guests virtiofs may
not be the best solution at the moment.

@sboeuf put this in a very nice way: "if the virtio-fs driver doesn't
support VIRTIO_F_ACCESS_PLATFORM, then the pages corresponding to the
virtqueues and the buffers won't be marked as SHARED, meaning the VMM
won't have access to it".

Interestingly enough, it works with QEMU, and it may be due to some
change done on the patched QEMU that @devimc is packaging, but we won't
take the path to figure out what was the change and patch
cloud-hypervisor on the same way, because of 1.

Fixes: #3810

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 12:49:40 +01:00
Fabiano Fidêncio
76e4f6a2a3 Revert "hypervisors: Confidential Guests do not support Device hotplug"
This reverts commit df8ffecde0, as device
hotplug *is* supported and, more than that, is very much needed when
using virtio-blk instead of virtio-fs.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 09:59:55 +01:00
Fabiano Fidêncio
fa8b93927c config: qemu: Fix disable_block_device_use comments
virtio-fs, instead of virtio-9p, is the default shared file system type
in case virtio-blk is not used.

Fixes: #3813

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-02 20:43:36 +01:00
Fabiano Fidêncio
9615c8bc9c config: fc: Don't expose disable_block_device_use
Relying on virtio-block is the *only* way to use Firecracker with Kata
Containers, as shared FS (virtio-{fs,fs-nydus,9p}) is not supported by
Firecracker.

As configuration doesn't make sense to be exposed, we hardcode the
`false` value in the Firecracker configuration structure.

Fixes: #3813

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-02 20:43:28 +01:00
Fabiano Fidêncio
de57466212 config: Expand confidential_guest comments
Let's clarify that an error will be reported in case confidential_guest
is enabled, but the hardware where Kata Containers is running doesn't
provide the required feature set.

Fixes: #3787

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-28 11:57:42 +01:00
Fabiano Fidêncio
641d475fa6 config: clh: Use "Intel TDX" instead of just "TDX"
Let's use "Intel TDX" rather than just "TDX", as it can ease the
understanding of the terminology.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-28 10:27:21 +01:00
Fabiano Fidêncio
0bafa2def9 config: clh: Mention supported TEEs
Let's mention the supported TEEs to be used with confidential guests.

Right now, Cloud Hyperisor supports only Intel TDX, used together with
TD Shim.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-28 10:24:33 +01:00
Tanweer Noor
082d538cb4 runtime: make selinux configurable
removes --tags selinux handling in the makefile (part of it introduced here: d78ffd6)
and makes selinux configurable via configuration.toml

Fixes: #3631
Signed-off-by: Tanweer Noor <tnoor@apple.com>
2022-02-25 10:33:46 -08:00
Fabiano Fidêncio
a13b4d5ad8 clh: Add firmware to the config file
"firmware" option was already present for a while, but it's never been
exposed to the configuration file before.

Let's do it now as it can be used, in combination with the newly added
confidential_guest option, to boot a guest VM using the so called
`td-shim`[0] with Cloud Hypervisor.

[0]: https://github.com/confidential-containers/td-shim

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
a8827e0c78 hypervisors: Confidential Guests do not support NVDIMM
NVDIMM is also not supported with Confidential Guests and Virtio Block
devices should be used instead.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
f50ff9f798 hypervisors: Confidential Guests do not support Memory hotplug
Similarly to VCPUs and Device hotplug, Confidential Guests also do not
support Memory hotplug.

Let's make it clear in the documentation and guard the code on both QEMU
and Cloud Hypervisor side to ensure we don't advertise Memory hotplug as
being supported when running Confidential Guests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
df8ffecde0 hypervisors: Confidential Guests do not support Device hotplug
Similarly to VCPUs hotplug, Confidential Guests also do not support
Device hotplug.

Let's make it clear in the documentation and guard the code on both QEMU
and Cloud Hypervisor side to ensure we don't advertise Device hotplug as
being supported when running Confidential Guests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
28c4c044e6 hypervisors: Confidential Guests do not support VCPUs hotplug
As confidential guests do not support VCPUs hotplug, let's set the
"DefaultMaxVCPUs" value to "NumVCPUs".

The reason to do this is to ensure that guests will be started with the
correct amount of VCPUs, without giving to the guest with all the
possible VCPUs the host could provide.

One clear side effect of this limitation is that workloads that would
require more VCPUs on their yaml definition will not run on this
scenario.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
29ee870d20 clh: Add confidential_guest to the config file
ConfidentialGuest is an option already present and exposed for QEMU,
which is used for using Kata Containers together with different sorts of
Guest Protections, such as TDX and SEV for x86_64, PEF for ppc64le, and
SE for s390x.

Right now we error out in case confidential_guest is enabled, as we will
be implementing the needed blocks for this as part of this series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
luodaowen.backend
3175aad5ba virtiofs-nydus: add lazyload support for kata with clh
As kata with qemu has supported lazyload, so this pr aims to
bring lazyload ability to kata with clh.

Fixes #3654

Signed-off-by: luodaowen.backend <luodaowen.backend@bytedance.com>
2022-02-19 21:55:31 +08:00
luodaowen.backend
2d9f89aec7 feature(nydusd): add nydusd support to introduse lazyload ability
Pulling image is the most time-consuming step in the container lifecycle. This PR
introduse nydus to kata container, it can lazily pull image when container start. So it
can speed up kata container create and start.

Fixes #2724

Signed-off-by: luodaowen.backend <luodaowen.backend@bytedance.com>
2022-02-11 21:41:17 +08:00