Commit Graph

85 Commits

Author SHA1 Message Date
Hyounggyu Choi
bb1d4adaa9 config: add SE configuration
This is to add SE configuration which is used by kata runtime.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2023-12-04 21:08:49 +01:00
Yohei Ueda
2910e333a8 runtime: Use static resource in remote hypervisor
This patch updates the template configuration file for
the remote hypervisor to set static_sandbox_resource_mgmt
to be true.  The remote hypervisor uses the peer pod config
to determine the sandbox size, so requires this to be set to
true by default.

Fixes: #6616
Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
(based on commit 938447803b)
2023-11-17 13:33:27 +00:00
stevenhorsman
26d56678a9 config: Add initial remote hypervisor config
- Remote hypervisor template config
- Add annotation enablement for machine_type, default_memory and
default_vcpus for flexible instance types

Fixes: #6349
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
(based on commits 7c9a791d67
and 335a456425)
2023-11-17 13:33:24 +00:00
Liu Wenyuan
9542211e71 configuration: add configuration for StratoVirt hypervisor.
Add configuration-stratovirt.toml.in to generate the StratoVirt configuration,
and parser to deliver config to StratoVirt.

Fixes: #7794

Signed-off-by: Liu Wenyuan <liuwenyuan9@huawei.com>
2023-11-16 20:47:26 +08:00
Alexandru Matei
d507d189bb fc: Add support for noflush cache option
Firecracker supports noflush semantic via Unsafe cache type.
There is no support for direct i/o, remove it from config file

Fixes: #7823

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2023-09-21 14:48:24 +03:00
Alexandru Matei
2ca781518a clh: Direct IO support for block devices
Clh suports direct i/o for disks. It doesn't
offer any support for noflush, removed passing
of option to cloud-hypervisor internal config

Fixes: #7798

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2023-09-21 14:48:24 +03:00
Fabiano Fidêncio
84c0d59d23 Merge pull request #7985 from fidencio/topic/clh-use-static_sandbox_resource_mgmt-as-default-on-arm
clh: arm: Use static_sandbox_resource_mgmt=true
2023-09-19 09:25:34 +02:00
Fabiano Fidêncio
72599f1911 clh: arm: Use static_sandbox_resource_mgmt=true
Users have noticed that this is needed, as CLH does not yet implement a
way to hotplug resources on aarh64.

With this patch, when building for x86_64, I can see the this is the
resulting config:
```
$ ARCH=amd64 make
...

$ cat config/configuration-clh.toml | grep static_sandbox_resource_mgmt
static_sandbox_resource_mgmt=false

```

And when building for aarch64:
```
$ ARCH=arm64 make
...

$ cat config/configuration-clh.toml | grep static_sandbox_resource_mgmt
static_sandbox_resource_mgmt=true
```

Fixes: #7941

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 14:14:10 +02:00
Greg Kurz
1f16b6627b runtime/qemu: Rework QMP/HMP support
PR #6146 added the possibility to control QEMU with an extra HMP socket
as an aid for debugging. This is great for development or bug chasing
but this raises some concerns in production.

The HMP monitor allows to temper with the VM state in a variety of ways.
This could be intentionally or mistakenly used to inject subtle bugs in
the VM that would be extremely hard if not even impossible to debug. We
definitely don't want that to be enabled by default.

The feature is currently wired to the `enable_debug` setting in the
`[hypervisor.qemu]` section of the configuration file. This setting has
historically been used to control "debug output" and it is used as such
by some downstream users (e.g. Openshift). Forcing people to have the
extra HMP backdoor at the same time is abusive and dangerous.

A new `extra_monitor_socket` is added to `[hypervisor.qemu]` to give
fine control on whether the HMP socket is wanted or not. This setting
is still gated by `enable_debug = true` to make it clear it is for
debug only. The default is to not have the HMP socket though. This
isn't backward compatible with #6416 but it is for the sake of "better
safe than sorry".

An extra monitor socket makes the QEMU instance untrusted. A warning is
thus logged to the journal when one is requested.

While here, also allow the user to choose between HMP and QMP for the
extra monitor socket. Motivation is that QMP offers way more options to
control or introspect the VM than HMP does. Users can also ask for
pretty json formatting well suited for human reading. This will improve
the debugging experience.

This feature is only made visible in the base and GPU configurations
of QEMU for now.

Fixes #7952

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-09-18 12:13:01 +02:00
Jeremi Piotrowski
3a1db7a86b runtime: clh: Support enabling iommu
by enabling IOMMU on the default PCI segment. For hotplug to work we need a
virtualized iommu and clh exposes one if there is some device or PCI segment
that requests it. I would have preferred to add a separate PCI segment for
hotplugging vfio devices but unfortunately kata assumes there is only one
segment all over the place. See create_pci_root_bus_path(),
split_vfio_pci_option() and grep for '0000'.

Enabling the IOMMU on the default PCI segment requires passing enabling IOMMU on
every device that is attached to it, which is why it is sprinkled all over the
place.

CLH does not support IOMMU for VirtioFs, so I've added a non IOMMU segment for
that device.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
509771e6f5 runtime: clh: Add hot_plug_vfio entry to config
hot_plug_vfio needs to be set to root-port, otherwise attaching vfio devices to
CLH VMs fails. Either cold_plug_vfio or hot_plug_vfio is required, and we have
not implemented support for cold_plug_vfio in CLH yet.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Dan Mihai
d0e0610679 runtime: config: use the SEV initrd for SNP
Thanks Unmesh Deodhar!

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-09-01 14:28:08 +00:00
Fabiano Fidêncio
67fed26f18 runtime: Use TDX image with in the qemu-tdx config
Let's make sure we use the TDX image as part of the QEMU TDX
configuration, which will help us to have the policies tested here.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-01 14:28:08 +00:00
stevenhorsman
8815ed0665 runtime: Remove config warnings
Remove configuration file shared_fs = none warnings
now that there is a solution to updating configMaps, secrets etc

Fixes: #7210
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-08-11 16:31:08 +01:00
Peng Tao
581be92b25 Merge pull request #4492 from zvonkok/pcie-topology
runtime: fix PCIe topology for GPUDirect use-case
2023-07-03 09:17:12 +08:00
Fabiano Fidêncio
6a21e20c63 runtime: Add "none" as a shared_fs option
Currently, even when using devmapper, if the VMM supports virtio-fs /
virtio-9p, that's used to share a few files between the host and the
guest.

This *needed*, as we need to share with the guest contents like secrets,
certificates, and configurations, via Kubernetes objects like configMaps
or secrets, and those are rotated and must be updated into the guest
whenever the rotation happens.

However, there are still use-cases users can live with just copying
those files into the guest at the pod creation time, and for those
there's absolutely no need to have a shared filesystem process running
with no extra obvious benefit, consuming memory and even increasing the
attack surface used by Kata Containers.

For the case mentioned above, we should allow users, making it very
clear which limitations it'll bring, to run Kata Containers with
devmapper without actually having to use a shared file system, which is
already the approach taken when using Firecracker as the VMM.

Fixes: #7207

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-06-30 20:45:00 +02:00
Greg Kurz
a43ea24dfc virtiofsd: Convert legacy -o sub-options to their -- replacement
The `-o` option is the legacy way to configure virtiofsd, inherited
from the C implementation. The rust implementation honours it for
compatibility but it logs deprecation warnings.

Let's use the replacement options in the go shim code. Also drop
references to `-o` from the configuration TOML file.

Fixes #7111

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-06-16 11:42:54 +02:00
Zvonko Kaiser
55a66eb7fb gpu: Add config to TOML
Update cold-plug and hot-plug setting to include bridge, root and
switch-port

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Fabiano Fidêncio
ca1531fe9d runtime: Use static_sandbox_resource_mgmt=true for TEEs
When this option is enabled the runtime will attempt to determine the
appropriate sandbox size (memory, CPU) before booting the virtual
machine.

As TEEs do not support memory and CPU hotplug, this approach must be
used.

Fixes: #6818

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-17 19:21:52 +02:00
Fabiano Fidêncio
3a9d3c72aa gpu: Rename the last bits from gpu to nvidia-gpu
Let's specifically name the `gpu` runtime class as `nvidia-gpu`.  By
doing this we keep the door open and ease the life of the next vendor
adding GPU support for Kata Containers.

Fixes: #6553

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-16 13:47:52 +02:00
Fabiano Fidêncio
edfaae85cb Merge pull request #6700 from fitzthum/snp-artifacts
packaging: Add SEV-SNP artifacts to main
2023-05-11 10:47:10 +02:00
Fabiano Fidêncio
c937d0a5d4 Merge pull request #6591 from UnmeshDeodhar/add-sev-artifacts-to-main
packaging: Add sev artifacts to main
2023-05-11 09:09:36 +02:00
Tobin Feldman-Fitzthum
0bb37bff78 config: Add SNP configuration
SNP requires many specific configurations, so let's make
a new SNP configuration file that we can use with the
kata-qemu-snp runtime class.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Signed-off-by: Alex Carter <Alex.Carter@ibm.com>
2023-05-10 20:55:36 +00:00
Unmesh Deodhar
fb9c1fc36e runtime: Add qemu-sev config
Adding config file that can be used with qemu-sev runtime class.
Since SEV has limited hotplug support, increase
the pod overhead to account for fixed resource usage.

Fixes: #6572

Signed-off-by: Unmesh Deodhar <udeodhar@amd.com>
2023-05-10 12:19:56 -05:00
Zvonko Kaiser
138ada049c gpu: Cold Plug VFIO toml setting
Added the cold_plug_vfio setting to the qemu-toml.in with some
epxlanation

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-27 11:04:45 +00:00
Fupan Li
a1568cd2f5 Merge pull request #6676 from zvonkok/gpu-runtime
gpu: Add GPU enabled confguration and runtime
2023-04-19 13:01:49 +08:00
Zvonko Kaiser
a81fff706f gpu: Adding a GPU enabled configuration
We need to set hotplug on pci root port and enable at least one
root port. Also set the guest-hooks-dir to the correct path

Fixes: #6675

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-17 10:40:09 +00:00
Fabiano Fidêncio
dc662333df runtime: Increase the dial_timeout
When testing on AKS, we've been hitting the dial_timeout every now and
then.  Let's increase it to 45 seconds (instead of 30) for all the VMMs,
and to 60 seconfs in case of TEEs.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-13 22:42:52 +02:00
Fabiano Fidêncio
3b3656d96d Merge pull request #6522 from fidencio/topic/add-tdx-artefacts-from-2023ww01-to-main
tdx: Add artefacts from the latest TDX tools release into main
2023-04-11 20:43:02 +02:00
Fabiano Fidêncio
98682805be config: Add configuration for QEMU TDX
As the QEMU configuration for TDX differs quite a lot from the normal
QEMU configuration, let's add a new configuration file for the QEMU TDX.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 16:10:35 +02:00
Feng Wang
cbe6ad9034 runtime: support non-root for clh
This change enables to run cloud-hypervisor VMM using a non-root user
when rootless flag is set true in the configuration

Fixes: #2567

Signed-off-by: Feng Wang <fwang@confluent.io>
2023-02-22 13:57:09 -08:00
zhaojizhuang
ca02c9f512 runtime: add reconnect timeout for vhost user block
Fixes: #6075
Signed-off-by: zhaojizhuang <571130360@qq.com>
2023-02-13 14:33:46 +08:00
yaoyinnan
bdf20b5d26 rootfs: support EROFS filesystem
For kata containers, rootfs is used in the read-only way.
EROFS can noticably decrease metadata overhead.

On the basis of supporting the EROFS file system, it supports using the config parameter to switch the file system used by rootfs.

Fixes: #6063

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
2023-02-11 00:44:13 +08:00
zhaojizhuang
9092c23a2e runtime: Add hmp for qemu
Fixes: #6092
Signed-off-by: zhaojizhuang <571130360@qq.com>
2023-01-29 14:22:04 +08:00
Eric Ernst
6ee550e9a5 runtime: vCPUs pinning is sandbox specific, not hypervisor
While at it, make sure we persist this and fix a misc typo.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2023-01-12 15:44:25 -08:00
Bin Liu
86a82cace9 runtime: change cache mode from none to never
New Rust virtiofsd's `cache` mode doesn't support `none` mode,
we should use `never` to replace it.

Fixes: #6018

Signed-off-by: Bin Liu <bin@hyper.sh>
2023-01-10 17:29:48 +08:00
Manabu Sugimoto
c617bbe70d runtime: Pass SELinux policy for containers to the agent
Pass SELinux policy for containers to the agent if `disable_guest_selinux`
is set to `false` in the runtime configuration. The `container_t` type
is applied to the container process inside the guest by default.
Users can also set a custom SELinux policy to the container process using
`guest_selinux_label` in the runtime configuration. This will be an
alternative configuration of Kubernetes' security context for SELinux
because users cannot specify the policy in Kata through Kubernetes's security
context. To apply SELinux policy to the container, the guest rootfs must
be CentOS that is created and built with `SELINUX=yes`.

Fixes: #4812

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2022-11-29 19:07:56 +09:00
liyuxuan.darfux
3bb145c63a runtime: Support virtiofs queue size for qemu and make it configurable
The default vhost-user-fs queue-size of qemu is 128 now. Set it to 1024
by default which is same as clh. Also make this value configurable.

Fixes: #5694

Signed-off-by: liyuxuan.darfux <liyuxuan.darfux@bytedance.com>
2022-11-19 15:38:11 +08:00
LitFlwr0
2508d39b7c runtime: added vcpus pinning logics
Core VCPU threads pinning logics for issue 4476. Also provided docs.

Fixes:#4476
Signed-off-by: LitFlwr0 <861690705@qq.com>
2022-11-04 17:52:42 +08:00
Joana Pecholt
ded60173d4 runtime: Enable choice between AMD SEV and SNP
This is based on a patch from @niteeshkd that adds a config
parameter to choose between AMD SEV and SEV-SNP VMs as the
confidential guest type in case both types are supported. SEV is
the default.

Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>
2022-09-16 17:51:41 +02:00
Joana Pecholt
105eda5b9a runtime: Initrd path option added to config
Adds initrd configuration option to the configuration.toml that is
generated for the setup using QEMU.

Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>
2022-09-16 17:51:41 +02:00
Archana Shinde
7d52934ec1 Merge pull request #4798 from amshinde/use-iouring-qemu
Use iouring for qemu block devices
2022-08-26 04:00:24 +05:30
Fabiano Fidêncio
c142fa2541 clh: Lift the sharedFS restriction used with TDX
When booting the TDX kernel with `tdx_disable_filter`, as it's been done
for QEMU, VirtioFS can work without any issues.

Whether this will be part of the upstream kernel or not is a different
story, but it easily could make it there as Cloud Hypervisor relies on
the VIRTIO_F_IOMMU_PLATFORM feature, which forces the guest to use the
DMA API, making these devices compatible with TDX.

See Sebastien Boeuf's explanation of this in the
3c973fa7ce208e7113f69424b7574b83f584885d commit:
"""
By using DMA API, the guest triggers the TDX codepath to share some of
the guest memory, in particular the virtqueues and associated buffers so
that the VMM and vhost-user backends/processes can access this memory.
"""

Fixes: #4977

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-24 17:14:05 +02:00
Archana Shinde
ed0f1d0b32 config: Add "block_device_aio" as a config option for qemu
This configuration will allow users to choose between different
I/O backends for qemu, with the default being io_uring.
This will allow users to fallback to a different I/O mechanism while
running on kernels olders than 5.1.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-05 13:16:34 -07:00
Manabu Sugimoto
4d89476c91 runtime: Fix DisableSelinux config
Enable Kata runtime to handle `disable_selinux` flag properly in order
to be able to change the status by the runtime configuration whether the
runtime applies the SELinux label to VMM process.

Fixes: #4599
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2022-07-06 15:50:28 +09:00
Fabiano Fidêncio
0939f5181b config: Expose default_maxmemory
Expose the newly added `default_maxmemory` to the project's Makefile and
to the configuration files.

Fixes: #4516

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-28 21:19:24 +02:00
Liang Zhou
ef925d40ce runtime: enable sandbox feature on qemu
Enable "-sandbox on" in qemu can introduce another protect layer
on the host, to make the secure container more secure.

The default option is disable because this feature may introduce some
performance cost, even though user can enable
/proc/sys/net/core/bpf_jit_enable to reduce the impact.

Fixes: #2266

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-06-17 15:30:46 -07:00
Snir Sheriber
c67b9d2975 qemu: allow using legacy serial device for the console
This allows to get guest early boot logs which are usually
missed when virtconsole is used.
- It utilizes previous work on the govmm side:
https://github.com/kata-containers/govmm/pull/203
- unit test added

Fixes: #4237
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-05-17 12:06:11 +03:00
Fabiano Fidêncio
b6467ddd73 clh: Expose disk rate limiter config
With everything implemented, let's now expose the disk rate limiter
configuration options in the Cloud Hypervisor configuration file.

Fixes: #4139

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:28:29 +02:00
Fabiano Fidêncio
7580bb5a78 clh: Expose net rate limiter config
With everything implemented, let's now expose the net rate limiter
configuration options in the Cloud Hypervisor configuration file.

Fixes: #4017

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:28:13 +02:00