Commit Graph

63 Commits

Author SHA1 Message Date
Archana Shinde
1e34220c41 qemu: tdx: Adapt to the TDX 1.5 stack
QEMU for TDX 1.5 makes use of private memory map/unmap.
Make changes to govmm to support this. Support for private backing fd
for memory is added as knob to the qemu config.

Userspace's map/unmap operations are done by fallocate() ioctl on the
backing store fd.
Reference:
https://lore.kernel.org/linux-mm/20220519153713.819591-1-chao.p.peng@linux.intel.com/

Fixes: #7770

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-28 13:41:36 +02:00
Zvonko Kaiser
114542e2ba s390x: Fixing device.Bus assignment
The device.Bus was reset if a specific combination of
configuration parameters were not met. With the new
PCIe topology this should not happen anymore

Fixes: #7381

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-20 07:24:26 +00:00
Peng Tao
581be92b25 Merge pull request #4492 from zvonkok/pcie-topology
runtime: fix PCIe topology for GPUDirect use-case
2023-07-03 09:17:12 +08:00
Fabiano Fidêncio
6a21e20c63 runtime: Add "none" as a shared_fs option
Currently, even when using devmapper, if the VMM supports virtio-fs /
virtio-9p, that's used to share a few files between the host and the
guest.

This *needed*, as we need to share with the guest contents like secrets,
certificates, and configurations, via Kubernetes objects like configMaps
or secrets, and those are rotated and must be updated into the guest
whenever the rotation happens.

However, there are still use-cases users can live with just copying
those files into the guest at the pod creation time, and for those
there's absolutely no need to have a shared filesystem process running
with no extra obvious benefit, consuming memory and even increasing the
attack surface used by Kata Containers.

For the case mentioned above, we should allow users, making it very
clear which limitations it'll bring, to run Kata Containers with
devmapper without actually having to use a shared file system, which is
already the approach taken when using Firecracker as the VMM.

Fixes: #7207

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-06-30 20:45:00 +02:00
Zvonko Kaiser
b1aa8c8a24 gpu: Moved the PCIe configs to drivers
The hypervisor_state file was the wrong location for the PCIe Port
settings, moved everything under device umbrella, where it can be
consumed more easily and we do not get into circular deps.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
de39fb7d38 runtime: Add support for GPUDirect and GPUDirect RDMA PCIe topology
Fixes: #4491

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Jakob Naucke
b546eca26f runtime: Generalize VFIO devices
Generalize VFIO devices to allow for adding AP in the next patch.
The logic for VFIOPciDeviceMediatedType() has been changed and IsAPVFIOMediatedDevice() has been removed.

The rationale for the revomal is:

- VFIODeviceMediatedType is divided into 2 subtypes for AP and PCI
- Logic of checking a subtype of mediated device is included in GetVFIODeviceType()
- VFIOPciDeviceMediatedType() can simply fulfill the device addition based
on a type categorized by GetVFIODeviceType()

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2023-03-16 10:06:37 +09:00
yaoyinnan
bdf20b5d26 rootfs: support EROFS filesystem
For kata containers, rootfs is used in the read-only way.
EROFS can noticably decrease metadata overhead.

On the basis of supporting the EROFS file system, it supports using the config parameter to switch the file system used by rootfs.

Fixes: #6063

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
2023-02-11 00:44:13 +08:00
Bin Liu
1dfd845f51 runtime: go fix code for 1.19
We have starting to use golang 1.19, some features are
not supported later, so run `go fix` to fix them.

Fixes: #5750

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-11-25 11:29:18 +08:00
liyuxuan.darfux
3bb145c63a runtime: Support virtiofs queue size for qemu and make it configurable
The default vhost-user-fs queue-size of qemu is 128 now. Set it to 1024
by default which is same as clh. Also make this value configurable.

Fixes: #5694

Signed-off-by: liyuxuan.darfux <liyuxuan.darfux@bytedance.com>
2022-11-19 15:38:11 +08:00
Bin Liu
27b1bb5ed9 Merge pull request #4467 from egernst/device-pkg
device package cleanup/refactor
2022-06-27 14:40:53 +08:00
Eric Ernst
f9e96c6506 runtime: device: move to top level package
Let's move device package to runtime/pkg instead of being buried under
virtcontainers.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-06-26 21:31:29 -07:00
Zvonko Kaiser
e7e7dc9dfe runtime: Add heuristic to get the right value(s) for mem-reserve
Fixes: #2938

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2022-06-21 03:44:28 -07:00
Snir Sheriber
c67b9d2975 qemu: allow using legacy serial device for the console
This allows to get guest early boot logs which are usually
missed when virtconsole is used.
- It utilizes previous work on the govmm side:
https://github.com/kata-containers/govmm/pull/203
- unit test added

Fixes: #4237
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-05-17 12:06:11 +03:00
Snir Sheriber
44814dce19 qemu: treat console kernel params within appendConsole
as it is tightly coupled with the appended console device
additionally have it tested

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-05-17 12:05:31 +03:00
Fabiano Fidêncio
76e4f6a2a3 Revert "hypervisors: Confidential Guests do not support Device hotplug"
This reverts commit df8ffecde0, as device
hotplug *is* supported and, more than that, is very much needed when
using virtio-blk instead of virtio-fs.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 09:59:55 +01:00
Jakob Naucke
eda8ea154a runtime: Gofmt fixes
- Mostly blank lines after `+build` -- see
  https://pkg.go.dev/go/build@go1.14.15 -- this is, to date, enforced by
  `gofmt`.
- 1.17-style go:build directives are also added.
- Spaces in govmm/vmm_s390x.go

Fixes: #3769
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2022-02-28 17:24:47 +01:00
Fabiano Fidêncio
f50ff9f798 hypervisors: Confidential Guests do not support Memory hotplug
Similarly to VCPUs and Device hotplug, Confidential Guests also do not
support Memory hotplug.

Let's make it clear in the documentation and guard the code on both QEMU
and Cloud Hypervisor side to ensure we don't advertise Memory hotplug as
being supported when running Confidential Guests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
df8ffecde0 hypervisors: Confidential Guests do not support Device hotplug
Similarly to VCPUs hotplug, Confidential Guests also do not support
Device hotplug.

Let's make it clear in the documentation and guard the code on both QEMU
and Cloud Hypervisor side to ensure we don't advertise Device hotplug as
being supported when running Confidential Guests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Samuel Ortiz
fa0e9dc6b1 virtcontainers: Make all Linux VMMs only build on Linux
Some of them (e.g. QEMU) can run on other OSes (e.g. Darwin) but the
current virtcontainers implementation is Linux specific.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-16 19:07:34 +01:00
Samuel Ortiz
c91035d0e1 virtcontainers: Move non QEMU specific constants to hypervisor.go
Hotplugging errors and 9pfs size are not particularily QEMU specific.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-16 19:07:34 +01:00
Samuel Ortiz
10ae05914c virtcontainers: Move guest protection definitions to hypervisor.go
They're not QEMU specific, other VMMs may implement support for it.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-16 19:07:31 +01:00
Julio Montes
1f29478b09 runtime: suppport split firmware
firmware can be split into FIRMWARE_VARS.fd (UEFI variables as
configuration) and FIRMWARE_CODE.fd (UEFI program image). UEFI
variables can be customized per each user while UEFI code is kept same.

fixes #3583

Signed-off-by: Julio Montes <julio.montes@intel.com>
2022-02-01 13:40:19 -06:00
Fabiano Fidêncio
ec6655af87 govmm: Use govmm from our own pkg
Let's stop using govmm from kata-containers/govmm and let's start using
it from our own repo.

Fixes: #3495

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-01-19 18:02:46 +01:00
Eric Ernst
2227c46c25 vc: hypervisor: use our own logger
This'll end up moving to hypervisors pkg, but let's stop using virtLog,
instead introduce hvLogger.

Fixes: #2884

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-11-19 12:20:41 -08:00
Eric Ernst
1e7cb4bc3a macvlan: drop bridged part of name
The fact that we need to "bridge" the endpoint is a bit irrelevant. To
be consistent with the rest of the endpoints, let's just call this
"macvlan"

Fixes: #3050

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-11-16 16:44:29 -08:00
Jakob Naucke
b7b89905d4 virtcontainers: Lint protection types
Protection types like tdxProtection or seProtection were marked nolint,
remove this. As a side effect, ARM needs dummy tests for these.

Fixes: #2801
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-11-09 18:36:32 +01:00
Yujia Qiao
6cc8000cae cli: Show available guest protection in env output
Show available guest protections in the
`kata-runtime env` output. Also bump the formatVersion.

Fixes: #1982

Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>
2021-10-25 21:44:56 +08:00
Yujia Qiao
2063b13805 virtcontainers: Add func AvailableGuestProtections
Add functions to return guestProtection as a string slice, which
can be then used in `kata-runtime env` output.

Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>
2021-10-25 21:44:01 +08:00
Jianyong Wu
8acfc154de check: fix typecheck failure in qemu_arm64_test.go
fix typecheck failure in qemu_arm64_test.go

Fixes: #2789
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-10-06 15:53:35 +02:00
Jakob Naucke
ff9728f032 virtcontainers: nolint guestProtection
Exclude from lint checking for it is ultimately only used in
architecture-specific code.

Fixes: #2273
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-10-06 15:53:35 +02:00
David Gibson
cc4983eeac runtime: Remove unused qemuArchBase.appendBridges definition
qemuArchBase.appendBridges is never actually used, because the bare
qemuArchBase type is itself never used (outside of unit tests).  Instead
*all* the subclasses of qemuArchBase override appendBridges() to call
the very similar, but not identical genericAppendBridges.  So, we can
remove the qemuArchBase.appendBridges implementation.

Furthermore, all those subclasses override appendBridges() in exactly
the same way, and so we can remove *those* definitions and replace the
base class qemuArchBase appendBridges() with that version, calling
genericAppendBridges().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-23 10:15:08 +10:00
Peng Tao
a9de761d71 runtime: drop qemu-lite support
As the project is not maintained and we have not been testing against it
for a long time.

Fixes: #2529
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-08-30 16:58:12 +08:00
Julio Montes
2859600a6f runtime: virtcontainers: make rootfs image read-only
Improve security by making rootfs image read-only, nobody
will be able to modify it from the guest.

fixes #1916

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-07-23 13:20:42 -05:00
Julio Montes
47d95dc1c6 runtime: virtcontainers: fix govet fieldalignment
Fix structures alignment

fixes #2271

Depends-on: github.com/kata-containers/tests#3727

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-07-20 11:59:15 -05:00
Marcel Apfelbaum
789a59549e virtcontainers: Remove the pc machine
Keeping around two different x86 machines has no added value
and require more tests and maintenance. Prefer the q35 machine
since it has more features and drop the pc machine.

Fixes #1953
Depends-on: github.com/kata-containers/tests#3586
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
2021-06-22 11:54:07 +03:00
Julio Montes
361bee91f7 runtime/virtcontrainers: fix alignment structures
fix alignment of qemuArchBase and HypervisorConfig structures

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-06-16 07:16:49 -05:00
Julio Montes
7834f4127f virtcontainers: change memory_offset to uint64
`memory_offset` is used to increase the maximum amount of memory
supported in a VM, this offset is equal to the NVDIMM/PMEM device that
is hot added, in real use case workloads such devices are bigger than
4G, which is the current limit (uint32).

fixes #2006

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-06-16 07:16:49 -05:00
Jakob Naucke
c0c05c73e1 virtcontainers: Add support for Secure Execution
Secure Execution is a confidential computing technology on s390x (IBM Z
& LinuxONE). Enable the correspondent virtualization technology in QEMU
(where it is referred to as "Protected Virtualization").

- Introduce enableProtection and appendProtectionDevice functions for
  QEMU s390x.
- Introduce CheckCmdline to check for "prot_virt=1" being present on the
  kernel command line.
- Introduce CPUFacilities and avilableGuestProtection for hypervisor
  s390x to check for CPU support.

Fixes: #1771

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-06-07 10:50:33 +02:00
Julio Montes
0affe8860d virtcontainers: define confidential guest framework
Define the structure and functions needed to support confidential
guests, this commit doesn't add support for any specific technology,
support for TDX, SEV, PEF and others will be added in following
commits.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-05-06 10:09:05 -05:00
Jakob Naucke
3ee61776d6 virtcontainers: Enable virtio-fs on s390x
Allow and configure vhost-user-fs devices (virtio-fs) on s390x. As a
consequence, appendVhostUserDevice now takes a context, which affects
its signature for other architectures.

Fixes: #1753

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-04-29 09:54:08 +02:00
Jakob Naucke
adba4532a4 virtcontainers: Revert "virtcontainers: Allow s390x appendVhostUserDevice"
This reverts commit 7f60911333.
Patch allowed other vhost user devices besides FS not supported on s390x
and failed to attach a CCW device number, which results in the
inavailability to use more devices after vhost-user-fs-ccw.

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-04-29 09:43:33 +02:00
Jakob Naucke
7f60911333 virtcontainers: Allow s390x appendVhostUserDevice
Remove the prohibition of vhost-user devices on s390x, which are by now
supported (e.g. vhost-user-fs-ccw). As a consequence,
appendVhostUserDevice no longer needs an error in its signature.
This enables virtio-fs support on s390x.

Fixes: #1469

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-04-20 12:20:32 +02:00
Chelsea Mafrica
e5aa4e7eb4 Merge pull request #1563 from Jakob-Naucke/s390x-missing-contexts
virtcontainers: Fix missing contexts in s390x
2021-03-30 09:38:28 -07:00
GabyCT
17840cb573 Merge pull request #1546 from devimc/2021-03-24/supportQEMU6
runtime: add support for QEMU 6
2021-03-29 14:33:16 -06:00
Jakob Naucke
31ced01eba virtcontainers: Fix missing contexts in s390x
#1389 has added a context for many signatures to improve trace spans.
Functions specific to s390x lack this. Add context where required. This
affects some common code signatures, since some functions that do not
require context on other architectures do require it on s390x.
Also remove an unnecessary import in test_qemu_s390x.go.

Fixes: #1562

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-03-29 17:49:27 +02:00
Julio Montes
1555bfd8b5 runtime: add support for QEMU 6
Use `on` and `off` to enable or disable features,
`no` prefix is deprecated

fixes #1545

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-03-24 10:55:35 -06:00
Peng Tao
0153f76b07 runtime: gofmt code
Looks like we have merged a lot of code that is not properly formated.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-03-24 14:37:46 +08:00
Chelsea Mafrica
6b0dc60dda runtime: Fix ordering of trace spans
A significant number of trace calls did not use a parent context that
would create proper span ordering in trace output. Add local context to
functions for use in trace calls to facilitate proper span ordering.
Additionally, change whether trace function returns context in some
functions in virtcontainers and use existing context rather than
background context in bindMount() so that span exists as a child of a
parent span.

Fixes #1355

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-03-16 17:39:28 -07:00
Jianyong Wu
b7a1f752c0 arm64: enable acpi for qemu/virt.
acpi is enabled for kata 1.x, port and rebase code for 2.x
including:
runtime: enable pflash;
agent: add acpi support for pci bus path;
packaging: enable CONFIG_RTC_DRV_EFI;

Fixes: #1317
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-01-29 22:12:43 +08:00