Commit Graph

35 Commits

Author SHA1 Message Date
Feng Wang
1cfe59304d runtime: Run QEMU using a non-root user/group
A random generated user/group is used to start QEMU VMM process.
The /dev/kvm group owner is also added to the QEMU process to grant it access.

Fixes #2444

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2021-09-17 11:28:44 -07:00
Julio Montes
47d95dc1c6 runtime: virtcontainers: fix govet fieldalignment
Fix structures alignment

fixes #2271

Depends-on: github.com/kata-containers/tests#3727

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-07-20 11:59:15 -05:00
Hui Zhu
cb6b7667cd runtime: Add option "enable_guest_swap" to config hypervisor.qemu
This commit add option "enable_guest_swap" to config hypervisor.qemu.
It will enable swap in the guest. Default false.
When enable_guest_swap is enabled, insert a raw file to the guest as the
swap device if the swappiness of a container (set by annotation
"io.katacontainers.container.resource.swappiness") is bigger than 0.
The size of the swap device should be
swap_in_bytes (set by annotation
"io.katacontainers.container.resource.swap_in_bytes") - memory_limit_in_bytes.
If swap_in_bytes is not set, the size should be memory_limit_in_bytes.
If swap_in_bytes and memory_limit_in_bytes is not set, the size should be
default_memory.

Fixes: #2201

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-07-19 23:22:06 +08:00
Jakob Naucke
cfd690b638 virtcontainers: Use virtio-blk-ccw on s390x
if virtio-blk-pci were to be used

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-07-08 14:59:47 +02:00
Julio Montes
361bee91f7 runtime/virtcontrainers: fix alignment structures
fix alignment of qemuArchBase and HypervisorConfig structures

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-06-16 07:16:49 -05:00
Julio Montes
7834f4127f virtcontainers: change memory_offset to uint64
`memory_offset` is used to increase the maximum amount of memory
supported in a VM, this offset is equal to the NVDIMM/PMEM device that
is hot added, in real use case workloads such devices are bigger than
4G, which is the current limit (uint32).

fixes #2006

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-06-16 07:16:49 -05:00
Jakob Naucke
c0c05c73e1 virtcontainers: Add support for Secure Execution
Secure Execution is a confidential computing technology on s390x (IBM Z
& LinuxONE). Enable the correspondent virtualization technology in QEMU
(where it is referred to as "Protected Virtualization").

- Introduce enableProtection and appendProtectionDevice functions for
  QEMU s390x.
- Introduce CheckCmdline to check for "prot_virt=1" being present on the
  kernel command line.
- Introduce CPUFacilities and avilableGuestProtection for hypervisor
  s390x to check for CPU support.

Fixes: #1771

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-06-07 10:50:33 +02:00
Carlos Venegas
2234b73090 metrics: Add virtiofsd exporter
Export proc stats for virtiofsd.

This commit only adds for hypervisors that have support for it.

- qemu
- cloud-hypervisor

Fixes: #1926

Signed-off-by: Carlos Venegas <jos.c.venegas.munoz@intel.com>
2021-06-02 16:06:00 +00:00
Julio Montes
0affe8860d virtcontainers: define confidential guest framework
Define the structure and functions needed to support confidential
guests, this commit doesn't add support for any specific technology,
support for TDX, SEV, PEF and others will be added in following
commits.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-05-06 10:09:05 -05:00
Julio Montes
88cf3db601 runtime: implement CPUFlags function
`CPUFlags` returns a map with all the CPU flags, these CPU flags
may help us to identiry whether a system support confidential computing
or not.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-05-03 09:33:13 -05:00
Christophe de Dinechin
dcb9f40394 config: Protect annotation for entropy_source
It would be undesirable to be given an annotation like "/dev/null".
Filter out bad annotation values.

Fixes: #1043

Suggested-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2021-04-22 15:26:40 +02:00
James O. D. Hunt
9256e590dc shutdown: Don't sever console watcher too early
Fixed logic used to handle static agent tracing.

For a standard (untraced) hypervisor shutdown, the runtime kills the VM
process once the workload has finished. But if static agent tracing is
enabled, the agent running inside the VM is responsible for the
shutdown. The existing code handled this scenario but did not wait for
the hypervisor process to end. The outcome of this being that the
console watcher thread was killed too early.

Although not a problem for an untraced system, if static agent tracing
was enabled, the logs from the hypervisor would be truncated, missing the
crucial final stages of the agents shutdown sequence.

The fix necessitated adding a new parameter to the `stopSandbox()` API,
which if true requests the runtime hypervisor logic simply to wait for
the hypervisor process to exit rather than killing it.

Fixes: #1696.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-04-15 15:22:00 +01:00
Chelsea Mafrica
6b0dc60dda runtime: Fix ordering of trace spans
A significant number of trace calls did not use a parent context that
would create proper span ordering in trace output. Add local context to
functions for use in trace calls to facilitate proper span ordering.
Additionally, change whether trace function returns context in some
functions in virtcontainers and use existing context rather than
background context in bindMount() so that span exists as a child of a
parent span.

Fixes #1355

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-03-16 17:39:28 -07:00
Jianyong Wu
b7a1f752c0 arm64: enable acpi for qemu/virt.
acpi is enabled for kata 1.x, port and rebase code for 2.x
including:
runtime: enable pflash;
agent: add acpi support for pci bus path;
packaging: enable CONFIG_RTC_DRV_EFI;

Fixes: #1317
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-01-29 22:12:43 +08:00
bin liu
4e3a8c0124 runtime: remove global sandbox variable
Remove global sandbox variable, and save *Sandbox to hypervisor struct.
For some needs, hypervisor may need to use methods from Sandbox.

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-13 09:47:09 +08:00
Fupan Li
d22c7cf00b Merge pull request #1013 from liubin/feature/1012-dump-guest-memroy-on-panic
Dump guest memory when kernel panic for QEMU
2020-11-09 09:46:28 +08:00
James O. D. Hunt
5ced96e96d hypervisor: Remove unused methods
Deleted `HypervisorConfig`'s unused  `CustomFirmwareAsset()` and
`JailerAssetPath()` methods.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-11-05 12:15:47 +00:00
bin liu
40418f6d88 runtime: add geust memory dump
When guest panic, dump guest kernel memory to host filesystem.
And also includes:
- hypervisor config
- hypervisor version
- and state of sandbox

Fixes: #1012

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-05 16:04:21 +08:00
Christophe de Dinechin
7c6aede5d4 config: Whitelist hypervisor annotations by name
Add a field "enable_annotations" to the runtime configuration that can
be used to whitelist annotations using a list of regular expressions,
which are used to match any part of the base annotation name, i.e. the
part after "io.katacontainers.config.hypervisor."

For example, the following configuraiton will match "virtio_fs_daemon",
"initrd" and "jailer_path", but not "path" nor "firmware":

  enable_annotations = [ "virtio.*", "initrd", "_path" ]

The default is an empty list of enabled annotations, which disables
annotations entirely.

If an anontation is rejected, the message is something like:

  annotation io.katacontainers.config.hypervisor.virtio_fs_daemon is not enabled

Fixes: #901

Suggested-by: Peng Tao <tao.peng@linux.alibaba.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-14 16:10:12 +02:00
Christophe de Dinechin
4e89b885d2 config: Protect file_mem_backend against annotation attacks
This one could theoretically be used to overwrite data on the host.
It seems somewhat less risky than the earlier ones for a number
of reasons, but worth protecting a little anyway.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-14 16:10:12 +02:00
Christophe de Dinechin
aae9656d8b config: Protect vhost_user_store_path against annotation attacks
This path could be used to overwrite data on the host.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-14 16:10:12 +02:00
Christophe de Dinechin
b21a829c61 config: Protect ctlpath from annotation attack
This also adds annotation for ctlpath which were not present
before. It's better to implement the code consistenly right now to make
sure that we don't end up with a leaky implementation tacked on later.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-14 16:10:12 +02:00
Christophe de Dinechin
27b6620b23 config: Protect jailer_path annotation
The jailer_path annotation can be used to execute arbitrary code on
the host. Add a jailer_path_list configuration entry providing a list
of regular expressions that can be used to filter annotations that
represent valid file names.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-14 16:10:12 +02:00
Christophe de Dinechin
bf13ff0a3a config: Protect virtio_fs_daemon annotation
Sending the virtio_fs_daemon annotation can be used to execute
arbitrary code on the host. In order to prevent this, restrict the
values of the annotation to a list provided by the configuration
file.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-14 16:10:12 +02:00
Julio Montes
6df165c19d runtime: add support for SGX
Support the `sgx.intel.com/epc` annotation that is defined by the intel
k8s plugin. This annotation enables SGX. Hardware-based isolation and
memory encryption.

For example, use `sgx.intel.com/epc = "64Mi"` to create a container
with 1 EPC section with pre-allocated memory.

At the time of writing this patch, SGX patches have not landed on the
linux kernel project.
The following github kernel fork contains all the SGX patches for the
host and guest: https://github.com/intel/kvm-sgx

fixes #483

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-01 08:24:29 -05:00
Qi Feng Huo
f5598a1bc2 Subject: [PATCH] qemu: add annotations for iommu_platform
for s390x virtio devices

Add iommu_platform annotations for qemu for ccw,
other supported devices can also make use of that.

  Fixes #603

Signed-off-by: Qi Feng Huo <huoqif@cn.ibm.com>
2020-08-28 11:25:14 +08:00
Penny Zheng
7f3e8959c5 console-watcher: use console watcher to monitor guest console outputs
Import new console watcher to monitor guest console outputs, and will be
only effective when we turn on enable_debug option.
Guest console outputs may include guest kernel debug info, agent debug info,
etc.

Fixes: #389

Signed-off-by: Penny Zheng penny.zheng@arm.com
2020-07-16 05:26:19 +00:00
Penny Zheng
1099a28830 kata 2.0: delete use_vsock option and proxy abstraction
With kata containers moving to 2.0, (hybrid-)vsock will be the only
way to directly communicate between host and agent.
And kata-proxy as additional component to handle the multiplexing on
serial port is also no longer needed.
Cleaning up related unit tests, and also add another mock socket type
`MockHybridVSock` to deal with ttrpc-based hybrid-vsock mock server.

Fixes: #389

Signed-off-by: Penny Zheng penny.zheng@arm.com
2020-07-16 04:20:02 +00:00
bin liu
3cf8b470cd runtime: delete Stateful from SandboxConfig
Since all containers are started from shim v2, `Stateful` is not needed.

Fixes: #332

Signed-off-by: bin liu <bin@hyper.sh>
2020-07-08 21:59:44 +08:00
Jia He
fa9d619e8a qemu: add cpu_features option
[ port from runtime commit 0100af18a2afdd6dfcc95129ec6237ba4915b3e5 ]

To control whether guest can enable/disable some CPU features. E.g. pmu=off,
vmx=off. As discussed in the thread [1], the best approach is to let users
specify them. How about adding a new option in the configuration file.

Currently this patch only supports this option in qemu,no other vmm.

[1] https://github.com/kata-containers/runtime/pull/2559#issuecomment-603998256

Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-06-29 20:16:11 -07:00
Penny Zheng
bd8658e362 rate-limiter: check if hypervisor supports built-in rate limiter
As for some hypervisors, like firecracker, they support built-in rate limiter
to control network I/O bandwidth on VMM level. And for some hypervisors, like qemu,
they don't.

Fixes: #250

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
2020-06-24 06:14:34 +00:00
Penny Zheng
c2645f5d5a rate-limiter: add rate limiter configuration/annotation on VM level
Add configuration/annotation about network I/O throttling on VM level.
rx_rate_limiter_max_rate is dedicated to control network inbound
bandwidth per pod.
tx_rate_limiter_max_rate is dedicated to control network outbound
bandwidth per pod.

Fixes: #250

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
2020-06-24 06:14:04 +00:00
Adrian Moreno
b97287090b qemu: enable iommu on q35
Add a configuration option and a Pod Annotation

If activated:
- Add kernel parameters to load iommu
- Add irqchip=split in the kvm options
- Add a vIOMMU to the VM

Fixes #2694
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2020-06-22 16:37:20 +02:00
Peng Tao
6de95bf36c gomod: update runtime import path
To use the kata-containers repo path.

Most of the change is generated by script:
find . -type f -name "*.go" |xargs sed -i -e \
's|github.com/kata-containers/runtime|github.com/kata-containers/kata-containers/src/runtime|g'

Fixes: #201
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-04-29 18:39:03 -07:00
Peng Tao
a02a8bda66 runtime: move all code to src/runtime
To prepare for merging into kata-containers repository.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-04-27 19:39:25 -07:00