A random generated user/group is used to start QEMU VMM process.
The /dev/kvm group owner is also added to the QEMU process to grant it access.
Fixes#2444
Signed-off-by: Feng Wang <feng.wang@databricks.com>
This commit add option "enable_guest_swap" to config hypervisor.qemu.
It will enable swap in the guest. Default false.
When enable_guest_swap is enabled, insert a raw file to the guest as the
swap device if the swappiness of a container (set by annotation
"io.katacontainers.container.resource.swappiness") is bigger than 0.
The size of the swap device should be
swap_in_bytes (set by annotation
"io.katacontainers.container.resource.swap_in_bytes") - memory_limit_in_bytes.
If swap_in_bytes is not set, the size should be memory_limit_in_bytes.
If swap_in_bytes and memory_limit_in_bytes is not set, the size should be
default_memory.
Fixes: #2201
Signed-off-by: Hui Zhu <teawater@antfin.com>
`memory_offset` is used to increase the maximum amount of memory
supported in a VM, this offset is equal to the NVDIMM/PMEM device that
is hot added, in real use case workloads such devices are bigger than
4G, which is the current limit (uint32).
fixes#2006
Signed-off-by: Julio Montes <julio.montes@intel.com>
Secure Execution is a confidential computing technology on s390x (IBM Z
& LinuxONE). Enable the correspondent virtualization technology in QEMU
(where it is referred to as "Protected Virtualization").
- Introduce enableProtection and appendProtectionDevice functions for
QEMU s390x.
- Introduce CheckCmdline to check for "prot_virt=1" being present on the
kernel command line.
- Introduce CPUFacilities and avilableGuestProtection for hypervisor
s390x to check for CPU support.
Fixes: #1771
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
Export proc stats for virtiofsd.
This commit only adds for hypervisors that have support for it.
- qemu
- cloud-hypervisor
Fixes: #1926
Signed-off-by: Carlos Venegas <jos.c.venegas.munoz@intel.com>
Define the structure and functions needed to support confidential
guests, this commit doesn't add support for any specific technology,
support for TDX, SEV, PEF and others will be added in following
commits.
Signed-off-by: Julio Montes <julio.montes@intel.com>
`CPUFlags` returns a map with all the CPU flags, these CPU flags
may help us to identiry whether a system support confidential computing
or not.
Signed-off-by: Julio Montes <julio.montes@intel.com>
It would be undesirable to be given an annotation like "/dev/null".
Filter out bad annotation values.
Fixes: #1043
Suggested-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Fixed logic used to handle static agent tracing.
For a standard (untraced) hypervisor shutdown, the runtime kills the VM
process once the workload has finished. But if static agent tracing is
enabled, the agent running inside the VM is responsible for the
shutdown. The existing code handled this scenario but did not wait for
the hypervisor process to end. The outcome of this being that the
console watcher thread was killed too early.
Although not a problem for an untraced system, if static agent tracing
was enabled, the logs from the hypervisor would be truncated, missing the
crucial final stages of the agents shutdown sequence.
The fix necessitated adding a new parameter to the `stopSandbox()` API,
which if true requests the runtime hypervisor logic simply to wait for
the hypervisor process to exit rather than killing it.
Fixes: #1696.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
A significant number of trace calls did not use a parent context that
would create proper span ordering in trace output. Add local context to
functions for use in trace calls to facilitate proper span ordering.
Additionally, change whether trace function returns context in some
functions in virtcontainers and use existing context rather than
background context in bindMount() so that span exists as a child of a
parent span.
Fixes#1355
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
acpi is enabled for kata 1.x, port and rebase code for 2.x
including:
runtime: enable pflash;
agent: add acpi support for pci bus path;
packaging: enable CONFIG_RTC_DRV_EFI;
Fixes: #1317
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Remove global sandbox variable, and save *Sandbox to hypervisor struct.
For some needs, hypervisor may need to use methods from Sandbox.
Signed-off-by: bin liu <bin@hyper.sh>
When guest panic, dump guest kernel memory to host filesystem.
And also includes:
- hypervisor config
- hypervisor version
- and state of sandbox
Fixes: #1012
Signed-off-by: bin liu <bin@hyper.sh>
Add a field "enable_annotations" to the runtime configuration that can
be used to whitelist annotations using a list of regular expressions,
which are used to match any part of the base annotation name, i.e. the
part after "io.katacontainers.config.hypervisor."
For example, the following configuraiton will match "virtio_fs_daemon",
"initrd" and "jailer_path", but not "path" nor "firmware":
enable_annotations = [ "virtio.*", "initrd", "_path" ]
The default is an empty list of enabled annotations, which disables
annotations entirely.
If an anontation is rejected, the message is something like:
annotation io.katacontainers.config.hypervisor.virtio_fs_daemon is not enabled
Fixes: #901
Suggested-by: Peng Tao <tao.peng@linux.alibaba.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
This one could theoretically be used to overwrite data on the host.
It seems somewhat less risky than the earlier ones for a number
of reasons, but worth protecting a little anyway.
Fixes: #901
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
This also adds annotation for ctlpath which were not present
before. It's better to implement the code consistenly right now to make
sure that we don't end up with a leaky implementation tacked on later.
Fixes: #901
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
The jailer_path annotation can be used to execute arbitrary code on
the host. Add a jailer_path_list configuration entry providing a list
of regular expressions that can be used to filter annotations that
represent valid file names.
Fixes: #901
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Sending the virtio_fs_daemon annotation can be used to execute
arbitrary code on the host. In order to prevent this, restrict the
values of the annotation to a list provided by the configuration
file.
Fixes: #901
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Support the `sgx.intel.com/epc` annotation that is defined by the intel
k8s plugin. This annotation enables SGX. Hardware-based isolation and
memory encryption.
For example, use `sgx.intel.com/epc = "64Mi"` to create a container
with 1 EPC section with pre-allocated memory.
At the time of writing this patch, SGX patches have not landed on the
linux kernel project.
The following github kernel fork contains all the SGX patches for the
host and guest: https://github.com/intel/kvm-sgxfixes#483
Signed-off-by: Julio Montes <julio.montes@intel.com>
for s390x virtio devices
Add iommu_platform annotations for qemu for ccw,
other supported devices can also make use of that.
Fixes#603
Signed-off-by: Qi Feng Huo <huoqif@cn.ibm.com>
Import new console watcher to monitor guest console outputs, and will be
only effective when we turn on enable_debug option.
Guest console outputs may include guest kernel debug info, agent debug info,
etc.
Fixes: #389
Signed-off-by: Penny Zheng penny.zheng@arm.com
With kata containers moving to 2.0, (hybrid-)vsock will be the only
way to directly communicate between host and agent.
And kata-proxy as additional component to handle the multiplexing on
serial port is also no longer needed.
Cleaning up related unit tests, and also add another mock socket type
`MockHybridVSock` to deal with ttrpc-based hybrid-vsock mock server.
Fixes: #389
Signed-off-by: Penny Zheng penny.zheng@arm.com
[ port from runtime commit 0100af18a2afdd6dfcc95129ec6237ba4915b3e5 ]
To control whether guest can enable/disable some CPU features. E.g. pmu=off,
vmx=off. As discussed in the thread [1], the best approach is to let users
specify them. How about adding a new option in the configuration file.
Currently this patch only supports this option in qemu,no other vmm.
[1] https://github.com/kata-containers/runtime/pull/2559#issuecomment-603998256
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
As for some hypervisors, like firecracker, they support built-in rate limiter
to control network I/O bandwidth on VMM level. And for some hypervisors, like qemu,
they don't.
Fixes: #250
Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Add configuration/annotation about network I/O throttling on VM level.
rx_rate_limiter_max_rate is dedicated to control network inbound
bandwidth per pod.
tx_rate_limiter_max_rate is dedicated to control network outbound
bandwidth per pod.
Fixes: #250
Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Add a configuration option and a Pod Annotation
If activated:
- Add kernel parameters to load iommu
- Add irqchip=split in the kvm options
- Add a vIOMMU to the VM
Fixes#2694
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
To use the kata-containers repo path.
Most of the change is generated by script:
find . -type f -name "*.go" |xargs sed -i -e \
's|github.com/kata-containers/runtime|github.com/kata-containers/kata-containers/src/runtime|g'
Fixes: #201
Signed-off-by: Peng Tao <bergwolf@hyper.sh>