Augment the mock hypervisor so that we can validate that ACPI memory hotplug
is carried out as expected.
We'll augment the number of memory slots in the hypervisor config each
time the memory of the hypervisor is changed. In this way we can ensure
that large memory hotplugs are broken up into appropriately sized
pieces in the unit test.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
If we're using ACPI hotplug for memory, there's a limitation on the
amount of memory which can be hotplugged at a single time.
During hotplug, we'll allocate memory for the memmap for each page,
resulting in a 64 byte per 4KiB page allocation. As an example, hotplugging 12GiB
of memory requires ~192 MiB of *free* memory, which is about the limit
we should expect for an idle 256 MiB guest (conservative heuristic of 75%
of provided memory).
From experimentation, at pod creation time we can reliably add 48 times
what is provided to the guest. (a factor of 48 results in using 75% of
provided memory for hotplug). Using prior example of a guest with 256Mi
RAM, 256 Mi * 48 = 12 Gi; 12GiB is upper end of what we should expect
can be hotplugged successfully into the guest.
Note: It isn't expected that we'll need to hotplug large amounts of RAM
after workloads have already started -- container additions are expected
to occur first in pod lifecycle. Based on this, we expect that provided
memory should be freely available for hotplug.
If virtio-mem is being utilized, there isn't such a limitation - we can
hotplug the max allowed memory at a single time.
Fixes: #4847
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
It'll be useful to get the total memory provided to the guest
(hotplugged + coldplugged). We'll use this information when calcualting
how much memory we can add at a time when utilizing ACPI hotplug.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
With the current TDX kernel used with Kata Containers, `tdx_guest` is
not needed, as TDX_GUEST is now a kernel configuration.
With this in mind, let's just drop the kernel parameter.
Fixes: #4981
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As right now the TDX guest kernel doesn't support "serial" console,
let's switch to using HVC in this case.
Fixes: #4980
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The runtime will crash when trying to resize memory when memory hotplug
is not allowed.
This happens because we cannot simply set the hotplug amount to zero,
leading is to not set memory hotplug at all, and later then trying to
access the value of a nil pointer.
Fixes: #4979
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
While doing tests using `ctr`, I've noticed that I've been hitting those
timeouts more frequently than expected.
Till we find the root cause of the issue (which is *not* in the Kata
Containers), let's increase the timeouts when dealing with a
Confidential Guest.
Fixes: #4978
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
When booting the TDX kernel with `tdx_disable_filter`, as it's been done
for QEMU, VirtioFS can work without any issues.
Whether this will be part of the upstream kernel or not is a different
story, but it easily could make it there as Cloud Hypervisor relies on
the VIRTIO_F_IOMMU_PLATFORM feature, which forces the guest to use the
DMA API, making these devices compatible with TDX.
See Sebastien Boeuf's explanation of this in the
3c973fa7ce208e7113f69424b7574b83f584885d commit:
"""
By using DMA API, the guest triggers the TDX codepath to share some of
the guest memory, in particular the virtqueues and associated buffers so
that the VMM and vhost-user backends/processes can access this memory.
"""
Fixes: #4977
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
If the API server is not ready, the mount call will fail, so before
mounting share fs, we should wait the nydusd is started and
the API server is ready.
Fixes: #4710
Signed-off-by: liubin <liubin0329@gmail.com>
Signed-off-by: Bin Liu <bin@hyper.sh>
Instead of passing a bunch of arguments to qmp functions for
adding block devices, use govmm BlockDevice structure to reduce these.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
This configuration will allow users to choose between different
I/O backends for qemu, with the default being io_uring.
This will allow users to fallback to a different I/O mechanism while
running on kernels olders than 5.1.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
To keep runtime-rs up to date, we will merge main into runtime-rs every
week.
Fixes:#4776
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
In qemu.StopVM(), if debug is enabled, the shim will dump logs
from qemu.log, but users don't know which logs are from qemu.log
and shim itself. Adding some additional messages will
help users to distinguish these logs.
Fixes: #4745
Signed-off-by: Bin Liu <bin@hyper.sh>
Return code is an int32 type, so if an error occurred, the default value
may be zero, this value will be created as a normal exit code.
Set return code to 255 will let the caller(for example Kubernetes) know
that there are some problems with the pod/container.
Fixes: #4419
Signed-off-by: liubin <liubin0329@gmail.com>
Ideally this config validation would be in a seperate package
(katautils?), but that would introduce circular dependency since we'd
call it from vc, and it depends on vc types (which, shouldn't be vc, but
probably a hypervisor package instead).
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
While working on the previous commits, some of the functions become
non-used. Let's simply remove them.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's adapt Cloud Hypervisor's and QEMU's code to properly behave to the
newly added `default_maxmemory` config.
While implementing this, a change of behaviour (or a bug fix, depending
on how you see it) has been introduced as if a pod requests more memory
than the amount avaiable in the host, instead of failing to start the
pod, we simply hotplug the maximum amount of memory available, mimicing
better the runc behaviour.
Fixes: #4516
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add a `default_maxmemory` configuration, which allows the admins
to set the maximum amount of memory to be used by a VM, considering the
initial amount + whatever ends up being hotplugged via the pod limits.
By default this value is 0 (zero), and it means that the whole physical
RAM is the limit.
Fixes: #4516
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Depending on the user of it, the hypervisor from hypervisor interface
could have differing view on what is valid or not. To help decouple,
let's instead check the hypervisor config validity as part of the
sandbox creation, rather than as part of the CreateVM call within the
hypervisor interface implementation.
Fixes: #4251
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Policy for whats valid/invalid within the config varies by VMM, host,
and by silicon architecture. Let's keep katautils simple for just
translating a toml to the hypervisor config structure, and leave
validation to virtcontainers.
Without this change, we're doing duplicate validation.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Rather than have device package depend on persist, let's define the
(almost duplicate) structures within device itself, and have the Kata
Container's persist pkg import these.
This'll help avoid unecessary dependencies within our core packages.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>