Under certain circumstances[0] Kata will attempt to use SHPC hotplug
for PCI devices on the guest. In fact we explicitly enable SHPC on
our PCI to PCI bridges, regardless of the qemu default.
SHPC was designed a long, long time ago for physical hotplugging and
works very poorly for a virtual environment. In particular it has a
mandatory 5s delay to allow a (real, human) operator to back out the
operation if they press a button by mistake. This alone makes it
unusable for a fast start up application like Kata.
Worse, the agent forces a PCI rescan during startup. That will race
with the SHPC hotplug operation causing the device to go into a bad
state where config space can't be accessed from the guest at all.
The only reason we've sort of gotten away with this is that our
default guest kernel configuration triggers what's arguably a kernel
bug effectively disabling SHPC. That makes the agent rescan the only
reason we see the new device.
Now that we require a qemu >=6.1, which includes ACPI PCI hotplug on
the q35 machine, we can explicitly disable SHPC in all cases. It's
nothing but trouble.
fixes#2174
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
qemuArchBase.appendBridges is never actually used, because the bare
qemuArchBase type is itself never used (outside of unit tests). Instead
*all* the subclasses of qemuArchBase override appendBridges() to call
the very similar, but not identical genericAppendBridges. So, we can
remove the qemuArchBase.appendBridges implementation.
Furthermore, all those subclasses override appendBridges() in exactly
the same way, and so we can remove *those* definitions and replace the
base class qemuArchBase appendBridges() with that version, calling
genericAppendBridges().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Change logger in Trace call in newContainer from sandbox.Logger() to
nil. Passing nil will cause an error to be logged by kataTraceLogger
instead of the sandbox logger, which will avoid having the log message
report it as part of the sandbox subsystem when it is part of the
container subsystem.
The kataTraceLogger will not log it as related to the container
subsystem, but since the container logger has not been created at this
point, and we already use the kataTraceLogger in other instances where a
subsystem's logger has not been created yet, this PR makes the call
consistent with other code.
Fixes#2665
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
A random generated user/group is used to start QEMU VMM process.
The /dev/kvm group owner is also added to the QEMU process to grant it access.
Fixes#2444
Signed-off-by: Feng Wang <feng.wang@databricks.com>
This patch adds the configuration option that allows to use hugepages
with Cloud Hypervisor guests.
Fixes: #2648
Signed-off-by: Bo Chen <chen.bo@intel.com>
We recently updated to using qemu-6.1 (from qemu 5.2). Unfortunately one
breaking change in qemu 6.0 wasn't caught by the CI.
The query-cpus QMP command has been removed, replaced by query-cpus-fast
(which has been available since qemu 2.12). govmm already had support for
query-cpus-fast, we just weren't using it, so the change is quite easy.
fixes#2643
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The new API is based on containerd's cgroups package.
With that conversion we can simpligy the virtcontainers sandbox code and
also uniformize our cgroups external API dependency. We now only depend
on containerd/cgroups for everything cgroups related.
Depends-on: github.com/kata-containers/tests#3805
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Eventually, we will convert the virtcontainers and the whole Kata
runtime code base to only rely on that package.
This will make Kata only depends on the simpler containerd cgroups API.
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
The only process we are adding there is the container host one, and
there is no such thing anymore.
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
This is a simplification of the host cgroup handling by partitioning the
host cgroups into 2: A sandbox cgroup and an overhead cgroup.
The sandbox cgroup is always created and initialized. The overhead
cgroup is only available when sandbox_cgroup_only is unset, and is
unconstrained on all controllers. The goal of having an overhead cgroup
is to be more flexible on how we manage a pod overhead. Having such
cgroup will allow for setting a fixed overhead per pod, for a subset of
controllers, while at the same time not having the pod being accounted
for those resources.
When sandbox_cgroup_only is not set, we move all non vCPU threads
to the overhead cgroup and let them run unconstrained. When it is set,
all pod related processes and threads will run in the sandbox cgroup.
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
Regardless of the sandbox_cgroup_only setting, we create the sandbox
cgroup manager and set the sandbox cgroup path at the same time.
Without doing this, the hypervisor constraint routine is mostly a NOP as
the sandbox state cgroup path is not initialized.
Fixes#2184
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
Sync the virtcontainers api.md document, add `ConfidentialGuest` `EntropySourceList` `GuestSwap` three
fields to the HypervisorConfig API.
Fixes#2625
Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
sync the virtcontainers api.md document, add SandboxBindMounts field to the SandboxConfig API.
And update the order of the SandboxConfig API fields.
Fixes#2621
Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
If the device has no permission, such as /dev/null, /dev/urandom,
it needs to be added into cgroup.
Fixes: #2615
Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
In getThreadIDs(), the cpuID variable is derived from a string that
already contains a whitespace. As a result, strings.SplitAfter returns
the cpuID with a leading space. This makes any go variant of string to int
fail (strconv.ParseInt() in our case). This patch makes sure that the
leading space character is removed so the string passed to
strconv.ParseInt() is "CPUID" and not " CPUID".
This has been caused by a change in the naming scheme of vcpu threads
for Firecracker after v0.19.1.
Fixes: #2592
Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
adds the default devices for unix such as /dev/null, /dev/urandom to
the container's resource cgroup spec
Fixes: #2539
Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
There is no need to keep multiple copies of the license file in
different directory. We can just use the top level one for the project.
Fixes: #2553
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Use ExecuteBlockdevAddWithDriverCache with swap in
hotplugAddBlockDevice to handle swap file cannot work OK with
ExecuteBlockdevAddWithCache issue.
Fixes: #2548
Signed-off-by: Hui Zhu <teawater@antfin.com>