A discussion on the Linux kernel mailing list [1] exposed that virtiofsd makes a
core assumption that the file systems being shared are not accessible by any
non-privileged user. We currently create the `shared` directory in the sandbox
with the default `0750` permissions, which gives read and directory traversal
access to the group. There is no real good reason for a non-root user to access
the shared directory, and this is potentially dangerous.
Fixes: #2589
[1]: https://lore.kernel.org/linux-fsdevel/YTI+k29AoeGdX13Q@redhat.com/
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Retrieve the absolute sandbox storage path. We will soon need this to
monitor the creation/deletion of new kata sandboxes.
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
The storage path we use to collect the sandbox files is defined in the
virtcontainers/persist/fs package.
We create the runtime socket in that storage path, by hardcoding the
full path in the SocketAddress() function in the runtime package.
This commit splits the hardcoded path by the socket address path so that
the runtime package will be able to provide the storage path to all the
components that may need it.
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
In order to retrieve the list of sandboxes, we poll the container engine
every 15 seconds via the CRI. Once we have the list we have to inspect
each pod to find out the kata ones.
This commit extend the sandbox cache to keep track of all the pods,
marking the kata ones, so that during the next polling only the new
sandboxes should be inspected to figure out which ones are using the
kata runtime.
Fixes: #2563
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
this is an unexpected event (likely a change in how containerd/cri-o
record the lower level runtime in the pod) and should be more visible:
raise the log level to "warning".
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
Change logger in Trace call in newContainer from sandbox.Logger() to
nil. Passing nil will cause an error to be logged by kataTraceLogger
instead of the sandbox logger, which will avoid having the log message
report it as part of the sandbox subsystem when it is part of the
container subsystem.
The kataTraceLogger will not log it as related to the container
subsystem, but since the container logger has not been created at this
point, and we already use the kataTraceLogger in other instances where a
subsystem's logger has not been created yet, this PR makes the call
consistent with other code.
Fixes#2665
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Call StopTracing with s.rootCtx, which is the root context for tracing,
instead of s.ctx, which is parent to a subset of trace spans.
Fixes#2661
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
A random generated user/group is used to start QEMU VMM process.
The /dev/kvm group owner is also added to the QEMU process to grant it access.
Fixes#2444
Signed-off-by: Feng Wang <feng.wang@databricks.com>
This patch adds the configuration option that allows to use hugepages
with Cloud Hypervisor guests.
Fixes: #2648
Signed-off-by: Bo Chen <chen.bo@intel.com>
We recently updated to using qemu-6.1 (from qemu 5.2). Unfortunately one
breaking change in qemu 6.0 wasn't caught by the CI.
The query-cpus QMP command has been removed, replaced by query-cpus-fast
(which has been available since qemu 2.12). govmm already had support for
query-cpus-fast, we just weren't using it, so the change is quite easy.
fixes#2643
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The new API is based on containerd's cgroups package.
With that conversion we can simpligy the virtcontainers sandbox code and
also uniformize our cgroups external API dependency. We now only depend
on containerd/cgroups for everything cgroups related.
Depends-on: github.com/kata-containers/tests#3805
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Eventually, we will convert the virtcontainers and the whole Kata
runtime code base to only rely on that package.
This will make Kata only depends on the simpler containerd cgroups API.
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
The only process we are adding there is the container host one, and
there is no such thing anymore.
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
This is a simplification of the host cgroup handling by partitioning the
host cgroups into 2: A sandbox cgroup and an overhead cgroup.
The sandbox cgroup is always created and initialized. The overhead
cgroup is only available when sandbox_cgroup_only is unset, and is
unconstrained on all controllers. The goal of having an overhead cgroup
is to be more flexible on how we manage a pod overhead. Having such
cgroup will allow for setting a fixed overhead per pod, for a subset of
controllers, while at the same time not having the pod being accounted
for those resources.
When sandbox_cgroup_only is not set, we move all non vCPU threads
to the overhead cgroup and let them run unconstrained. When it is set,
all pod related processes and threads will run in the sandbox cgroup.
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
Regardless of the sandbox_cgroup_only setting, we create the sandbox
cgroup manager and set the sandbox cgroup path at the same time.
Without doing this, the hypervisor constraint routine is mostly a NOP as
the sandbox state cgroup path is not initialized.
Fixes#2184
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
Sync the virtcontainers api.md document, add `ConfidentialGuest` `EntropySourceList` `GuestSwap` three
fields to the HypervisorConfig API.
Fixes#2625
Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
sync the virtcontainers api.md document, add SandboxBindMounts field to the SandboxConfig API.
And update the order of the SandboxConfig API fields.
Fixes#2621
Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
If the device has no permission, such as /dev/null, /dev/urandom,
it needs to be added into cgroup.
Fixes: #2615
Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
In getThreadIDs(), the cpuID variable is derived from a string that
already contains a whitespace. As a result, strings.SplitAfter returns
the cpuID with a leading space. This makes any go variant of string to int
fail (strconv.ParseInt() in our case). This patch makes sure that the
leading space character is removed so the string passed to
strconv.ParseInt() is "CPUID" and not " CPUID".
This has been caused by a change in the naming scheme of vcpu threads
for Firecracker after v0.19.1.
Fixes: #2592
Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
adds the default devices for unix such as /dev/null, /dev/urandom to
the container's resource cgroup spec
Fixes: #2539
Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
There is no need to keep multiple copies of the license file in
different directory. We can just use the top level one for the project.
Fixes: #2553
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Use ExecuteBlockdevAddWithDriverCache with swap in
hotplugAddBlockDevice to handle swap file cannot work OK with
ExecuteBlockdevAddWithCache issue.
Fixes: #2548
Signed-off-by: Hui Zhu <teawater@antfin.com>