There is a new "ReadOnly" option added to nvdimm device in qemu
and now added to kata. However, qemu used for arm64 is a little
old and has no this feature. Here we remove this feature for arm.
Depends-on: github.com/kata-containers/tests#3831
Fixes: #2320
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
It looks like the version check for cloud hypervisor (clh) was added
initially when clh was actively evolving its API. We no longer need the
version check as clh API has been fairly stable for its recent releases.
Fixes: #1991
Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit d08603bebb)
When running a firecracker instance jailed, block devices
are not removed correctly, as the jailerRoot path is not
stripped from the PATCH command sent to the FC API.
This patch differentiates the jailed case from the non-jailed
one and allows the firecracker instance to be properly
terminated.
Fixes#2387
Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
(cherry picked from commit 64dd35ba4f)
Improve security by making rootfs image read-only, nobody
will be able to modify it from the guest.
fixes#1916
Signed-off-by: Julio Montes <julio.montes@intel.com>
We'd better stop the virtiofsd specifically after stop qemu,
instead of depending on the qemu's termination to notify virtiofsd
to exit.
Fixes: #2211
Signed-off-by: fupan.lfp <fupan.lfp@antgroup.com>
Architectures that do not support memory hotplugging will fail when
memory limits are set because that amount is hotplugged. Issue a warning
instead. The long-term solution is virtio-mem.
Fixes: #1412
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
This commit will fix two problems:
- Virtiofsd process ID returned to the caller will always be 0,
the pid var is never being assigned a value.
- Socket listen fd may leak in case of failure of starting virtiofsd process.
This is a port of be9ca0d58bFixes: #1931
Signed-off-by: bin <bin@hyper.sh>
(cherry picked from commit 773deca2f6)
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Without this, if the shim dies, we will not have a reliable way to
identify what mounts should be cleaned up if `containerd-shim-kata-v2
cleanup` is called for the sandbox.
Before this, if you `ctr run` with a sandbox bindmount defined and SIGKILL the
containerd-shim-kata-v2, you'll notice the sandbox bindmount left on
host.
With this change, the shim is able to get the sandbox bindmount
information from disk and do the appropriate cleanup.
Fixes#1896
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
(cherry picked from commit 7f1030d303)
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
If for any reason there's an error when trying to setup the sandbox
bindmounts, make sure we roll back any mounts already created when
setting up the sandbox.
Without this, we'd leave shared directory mount and potentially
sandbox-bindmounts on the host.
Fixes: #1895
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
(cherry picked from commit 089a7484e1)
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
allow to set dialing timeout in configuration.toml
default is 30s
Fixes: #1789
(cherry-picked 01b56d6cbf)
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
Got:
FATA[0000] run pod sandbox: rpc error: code = Unknown desc = failed to
create containerd task: Add 189759MB virtio-mem-pci fail QMP command
failed: backend memory size must be multiple of 0x200000: unknown
This commit let sizeMB be multiple of 2Mib to fix the issue.
Fixes: #1796
Signed-off-by: Hui Zhu <teawater@antfin.com>
Quotes from the cloud-hypervisor release v15.0:
This release is the first in a new version numbering scheme to represent that
we believe Cloud Hypervisor is maturing and entering a period of stability.
With this new release we are beginning our new stability guarantees.
Other highlights from the latest release include: 1) Network device rate
limiting; 2) Support for runtime control of `virtio-net` guest offload;
3) `--api-socket` supports file descriptor parameter; 4) Bug fixes on
`virtio-pmem`, PCI BARs alignment, `virtio-net`, etc.; 5) Deprecation of
the "LinuxBoot" protocol for ELF and bzImage in the coming release.
Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v15.0
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by `openapi-generator` [1-2]. As the API changes do not
impact usages in Kata, no additional changes in kata's runtime are
needed to work with the current version of cloud-hypervisor.
[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.mdFixes: #1779
Signed-off-by: Bo Chen <chen.bo@intel.com>
Allow and configure vhost-user-fs devices (virtio-fs) on s390x. As a
consequence, appendVhostUserDevice now takes a context, which affects
its signature for other architectures.
Fixes: #1753
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
This reverts commit 7f60911333.
Patch allowed other vhost user devices besides FS not supported on s390x
and failed to attach a CCW device number, which results in the
inavailability to use more devices after vhost-user-fs-ccw.
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
cgroupsCreate will just keep the CPU resources infomation but not the
others.
Set it to c.config.Resources will clean most of resources of the
container.
This commit remove it to handle the issue.
Fixes: #1758
Signed-off-by: Hui Zhu <teawater@antfin.com>
The pointer that send to newContainer in CreateContainer and
createContainers is not the pointer that point to the address in
s.config.Containers.
This commit fix this issue.
Fixes: #1758
Signed-off-by: Hui Zhu <teawater@antfin.com>
If the QEMU VMM fails to launch, we currently fail to kill virtiofsd,
resulting in leftover processes running on the host. Let's make sure we
kill these, and explicitly cleanup the virtiofs socket on the
filesystem.
Ideally we'll migrate QEMU to utilize the same virtiofsd interface that
CLH uses, but let's fix this bug as a first step.
Fixes: #1755
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Users can specify extra arguments for virtiofsd in a pod spec using the
io.katacontainers.config.hypervisor.virtio_fs_extra_args annontation.
However, this annotation was ignored so far by the runtime. This commit
fixes the issue by processing the annotation value (if present) and
translating it to the corresponding hypervisor configuration item.
Fixes#1523
Signed-off-by: Pavel Mores <pmores@redhat.com>
It would be undesirable to be given an annotation like "/dev/null".
Filter out bad annotation values.
Fixes: #1043
Suggested-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
For k8s emptyDir volume, a specific fsGroup would
be set for it, thus runtime should pass this fsGroup
for EphemeralStorage to guest and set it properly on
the emptyDir volume in guest.
Fixes: #1580
Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
Update:
- Make the type of errnoRet in oci.proto oneof
- Update seccomp_grpc_to_oci that can set errnoRet as EPREM if the
value is empty.
- Update the oci.pb.go based on the above fixes
- Add seccomp errnoRet and flags option to configs in rustjail
Fixes: #1719
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
Remove the prohibition of vhost-user devices on s390x, which are by now
supported (e.g. vhost-user-fs-ccw). As a consequence,
appendVhostUserDevice no longer needs an error in its signature.
This enables virtio-fs support on s390x.
Fixes: #1469
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
Rather than relying on the system clock, use a channel timeout to avoid
problems if the system time changed.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Fixed logic used to handle static agent tracing.
For a standard (untraced) hypervisor shutdown, the runtime kills the VM
process once the workload has finished. But if static agent tracing is
enabled, the agent running inside the VM is responsible for the
shutdown. The existing code handled this scenario but did not wait for
the hypervisor process to end. The outcome of this being that the
console watcher thread was killed too early.
Although not a problem for an untraced system, if static agent tracing
was enabled, the logs from the hypervisor would be truncated, missing the
crucial final stages of the agents shutdown sequence.
The fix necessitated adding a new parameter to the `stopSandbox()` API,
which if true requests the runtime hypervisor logic simply to wait for
the hypervisor process to exit rather than killing it.
Fixes: #1696.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Previously, the hypervisors were sending a signal and then checking to
see if the process had died by sending the magic null signal (`0`). However,
that doesn't work as it was written: the logic was assuming sending the
null signal to a process that was dead would return `ESRCH`, but it
doesn't: you first need to you `wait(2)` for the process before sending
that signal. This means that previously, all affected hypervisors would
appear to take `timeout` seconds to end, even though they had _already_
finished.
Now, the hypervisors true end time will be seen as we wait for the
processes before sending the null signal to ensure the process has
finished.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Refactored some of the hypervisors to remove the duplicated code used to
trigger a shutdown.
Also added some unit tests.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Currently runtime and agent special case virtio-blk devices under clh,
ostensibly because the PCI address information is not available in that
case.
In fact, cloud-hypervisor's VmAddDiskPut API does return a PciDeviceInfo,
which includes a PCI address. That API is broken, because PCI addressing
depends on guest (firmware or OS) actions that the hypervisor won't know
about. clh only gets away with this because it only uses a single PCI root
and never uses PCI bridges, in which case the guest addresses are
accurately predictable: they always have domain and bus zero.
Until https://github.com/kata-containers/kata-containers/pull/1190, Kata
couldn't handle PCI addressing unless there was exactly one bridge, which
might be why this was actually special-cased for clh.
With #1190 merged, we can handle more general PCI paths, and we can derive
a trivial (one element) PCI path from the information that the clh API
gives us. We can use that to remove this special case.
fixes#1431
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>