Commit Graph

360 Commits

Author SHA1 Message Date
Jianyong Wu
5d86916e03 qemu/arm: remove nvdimm/"ReadOnly" option on arm64
There is a new "ReadOnly" option added to nvdimm device in qemu
and now added to kata. However, qemu used for arm64 is a little
old and has no this feature. Here we remove this feature for arm.

Depends-on: github.com/kata-containers/tests#3831
Fixes: #2320
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-08-17 15:33:32 +08:00
Fabiano Fidêncio
b52f719e4b Merge pull request #2366 from likebreath/0730/backport_clh_v17.0
stable-2.1 | versions: Upgrade to Cloud Hypervisor v17.0
2021-08-12 22:07:07 +02:00
Bo Chen
b6a8a71afa versions: Upgrade to Cloud Hypervisor v17.0
Highlights from the Cloud Hypervisor release v17.0: 1) ARM64 NUMA
support using ACPI; 2) `Seccomp` support for MSHV backend; 3) Hotplug of
macvtap devices; 4) Improved SGX support; 5) Inflight tracking for
`vhost-user` devices; 6) Bug fixes.

Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v17.0

Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by `openapi-generator` [1-2]. As the API changes do not
impact usages in Kata, no additional changes in kata's runtime are
needed to work with the current version of cloud-hypervisor.

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Fixes: #2333

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit cc0bb9aebc)
2021-08-11 10:52:58 -07:00
Bo Chen
87b7d054fb runtime: Remove the version check for cloud hypervisor
It looks like the version check for cloud hypervisor (clh) was added
initially when clh was actively evolving its API. We no longer need the
version check as clh API has been fairly stable for its recent releases.

Fixes: #1991

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit d08603bebb)
2021-08-11 10:48:37 -07:00
Anastassios Nanos
adcd2aeaea virtcontainers: fc: properly remove jailed block device
When running a firecracker instance jailed, block devices
are not removed correctly, as the jailerRoot path is not
stripped from the PATCH command sent to the FC API.

This patch differentiates the jailed case from the non-jailed
one and allows the firecracker instance to be properly
terminated.

Fixes #2387

Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
(cherry picked from commit 64dd35ba4f)
2021-08-09 16:16:12 +01:00
Julio Montes
d3bab50496 runtime: virtcontainers: make rootfs image read-only
Improve security by making rootfs image read-only, nobody
will be able to modify it from the guest.

fixes #1916

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-08-04 10:38:41 +03:00
fupan.lfp
0f7a54adf3 qemu: stop the virtiofsd specifically
We'd better stop the virtiofsd specifically after stop qemu,
instead of depending on the qemu's termination to notify virtiofsd
to exit.

Fixes: #2211

Signed-off-by: fupan.lfp <fupan.lfp@antgroup.com>
2021-08-02 10:30:40 +03:00
Julio Montes
8138a16b8b runtime: do not hot-remove PMEM devices
PMEM devices cannot be hot-removed from a running VM.

fixes #2018

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-08-02 10:30:40 +03:00
Jakob Naucke
d1de06c9ea virtcontainers: Don't fail memory hotplug
Architectures that do not support memory hotplugging will fail when
memory limits are set because that amount is hotplugged. Issue a warning
instead. The long-term solution is virtio-mem.

Fixes: #1412
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-08-02 10:30:40 +03:00
Bo Chen
b35506d124 versions: Upgrade to cloud-hypervisor v16.0
Highlights from the Cloud Hypervisor release v16.0: 1) Improved live
migration support; 2) Improved `vhost-user` support; 3) ARM64 ACPI and
UEFI support; 4) Bug fixes.

Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v16.0

Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by `openapi-generator` [1-2]. As the API changes do not
impact usages in Kata, no additional changes in kata's runtime are
needed to work with the current version of cloud-hypervisor.

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Fixes: #1992

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 85c40001da)
2021-06-15 10:26:09 -07:00
bin
8019f7322d virtiofsd: Fix file descriptors leak and return correct PID
This commit will fix two problems:
- Virtiofsd process ID returned to the caller will always be 0,
   the pid var is never being assigned a value.
- Socket listen fd may leak in case of failure of starting virtiofsd process.
  This is a port of be9ca0d58b

Fixes: #1931

Signed-off-by: bin <bin@hyper.sh>
(cherry picked from commit 773deca2f6)
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-05-31 16:56:15 +02:00
Eric Ernst
c51891fee7 sandbox-bindmount: persist mount information
Without this, if the shim dies, we will not have a reliable way to
identify what mounts should be cleaned up if `containerd-shim-kata-v2
cleanup` is called for the sandbox.

Before this, if you `ctr run` with a sandbox bindmount defined and SIGKILL the
containerd-shim-kata-v2, you'll notice the sandbox bindmount left on
host.

With this change, the shim is able to get the sandbox bindmount
information from disk and do the appropriate cleanup.

Fixes #1896

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
(cherry picked from commit 7f1030d303)
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-05-24 15:08:08 +02:00
Eric Ernst
b137c7ac33 sandbox: Cleanup if failure to setup sandbox-bindmount occurs
If for any reason there's an error when trying to setup the sandbox
bindmounts, make sure we roll back any mounts already created when
setting up the sandbox.

Without this, we'd leave shared directory mount and potentially
sandbox-bindmounts on the host.

Fixes: #1895

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
(cherry picked from commit 089a7484e1)
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-05-24 15:08:04 +02:00
Peng Tao
7086f91e1f runtime: sandbox delete should succeed after verifying sandbox state
Otherwise we might block delete and create orphan containers.

Fixes: #1039

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
(cherry picked from commit 35151f1786)
2021-05-14 09:41:38 +02:00
Snir Sheriber
c0bdba2350 runtime: make dialing timeout configurable
allow to set dialing timeout in configuration.toml
default is 30s

Fixes: #1789
(cherry-picked 01b56d6cbf)
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2021-05-12 14:17:34 +03:00
Hui Zhu
7f7c3fc8ec qemu.go: qemu: resizeMemory: Fix virtio-mem resize overflow issue
This commit change sizeByte from uint32 to uint64 to fix overflow issue.

Fixes: #1796

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-05-06 14:13:50 +08:00
Hui Zhu
c9053ea3fb qemu.go: qemu: setupVirtioMem: let sizeMB be multiple of 2Mib
Got:
FATA[0000] run pod sandbox: rpc error: code = Unknown desc = failed to
create containerd task: Add 189759MB virtio-mem-pci fail QMP command
failed: backend memory size must be multiple of 0x200000: unknown

This commit let sizeMB be multiple of 2Mib to fix the issue.

Fixes: #1796

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-05-06 14:13:48 +08:00
Eric Ernst
1c0d3afd55 Merge pull request #1754 from Jakob-Naucke/fix-virtiofs-s390x
virtcontainers: Fix virtio-fs on s390x
2021-04-30 09:28:12 -07:00
Fabiano Fidêncio
2e0221125a Merge pull request #1780 from likebreath/0429/clh_v15.0
versions: Upgrade to cloud-hypervisor v15.0
2021-04-30 18:20:36 +02:00
Fabiano Fidêncio
29fdfcfebc Merge pull request #1725 from liubin/liubin/1724-not-return-if-get-api-socket-failed
clh: return error if apiSocketPath failed
2021-04-30 18:16:45 +02:00
Fabiano Fidêncio
dc23adcd50 Merge pull request #1743 from alrs/fix-runtime-err
runtime: fix dropped error
2021-04-30 18:15:22 +02:00
Fabiano Fidêncio
bd486f7bf3 Merge pull request #1720 from ManaSugi/update-seccomp-spec
agent: Update seccomp configuration for errnoRet and flags
2021-04-30 10:52:42 +02:00
Bo Chen
1ca6bedf3e versions: Upgrade to cloud-hypervisor v15.0
Quotes from the cloud-hypervisor release v15.0:

This release is the first in a new version numbering scheme to represent that
we believe Cloud Hypervisor is maturing and entering a period of stability.
With this new release we are beginning our new stability guarantees.

Other highlights from the latest release include: 1) Network device rate
limiting; 2) Support for runtime control of `virtio-net` guest offload;
3) `--api-socket` supports file descriptor parameter; 4) Bug fixes on
`virtio-pmem`, PCI BARs alignment, `virtio-net`, etc.; 5) Deprecation of
the "LinuxBoot" protocol for ELF and bzImage in the coming release.

Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v15.0

Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by `openapi-generator` [1-2]. As the API changes do not
impact usages in Kata, no additional changes in kata's runtime are
needed to work with the current version of cloud-hypervisor.

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Fixes: #1779

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-04-29 10:56:22 -07:00
Jakob Naucke
3ee61776d6 virtcontainers: Enable virtio-fs on s390x
Allow and configure vhost-user-fs devices (virtio-fs) on s390x. As a
consequence, appendVhostUserDevice now takes a context, which affects
its signature for other architectures.

Fixes: #1753

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-04-29 09:54:08 +02:00
Jakob Naucke
adba4532a4 virtcontainers: Revert "virtcontainers: Allow s390x appendVhostUserDevice"
This reverts commit 7f60911333.
Patch allowed other vhost user devices besides FS not supported on s390x
and failed to attach a CCW device number, which results in the
inavailability to use more devices after vhost-user-fs-ccw.

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-04-29 09:43:33 +02:00
Eric Ernst
b20dff8027 Merge pull request #1759 from kata-containers/fix_update
Fix the issue that sandbox size is not right after update
2021-04-28 14:48:24 -07:00
Eric Ernst
23a8179184 Merge pull request #1756 from egernst/leave-no-virtiofs-behind
qemu: kill virtiofsd if failure to start VMM
2021-04-27 17:16:33 -07:00
Wainer dos Santos Moschetta
3677640811 runtime/virtcontainers: Fix typo on qmp error msg
"negotiate" was misspelled on qemu's qmp error message.

Fixes #1764
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2021-04-27 11:52:42 -04:00
Hui Zhu
0787ea8073 cgroupsCreate: not set resources to c.config.Resources
cgroupsCreate will just keep the CPU resources infomation but not the
others.
Set it to c.config.Resources will clean most of resources of the
container.

This commit remove it to handle the issue.

Fixes: #1758

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-04-27 16:44:30 +08:00
Hui Zhu
831224aa22 Sandbox: Fix ContainerConfig ptr in CreateContainer and createContainers
The pointer that send to newContainer in CreateContainer and
createContainers is not the pointer that point to the address in
s.config.Containers.

This commit fix this issue.

Fixes: #1758

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-04-27 16:44:22 +08:00
Eric Ernst
a57c8ab1be qemu: kill virtiofsd if failure to start VMM
If the QEMU VMM fails to launch, we currently fail to kill virtiofsd,
resulting in leftover processes running on the host. Let's make sure we
kill these, and explicitly cleanup the virtiofs socket on the
filesystem.

Ideally we'll migrate QEMU to utilize the same virtiofsd interface that
CLH uses, but let's fix this bug as a first step.

Fixes: #1755

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-04-26 21:07:20 -07:00
bin
0d0a520d42 clh: return error if apiSocketPath failed
If apiSocketPath failed, should return the error, but not nil

Fixes: #1724

Signed-off-by: bin <bin@hyper.sh>
2021-04-25 10:25:42 +08:00
Lars Lehtonen
fc6bb01a7f runtime: fix dropped error
Fixes: #212

Signed-off-by: Lars Lehtonen <lars.lehtonen@gmail.com>
2021-04-24 14:18:50 -07:00
Fabiano Fidêncio
fe2311cd4c Merge pull request #1739 from pmores/virtiofsd-extra-args-annotation-handling
add io.katacontainers.config.hypervisor.virtio_fs_extra_args handling
2021-04-23 23:22:01 +02:00
Pavel Mores
30ff6ee88b runtime: handle io.katacontainers.config.hypervisor.virtio_fs_extra_args
Users can specify extra arguments for virtiofsd in a pod spec using the
io.katacontainers.config.hypervisor.virtio_fs_extra_args annontation.
However, this annotation was ignored so far by the runtime.  This commit
fixes the issue by processing the annotation value (if present) and
translating it to the corresponding hypervisor configuration item.

Fixes #1523

Signed-off-by: Pavel Mores <pmores@redhat.com>
2021-04-23 21:09:28 +02:00
Fabiano Fidêncio
5eaf7a9982 Merge pull request #1049 from c3d/feature/1043-entropy-source-annotation
Entropy source annotation
2021-04-23 20:16:11 +02:00
Fabiano Fidêncio
b41d9a99b4 Merge pull request #1703 from lifupan/main_fix
fix the issue of missing set fsGroup for EphemeralStorage
2021-04-22 20:29:36 +02:00
Christophe de Dinechin
dcb9f40394 config: Protect annotation for entropy_source
It would be undesirable to be given an annotation like "/dev/null".
Filter out bad annotation values.

Fixes: #1043

Suggested-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2021-04-22 15:26:40 +02:00
fupan.lfp
628d55bf4c kata-agent: fix the issue of fsGroup missing
For k8s emptyDir volume, a specific fsGroup would
be set for it, thus runtime should pass this fsGroup
for EphemeralStorage to guest and set it properly on
the emptyDir volume in guest.

Fixes: #1580

Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
2021-04-22 21:08:52 +08:00
Chelsea Mafrica
1c222c75ac Merge pull request #1697 from jodh-intel/improve-agent-shutdown-handling
Improve agent shutdown handling
2021-04-20 21:25:36 -07:00
Manabu Sugimoto
81c5ff1231 agent: Update seccomp configuration for errnoRet and flags
Update:
- Make the type of errnoRet in oci.proto oneof
- Update seccomp_grpc_to_oci that can set errnoRet as EPREM if the
value is empty.
- Update the oci.pb.go based on the above fixes
- Add seccomp errnoRet and flags option to configs in rustjail

Fixes: #1719

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2021-04-21 12:16:58 +09:00
Fabiano Fidêncio
4c177b5c40 Merge pull request #1599 from Jakob-Naucke/virtiofs-s390x
Enable virtio-fs on s390x
2021-04-20 21:07:15 +02:00
Carlos Venegas
cd27308755 Merge pull request #1432 from dgibson/bug1431
block: Generate PCI path for virtio-blk devices on clh
2021-04-20 12:00:09 -05:00
Fabiano Fidêncio
9df86d28a5 Merge pull request #1678 from cmaf/remove-spans-healthcheck
runtime: Disable trace for healthcheck
2021-04-20 18:38:47 +02:00
Jakob Naucke
7f60911333 virtcontainers: Allow s390x appendVhostUserDevice
Remove the prohibition of vhost-user devices on s390x, which are by now
supported (e.g. vhost-user-fs-ccw). As a consequence,
appendVhostUserDevice no longer needs an error in its signature.
This enables virtio-fs support on s390x.

Fixes: #1469

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-04-20 12:20:32 +02:00
James O. D. Hunt
de2631e711 utils: Make WaitLocalProcess safer
Rather than relying on the system clock, use a channel timeout to avoid
problems if the system time changed.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-04-15 15:46:42 +01:00
James O. D. Hunt
9256e590dc shutdown: Don't sever console watcher too early
Fixed logic used to handle static agent tracing.

For a standard (untraced) hypervisor shutdown, the runtime kills the VM
process once the workload has finished. But if static agent tracing is
enabled, the agent running inside the VM is responsible for the
shutdown. The existing code handled this scenario but did not wait for
the hypervisor process to end. The outcome of this being that the
console watcher thread was killed too early.

Although not a problem for an untraced system, if static agent tracing
was enabled, the logs from the hypervisor would be truncated, missing the
crucial final stages of the agents shutdown sequence.

The fix necessitated adding a new parameter to the `stopSandbox()` API,
which if true requests the runtime hypervisor logic simply to wait for
the hypervisor process to exit rather than killing it.

Fixes: #1696.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-04-15 15:22:00 +01:00
James O. D. Hunt
51ab870091 utils: Improve WaitLocalProcess
Previously, the hypervisors were sending a signal and then checking to
see if the process had died by sending the magic null signal (`0`). However,
that doesn't work as it was written: the logic was assuming sending the
null signal to a process that was dead would return `ESRCH`, but it
doesn't: you first need to you `wait(2)` for the process before sending
that signal. This means that previously, all affected hypervisors would
appear to take `timeout` seconds to end, even though they had _already_
finished.

Now, the hypervisors true end time will be seen as we wait for the
processes before sending the null signal to ensure the process has
finished.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-04-15 14:51:06 +01:00
James O. D. Hunt
507ef6369e utils: Add waitLocalProcess function
Refactored some of the hypervisors to remove the duplicated code used to
trigger a shutdown.

Also added some unit tests.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-04-15 14:51:03 +01:00
David Gibson
1d5098de70 agent/block: Generate PCI path for virtio-blk devices on clh
Currently runtime and agent special case virtio-blk devices under clh,
ostensibly because the PCI address information is not available in that
case.

In fact, cloud-hypervisor's VmAddDiskPut API does return a PciDeviceInfo,
which includes a PCI address.  That API is broken, because PCI addressing
depends on guest (firmware or OS) actions that the hypervisor won't know
about.  clh only gets away with this because it only uses a single PCI root
and never uses PCI bridges, in which case the guest addresses are
accurately predictable: they always have domain and bus zero.

Until https://github.com/kata-containers/kata-containers/pull/1190, Kata
couldn't handle PCI addressing unless there was exactly one bridge, which
might be why this was actually special-cased for clh.

With #1190 merged, we can handle more general PCI paths, and we can derive
a trivial (one element) PCI path from the information that the clh API
gives us.  We can use that to remove this special case.

fixes #1431

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-04-13 13:29:24 +10:00