Commit Graph

1087 Commits

Author SHA1 Message Date
Jianyong Wu
5d86916e03 qemu/arm: remove nvdimm/"ReadOnly" option on arm64
There is a new "ReadOnly" option added to nvdimm device in qemu
and now added to kata. However, qemu used for arm64 is a little
old and has no this feature. Here we remove this feature for arm.

Depends-on: github.com/kata-containers/tests#3831
Fixes: #2320
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-08-17 15:33:32 +08:00
Fabiano Fidêncio
b52f719e4b Merge pull request #2366 from likebreath/0730/backport_clh_v17.0
stable-2.1 | versions: Upgrade to Cloud Hypervisor v17.0
2021-08-12 22:07:07 +02:00
Bo Chen
b6a8a71afa versions: Upgrade to Cloud Hypervisor v17.0
Highlights from the Cloud Hypervisor release v17.0: 1) ARM64 NUMA
support using ACPI; 2) `Seccomp` support for MSHV backend; 3) Hotplug of
macvtap devices; 4) Improved SGX support; 5) Inflight tracking for
`vhost-user` devices; 6) Bug fixes.

Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v17.0

Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by `openapi-generator` [1-2]. As the API changes do not
impact usages in Kata, no additional changes in kata's runtime are
needed to work with the current version of cloud-hypervisor.

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Fixes: #2333

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit cc0bb9aebc)
2021-08-11 10:52:58 -07:00
Bo Chen
87b7d054fb runtime: Remove the version check for cloud hypervisor
It looks like the version check for cloud hypervisor (clh) was added
initially when clh was actively evolving its API. We no longer need the
version check as clh API has been fairly stable for its recent releases.

Fixes: #1991

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit d08603bebb)
2021-08-11 10:48:37 -07:00
Anastassios Nanos
adcd2aeaea virtcontainers: fc: properly remove jailed block device
When running a firecracker instance jailed, block devices
are not removed correctly, as the jailerRoot path is not
stripped from the PATCH command sent to the FC API.

This patch differentiates the jailed case from the non-jailed
one and allows the firecracker instance to be properly
terminated.

Fixes #2387

Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
(cherry picked from commit 64dd35ba4f)
2021-08-09 16:16:12 +01:00
Julio Montes
b54ca3de6a vendor: update govmm
Bring read-only nvdimm support

Shortlog:
335fa81 qemu: fix golangci-lint errors
61b6378 .github/workflows: reimplement github actions CI
9d6e797 go: support go modules
0d21263 qemu: support read-only nvdimm
ff34d28 qemu: Consistent parameter building

Signed-off-by: Julio Montes <julio.montes@intel.com>
(Based-on: 070590fb53)
2021-08-04 16:35:12 +03:00
Julio Montes
d3bab50496 runtime: virtcontainers: make rootfs image read-only
Improve security by making rootfs image read-only, nobody
will be able to modify it from the guest.

fixes #1916

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-08-04 10:38:41 +03:00
fupan.lfp
0f7a54adf3 qemu: stop the virtiofsd specifically
We'd better stop the virtiofsd specifically after stop qemu,
instead of depending on the qemu's termination to notify virtiofsd
to exit.

Fixes: #2211

Signed-off-by: fupan.lfp <fupan.lfp@antgroup.com>
2021-08-02 10:30:40 +03:00
fupan.lfp
f3cf46a621 shimv2: fix the issue of leaking the hypervisor processes
Since we only send an shutdown qmp command to qemu when do
stopSandbox, and didn't wait until qemu process's exit, thus
we'd better to make sure it had exited when shimv2 terminated.
Thus here to do the last cleanup of the hypervisor.

Fixes: #2198

Signed-off-by: fupan.lfp <fupan.lfp@antgroup.com>
2021-08-02 10:30:40 +03:00
David Gibson
2e045fc105 agent: Fix to parsing of /proc/self/mountinfo
get_mounts() parses /proc/self/mountinfo in order to get the mountpoints
for various cgroup filesystems.  One of the entries in mountinfo is the
"device" for each filesystem, but for virtual filesystems like /proc, /sys
and cgroups, the device entry is arbitrary.  Depending on the exact rootfs
setup, it can end up being "-".

This breaks get_mounts() because it uses " - " as a separator.  There
really is a " - " separator in mountinfo, but in this case the device entry
shows up as a second one.  Fix this, by changing a split to a splitn, which
will effectively only consider the first " - " in the line.

While we're there, make the warning message more useful, by having it
actually show which line it wasn't able to parse.

fixes #2182

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-08-02 10:30:40 +03:00
Tim Zhang
aa4a3053a0 agent: enhance tests of execute_hook
Use which to find the full path of exe before run execute_hook
to avoid error: 'No such file or directory'

Fixes: #2172

Signed-off-by: Tim Zhang <tim@hyper.sh>
2021-08-02 10:30:40 +03:00
bin
0fd747c752 agent: update netlink libraries
Update rtnetlink to use crate.io to make cargo vendor work.
Add vendor/ to .gitignore.

Fixes: #2111

Signed-off-by: bin <bin@hyper.sh>
2021-08-02 10:30:40 +03:00
fupan.lfp
754e73ca99 shimv2: fix the issue of leaking wait goroutines
After create an container/exec successfully, containerd
would wait it immediately, and if start it failed, there
is no chance to send value to exitCh, thus the wait goroutine
would blocked for ever and had no chance to exit.

Fixes: #2087

Signed-off-by: fupan.lfp <fupan.lfp@antgroup.com>
2021-08-02 10:30:40 +03:00
Julio Montes
8138a16b8b runtime: do not hot-remove PMEM devices
PMEM devices cannot be hot-removed from a running VM.

fixes #2018

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-08-02 10:30:40 +03:00
Jakob Naucke
d1de06c9ea virtcontainers: Don't fail memory hotplug
Architectures that do not support memory hotplugging will fail when
memory limits are set because that amount is hotplugged. Issue a warning
instead. The long-term solution is virtio-mem.

Fixes: #1412
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-08-02 10:30:40 +03:00
Tim Zhang
8dbf4f3ade agent: Upgrade mio to v0.7.13 to fix epoll_fd leak problem
Fixes: #2035
Fixes: tokio-rs/tokio/#3809

Signed-off-by: Tim Zhang <tim@hyper.sh>
2021-08-02 10:30:40 +03:00
Julien Ropé
962cb94c6b runtime: report finish time in containers stats
Make sure we report the exit time for the container when we answer a "Status" request.

Fixes: #2098

Signed-off-by: Julien Ropé <jrope@redhat.com>
2021-06-23 17:51:33 +02:00
Gabriela Cervantes
2162422bd9 docs: Update url for installation guides
This PR updates the correct url for kata installation guides in kata 2.x

Fixes #2069

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2021-06-21 20:09:53 +00:00
Bo Chen
b35506d124 versions: Upgrade to cloud-hypervisor v16.0
Highlights from the Cloud Hypervisor release v16.0: 1) Improved live
migration support; 2) Improved `vhost-user` support; 3) ARM64 ACPI and
UEFI support; 4) Bug fixes.

Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v16.0

Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by `openapi-generator` [1-2]. As the API changes do not
impact usages in Kata, no additional changes in kata's runtime are
needed to work with the current version of cloud-hypervisor.

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Fixes: #1992

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 85c40001da)
2021-06-15 10:26:09 -07:00
Gabriela Cervantes
d761bb22e7 docs: Update README for runtime documentation
This PR removes old links that were used in kata 1.x but not
longer valid for kata 2.x

Fixes #2019

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2021-06-14 16:44:23 -05:00
Shengjing Zhu
db8d853b99 runtime: remove covertool from cli test
covertool has no active since 2018 and is not compatible with go1.16

  ../vendor/github.com/dlespiau/covertool/pkg/cover/cover.go:76:29: cannot use f (type dummyTestDeps) as type testing.testDeps in argument to testing.MainStart:
  dummyTestDeps does not implement testing.testDeps (missing SetPanicOnExit0 method)

Fixes: #1862

Signed-off-by: Shengjing Zhu <zhsj@debian.org>
(cherry picked from commit 1b60705646)
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-06-08 10:00:51 +02:00
Tim Zhang
1cc2ad3f34 agent: Fix fd leak caused by netlink
See also: little-dude/netlink#165

Fixes: #1952

Because the author of netlink has no time to maintain the crate
(https://github.com/little-dude/netlink/issues/161), so we
need to switch the dependency to github temporarily.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2021-06-03 17:24:09 +08:00
Tim Zhang
ac34f6dfd9 agent: Upgrade tokio-vsock to fix fd leak of vsock socket
Fixes: #1950

The further information: rust-vsock/vsock-rs#15

Signed-off-by: Tim Zhang <tim@hyper.sh>
2021-06-03 10:29:33 +08:00
Julio Montes
ff206cf6cf Merge pull request #1946 from fidencio/wip/weekly-backports-to-stable-2.1
[stable-2.1] Weekly backports to stable-2.1 branch, May 31st 2021
2021-06-01 16:00:42 -05:00
fupan.lfp
915fea7b1f cgroup: fix the issue of set mem.limit and mem.swap
When update memory limit, we should adapt the write sequence
for memory and swap memory, so it won't fail because
the new value and the old value don't fit kernel's
validation.

Fixes: #1917

Signed-off-by: fupan.lfp <fupan.lfp@antgroup.com>
(cherry picked from commit 30f4834c5b)
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-05-31 16:56:15 +02:00
fupan.lfp
a05e137710 agent: re-enable the standard SIGPIPE behavior
The Rust standard library had suppressed the default SIGPIPE
behavior, see https://github.com/rust-lang/rust/pull/13158.
Since the parent's signal handler would be inherited by it's child
process, thus we should re-enable the standard SIGPIPE behavior as a
workaround.

Fixes: #1887

Signed-off-by: fupan.lfp <fupan.lfp@antgroup.com>
(cherry picked from commit 0ae364c8eb)
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-05-31 16:56:15 +02:00
bin
8019f7322d virtiofsd: Fix file descriptors leak and return correct PID
This commit will fix two problems:
- Virtiofsd process ID returned to the caller will always be 0,
   the pid var is never being assigned a value.
- Socket listen fd may leak in case of failure of starting virtiofsd process.
  This is a port of be9ca0d58b

Fixes: #1931

Signed-off-by: bin <bin@hyper.sh>
(cherry picked from commit 773deca2f6)
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-05-31 16:56:15 +02:00
bin
e48c9d426d runtime: and cgroup and SandboxCgroupOnly check for check sub-command
In kata-runtime check sub-command, checks cgroups and SandboxCgroupOnly
to show message if the SandboxCgroupOnly is not set to true
and cgroup v2 is used.

Fixes: #1927

Signed-off-by: bin <bin@hyper.sh>
2021-05-28 16:36:49 +08:00
quanweiZhou
7874ab33d4 agent: fix start container failed when dropping all capabilities
When starting a container and dropping all capabilities,
the init child process has no permission to read the exec.fifo
file because the parent set the file mode 0o622. So change the exec.fifo file mode to 0o644.

fixes #1913

Signed-off-by: quanweiZhou <quanweiZhou@linux.alibaba.com>
(cherry picked from commit 3e4ebe10ac)
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-05-24 15:08:15 +02:00
Eric Ernst
c51891fee7 sandbox-bindmount: persist mount information
Without this, if the shim dies, we will not have a reliable way to
identify what mounts should be cleaned up if `containerd-shim-kata-v2
cleanup` is called for the sandbox.

Before this, if you `ctr run` with a sandbox bindmount defined and SIGKILL the
containerd-shim-kata-v2, you'll notice the sandbox bindmount left on
host.

With this change, the shim is able to get the sandbox bindmount
information from disk and do the appropriate cleanup.

Fixes #1896

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
(cherry picked from commit 7f1030d303)
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-05-24 15:08:08 +02:00
Eric Ernst
b137c7ac33 sandbox: Cleanup if failure to setup sandbox-bindmount occurs
If for any reason there's an error when trying to setup the sandbox
bindmounts, make sure we roll back any mounts already created when
setting up the sandbox.

Without this, we'd leave shared directory mount and potentially
sandbox-bindmounts on the host.

Fixes: #1895

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
(cherry picked from commit 089a7484e1)
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-05-24 15:08:04 +02:00
fupan.lfp
9266c2460a rustjail: separated the propagation flags from mount flags
Since the propagation flags couldn't be combinted with the
standard mount flags, and they should be used with the remount,
thus it's better to split them from the standard mount flags.

Fixes: #1699

Signed-off-by: fupan.lfp <fupan.lfp@antgroup.com>
(cherry picked from commit e5fe572f51)
2021-05-14 09:42:00 +02:00
Peng Tao
7086f91e1f runtime: sandbox delete should succeed after verifying sandbox state
Otherwise we might block delete and create orphan containers.

Fixes: #1039

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
(cherry picked from commit 35151f1786)
2021-05-14 09:41:38 +02:00
Chelsea Mafrica
9a32a3e16d Merge pull request #1835 from snir911/backport_configure_timeout
stable-2.1 | runtime: make dialing timeout configurable
2021-05-12 13:14:37 -07:00
Fabiano Fidêncio
123f7d53cb Merge pull request #1830 from Tim-Zhang/fix-reap-for-stable-2.1
stable-2.1 | agent: avoid reaping the exit signal of execute_hook in the reaper
2021-05-12 20:26:42 +02:00
Snir Sheriber
c0bdba2350 runtime: make dialing timeout configurable
allow to set dialing timeout in configuration.toml
default is 30s

Fixes: #1789
(cherry-picked 01b56d6cbf)
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2021-05-12 14:17:34 +03:00
Eric Ernst
1b3cf2fb7d kata-monitor: export get stats for sandbox
Gathering stats for a given sandbox is pretty useful; let's export a
function from katamonitor pkg to do this.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
(cherry picked from commit 3787306107)
2021-05-12 11:44:58 +02:00
Eric Ernst
59b9e5d0f8 kata-runtime: add metrics command
For easier debug, let's add subcommand to kata-runtime for gathering
metrics associated with a given sandbox.

kata-runtime metrics --sandbox-id foobar

Fixes: #1815

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
(cherry picked from commit 8068a4692f)
2021-05-12 11:44:53 +02:00
Tim Zhang
828a304883 agent: avoid reaping the exit signal of execute_hook in the reaper
Fixes: #1826

Signed-off-by: Tim Zhang <tim@hyper.sh>
2021-05-12 16:33:44 +08:00
Eric Ernst
d3690952e6 runtime: shim: dedup client, socket addr code
(1) Add an accessor function, SocketAddress, to the shim-v2 code for
determining the shim's abstract domain socket address, given the sandbox
ID.

(2) In kata monitor, create a function, BuildShimClient, for obtaining the appropriate
http.Client for communicating with the shim's monitoring endpoint.

(3) Update the kata CLI and kata-monitor code to make use of these.

(4) Migrate some kata monitor methods to be functions, in order to ease
future reuse.

(5) drop unused namespace from functions where it is no longer needed.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
(cherry picked from commit 3caed6f88d)
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-05-10 15:35:53 +02:00
Fabiano Fidêncio
7f7c794da4 runtime: Short the shim-monitor path
Instead of having something like
"/containerd-shim/$namespace/$sandboxID/shim-monitor.sock", let's change
the approach to:
* create the file in a more neutral location "/run/vc", instead of
  "/containerd-shim";
* drop the namespace, as the sandboxID should be unique;
* remove ".sock" from the socket name.

This will result on a name that looks like:
"/run/vc/$sandboxID/shim-monitor"

Fixes: #497

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit 4bc006c8a4)
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-05-10 15:35:47 +02:00
bin
3f1b7c9127 cli: delete tracing code for kata-runtime binary
There are no pod/container operations in kata-runtime binary,
tracing in this package is meaningless.

Fixes: #1748

Signed-off-by: bin <bin@hyper.sh>
(cherry picked from commit 13c23fec11)
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-05-10 15:35:36 +02:00
Snir Sheriber
68cad37720 agent: Set fixed NOFILE limit value for kata-agent
Some applications may fail if NOFILE limit is set to unlimited.
Although in some environments this value is explicitly overridden,
lets set it to a more sane value in case it doesn't.

Fixes #1715
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
(cherry picked from commit a188577ebf)
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-05-10 15:34:28 +02:00
bin
7c9067cc9d docs: add per-Pod Kata configurations for enable_pprof
Now enabling enable_pprof for individual pods is supported,
but not documented.

This commit will add per-Pod Kata configurations for `enable_pprof`
in file `docs/how-to/how-to-set-sandbox-config-kata.md`

Fixes: #1744

Signed-off-by: bin <bin@hyper.sh>
(cherry picked from commit 95e54e3f48)
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-05-10 15:34:05 +02:00
Tim Zhang
0e2df80bda Merge pull request #1814 from liubin/fix/1804-select-sandbox-ctx
[backport]runtime: use s.ctx instead ctx for checking cancellation
2021-05-07 19:43:14 +08:00
bin
79831fafaf runtime: use s.ctx instead ctx for checking cancellation
s.ctx should be used for checking cancellation, and the
local ctx is used for tracing.

Fixes: #1804

Signed-off-by: bin <bin@hyper.sh>
2021-05-06 17:22:53 +08:00
Hui Zhu
7f7c3fc8ec qemu.go: qemu: resizeMemory: Fix virtio-mem resize overflow issue
This commit change sizeByte from uint32 to uint64 to fix overflow issue.

Fixes: #1796

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-05-06 14:13:50 +08:00
Hui Zhu
c9053ea3fb qemu.go: qemu: setupVirtioMem: let sizeMB be multiple of 2Mib
Got:
FATA[0000] run pod sandbox: rpc error: code = Unknown desc = failed to
create containerd task: Add 189759MB virtio-mem-pci fail QMP command
failed: backend memory size must be multiple of 0x200000: unknown

This commit let sizeMB be multiple of 2Mib to fix the issue.

Fixes: #1796

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-05-06 14:13:48 +08:00
Eric Ernst
1c0d3afd55 Merge pull request #1754 from Jakob-Naucke/fix-virtiofs-s390x
virtcontainers: Fix virtio-fs on s390x
2021-04-30 09:28:12 -07:00
Fabiano Fidêncio
2e0221125a Merge pull request #1780 from likebreath/0429/clh_v15.0
versions: Upgrade to cloud-hypervisor v15.0
2021-04-30 18:20:36 +02:00