Commit Graph

350 Commits

Author SHA1 Message Date
zhaojizhuang
ca02c9f512 runtime: add reconnect timeout for vhost user block
Fixes: #6075
Signed-off-by: zhaojizhuang <571130360@qq.com>
2023-02-13 14:33:46 +08:00
Bin Liu
95602c8c08 Merge pull request #5999 from yaoyinnan/5998/feat/cgroup-metrics
runtime: support cgroup v2 metrics marshal guest metrics
2023-02-11 19:26:24 +08:00
Bin Liu
ecbd94d80c Merge pull request #6064 from yaoyinnan/6063/feat/rootfs-erofs
rootfs: support EROFS filesystem
2023-02-11 11:10:23 +08:00
yaoyinnan
bdf20b5d26 rootfs: support EROFS filesystem
For kata containers, rootfs is used in the read-only way.
EROFS can noticably decrease metadata overhead.

On the basis of supporting the EROFS file system, it supports using the config parameter to switch the file system used by rootfs.

Fixes: #6063

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
2023-02-11 00:44:13 +08:00
GabyCT
86501d5f6f Merge pull request #6200 from gkurz/improve-appendFDs-doc
runtime: Improve documentation of appendFDs
2023-02-09 15:50:37 -06:00
yaoyinnan
01765e1734 runtime: support cgroup v2 metrics marshal guest metrics
Support to use cgroup v2 metrics marshal guest metrics.

Fixes: #5998

Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
2023-02-09 19:14:09 +08:00
Bin Liu
56071c6e7b virtiofsd: change cache mod to const
Change cache mod from literal to const and place them in one place.

Also set default cache mode from `none` to `never` in
`pkg/katautils/config-settings.go.in`.

Fixes: #6151

Signed-off-by: Bin Liu <bin@hyper.sh>
2023-02-08 15:06:52 +08:00
Bin Liu
71a3b73cb0 Merge pull request #6223 from d3c3mber/rm-unused-shim-config
runtime: remove not used shim configurations
2023-02-08 10:00:52 +08:00
d3c3mber
390916b33c runtime: remove not used shim configurations
ShimPath and ShimDebug are not needed anymore.

Fixes: #6147

Signed-off-by: d3c3mber <tangbo_gl_2022@163.com>
2023-02-07 14:06:12 +08:00
GabyCT
7fc35f19eb Merge pull request #6056 from jongwu/perm_deny
arm64/CI: fix unit test failure on arm64
2023-02-03 10:53:38 -06:00
Jianyong Wu
59f104c022 runtime: skip unit test that fail regularly on aarch64
There are lots of unit test cases fails regularly on aarch64, including
TestIOCopy, create_tmpfs. Temporarily skip it for now and enable it
after them get fixed.

Fixes: #6194
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-02-03 11:34:39 +08:00
Greg Kurz
3c48f2202c runtime: Improve documentation of appendFDs
The cmd.ExtraFiles feature that is used to implement appendFDs takes an
array of arbitray file descriptors and internally renumbers them to be
consecutive starting from 3, using dup2().

This isn't especially obvious : document it for the sake of clarity.

Fixes #6199

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-02-02 12:52:10 +01:00
Peng Tao
a34f36f8f4 Merge pull request #6149 from openanolis/fix_kata_runtime
runtime:fix stat uds path
2023-02-02 11:00:07 +08:00
Greg Kurz
334c4b8bdc runtime: Drop QEMU log file support
The QEMU log file is essentially about fine grain tracing of QEMU
internals and mostly useful for developpers, not production. Notably,
the log file isn't limited in size, nor rotated in any way. It means
that a container running in the VM could possibly flood the log file
with a guest triggerable trace. For example, on openshift, the log
file is supposed to reside on a per-VM 14 GiB tmpfs mount. This means
that each pod running with the kata runtime could potentially consume
this amount of host RAM which is not acceptable.

Error messages are best collected from QEMU's stderr as kata is doing
now since PR #5736 was merged. Drop support for the QEMU log file
because it doesn't bring any value but can certainly do harm.

Fixes #6173

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-01-31 09:20:29 +01:00
Zhongtao Hu
1e531b44dc runtime:fix stat uds path
os.Stat("unix:///run/vc/sbs/sid/shim-monitor.sock") will fail,
should be os.Stat("/run/vc/sbs/sid/shim-monitor.sock")

Fixes:#6148
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-01-29 15:08:13 +08:00
zhaojizhuang
9092c23a2e runtime: Add hmp for qemu
Fixes: #6092
Signed-off-by: zhaojizhuang <571130360@qq.com>
2023-01-29 14:22:04 +08:00
Greg Kurz
af125b1498 Merge pull request #5736 from gkurz/no-qemu-daemonize
runtime: Start QEMU undaemonized and get logs
2023-01-27 16:33:48 +01:00
Greg Kurz
39fe4a4b6f runtime: Collect QEMU's stderr
LaunchQemu now connects a pipe to QEMU's stderr and makes it
usable by callers through a Go io.ReadCloser object. As
explained in [0], all messages should be read from the pipe
before calling cmd.Wait : introduce a LogAndWait helper to handle
that.

Fixes #5780

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-01-24 23:09:17 +01:00
Greg Kurz
bf4e3a618f runtime: Launch QEMU with cmd.Start()
LaunchCustomQemu() currently starts QEMU with cmd.Run() which is
supposed to block until the child process terminates. This assumes
that QEMU daemonizes itself, otherwise LaunchCustomQemu() would
block forever. The virtcontainers package indeed enables the
Daemonize knob in the configuration but having such an implicit
dependency on a supposedly configurable setting is ugly and fragile.

cmd.Run() is :

func (c *Cmd) Run() error {
	if err := c.Start(); err != nil {
		return err
	}
	return c.Wait()
}

Let's open-code this : govmm calls cmd.Start() and returns the
cmd to virtcontainers which calls cmd.Wait().

If QEMU doesn't start, e.g. missing binary, there won't be any
errors to collect from QEMU output. Just drop these lines in govmm.
Similarily there won't be any log file to read from in virtcontainers.
Drop that as well.

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-01-24 23:09:11 +01:00
Greg Kurz
8a4f08cb0f govmm: Optionally pass QMP listener to QEMU
QEMU's -qmp option can be passed the file descriptor of a socket that
is already in listening mode. This is done with by passing `fd=XXX`
to `-qmp` instead of a path. Note that these two options are mutually
exclusive : QEMU errors out if both are passed, so we check that as
well in the validation function.

While here add the `path=` stanza in the path based case for clarity.

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-01-24 23:08:48 +01:00
Greg Kurz
219bb8e7d0 govmm: Optionally start QMP with a pre-configured connection
When QEMU is launched daemonized, we have the guarantee that the
QMP socket is available. In order to launch a non-daemonized QEMU,
the QMP connection should be created before QEMU is started in order
to avoid a race. Introduce a variant of QMPStart() that can use such
an existing connection.

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-01-24 19:16:47 +01:00
Peng Tao
7d1a604bad Merge pull request #6060 from ls-ggg/6055/service.mu-deadlock
runtime:all APIs are hang in the service.mu
2023-01-18 10:50:00 +08:00
Bin Liu
790f45190b Merge pull request #6074 from zhaojizhuang/enablevhostuserstore
runtime: paas enablevhostuserstore annotation to hypervisor config
2023-01-17 11:43:43 +08:00
ls
69fc8de712 runtime:all APIs are hang in the service.mu
When the vmm process exits abnormally, a goroutine sets s.monitor
to null in the 'watchSandbox' function without getting service.mu,
This will cause another goroutine to block when sending a message
to s.monitor, and it holds service.mu, which leads to a deadlock.
For example, the wait function in the file
.../pkg/containerd-shim-v2/wait.go will send a message to s.monitor
after obtaining service.mu, but s.monitor may be null at this time

Fixes: #6059

Signed-off-by: ls <335814617@qq.com>
2023-01-16 14:45:37 +08:00
Eric Ernst
458fe865ea Merge pull request #6052 from egernst/add-darwin-skeletons
Add darwin skeletons
2023-01-13 13:14:16 -08:00
zhaojizhuang
cf1bae3521 runtime: paas enablevhostuserstore annotation to hypervisor config
Fixes: #6073
Signed-off-by: zhaojizhuang <571130360@qq.com>
2023-01-13 17:07:38 +08:00
Samuel Ortiz
a9626682af virtcontainers: resourcecontrol: Add skeleton for Darwin
Cgroups do not exist on Darwin, so use an empty implementation for
resourcecontrol for the time being. In the process, ensure that the
utilized cgroup handling (ie, isSystemdCgroup) is kept in general file,
since we use this to help assess/constrain the container spec we pass to
the guest.

Fixes: #6051

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2023-01-12 15:53:28 -08:00
Eric Ernst
6ee550e9a5 runtime: vCPUs pinning is sandbox specific, not hypervisor
While at it, make sure we persist this and fix a misc typo.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2023-01-12 15:44:25 -08:00
Eric Ernst
f137048be3 resource-control: add helper function for setting CPU affinity
Let's abstract the CPU affinity

Fixes: #6044

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2023-01-11 17:55:53 -08:00
Bin Liu
86a82cace9 runtime: change cache mode from none to never
New Rust virtiofsd's `cache` mode doesn't support `none` mode,
we should use `never` to replace it.

Fixes: #6018

Signed-off-by: Bin Liu <bin@hyper.sh>
2023-01-10 17:29:48 +08:00
Bin Liu
2c10b37172 Merge pull request #5991 from dcantah/darwin-sigs
runtime: Define Darwin handled signals list
2023-01-07 11:19:48 +08:00
Fabiano Fidêncio
175794458f Merge pull request #5972 from bergwolf/github/hook
fix moby prestart hook handling
2023-01-06 14:54:39 +01:00
Samuel Ortiz
3b4420eb8e runtime: Define Darwin handled signals list
Fixes: #5990

Some signals may not be defined on non Linux host OSes, like
SIGSTKFLT for example. It's also not defined on certain architectures,
but irrelevant for this.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
2023-01-05 17:50:47 -08:00
Danny Canter
24b05a99b6 schedcore: Make buildable on !linux
Fixes: #5983

sched-core only makes sense on Linux hosts. Let's add stub/error for
other platforms.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
2023-01-05 11:51:04 -08:00
Peng Tao
d085389127 vc: fix up UT for CreateSandbox API change
Need to adapt the UT as well.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-01-03 22:30:42 +08:00
Peng Tao
578a9c25f0 vc: rescan network endpoints after running prestart hooks
Moby relies on the prestart hooks to configure network endpoints. We
should rescan the netns after running them so that the newly added
endpoints can be found and plugged to the guest.

Fixes: #5941
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-01-03 22:30:41 +08:00
Peng Tao
cb84b0fb02 katautils: run prestart hooks after starting VM
So that we can pass the hypervisor pid to the hook instead of the
runtime process's.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-01-03 10:52:32 +00:00
Zhongtao Hu
dae6670628 kata-runtime: add rust runtime path for kata-runtime exec
add rust runtime path for kata-runtime exec

Fixes:#5963
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-12-30 13:34:34 +08:00
Binbin Zhang
99485d871c shim: return hypervisor's pid not shim's pid
update outdated code comments

Fixes: #3234

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2022-12-14 11:16:11 +08:00
Manabu Sugimoto
c617bbe70d runtime: Pass SELinux policy for containers to the agent
Pass SELinux policy for containers to the agent if `disable_guest_selinux`
is set to `false` in the runtime configuration. The `container_t` type
is applied to the container process inside the guest by default.
Users can also set a custom SELinux policy to the container process using
`guest_selinux_label` in the runtime configuration. This will be an
alternative configuration of Kubernetes' security context for SELinux
because users cannot specify the policy in Kata through Kubernetes's security
context. To apply SELinux policy to the container, the guest rootfs must
be CentOS that is created and built with `SELINUX=yes`.

Fixes: #4812

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2022-11-29 19:07:56 +09:00
Bin Liu
1dfd845f51 runtime: go fix code for 1.19
We have starting to use golang 1.19, some features are
not supported later, so run `go fix` to fix them.

Fixes: #5750

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-11-25 11:29:18 +08:00
Bin Liu
06a604b753 Merge pull request #5720 from YchauWang/wyc-docs-test-22
runtime: add log record to the qemu config method `appendDevices` for…
2022-11-24 13:15:06 +08:00
wangyongchao.bj
30a7ebf430 runtime: Log invalid devices in QEMU config
When the user tried to add new devices to the VM, there is no error info for the invalid
 device. This PR adds a log record to the `appendDevices` for the invalid device of the
 qemu config.

Fixes: #5719

Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
2022-11-23 09:09:45 +08:00
Fabiano Fidêncio
df3d9878d5 Merge pull request #5695 from darfux/virtiofs-queue-size
runtime: Support virtiofs queue size for qemu and make it configurable
2022-11-22 20:04:30 +01:00
liyuxuan.darfux
3bb145c63a runtime: Support virtiofs queue size for qemu and make it configurable
The default vhost-user-fs queue-size of qemu is 128 now. Set it to 1024
by default which is same as clh. Also make this value configurable.

Fixes: #5694

Signed-off-by: liyuxuan.darfux <liyuxuan.darfux@bytedance.com>
2022-11-19 15:38:11 +08:00
Fabiano Fidêncio
d94718fb30 runtime: Fix gofmt issues
It seems that bumping the version of golang and golangci-lint new format
changes are required.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-11-17 14:16:12 +01:00
Fabiano Fidêncio
16b8375095 golang: Stop using io/ioutils
The package has been deprecated as part of 1.16 and the same
functionality is now provided by either the io or the os package.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-11-17 13:43:25 +01:00
LitFlwr0
2508d39b7c runtime: added vcpus pinning logics
Core VCPU threads pinning logics for issue 4476. Also provided docs.

Fixes:#4476
Signed-off-by: LitFlwr0 <861690705@qq.com>
2022-11-04 17:52:42 +08:00
snir911
288e337a6f Merge pull request #5434 from Rouzip/remove-doNetNS
add EnterNetNS in virtcontainers
2022-10-30 11:19:07 +02:00
Fabiano Fidêncio
190e623c40 Merge pull request #5317 from Champ-Goblem/fix-containerd-stats
shim: Ensure pagesize is set when reporting hugetlb stats
2022-10-24 10:24:49 +02:00