Commit Graph

329 Commits

Author SHA1 Message Date
Peng Tao
7d1a604bad Merge pull request #6060 from ls-ggg/6055/service.mu-deadlock
runtime:all APIs are hang in the service.mu
2023-01-18 10:50:00 +08:00
Bin Liu
790f45190b Merge pull request #6074 from zhaojizhuang/enablevhostuserstore
runtime: paas enablevhostuserstore annotation to hypervisor config
2023-01-17 11:43:43 +08:00
ls
69fc8de712 runtime:all APIs are hang in the service.mu
When the vmm process exits abnormally, a goroutine sets s.monitor
to null in the 'watchSandbox' function without getting service.mu,
This will cause another goroutine to block when sending a message
to s.monitor, and it holds service.mu, which leads to a deadlock.
For example, the wait function in the file
.../pkg/containerd-shim-v2/wait.go will send a message to s.monitor
after obtaining service.mu, but s.monitor may be null at this time

Fixes: #6059

Signed-off-by: ls <335814617@qq.com>
2023-01-16 14:45:37 +08:00
Eric Ernst
458fe865ea Merge pull request #6052 from egernst/add-darwin-skeletons
Add darwin skeletons
2023-01-13 13:14:16 -08:00
zhaojizhuang
cf1bae3521 runtime: paas enablevhostuserstore annotation to hypervisor config
Fixes: #6073
Signed-off-by: zhaojizhuang <571130360@qq.com>
2023-01-13 17:07:38 +08:00
Samuel Ortiz
a9626682af virtcontainers: resourcecontrol: Add skeleton for Darwin
Cgroups do not exist on Darwin, so use an empty implementation for
resourcecontrol for the time being. In the process, ensure that the
utilized cgroup handling (ie, isSystemdCgroup) is kept in general file,
since we use this to help assess/constrain the container spec we pass to
the guest.

Fixes: #6051

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2023-01-12 15:53:28 -08:00
Eric Ernst
6ee550e9a5 runtime: vCPUs pinning is sandbox specific, not hypervisor
While at it, make sure we persist this and fix a misc typo.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2023-01-12 15:44:25 -08:00
Eric Ernst
f137048be3 resource-control: add helper function for setting CPU affinity
Let's abstract the CPU affinity

Fixes: #6044

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2023-01-11 17:55:53 -08:00
Bin Liu
86a82cace9 runtime: change cache mode from none to never
New Rust virtiofsd's `cache` mode doesn't support `none` mode,
we should use `never` to replace it.

Fixes: #6018

Signed-off-by: Bin Liu <bin@hyper.sh>
2023-01-10 17:29:48 +08:00
Bin Liu
2c10b37172 Merge pull request #5991 from dcantah/darwin-sigs
runtime: Define Darwin handled signals list
2023-01-07 11:19:48 +08:00
Fabiano Fidêncio
175794458f Merge pull request #5972 from bergwolf/github/hook
fix moby prestart hook handling
2023-01-06 14:54:39 +01:00
Samuel Ortiz
3b4420eb8e runtime: Define Darwin handled signals list
Fixes: #5990

Some signals may not be defined on non Linux host OSes, like
SIGSTKFLT for example. It's also not defined on certain architectures,
but irrelevant for this.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
2023-01-05 17:50:47 -08:00
Danny Canter
24b05a99b6 schedcore: Make buildable on !linux
Fixes: #5983

sched-core only makes sense on Linux hosts. Let's add stub/error for
other platforms.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
2023-01-05 11:51:04 -08:00
Peng Tao
d085389127 vc: fix up UT for CreateSandbox API change
Need to adapt the UT as well.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-01-03 22:30:42 +08:00
Peng Tao
578a9c25f0 vc: rescan network endpoints after running prestart hooks
Moby relies on the prestart hooks to configure network endpoints. We
should rescan the netns after running them so that the newly added
endpoints can be found and plugged to the guest.

Fixes: #5941
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-01-03 22:30:41 +08:00
Peng Tao
cb84b0fb02 katautils: run prestart hooks after starting VM
So that we can pass the hypervisor pid to the hook instead of the
runtime process's.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-01-03 10:52:32 +00:00
Zhongtao Hu
dae6670628 kata-runtime: add rust runtime path for kata-runtime exec
add rust runtime path for kata-runtime exec

Fixes:#5963
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-12-30 13:34:34 +08:00
Binbin Zhang
99485d871c shim: return hypervisor's pid not shim's pid
update outdated code comments

Fixes: #3234

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2022-12-14 11:16:11 +08:00
Manabu Sugimoto
c617bbe70d runtime: Pass SELinux policy for containers to the agent
Pass SELinux policy for containers to the agent if `disable_guest_selinux`
is set to `false` in the runtime configuration. The `container_t` type
is applied to the container process inside the guest by default.
Users can also set a custom SELinux policy to the container process using
`guest_selinux_label` in the runtime configuration. This will be an
alternative configuration of Kubernetes' security context for SELinux
because users cannot specify the policy in Kata through Kubernetes's security
context. To apply SELinux policy to the container, the guest rootfs must
be CentOS that is created and built with `SELINUX=yes`.

Fixes: #4812

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2022-11-29 19:07:56 +09:00
Bin Liu
1dfd845f51 runtime: go fix code for 1.19
We have starting to use golang 1.19, some features are
not supported later, so run `go fix` to fix them.

Fixes: #5750

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-11-25 11:29:18 +08:00
Bin Liu
06a604b753 Merge pull request #5720 from YchauWang/wyc-docs-test-22
runtime: add log record to the qemu config method `appendDevices` for…
2022-11-24 13:15:06 +08:00
wangyongchao.bj
30a7ebf430 runtime: Log invalid devices in QEMU config
When the user tried to add new devices to the VM, there is no error info for the invalid
 device. This PR adds a log record to the `appendDevices` for the invalid device of the
 qemu config.

Fixes: #5719

Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
2022-11-23 09:09:45 +08:00
Fabiano Fidêncio
df3d9878d5 Merge pull request #5695 from darfux/virtiofs-queue-size
runtime: Support virtiofs queue size for qemu and make it configurable
2022-11-22 20:04:30 +01:00
liyuxuan.darfux
3bb145c63a runtime: Support virtiofs queue size for qemu and make it configurable
The default vhost-user-fs queue-size of qemu is 128 now. Set it to 1024
by default which is same as clh. Also make this value configurable.

Fixes: #5694

Signed-off-by: liyuxuan.darfux <liyuxuan.darfux@bytedance.com>
2022-11-19 15:38:11 +08:00
Fabiano Fidêncio
d94718fb30 runtime: Fix gofmt issues
It seems that bumping the version of golang and golangci-lint new format
changes are required.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-11-17 14:16:12 +01:00
Fabiano Fidêncio
16b8375095 golang: Stop using io/ioutils
The package has been deprecated as part of 1.16 and the same
functionality is now provided by either the io or the os package.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-11-17 13:43:25 +01:00
LitFlwr0
2508d39b7c runtime: added vcpus pinning logics
Core VCPU threads pinning logics for issue 4476. Also provided docs.

Fixes:#4476
Signed-off-by: LitFlwr0 <861690705@qq.com>
2022-11-04 17:52:42 +08:00
snir911
288e337a6f Merge pull request #5434 from Rouzip/remove-doNetNS
add EnterNetNS in virtcontainers
2022-10-30 11:19:07 +02:00
Fabiano Fidêncio
190e623c40 Merge pull request #5317 from Champ-Goblem/fix-containerd-stats
shim: Ensure pagesize is set when reporting hugetlb stats
2022-10-24 10:24:49 +02:00
Rouzip
39363ffbfb runtime: remove same function
Add EnterNetNS in virtcontainers to remove same function.

FIXes #5394

Signed-off-by: Rouzip <1226015390@qq.com>
2022-10-17 10:59:13 +08:00
Vijay Dhanraj
435c8f181a acrn: Enable ACRN hypervisor support for Kata 2.x release
Currently ACRN hypervisor support in Kata2.x releases is broken.
This commit re-enables ACRN hypervisor support and also refactors
the code so as to remove dependency on Sandbox.

Fixes #3027

Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
2022-10-07 07:40:32 -07:00
Champ-Goblem
89e62d4edf shim: Ensure pagesize is set when reporting hugetbl stats
The containerd stats method and metrics API are broken with Kata 2.5.x, the stats fail to load and the metrics API responds with status code 500

This seems to be down to the conversion from the stats reported by the agent RPC `StatsContainer` where the field `Pagesize` is not
completed by the `setHugetlbStats` method. In the case where multiple sized tables stats are reported, this causes containerd to register two metrics
with the same label set, rather than each being partitioned by the `page` label.

Fixes: #5316
Signed-off-by: Champ-Goblem <cameron@northflank.com>
2022-10-04 09:16:30 +01:00
Peng Tao
8a2df6b31c Merge pull request #4931 from jpecholt/snp-support
Added SNP-Support for Kata-Containers
2022-09-27 14:17:54 +08:00
Joana Pecholt
ded60173d4 runtime: Enable choice between AMD SEV and SNP
This is based on a patch from @niteeshkd that adds a config
parameter to choose between AMD SEV and SEV-SNP VMs as the
confidential guest type in case both types are supported. SEV is
the default.

Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>
2022-09-16 17:51:41 +02:00
Joana Pecholt
22bda0838c runtime: Support for AMD SEV-SNP VMs
This commit adds AMD SEV-SNP as a confidential guest option to the
runtime. Information on required components such as OVMF, QEMU and
a kernel supporting SEV-SNP are defined in the versions file and
corresponding configs are added.

Note: The CPU model 'host' provided by the current SNP-QEMU does
not support all SNP capabilities yet, which is why this option is
changed to EPYC-v4.

Note: The guest's physical address space reduction specified with
ReducedPhysBits is 1. Details are can be found in Section 15.34.6
here https://www.amd.com/system/files/TechDocs/24593.pdf

Fixes #4437

Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>
2022-09-16 17:51:41 +02:00
Feng Wang
f914319874 runtime: store the user name in hypervisor config
The user name will be used to delete the user instead of relying on
uid lookup because uid can be reused.

Fixes: #5155

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-09-13 10:32:55 -07:00
Feng Wang
c3015927a3 runtime: add more debug logs for non-root user operation
Previously the logging was insufficient and made debugging difficult

Fixes: #5155

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-09-12 21:38:57 -07:00
Archana Shinde
7d52934ec1 Merge pull request #4798 from amshinde/use-iouring-qemu
Use iouring for qemu block devices
2022-08-26 04:00:24 +05:30
Peng Tao
a06d819b24 runtime: cri-o annotations have been moved to podman
Let's swith to depending on podman which also simplies indirect
dependency on kubernetes components. And it helps to avoid cri-o
security issues like CVE-2022-1708 as well.

Fixes: #4972
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-08-24 18:11:37 +08:00
Chelsea Mafrica
fcc1e0c617 runtime: tracing: End root span at end of trace
The root span should exist the duration of the trace. Defer ending span
until the end of the trace instead of end of function. Add the span to
the service struct to do so.

Fixes #4902

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2022-08-12 13:15:39 -07:00
Archana Shinde
c1e3b8f40f govmm: Refactor qmp functions for adding block device
Instead of passing a bunch of arguments to qmp functions for
adding block devices, use govmm BlockDevice structure to reduce these.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-05 13:16:34 -07:00
Archana Shinde
598884f374 govmm: Refactor code to get rid of redundant code
Get rid of redundant return values from function.
args and blockdevArgs used to return different values to maintain
compatilibity between qemu versions. These are exactly the same now.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-05 13:16:34 -07:00
Archana Shinde
00860a7e43 qmp: Pass aio backend while adding block device
Allow govmm to pass aio backend while adding block device.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-05 13:16:34 -07:00
Archana Shinde
e1b49d7586 config: Add block aio as a supported annotation
Allow Block AIO to be passed as a per pod annotation.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-05 13:16:34 -07:00
Archana Shinde
ed0f1d0b32 config: Add "block_device_aio" as a config option for qemu
This configuration will allow users to choose between different
I/O backends for qemu, with the default being io_uring.
This will allow users to fallback to a different I/O mechanism while
running on kernels olders than 5.1.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-05 13:16:34 -07:00
Archana Shinde
b6cd2348f5 govmm: Add io_uring as AIO type
io_uring was introduced as a new kernel IO interface in kernel 5.1.
It is designed for higher performance than the older Linux AIO API.
This feature was added in qemu 5.0.

Fixes #4645

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-03 10:43:12 -07:00
Archana Shinde
81cdaf0771 govmm: Correct documentation for Linux aio.
The comments for "native" aio are incorrect. Correct these.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-03 10:41:50 -07:00
Zhongtao Hu
adfad44efe Merge remote-tracking branch 'origin/main' into runtime-rs-merge-tmp
To keep runtime-rs up to date, we will merge main into runtime-rs every
week.

Fixes:#4776
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-08-01 11:12:48 +08:00
yaoyinnan
5c3155f7e2 runtime: Support for host cgroup v2
Support cgroup v2 on the host. Update vendor containerd/cgroups to add cgroup v2.

Fixes: #3073

Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
2022-07-28 10:30:45 +08:00
wllenyj
274598ae56 kata-runtime: add dragonball config check support.
add dragonball config check support.

Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
2022-07-14 10:43:50 +08:00