Commit Graph

3669 Commits

Author SHA1 Message Date
James O. D. Hunt
7f666f783d runtime-rs: ch: Fix TDX
PR #8311 inadvertently broke the runtime-rs / Cloud Hypervisor TDX
handling. It also introduced unrecoverable failure scenarios. Hence,
replace slow, fallible regex matching in logging fast path with single pass
non-failing multi-string log level matching.

Also, added a unit test for `parse_ch_log_level()`.

Fixes: #8418.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-11-13 08:49:47 +00:00
Xuewei Niu
0a9125e629 Merge pull request #7675 from justxuewei/vhost-net 2023-11-12 20:38:18 +08:00
Xuewei Niu
d1deaf0538 dragonball: Minor changes for a comment from Bian
- Add feature control for InsertNetworkDevice.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-12 14:14:10 +08:00
Xuewei Niu
e4f83e27c4 dragonball: vhost-net set_offload with acked features
set_offload() for tap devices depends on acked features.

Signed-off-by: Helin Guo <helinguo@linux.alibaba.com>
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-12 14:10:39 +08:00
Xuewei Niu
6cd572dbbb dragonball: Minor changes for Chao's comments
- Remove two panic statements from InsertNetworkDevice test.
- Rename `NUM_QUEUES` to `DEFAULT_NUM_QUEUES`, `QUEUE_SIZE` to
  `DEFAULT_QUEUE_SIZE` for vhost-net and virtio-net.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-12 14:10:39 +08:00
Xuewei Niu
dcdf3c6556 runtime-rs: Supply missing fields of NetworkConfig
`test_networkconfig_to_netconfig` from clh depends on `NetworkConfig` which
has some new fields in this PR. Therefore, this commit gives the test
missing fields.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-12 14:10:39 +08:00
Xuewei Niu
58e9709c1f dragonball: Changes for ZizhengBian's comments
- Dragonball's vhost-net feature not depends on virtio-net feature.
- Remove `TapError` from dbs-virtio-devices's Error, and add `VirtioNet`
  and `VhostNet` two fields.
- Downgrade visiblity of two fields of `VhostNetDeviceMgr` from
  `pub(crate)`.
- File an issue to record a todo for network rate limiter.
- Print internal errors with `{0:?}.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-12 14:10:33 +08:00
Chao Wu
a62fb83c91 Merge pull request #8169 from openanolis/chao/fix_typo_shm
runtime-rs: fix a typo in shm
2023-11-10 14:00:11 +08:00
gaohuatao
78df1bb851 agent: update AGENT_THREADS metrics value
Fixes: #8369

Signed-off-by: gaohuatao <gaohuatao@bytedance.com>
2023-11-10 10:39:57 +08:00
Chao Wu
afb002c25c runtime-rs: fix a typo in shm
is_shim_volume should be is_shm_volume in shm_volume mod.

fixes: #8168
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-11-10 10:36:58 +08:00
Archana Shinde
1611723465 Merge pull request #8379 from likebreath/1103/clh_v36.0
Upgrade to Cloud Hypervisor v36.0
2023-11-08 21:10:41 -08:00
Archana Shinde
268d4d622f Merge pull request #8389 from justxuewei/vm-capable-test
runtime: Fix TestCheckHostIsVMContainerCapable unstablity issue
2023-11-08 12:14:04 -08:00
Archana Shinde
92a517156c Merge pull request #8367 from amshinde/add-nerdctl-ipvlan-test
network: Fix network hotplug for ipvlan and macvlan endpoints for qemu and add tests
2023-11-08 11:45:13 -08:00
Chelsea Mafrica
83e731328f Merge pull request #8023 from cmaf/runtime-rs-ch-pause-resume
runtime-rs: Update status for pause and resume
2023-11-08 11:34:47 -08:00
Xuewei Niu
acd9057c7b runtime: Fix TestCheckHostIsVMContainerCapable unstablity issue
TestCheckHostIsVMContainerCapable removes sysModuleDir to simulate a
case that the kernel modules are not loaded. However,
checkKernelModules() executes modprobe <module> if a module not
found in that directory. Loading those modules is required to be denied
temporarily.

Fixes: #8390

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-08 22:40:08 +08:00
Fupan Li
100a73d2fd Merge pull request #7531 from justxuewei/device-cgroup
agent: Restrict device access at upper node of container's cgroup
2023-11-08 22:01:48 +08:00
Xuewei Niu
023d8dc01e agent: Changes according to Pan's comments
- Disable device cgroup restriction while pod cgroup is not available.
- Remove balcklist-related names and change whitelist-related names to
  allowed_all.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-08 09:39:08 +08:00
Xuewei Niu
b5f3a8cb39 agent: Fix container launching failure with systemd cgroup
FSManager of systemd cgroup manager is responsible for setting up cgroup
path. The container launching will be failed if the FSManager is in
read-only mode.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-08 09:39:07 +08:00
Xuewei Niu
6477825195 agent: Minor changes according to Zhou's comments
The changes include:

- Change to debug logging level for resources after processed.
- Remove a todo for pod cgroup cleanup.
- Add an anyhow context to `get_paths_and_mounts()`.
- Remove code which denys access to VMROOTFS since it won't take effect. If
  blackmode is in use, the VMROOTFS will be denyed as default. Otherwise,
  device cgroups won't be updated in whitelist mode.
- Add a unit test for `default_allowed_devices()`.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-08 09:39:07 +08:00
Xuewei Niu
cec8044744 agent: Make devcg_info optional for LinuxContainer::new()
The runk is a standard OCI runtime that isnt' aware of concept of sandbox.
Therefore, the `devcg_info` argument of `LinuxContainer::new()` is
unneccessary to be provided.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-08 09:39:07 +08:00
Xuewei Niu
ef4c3844a3 agent: Restrict device access at upper node of container's cgroup
The target is to guarantee that containers couldn't escape to access extra
devices, like vm rootfs, etc.

Assume that there is a cgroup, such as `/A/B`. The `B` is container cgroup,
and the `A` is what we called pod cgroup. No matter what permissions are
set for the container (`B`), the `A`'s permission is always `a *:* rwm`. It
leads that containers could acquire permission to access to other devices
in VM that not belongs to themselves.

In order to set devices cgroup properly, the order of setting cgroups is
that the pod cgroup comes first and the container cgroup comes after.

The `Sandbox` has a new field, `devcg_info`, to save cgroup states. To
avoid setting container cgroup too early, an initialization should be done
carefully. `inited`, one of the states, is a boolean to indicate if the pod
cgroup is initialized. If no, the pod cgroup should be created firstly, and
set default permissions. After that, the pause container cgroup is created
and inherits the permissions from the pod cgroup.

If whitelist mode which allows containers to access all devices in VM is
enabled,  then device resources from OCI spec are ignored.

This feature not supports systemd cgroup and cgroup v2, since:

- Systemd cgroup implemented on Agent hasn't supported devices subsystem so
  far, see: https://github.com/kata-containers/kata-containers/issues/7506.
- Cgroup v2's device controller depends on eBPF programs, which is out of
  scope of cgroup.

Fixes: #7507

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-08 09:39:07 +08:00
Archana Shinde
a6272733e7 network: Fix network hotplug for ipvlan and macvlan endpoints.
Since moving from network coldplug to hotplug, the only case verified
was veth endpoints. Support for network hotplug for ipvlan and macvlan was
broken/not added. Fix it.

Fixes: #8391

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-11-07 10:13:51 -08:00
James O. D. Hunt
59d0d4caff runtime-rs: ch: Simplify VSOCK error handling
Remove the redundant `VmConfigError::EmptyVsockSocketPath` error from
the Cloud Hypervisor config crate since this scenario is already handled
by the `VsockConfigError::NoVsockSocketPath` error.

Fixes: #8385.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-11-07 17:45:38 +00:00
James O. D. Hunt
bdb83f8282 runtime-rs: ch: Remove unused function
Remove the redundant `parse_mac()` function: this was never used and we
already have an implementation in `crates/resource/src/network/utils/mod.rs`.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-11-07 17:45:38 +00:00
Xuewei Niu
8ea87405ed runtime-rs: Remove virtio config from Backend
Virtio-net and vhost-net share a common virtio config, and vhost-user-net
uses another config, named `VhostUserConfig`. Thus, the virtio config could
be added into `NetworkConfig` instead of `Backend`.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-07 19:35:02 +08:00
Xuewei Niu
ad66378bf5 runtime-rs: Move Dragonball stuff out of device drivers
Moving Dragonball structs convertions out of device drivers to keep driver
neutral. The convertions include `NetworkBackend` to
`DragonballNetworkBackend` and `NetworkConfig` to
`DragonballNetworkConfig`.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-07 19:35:02 +08:00
Xuewei Niu
3e0614cdf0 dragonball: Minor changes to comments
Changes include:

- Merge `VhostNetDeviceError` import item.
- Replace if with match in `add_vhost_net_device()`

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-07 19:35:02 +08:00
Xuewei Niu
a047331a34 runtime-rs: Network config distinguishes backends
Network backends determine the virtio dataplane implementations. Common
protocols include virtio-net, vhost-net and vhost-user-net, etc. Network
config has a new field named `backend` to specify which protocol to use.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-07 19:35:02 +08:00
Xuewei Niu
9203371833 dragonball: Introduce vhost-net device
PLEASE NOTE THAT this pull request just implements vhost-net support for
Dragonball, and adaptation for the Runtime-rs. And this pull request
DOESN'T provide an item to config which backend to use. To sum up,
virtio-net as a default backend is only choice for the user so far.

This pull request introduces vhost-net device for the Dragonball. In
addition, this pull request includes changes of Runtime-rs to improve
network configuration abilities.

The Dragonball part implements a vhost-net device and a vhost-net device
manager, named `VhostNetDeviceMgr`, to manage vhost-net device.
`NetworkInterfaceConfig` is introduced as a high-level abstract for network
config. Then, the Dragonball is able to distinguish network backends, e.g.
virtio-net, vhost-net, vhost-user-net(WIP), etc.

The Runtime-rs part adds support of multiple network backends as well.
`NetworkConfig` has a couple of new fields, like `backend`,
`use_shared_irq`, etc. And Dragonball's network config structs are
implmented `From` trait which allow to be converted from the Runtime-rs's
network config conveniently.

Fixes: #7674

Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
Signed-off-by: Zizheng Bian <zizheng.bian@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-07 19:35:02 +08:00
Beraldo Leal
dd530ba8ee tests: fixes AMD errors
TestCheckHostIsVMContainerCapable is failing on AMD machines.
kata-check_amd64_test.go:96 has no AMD modules, also getCPUType is
missing.

Fixes #8384.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-11-06 16:49:59 +00:00
Beraldo Leal
7641c19f74 runtime: bump containerd for gogo deprecation
This update includes necessary changes due to the version bump of
containerd and its dependencies. It's part of a broader initiative to
phase out gogo protobuf, which has been deprecated, and to align with
the current supported libraries.

Fixes #7420.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-11-06 16:49:59 +00:00
Beraldo Leal
16fa2c39e6 protocols: replace gogo/types.Empty and Any
by Google versions.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-11-06 16:49:58 +00:00
Beraldo Leal
c61f4a8592 protocols: remove unused fieldpath option
The +fieldpath option, specific to gogoprotobuf, enabled dynamic field
access in protobuf messages, allowing nested fields to be accessed via
string paths.

This change is part of a larger effort to transition to the official Go
protobuf library for better maintainability and community support.
Upon review, no instances of dynamic field access were found in the
codebase, confirming that the feature is not in use.

By removing this unused feature, we simplify the build process and make
it easier to complete the transition away from gogoprotobuf.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-11-06 16:49:58 +00:00
Beraldo Leal
c87bc60ea0 protocols: removing unused mappings
Those mappings are not used by our .proto files and there is no
difference between .pb.go files generated.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-11-06 16:49:58 +00:00
Beraldo Leal
c5d845b30a agent: updating Cargo.lock files
Probably previous changes missed updating Cargo.lock.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-11-06 16:49:58 +00:00
Beraldo Leal
5d88c78a6e protocols: generating agent.pb.go
a3b003c345 modified agent but agent.pb.go
was not updated.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-11-06 16:49:58 +00:00
Archana Shinde
036b7787dd runtime-rs: Use PCI path from hypervisor for vfio devices
Remove earlier functionality that tries to assign PCI path to vfio
devices from the host assuming pci slots to start from 1.
Get this from the hypervisor instead.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-11-05 21:59:44 -08:00
Archana Shinde
c3ce6a1d15 runtime-rs: Provide PCI path to the agent for virtio-block
If PCI path for block device is not empty for a block device, use
that as identifier for agent instead of virt path which is valid only
for mmio devices.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-11-05 21:59:44 -08:00
Archana Shinde
a2bbbad711 runtime-rs: change hypervisor add_device trait to return device copy
Block(virtio-blk) and vfio devices are currently not handled correctly
by the agent as the agent is not provided with correct PCI paths for
these devices.

The PCI paths for these devices can be inferred from the PCI information
provided by the hypervisor when the device is added.
Hence changing the add_device trait function to return a device copy
with PCI info potentially provided by the hypervisor. This can then be
provided to the agent to correctly detect devices within the VM.

This commit includes implementation for PCI info update for
cloud-hupervisor for virtio-blk devices with stubs provided for other
hypervisors.

Removing Vsock from the DeviceType enum as Vsock currently does not
implement the Device Trait, it has no attach and detach trait functions
among others. Part of the reason is because these functions require Vsock
to implement Clone trait as these functions need cloned copies to be
passed down the hypervisor.

The change introduced for returning a device copy from the add_device
hypervisor trait explicitly requires a device to implement
Copy trait. Hence removing Vsock from the DeviceType enum for now, as
its implementation is incomplete and not currently used.

Note, one of the blockers for adding the Clone trait to Vsock is that it
currently includes a file handle which cannot be cloned. For Clone and
Device Traits to be implemented for Vsock, it requires an implementation
change in the future for it to be cloneable.

Fixes: #8283

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-11-05 21:59:44 -08:00
Bo Chen
071667f1ca runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v35.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.

Fixes: #8378

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-11-03 10:47:06 -07:00
Fabiano Fidêncio
40cc397218 Merge pull request #8255 from cmaf/migrate-checks-fixes-links
docs: Fix broken links
2023-11-01 14:46:30 +01:00
Fabiano Fidêncio
53cda12a71 Merge pull request #8311 from TimePrinciple/log-system-enhancement
runtime-rs: Log system enhancement
2023-10-31 10:14:41 +01:00
Archana Shinde
148c565b2f Merge pull request #8289 from BbolroC/skip-create-tmpfs-s390x
agent: Skip flaky create_tmpfs on s390x
2023-10-30 22:26:28 -07:00
Ruoqing He
4ad2cfe0c2 runtime-rs: Log system enhancement
By modifying RuntimeLevelFilter drain to improve logging control,
enabling isolation of change effect of the loggers between components,
tuning clh logs to be logged according to their log levels
given by cloud-hypervisor.

Fixes: #8310

Signed-off-by: Ruoqing He <linuxwatcher@outlook.com>
2023-10-31 04:57:46 +00:00
David Esparza
2a17d3889e Merge pull request #8334 from amshinde/ipvlan-nerdctl-fix
network: Fix network attach for ipvlan and macvlan
2023-10-30 16:00:32 -06:00
Chao Wu
7d26604061 Merge pull request #7831 from lisongqian/feat/dragonball_trace
dragonball: add tracing feature for dragonball
2023-10-30 17:27:30 +08:00
James O. D. Hunt
d7e410ad2b Merge pull request #8314 from jodh-intel/kata-ctl-show-confidential-guest
kata-runtime/kata-ctl: Add security details to output
2023-10-30 07:41:22 +00:00
Songqian Li
2f533c3003 dragonball: add tracing feature for dragonball
This PR adds the tracing capability for dragonball and it depends on the tracing::Subscriber of the upper layer.

Fixes: #7249

Signed-off-by: Songqian Li <mail@lisongqian.cn>
2023-10-28 19:52:24 +08:00
Chao Wu
f1f4410537 Merge pull request #7695 from lisongqian/feat/legacy_metrics
dragonball: add metrics support for legacy device
2023-10-28 16:48:57 +08:00
Archana Shinde
f53f86884f network: Fix network attach for ipvlan and macvlan
We used the approach of cold-plugging network interface for pre-shimv2
support for docker.Since the hotplug approach was not required,
we never really got to implementing hotplug support for certain network
endpoints, ipvlan and macvlan being among them.

Since moving to shimv2 interface as the default for
runtime, we switched to hotplugging the network interface for supporting
docker and nerdctl. This was done for veth endpoints only.

Implement the hot-attach apis for ipvlan and macvlan as well to support
ipvlan and macvlan networks with docker and nerdctl.

Fixes: #8333

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-10-27 21:42:37 -07:00