CCv0: Merge main into CCv0 branch

Merge remote-tracking branch 'upstream/main' into CCv0

Fixes: #4200
Signed-off-by: Megan Wright <megan.wright@.ibm.com>
This commit is contained in:
Megan Wright
2022-05-04 11:26:50 +01:00
64 changed files with 5110 additions and 539 deletions

View File

@@ -14,6 +14,7 @@ TOOLS =
TOOLS += agent-ctl
TOOLS += trace-forwarder
TOOLS += runk
STANDARD_TARGETS = build check clean install test vendor

View File

@@ -132,6 +132,7 @@ The table below lists the remaining parts of the project:
| [osbuilder](tools/osbuilder) | infrastructure | Tool to create "mini O/S" rootfs and initrd images and kernel for the hypervisor. |
| [`agent-ctl`](src/tools/agent-ctl) | utility | Tool that provides low-level access for testing the agent. |
| [`trace-forwarder`](src/tools/trace-forwarder) | utility | Agent tracing helper. |
| [`runk`](src/tools/runk) | utility | Standard OCI container runtime based on the agent. |
| [`ci`](https://github.com/kata-containers/ci) | CI | Continuous Integration configuration files and scripts. |
| [`katacontainers.io`](https://github.com/kata-containers/www.katacontainers.io) | Source for the [`katacontainers.io`](https://www.katacontainers.io) site. |

View File

@@ -11,6 +11,7 @@ Kata Containers design documents:
- [`Inotify` support](inotify.md)
- [Metrics(Kata 2.0)](kata-2-0-metrics.md)
- [Design for Kata Containers `Lazyload` ability with `nydus`](kata-nydus-design.md)
- [Design for direct-assigned volume](direct-blk-device-assignment.md)
---

View File

@@ -0,0 +1,253 @@
# Motivation
Today, there exist a few gaps between Container Storage Interface (CSI) and virtual machine (VM) based runtimes such as Kata Containers
that prevent them from working together smoothly.
First, its cumbersome to use a persistent volume (PV) with Kata Containers. Today, for a PV with Filesystem volume mode, Virtio-fs
is the only way to surface it inside a Kata Container guest VM. But often mounting the filesystem (FS) within the guest operating system (OS) is
desired due to performance benefits, availability of native FS features and security benefits over the Virtio-fs mechanism.
Second, its difficult if not impossible to resize a PV online with Kata Containers. While a PV can be expanded on the host OS,
the updated metadata needs to be propagated to the guest OS in order for the application container to use the expanded volume.
Currently, there is not a way to propagate the PV metadata from the host OS to the guest OS without restarting the Pod sandbox.
# Proposed Solution
Because of the OS boundary, these features cannot be implemented in the CSI node driver plugin running on the host OS
as is normally done in the runc container. Instead, they can be done by the Kata Containers agent inside the guest OS,
but it requires the CSI driver to pass the relevant information to the Kata Containers runtime.
An ideal long term solution would be to have the `kubelet` coordinating the communication between the CSI driver and
the container runtime, as described in [KEP-2857](https://github.com/kubernetes/enhancements/pull/2893/files).
However, as the KEP is still under review, we would like to propose a short/medium term solution to unblock our use case.
The proposed solution is built on top of a previous [proposal](https://github.com/egernst/kata-containers/blob/da-proposal/docs/design/direct-assign-volume.md)
described by Eric Ernst. The previous proposal has two gaps:
1. Writing a `csiPlugin.json` file to the volume root path introduced a security risk. A malicious user can gain unauthorized
access to a block device by writing their own `csiPlugin.json` to the above location through an ephemeral CSI plugin.
2. The proposal didn't describe how to establish a mapping between a volume and a kata sandbox, which is needed for
implementing CSI volume resize and volume stat collection APIs.
This document particularly focuses on how to address these two gaps.
## Assumptions and Limitations
1. The proposal assumes that a block device volume will only be used by one Pod on a node at a time, which we believe
is the most common pattern in Kata Containers use cases. Its also unsafe to have the same block device attached to more than
one Kata pod. In the context of Kubernetes, the `PersistentVolumeClaim` (PVC) needs to have the `accessMode` as `ReadWriteOncePod`.
2. More advanced Kubernetes volume features such as, `fsGroup`, `fsGroupChangePolicy`, and `subPath` are not supported.
## End User Interface
1. The user specifies a PV as a direct-assigned volume. How a PV is specified as a direct-assigned volume is left for each CSI implementation to decide.
There are a few options for reference:
1. A storage class parameter specifies whether it's a direct-assigned volume. This avoids any lookups of PVC
or Pod information from the CSI plugin (as external provisioner takes care of these). However, all PVs in the storage class with the parameter set
will have host mounts skipped.
2. Use a PVC annotation. This approach requires the CSI plugins have `--extra-create-metadata` [set](https://kubernetes-csi.github.io/docs/external-provisioner.html#persistentvolumeclaim-and-persistentvolume-parameters)
to be able to perform a lookup of the PVC annotations from the API server. Pro: API server lookup of annotations only required during creation of PV.
Con: The CSI plugin will always skip host mounting of the PV.
3. The CSI plugin can also lookup pod `runtimeclass` during `NodePublish`. This approach can be found in the [ALIBABA CSI plugin](https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver/blob/master/pkg/disk/nodeserver.go#L248).
2. The CSI node driver delegates the direct assigned volume to the Kata Containers runtime. The CSI node driver APIs need to
be modified to pass the volume mount information and collect volume information to/from the Kata Containers runtime by invoking `kata-runtime` command line commands.
* **NodePublishVolume** -- It invokes `kata-runtime direct-volume add --volume-path [volumePath] --mount-info [mountInfo]`
to propagate the volume mount information to the Kata Containers runtime for it to carry out the filesystem mount operation.
The `volumePath` is the [target_path](https://github.com/container-storage-interface/spec/blob/master/csi.proto#L1364) in the CSI `NodePublishVolumeRequest`.
The `mountInfo` is a serialized JSON string.
* **NodeGetVolumeStats** -- It invokes `kata-runtime direct-volume stats --volume-path [volumePath]` to retrieve the filesystem stats of direct-assigned volume.
* **NodeExpandVolume** -- It invokes `kata-runtime direct-volume resize --volume-path [volumePath] --size [size]` to send a resize request to the Kata Containers runtime to
resize the direct-assigned volume.
* **NodeStageVolume/NodeUnStageVolume** -- It invokes `kata-runtime direct-volume remove --volume-path [volumePath]` to remove the persisted metadata of a direct-assigned volume.
The `mountInfo` object is defined as follows:
```Golang
type MountInfo struct {
// The type of the volume (ie. block)
VolumeType string `json:"volume-type"`
// The device backing the volume.
Device string `json:"device"`
// The filesystem type to be mounted on the volume.
FsType string `json:"fstype"`
// Additional metadata to pass to the agent regarding this volume.
Metadata map[string]string `json:"metadata,omitempty"`
// Additional mount options.
Options []string `json:"options,omitempty"`
}
```
Notes: given that the `mountInfo` is persisted to the disk by the Kata runtime, it shouldn't container any secrets (such as SMB mount password).
## Implementation Details
### Kata runtime
Instead of the CSI node driver writing the mount info into a `csiPlugin.json` file under the volume root,
as described in the original proposal, here we propose that the CSI node driver passes the mount information to
the Kata Containers runtime through a new `kata-runtime` commandline command. The `kata-runtime` then writes the mount
information to a `mount-info.json` file in a predefined location (`/run/kata-containers/shared/direct-volumes/[volume_path]/`).
When the Kata Containers runtime starts a container, it verifies whether a volume mount is a direct-assigned volume by checking
whether there is a `mountInfo` file under the computed Kata `direct-volumes` directory. If it is, the runtime parses the `mountInfo` file,
updates the mount spec with the data in `mountInfo`. The updated mount spec is then passed to the Kata agent in the guest VM together
with other mounts. The Kata Containers runtime also creates a file named by the sandbox id under the `direct-volumes/[volume_path]/`
directory. The reason for adding a sandbox id file is to establish a mapping between the volume and the sandbox using it.
Later, when the Kata Containers runtime handles the `get-stats` and `resize` commands, it uses the sandbox id to identify
the endpoint of the corresponding `containerd-shim-kata-v2`.
### containerd-shim-kata-v2 changes
`containerd-shim-kata-v2` provides an API for sandbox management through a Unix domain socket. Two new handlers are proposed: `/direct-volume/stats` and `/direct-volume/resize`:
Example:
```bash
$ curl --unix-socket "$shim_socket_path" -I -X GET 'http://localhost/direct-volume/stats/[urlSafeVolumePath]'
$ curl --unix-socket "$shim_socket_path" -I -X POST 'http://localhost/direct-volume/resize' -d '{ "volumePath"": [volumePath], "Size": "123123" }'
```
The shim then forwards the corresponding request to the `kata-agent` to carry out the operations inside the guest VM. For `resize` operation,
the Kata runtime also needs to notify the hypervisor to resize the block device (e.g. call `block_resize` in QEMU).
### Kata agent changes
The mount spec of a direct-assigned volume is passed to `kata-agent` through the existing `Storage` GRPC object.
Two new APIs and three new GRPC objects are added to GRPC protocol between the shim and agent for resizing and getting volume stats:
```protobuf
rpc GetVolumeStats(VolumeStatsRequest) returns (VolumeStatsResponse);
rpc ResizeVolume(ResizeVolumeRequest) returns (google.protobuf.Empty);
message VolumeStatsRequest {
// The volume path on the guest outside the container
string volume_guest_path = 1;
}
message ResizeVolumeRequest {
// Full VM guest path of the volume (outside the container)
string volume_guest_path = 1;
uint64 size = 2;
}
// This should be kept in sync with CSI NodeGetVolumeStatsResponse (https://github.com/container-storage-interface/spec/blob/v1.5.0/csi.proto)
message VolumeStatsResponse {
// This field is OPTIONAL.
repeated VolumeUsage usage = 1;
// Information about the current condition of the volume.
// This field is OPTIONAL.
// This field MUST be specified if the VOLUME_CONDITION node
// capability is supported.
VolumeCondition volume_condition = 2;
}
message VolumeUsage {
enum Unit {
UNKNOWN = 0;
BYTES = 1;
INODES = 2;
}
// The available capacity in specified Unit. This field is OPTIONAL.
// The value of this field MUST NOT be negative.
uint64 available = 1;
// The total capacity in specified Unit. This field is REQUIRED.
// The value of this field MUST NOT be negative.
uint64 total = 2;
// The used capacity in specified Unit. This field is OPTIONAL.
// The value of this field MUST NOT be negative.
uint64 used = 3;
// Units by which values are measured. This field is REQUIRED.
Unit unit = 4;
}
// VolumeCondition represents the current condition of a volume.
message VolumeCondition {
// Normal volumes are available for use and operating optimally.
// An abnormal volume does not meet these criteria.
// This field is REQUIRED.
bool abnormal = 1;
// The message describing the condition of the volume.
// This field is REQUIRED.
string message = 2;
}
```
### Step by step walk-through
Given the following definition:
```YAML
---
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
runtime-class: kata-qemu
containers:
- name: app
image: centos
command: ["/bin/sh"]
args: ["-c", "while true; do echo $(date -u) >> /data/out.txt; sleep 5; done"]
volumeMounts:
- name: persistent-storage
mountPath: /data
volumes:
- name: persistent-storage
persistentVolumeClaim:
claimName: ebs-claim
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
skip-hostmount: "true"
name: ebs-claim
spec:
accessModes:
- ReadWriteOncePod
volumeMode: Filesystem
storageClassName: ebs-sc
resources:
requests:
storage: 4Gi
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: ebs-sc
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
csi.storage.k8s.io/fstype: ext4
```
Lets assume that changes have been made in the `aws-ebs-csi-driver` node driver.
**Node publish volume**
1. In the node CSI driver, the `NodePublishVolume` API invokes: `kata-runtime direct-volume add --volume-path "/kubelet/a/b/c/d/sdf" --mount-info "{\"Device\": \"/dev/sdf\", \"fstype\": \"ext4\"}"`.
2. The `Kata-runtime` writes the mount-info JSON to a file called `mountInfo.json` under `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
**Node unstage volume**
1. In the node CSI driver, the `NodeUnstageVolume` API invokes: `kata-runtime direct-volume remove --volume-path "/kubelet/a/b/c/d/sdf"`.
2. Kata-runtime deletes the directory `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
**Use the volume in sandbox**
1. Upon the request to start a container, the `containerd-shim-kata-v2` examines the container spec,
and iterates through the mounts. For each mount, if there is a `mountInfo.json` file under `/run/kata-containers/shared/direct-volumes/[mount source path]`,
it generates a `storage` GRPC object after overwriting the mount spec with the information in `mountInfo.json`.
2. The shim sends the storage objects to kata-agent through TTRPC.
3. The shim writes a file with the sandbox id as the name under `/run/kata-containers/shared/direct-volumes/[mount source path]`.
4. The kata-agent mounts the storage objects for the container.
**Node expand volume**
1. In the node CSI driver, the `NodeExpandVolume` API invokes: `kata-runtime direct-volume resize -volume-path "/kubelet/a/b/c/d/sdf" -size 8Gi`.
2. The Kata runtime checks whether there is a sandbox id file under the directory `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
3. The Kata runtime identifies the shim instance through the sandbox id, and sends a GRPC request to resize the volume.
4. The shim handles the request, asks the hypervisor to resize the block device and sends a GRPC request to Kata agent to resize the filesystem.
5. Kata agent receives the request and resizes the filesystem.
**Node get volume stats**
1. In the node CSI driver, the `NodeGetVolumeStats` API invokes: `kata-runtime direct-volume stats -volume-path "/kubelet/a/b/c/d/sdf"`.
2. The Kata runtime checks whether there is a sandbox id file under the directory `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
3. The Kata runtime identifies the shim instance through the sandbox id, and sends a GRPC request to get the volume stats.
4. The shim handles the request and forwards it to the Kata agent.
5. Kata agent receives the request and returns the filesystem stats.

View File

@@ -39,7 +39,7 @@ Details of each solution and a summary are provided below.
Kata Containers with QEMU has complete compatibility with Kubernetes.
Depending on the host architecture, Kata Containers supports various machine types,
for example `pc` and `q35` on x86 systems, `virt` on ARM systems and `pseries` on IBM Power systems. The default Kata Containers
for example `q35` on x86 systems, `virt` on ARM systems and `pseries` on IBM Power systems. The default Kata Containers
machine type is `q35`. The machine type and its [`Machine accelerators`](#machine-accelerators) can
be changed by editing the runtime [`configuration`](architecture/README.md#configuration) file.
@@ -60,9 +60,8 @@ Machine accelerators are architecture specific and can be used to improve the pe
and enable specific features of the machine types. The following machine accelerators
are used in Kata Containers:
- NVDIMM: This machine accelerator is x86 specific and only supported by `pc` and
`q35` machine types. `nvdimm` is used to provide the root filesystem as a persistent
memory device to the Virtual Machine.
- NVDIMM: This machine accelerator is x86 specific and only supported by `q35` machine types.
`nvdimm` is used to provide the root filesystem as a persistent memory device to the Virtual Machine.
#### Hotplug devices

View File

@@ -81,7 +81,7 @@
- Download the standard `systemd(1)` service file and install to
`/etc/systemd/system/`:
- https://raw.githubusercontent.com/containerd/containerd/master/containerd.service
- https://raw.githubusercontent.com/containerd/containerd/main/containerd.service
> **Notes:**
>

View File

@@ -3,4 +3,4 @@
Kata Containers supports passing certain GPUs from the host into the container. Select the GPU vendor for detailed information:
- [Intel](Intel-GPU-passthrough-and-Kata.md)
- [Nvidia](Nvidia-GPU-passthrough-and-Kata.md)
- [NVIDIA](NVIDIA-GPU-passthrough-and-Kata.md)

View File

@@ -0,0 +1,372 @@
# Using NVIDIA GPU device with Kata Containers
An NVIDIA GPU device can be passed to a Kata Containers container using GPU
passthrough (NVIDIA GPU pass-through mode) as well as GPU mediated passthrough
(NVIDIA vGPU mode).
NVIDIA GPU pass-through mode, an entire physical GPU is directly assigned to one
VM, bypassing the NVIDIA Virtual GPU Manager. In this mode of operation, the GPU
is accessed exclusively by the NVIDIA driver running in the VM to which it is
assigned. The GPU is not shared among VMs.
NVIDIA Virtual GPU (vGPU) enables multiple virtual machines (VMs) to have
simultaneous, direct access to a single physical GPU, using the same NVIDIA
graphics drivers that are deployed on non-virtualized operating systems. By
doing this, NVIDIA vGPU provides VMs with unparalleled graphics performance,
compute performance, and application compatibility, together with the
cost-effectiveness and scalability brought about by sharing a GPU among multiple
workloads. A vGPU can be either time-sliced or Multi-Instance GPU (MIG)-backed
with [MIG-slices](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/).
| Technology | Description | Behavior | Detail |
| --- | --- | --- | --- |
| NVIDIA GPU pass-through mode | GPU passthrough | Physical GPU assigned to a single VM | Direct GPU assignment to VM without limitation |
| NVIDIA vGPU time-sliced | GPU time-sliced | Physical GPU time-sliced for multiple VMs | Mediated passthrough |
| NVIDIA vGPU MIG-backed | GPU with MIG-slices | Physical GPU MIG-sliced for multiple VMs | Mediated passthrough |
## Hardware Requirements
NVIDIA GPUs Recommended for Virtualization:
- NVIDIA Tesla (T4, M10, P6, V100 or newer)
- NVIDIA Quadro RTX 6000/8000
## Host BIOS Requirements
Some hardware requires a larger PCI BARs window, for example, NVIDIA Tesla P100,
K40m
```sh
$ lspci -s d0:00.0 -vv | grep Region
Region 0: Memory at e7000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at 222800000000 (64-bit, prefetchable) [size=32G] # Above 4G
Region 3: Memory at 223810000000 (64-bit, prefetchable) [size=32M]
```
For large BARs devices, MMIO mapping above 4G address space should be `enabled`
in the PCI configuration of the BIOS.
Some hardware vendors use different name in BIOS, such as:
- Above 4G Decoding
- Memory Hole for PCI MMIO
- Memory Mapped I/O above 4GB
If one is using a GPU based on the Ampere architecture and later additionally
SR-IOV needs to be enabled for the vGPU use-case.
The following steps outline the workflow for using an NVIDIA GPU with Kata.
## Host Kernel Requirements
The following configurations need to be enabled on your host kernel:
- `CONFIG_VFIO`
- `CONFIG_VFIO_IOMMU_TYPE1`
- `CONFIG_VFIO_MDEV`
- `CONFIG_VFIO_MDEV_DEVICE`
- `CONFIG_VFIO_PCI`
Your host kernel needs to be booted with `intel_iommu=on` on the kernel command
line.
## Install and configure Kata Containers
To use non-large BARs devices (for example, NVIDIA Tesla T4), you need Kata
version 1.3.0 or above. Follow the [Kata Containers setup
instructions](../install/README.md) to install the latest version of Kata.
To use large BARs devices (for example, NVIDIA Tesla P100), you need Kata
version 1.11.0 or above.
The following configuration in the Kata `configuration.toml` file as shown below
can work:
Hotplug for PCI devices with small BARs by `acpi_pcihp` (Linux's ACPI PCI
Hotplug driver):
```sh
machine_type = "q35"
hotplug_vfio_on_root_bus = false
```
Hotplug for PCIe devices with large BARs by `pciehp` (Linux's PCIe Hotplug
driver):
```sh
machine_type = "q35"
hotplug_vfio_on_root_bus = true
pcie_root_port = 1
```
## Build Kata Containers kernel with GPU support
The default guest kernel installed with Kata Containers does not provide GPU
support. To use an NVIDIA GPU with Kata Containers, you need to build a kernel
with the necessary GPU support.
The following kernel config options need to be enabled:
```sh
# Support PCI/PCIe device hotplug (Required for large BARs device)
CONFIG_HOTPLUG_PCI_PCIE=y
# Support for loading modules (Required for load NVIDIA drivers)
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# Enable the MMIO access method for PCIe devices (Required for large BARs device)
CONFIG_PCI_MMCONFIG=y
```
The following kernel config options need to be disabled:
```sh
# Disable Open Source NVIDIA driver nouveau
# It conflicts with NVIDIA official driver
CONFIG_DRM_NOUVEAU=n
```
> **Note**: `CONFIG_DRM_NOUVEAU` is normally disabled by default.
It is worth checking that it is not enabled in your kernel configuration to
prevent any conflicts.
Build the Kata Containers kernel with the previous config options, using the
instructions described in [Building Kata Containers
kernel](../../tools/packaging/kernel). For further details on building and
installing guest kernels, see [the developer
guide](../Developer-Guide.md#install-guest-kernel-images).
There is an easy way to build a guest kernel that supports NVIDIA GPU:
```sh
## Build guest kernel with ../../tools/packaging/kernel
# Prepare (download guest kernel source, generate .config)
$ ./build-kernel.sh -v 5.15.23 -g nvidia -f setup
# Build guest kernel
$ ./build-kernel.sh -v 5.15.23 -g nvidia build
# Install guest kernel
$ sudo -E ./build-kernel.sh -v 5.15.23 -g nvidia install
```
To build NVIDIA Driver in Kata container, `linux-headers` is required.
This is a way to generate deb packages for `linux-headers`:
> **Note**:
> Run `make rpm-pkg` to build the rpm package.
> Run `make deb-pkg` to build the deb package.
>
```sh
$ cd kata-linux-5.15.23-89
$ make deb-pkg
```
Before using the new guest kernel, please update the `kernel` parameters in
`configuration.toml`.
```sh
kernel = "/usr/share/kata-containers/vmlinuz-nvidia-gpu.container"
```
## NVIDIA GPU pass-through mode with Kata Containers
Use the following steps to pass an NVIDIA GPU device in pass-through mode with Kata:
1. Find the Bus-Device-Function (BDF) for GPU device on host:
```sh
$ sudo lspci -nn -D | grep -i nvidia
0000:d0:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:20b9] (rev a1)
```
> PCI address `0000:d0:00.0` is assigned to the hardware GPU device.
> `10de:20b9` is the device ID of the hardware GPU device.
2. Find the IOMMU group for the GPU device:
```sh
$ BDF="0000:d0:00.0"
$ readlink -e /sys/bus/pci/devices/$BDF/iommu_group
```
The previous output shows that the GPU belongs to IOMMU group 192. The next
step is to bind the GPU to the VFIO-PCI driver.
```sh
$ BDF="0000:d0:00.0"
$ DEV="/sys/bus/pci/devices/$BDF"
$ echo "vfio-pci" > $DEV/driver_override
$ echo $BDF > $DEV/driver/unbind
$ echo $BDF > /sys/bus/pci/drivers_probe
# To return the device to the standard driver, we simply clear the
# driver_override and reprobe the device, ex:
$ echo > $DEV/preferred_driver
$ echo $BDF > $DEV/driver/unbind
$ echo $BDF > /sys/bus/pci/drivers_probe
```
3. Check the IOMMU group number under `/dev/vfio`:
```sh
$ ls -l /dev/vfio
total 0
crw------- 1 zvonkok zvonkok 243, 0 Mar 18 03:06 192
crw-rw-rw- 1 root root 10, 196 Mar 18 02:27 vfio
```
4. Start a Kata container with GPU device:
```sh
# You may need to `modprobe vhost-vsock` if you get
# host system doesn't support vsock: stat /dev/vhost-vsock
$ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/library/archlinux:latest" arch uname -r
```
5. Run `lspci` within the container to verify the GPU device is seen in the list
of the PCI devices. Note the vendor-device id of the GPU (`10de:20b9`) in the `lspci` output.
```sh
$ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/library/archlinux:latest" arch sh -c "lspci -nn | grep '10de:20b9'"
```
6. Additionally, you can check the PCI BARs space of the NVIDIA GPU device in the container:
```sh
$ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/library/archlinux:latest" arch sh -c "lspci -s 02:00.0 -vv | grep Region"
```
> **Note**: If you see a message similar to the above, the BAR space of the NVIDIA
> GPU has been successfully allocated.
## NVIDIA vGPU mode with Kata Containers
NVIDIA vGPU is a licensed product on all supported GPU boards. A software license
is required to enable all vGPU features within the guest VM.
> **TODO**: Will follow up with instructions
## Install NVIDIA Driver + Toolkit in Kata Containers Guest OS
Consult the [Developer-Guide](https://github.com/kata-containers/kata-containers/blob/main/docs/Developer-Guide.md#create-a-rootfs-image) on how to create a
rootfs base image for a distribution of your choice. This is going to be used as
a base for a NVIDIA enabled guest OS. Use the `EXTRA_PKGS` variable to install
all the needed packages to compile the drivers. Also copy the kernel development
packages from the previous `make deb-pkg` into `$ROOTFS_DIR`.
```sh
export EXTRA_PKGS="gcc make curl gnupg"
```
Having the `$ROOTFS_DIR` exported in the previous step we can now install all the
need parts in the guest OS. In this case we have an Ubuntu based rootfs.
First off all mount the special filesystems into the rootfs
```sh
$ sudo mount -t sysfs -o ro none ${ROOTFS_DIR}/sys
$ sudo mount -t proc -o ro none ${ROOTFS_DIR}/proc
$ sudo mount -t tmpfs none ${ROOTFS_DIR}/tmp
$ sudo mount -o bind,ro /dev ${ROOTFS_DIR}/dev
$ sudo mount -t devpts none ${ROOTFS_DIR}/dev/pts
```
Now we can enter `chroot`
```sh
$ sudo chroot ${ROOTFS_DIR}
```
Inside the rootfs one is going to install the drivers and toolkit to enable easy
creation of GPU containers with Kata. We can also use this rootfs for any other
container not specifically only for GPUs.
As a prerequisite install the copied kernel development packages
```sh
$ sudo dpkg -i *.deb
```
Get the driver run file, since we need to build the driver against a kernel that
is not running on the host we need the ability to specify the exact version we
want the driver to build against. Take the kernel version one used for building
the NVIDIA kernel (`5.15.23-nvidia-gpu`).
```sh
$ wget https://us.download.nvidia.com/XFree86/Linux-x86_64/510.54/NVIDIA-Linux-x86_64-510.54.run
$ chmod +x NVIDIA-Linux-x86_64-510.54.run
# Extract the source files so we can run the installer with arguments
$ ./NVIDIA-Linux-x86_64-510.54.run -x
$ cd NVIDIA-Linux-x86_64-510.54
$ ./nvidia-installer -k 5.15.23-nvidia-gpu
```
Having the drivers installed we need to install the toolkit which will take care
of providing the right bits into the container.
```sh
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
$ curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
$ apt update
$ apt install nvidia-container-toolkit
```
Create the hook execution file for Kata:
```
# Content of $ROOTFS_DIR/usr/share/oci/hooks/prestart/nvidia-container-toolkit.sh
#!/bin/bash -x
/usr/bin/nvidia-container-toolkit -debug $@
```
As a last step one can do some cleanup of files or package caches. Build the
rootfs and configure it for use with Kata according to the development guide.
Enable the `guest_hook_path` in Kata's `configuration.toml`
```sh
guest_hook_path = "/usr/share/oci/hooks"
```
One has build a NVIDIA rootfs, kernel and now we can run any GPU container
without installing the drivers into the container. Check NVIDIA device status
with `nvidia-smi`
```sh
$ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/nvidia/cuda:11.6.0-base-ubuntu20.04" cuda nvidia-smi
Fri Mar 18 10:36:59 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.54 Driver Version: 510.54 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A30X Off | 00000000:02:00.0 Off | 0 |
| N/A 38C P0 67W / 230W | 0MiB / 24576MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
```
As a last step one can remove the additional packages and files that were added
to the `$ROOTFS_DIR` to keep it as small as possible.
## References
- [Configuring a VM for GPU Pass-Through by Using the QEMU Command Line](https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#using-gpu-pass-through-red-hat-el-qemu-cli)
- https://gitlab.com/nvidia/container-images/driver/-/tree/master
- https://github.com/NVIDIA/nvidia-docker/wiki/Driver-containers

View File

@@ -1,293 +0,0 @@
# Using Nvidia GPU device with Kata Containers
An Nvidia GPU device can be passed to a Kata Containers container using GPU passthrough
(Nvidia GPU pass-through mode) as well as GPU mediated passthrough (Nvidia vGPU mode). 
Nvidia GPU pass-through mode, an entire physical GPU is directly assigned to one VM,
bypassing the Nvidia Virtual GPU Manager. In this mode of operation, the GPU is accessed
exclusively by the Nvidia driver running in the VM to which it is assigned.
The GPU is not shared among VMs.
Nvidia Virtual GPU (vGPU) enables multiple virtual machines (VMs) to have simultaneous,
direct access to a single physical GPU, using the same Nvidia graphics drivers that are
deployed on non-virtualized operating systems. By doing this, Nvidia vGPU provides VMs
with unparalleled graphics performance, compute performance, and application compatibility,
together with the cost-effectiveness and scalability brought about by sharing a GPU
among multiple workloads.
| Technology | Description | Behaviour | Detail |
| --- | --- | --- | --- |
| Nvidia GPU pass-through mode | GPU passthrough | Physical GPU assigned to a single VM | Direct GPU assignment to VM without limitation |
| Nvidia vGPU mode | GPU sharing | Physical GPU shared by multiple VMs | Mediated passthrough |
## Hardware Requirements
Nvidia GPUs Recommended for Virtualization:
- Nvidia Tesla (T4, M10, P6, V100 or newer)
- Nvidia Quadro RTX 6000/8000
## Host BIOS Requirements
Some hardware requires a larger PCI BARs window, for example, Nvidia Tesla P100, K40m
```
$ lspci -s 04:00.0 -vv | grep Region
Region 0: Memory at c6000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at 383800000000 (64-bit, prefetchable) [size=16G] #above 4G
Region 3: Memory at 383c00000000 (64-bit, prefetchable) [size=32M]
```
For large BARs devices, MMIO mapping above 4G address space should be `enabled`
in the PCI configuration of the BIOS.
Some hardware vendors use different name in BIOS, such as:
- Above 4G Decoding
- Memory Hole for PCI MMIO
- Memory Mapped I/O above 4GB
The following steps outline the workflow for using an Nvidia GPU with Kata.
## Host Kernel Requirements
The following configurations need to be enabled on your host kernel:
- `CONFIG_VFIO`
- `CONFIG_VFIO_IOMMU_TYPE1`
- `CONFIG_VFIO_MDEV`
- `CONFIG_VFIO_MDEV_DEVICE`
- `CONFIG_VFIO_PCI`
Your host kernel needs to be booted with `intel_iommu=on` on the kernel command line.
## Install and configure Kata Containers
To use non-large BARs devices (for example, Nvidia Tesla T4), you need Kata version 1.3.0 or above.
Follow the [Kata Containers setup instructions](../install/README.md)
to install the latest version of Kata.
To use large BARs devices (for example, Nvidia Tesla P100), you need Kata version 1.11.0 or above.
The following configuration in the Kata `configuration.toml` file as shown below can work:
Hotplug for PCI devices by `acpi_pcihp` (Linux's ACPI PCI Hotplug driver):
```
machine_type = "q35"
hotplug_vfio_on_root_bus = false
```
Hotplug for PCIe devices by `pciehp` (Linux's PCIe Hotplug driver):
```
machine_type = "q35"
hotplug_vfio_on_root_bus = true
pcie_root_port = 1
```
## Build Kata Containers kernel with GPU support
The default guest kernel installed with Kata Containers does not provide GPU support.
To use an Nvidia GPU with Kata Containers, you need to build a kernel with the
necessary GPU support.
The following kernel config options need to be enabled:
```
# Support PCI/PCIe device hotplug (Required for large BARs device)
CONFIG_HOTPLUG_PCI_PCIE=y
# Support for loading modules (Required for load Nvidia drivers)
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# Enable the MMIO access method for PCIe devices (Required for large BARs device)
CONFIG_PCI_MMCONFIG=y
```
The following kernel config options need to be disabled:
```
# Disable Open Source Nvidia driver nouveau
# It conflicts with Nvidia official driver
CONFIG_DRM_NOUVEAU=n
```
> **Note**: `CONFIG_DRM_NOUVEAU` is normally disabled by default.
It is worth checking that it is not enabled in your kernel configuration to prevent any conflicts.
Build the Kata Containers kernel with the previous config options,
using the instructions described in [Building Kata Containers kernel](../../tools/packaging/kernel).
For further details on building and installing guest kernels,
see [the developer guide](../Developer-Guide.md#install-guest-kernel-images).
There is an easy way to build a guest kernel that supports Nvidia GPU:
```
## Build guest kernel with ../../tools/packaging/kernel
# Prepare (download guest kernel source, generate .config)
$ ./build-kernel.sh -v 4.19.86 -g nvidia -f setup
# Build guest kernel
$ ./build-kernel.sh -v 4.19.86 -g nvidia build
# Install guest kernel
$ sudo -E ./build-kernel.sh -v 4.19.86 -g nvidia install
/usr/share/kata-containers/vmlinux-nvidia-gpu.container -> vmlinux-4.19.86-70-nvidia-gpu
/usr/share/kata-containers/vmlinuz-nvidia-gpu.container -> vmlinuz-4.19.86-70-nvidia-gpu
```
To build Nvidia Driver in Kata container, `kernel-devel` is required.
This is a way to generate rpm packages for `kernel-devel`:
```
$ cd kata-linux-4.19.86-68
$ make rpm-pkg
Output RPMs:
~/rpmbuild/RPMS/x86_64/kernel-devel-4.19.86_nvidia_gpu-1.x86_64.rpm
```
> **Note**:
> - `kernel-devel` should be installed in Kata container before run Nvidia driver installer.
> - Run `make deb-pkg` to build the deb package.
Before using the new guest kernel, please update the `kernel` parameters in `configuration.toml`.
```
kernel = "/usr/share/kata-containers/vmlinuz-nvidia-gpu.container"
```
## Nvidia GPU pass-through mode with Kata Containers
Use the following steps to pass an Nvidia GPU device in pass-through mode with Kata:
1. Find the Bus-Device-Function (BDF) for GPU device on host:
```
$ sudo lspci -nn -D | grep -i nvidia
0000:04:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:15f8] (rev a1)
0000:84:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:15f8] (rev a1)
```
> PCI address `0000:04:00.0` is assigned to the hardware GPU device.
> `10de:15f8` is the device ID of the hardware GPU device.
2. Find the IOMMU group for the GPU device:
```
$ BDF="0000:04:00.0"
$ readlink -e /sys/bus/pci/devices/$BDF/iommu_group
/sys/kernel/iommu_groups/45
```
The previous output shows that the GPU belongs to IOMMU group 45.
3. Check the IOMMU group number under `/dev/vfio`:
```
$ ls -l /dev/vfio
total 0
crw------- 1 root root 248, 0 Feb 28 09:57 45
crw------- 1 root root 248, 1 Feb 28 09:57 54
crw-rw-rw- 1 root root 10, 196 Feb 28 09:57 vfio
```
4. Start a Kata container with GPU device:
```
$ sudo docker run -it --runtime=kata-runtime --cap-add=ALL --device /dev/vfio/45 centos /bin/bash
```
5. Run `lspci` within the container to verify the GPU device is seen in the list
of the PCI devices. Note the vendor-device id of the GPU (`10de:15f8`) in the `lspci` output.
```
$ lspci -nn -D | grep '10de:15f8'
0000:01:01.0 3D controller [0302]: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] [10de:15f8] (rev a1)
```
6. Additionally, you can check the PCI BARs space of the Nvidia GPU device in the container:
```
$ lspci -s 01:01.0 -vv | grep Region
Region 0: Memory at c0000000 (32-bit, non-prefetchable) [disabled] [size=16M]
Region 1: Memory at 4400000000 (64-bit, prefetchable) [disabled] [size=16G]
Region 3: Memory at 4800000000 (64-bit, prefetchable) [disabled] [size=32M]
```
> **Note**: If you see a message similar to the above, the BAR space of the Nvidia
> GPU has been successfully allocated.
## Nvidia vGPU mode with Kata Containers
Nvidia vGPU is a licensed product on all supported GPU boards. A software license
is required to enable all vGPU features within the guest VM.
> **Note**: There is no suitable test environment, so it is not written here.
## Install Nvidia Driver in Kata Containers
Download the official Nvidia driver from
[https://www.nvidia.com/Download/index.aspx](https://www.nvidia.com/Download/index.aspx),
for example `NVIDIA-Linux-x86_64-418.87.01.run`.
Install the `kernel-devel`(generated in the previous steps) for guest kernel:
```
$ sudo rpm -ivh kernel-devel-4.19.86_gpu-1.x86_64.rpm
```
Here is an example to extract, compile and install Nvidia driver:
```
## Extract
$ sh ./NVIDIA-Linux-x86_64-418.87.01.run -x
## Compile and install (It will take some time)
$ cd NVIDIA-Linux-x86_64-418.87.01
$ sudo ./nvidia-installer -a -q --ui=none \
--no-cc-version-check \
--no-opengl-files --no-install-libglvnd \
--kernel-source-path=/usr/src/kernels/`uname -r`
```
Or just run one command line:
```
$ sudo sh ./NVIDIA-Linux-x86_64-418.87.01.run -a -q --ui=none \
--no-cc-version-check \
--no-opengl-files --no-install-libglvnd \
--kernel-source-path=/usr/src/kernels/`uname -r`
```
To view detailed logs of the installer:
```
$ tail -f /var/log/nvidia-installer.log
```
Load Nvidia driver module manually
```
# Optionalgenerate modules.dep and map files for Nvidia driver
$ sudo depmod
# Load module
$ sudo modprobe nvidia-drm
# Check module
$ lsmod | grep nvidia
nvidia_drm 45056 0
nvidia_modeset 1093632 1 nvidia_drm
nvidia 18202624 1 nvidia_modeset
drm_kms_helper 159744 1 nvidia_drm
drm 364544 3 nvidia_drm,drm_kms_helper
i2c_core 65536 3 nvidia,drm_kms_helper,drm
ipmi_msghandler 49152 1 nvidia
```
Check Nvidia device status with `nvidia-smi`
```
$ nvidia-smi
Tue Mar 3 00:03:49 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:01:01.0 Off | 0 |
| N/A 27C P0 25W / 250W | 0MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
```
## References
- [Configuring a VM for GPU Pass-Through by Using the QEMU Command Line](https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#using-gpu-pass-through-red-hat-el-qemu-cli)
- https://gitlab.com/nvidia/container-images/driver/-/tree/master
- https://github.com/NVIDIA/nvidia-docker/wiki/Driver-containers

1
src/agent/Cargo.lock generated
View File

@@ -2256,6 +2256,7 @@ dependencies = [
"async-trait",
"capctl",
"caps",
"cfg-if 0.1.10",
"cgroups-rs",
"futures",
"inotify",

View File

@@ -82,3 +82,13 @@ lto = true
[features]
seccomp = ["rustjail/seccomp"]
standard-oci-runtime = ["rustjail/standard-oci-runtime"]
[[bin]]
name = "kata-agent"
path = "src/main.rs"
[[bin]]
name = "oci-kata-agent"
path = "src/main.rs"
required-features = ["standard-oci-runtime"]

View File

@@ -33,6 +33,14 @@ ifeq ($(SECCOMP),yes)
override EXTRA_RUSTFEATURES += seccomp
endif
##VAR STANDARD_OCI_RUNTIME=yes|no define if agent enables standard oci runtime feature
STANDARD_OCI_RUNTIME := no
# Enable standard oci runtime feature of rust build
ifeq ($(STANDARD_OCI_RUNTIME),yes)
override EXTRA_RUSTFEATURES += standard-oci-runtime
endif
ifneq ($(EXTRA_RUSTFEATURES),)
override EXTRA_RUSTFEATURES := --features "$(EXTRA_RUSTFEATURES)"
endif

View File

@@ -25,6 +25,7 @@ path-absolutize = "1.2.0"
anyhow = "1.0.32"
cgroups = { package = "cgroups-rs", version = "0.2.8" }
rlimit = "0.5.3"
cfg-if = "0.1.0"
tokio = { version = "1.2.0", features = ["sync", "io-util", "process", "time", "macros"] }
futures = "0.3.17"
@@ -38,3 +39,4 @@ tempfile = "3.1.0"
[features]
seccomp = ["libseccomp"]
standard-oci-runtime = []

View File

@@ -0,0 +1,82 @@
// SPDX-License-Identifier: Apache-2.0
//
// Copyright 2021 Sony Group Corporation
//
use anyhow::{anyhow, Result};
use nix::errno::Errno;
use nix::pty;
use nix::sys::{socket, uio};
use nix::unistd::{self, dup2};
use std::os::unix::io::{AsRawFd, RawFd};
use std::path::Path;
pub fn setup_console_socket(csocket_path: &str) -> Result<Option<RawFd>> {
if csocket_path.is_empty() {
return Ok(None);
}
let socket_fd = socket::socket(
socket::AddressFamily::Unix,
socket::SockType::Stream,
socket::SockFlag::empty(),
None,
)?;
match socket::connect(
socket_fd,
&socket::SockAddr::Unix(socket::UnixAddr::new(Path::new(csocket_path))?),
) {
Ok(()) => Ok(Some(socket_fd)),
Err(errno) => Err(anyhow!("failed to open console fd: {}", errno)),
}
}
pub fn setup_master_console(socket_fd: RawFd) -> Result<()> {
let pseudo = pty::openpty(None, None)?;
let pty_name: &[u8] = b"/dev/ptmx";
let iov = [uio::IoVec::from_slice(pty_name)];
let fds = [pseudo.master];
let cmsg = socket::ControlMessage::ScmRights(&fds);
socket::sendmsg(socket_fd, &iov, &[cmsg], socket::MsgFlags::empty(), None)?;
unistd::setsid()?;
let ret = unsafe { libc::ioctl(pseudo.slave, libc::TIOCSCTTY) };
Errno::result(ret).map_err(|e| anyhow!(e).context("ioctl TIOCSCTTY"))?;
dup2(pseudo.slave, std::io::stdin().as_raw_fd())?;
dup2(pseudo.slave, std::io::stdout().as_raw_fd())?;
dup2(pseudo.slave, std::io::stderr().as_raw_fd())?;
unistd::close(socket_fd)?;
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
use crate::skip_if_not_root;
use std::fs::File;
use std::os::unix::net::UnixListener;
use std::path::PathBuf;
use tempfile::{self, tempdir};
const CONSOLE_SOCKET: &str = "console-socket";
#[test]
fn test_setup_console_socket() {
let dir = tempdir()
.map_err(|e| anyhow!(e).context("tempdir failed"))
.unwrap();
let socket_path = dir.path().join(CONSOLE_SOCKET);
let _listener = UnixListener::bind(&socket_path).unwrap();
let ret = setup_console_socket(socket_path.to_str().unwrap());
assert!(ret.is_ok());
}
}

View File

@@ -23,6 +23,8 @@ use crate::cgroups::fs::Manager as FsManager;
#[cfg(test)]
use crate::cgroups::mock::Manager as FsManager;
use crate::cgroups::Manager;
#[cfg(feature = "standard-oci-runtime")]
use crate::console;
use crate::log_child;
use crate::process::Process;
#[cfg(feature = "seccomp")]
@@ -64,7 +66,7 @@ use tokio::sync::Mutex;
use crate::utils;
const EXEC_FIFO_FILENAME: &str = "exec.fifo";
pub const EXEC_FIFO_FILENAME: &str = "exec.fifo";
const INIT: &str = "INIT";
const NO_PIVOT: &str = "NO_PIVOT";
@@ -74,6 +76,10 @@ const CLOG_FD: &str = "CLOG_FD";
const FIFO_FD: &str = "FIFO_FD";
const HOME_ENV_KEY: &str = "HOME";
const PIDNS_FD: &str = "PIDNS_FD";
const CONSOLE_SOCKET_FD: &str = "CONSOLE_SOCKET_FD";
#[cfg(feature = "standard-oci-runtime")]
const OCI_AGENT_BINARY: &str = "oci-kata-agent";
#[derive(Debug)]
pub struct ContainerStatus {
@@ -82,7 +88,7 @@ pub struct ContainerStatus {
}
impl ContainerStatus {
fn new() -> Self {
pub fn new() -> Self {
ContainerStatus {
pre_status: ContainerState::Created,
cur_status: ContainerState::Created,
@@ -99,6 +105,12 @@ impl ContainerStatus {
}
}
impl Default for ContainerStatus {
fn default() -> Self {
Self::new()
}
}
pub type Config = CreateOpts;
type NamespaceType = String;
@@ -106,7 +118,7 @@ lazy_static! {
// This locker ensures the child exit signal will be received by the right receiver.
pub static ref WAIT_PID_LOCKER: Arc<Mutex<bool>> = Arc::new(Mutex::new(false));
static ref NAMESPACES: HashMap<&'static str, CloneFlags> = {
pub static ref NAMESPACES: HashMap<&'static str, CloneFlags> = {
let mut m = HashMap::new();
m.insert("user", CloneFlags::CLONE_NEWUSER);
m.insert("ipc", CloneFlags::CLONE_NEWIPC);
@@ -119,7 +131,7 @@ lazy_static! {
};
// type to name hashmap, better to be in NAMESPACES
static ref TYPETONAME: HashMap<&'static str, &'static str> = {
pub static ref TYPETONAME: HashMap<&'static str, &'static str> = {
let mut m = HashMap::new();
m.insert("ipc", "ipc");
m.insert("user", "user");
@@ -236,6 +248,8 @@ pub struct LinuxContainer {
pub status: ContainerStatus,
pub created: SystemTime,
pub logger: Logger,
#[cfg(feature = "standard-oci-runtime")]
pub console_socket: PathBuf,
}
#[derive(Serialize, Deserialize, Debug)]
@@ -359,7 +373,6 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
)));
}
}
log_child!(cfd_log, "child process start run");
let buf = read_sync(crfd)?;
let spec_str = std::str::from_utf8(&buf)?;
@@ -379,6 +392,9 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
let cm: FsManager = serde_json::from_str(cm_str)?;
#[cfg(feature = "standard-oci-runtime")]
let csocket_fd = console::setup_console_socket(&std::env::var(CONSOLE_SOCKET_FD)?)?;
let p = if spec.process.is_some() {
spec.process.as_ref().unwrap()
} else {
@@ -670,10 +686,19 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
let _ = unistd::close(crfd);
let _ = unistd::close(cwfd);
unistd::setsid().context("create a new session")?;
if oci_process.terminal {
unsafe {
libc::ioctl(0, libc::TIOCSCTTY);
cfg_if::cfg_if! {
if #[cfg(feature = "standard-oci-runtime")] {
if let Some(csocket_fd) = csocket_fd {
console::setup_master_console(csocket_fd)?;
} else {
return Err(anyhow!("failed to get console master socket fd"));
}
}
else {
unistd::setsid().context("create a new session")?;
unsafe { libc::ioctl(0, libc::TIOCSCTTY) };
}
}
}
@@ -926,8 +951,24 @@ impl BaseContainer for LinuxContainer {
let _ = unistd::close(pid);
});
let exec_path = std::env::current_exe()?;
cfg_if::cfg_if! {
if #[cfg(feature = "standard-oci-runtime")] {
let exec_path = PathBuf::from(OCI_AGENT_BINARY);
}
else {
let exec_path = std::env::current_exe()?;
}
}
let mut child = std::process::Command::new(exec_path);
#[allow(unused_mut)]
let mut console_name = PathBuf::from("");
#[cfg(feature = "standard-oci-runtime")]
if !self.console_socket.as_os_str().is_empty() {
console_name = self.console_socket.clone();
}
let mut child = child
.arg("init")
.stdin(child_stdin)
@@ -937,7 +978,8 @@ impl BaseContainer for LinuxContainer {
.env(NO_PIVOT, format!("{}", self.config.no_pivot_root))
.env(CRFD_FD, format!("{}", crfd))
.env(CWFD_FD, format!("{}", cwfd))
.env(CLOG_FD, format!("{}", cfd_log));
.env(CLOG_FD, format!("{}", cfd_log))
.env(CONSOLE_SOCKET_FD, console_name);
if p.init {
child = child.env(FIFO_FD, format!("{}", fifofd));
@@ -1419,8 +1461,16 @@ impl LinuxContainer {
.unwrap()
.as_secs(),
logger: logger.new(o!("module" => "rustjail", "subsystem" => "container", "cid" => id)),
#[cfg(feature = "standard-oci-runtime")]
console_socket: Path::new("").to_path_buf(),
})
}
#[cfg(feature = "standard-oci-runtime")]
pub fn set_console_socket(&mut self, console_socket: &Path) -> Result<()> {
self.console_socket = console_socket.to_path_buf();
Ok(())
}
}
fn setgroups(grps: &[libc::gid_t]) -> Result<()> {
@@ -1460,7 +1510,7 @@ use std::process::Stdio;
use std::time::Duration;
use tokio::io::{AsyncReadExt, AsyncWriteExt};
async fn execute_hook(logger: &Logger, h: &Hook, st: &OCIState) -> Result<()> {
pub async fn execute_hook(logger: &Logger, h: &Hook, st: &OCIState) -> Result<()> {
let logger = logger.new(o!("action" => "execute-hook"));
let binary = PathBuf::from(h.path.as_str());

View File

@@ -30,6 +30,8 @@ extern crate regex;
pub mod capabilities;
pub mod cgroups;
#[cfg(feature = "standard-oci-runtime")]
pub mod console;
pub mod container;
pub mod mount;
pub mod pipestream;
@@ -529,6 +531,31 @@ mod tests {
};
}
// Parameters:
//
// 1: expected Result
// 2: actual Result
// 3: string used to identify the test on error
#[macro_export]
macro_rules! assert_result {
($expected_result:expr, $actual_result:expr, $msg:expr) => {
if $expected_result.is_ok() {
let expected_value = $expected_result.as_ref().unwrap();
let actual_value = $actual_result.unwrap();
assert!(*expected_value == actual_value, "{}", $msg);
} else {
assert!($actual_result.is_err(), "{}", $msg);
let expected_error = $expected_result.as_ref().unwrap_err();
let expected_error_msg = format!("{:?}", expected_error);
let actual_error_msg = format!("{:?}", $actual_result.unwrap_err());
assert!(expected_error_msg == actual_error_msg, "{}", $msg);
}
};
}
#[test]
fn test_process_grpc_to_oci() {
#[derive(Debug)]

View File

@@ -32,16 +32,21 @@ use crate::log_child;
// Info reveals information about a particular mounted filesystem. This
// struct is populated from the content in the /proc/<pid>/mountinfo file.
#[derive(std::fmt::Debug)]
#[derive(std::fmt::Debug, PartialEq)]
pub struct Info {
mount_point: String,
optional: String,
fstype: String,
}
const MOUNTINFOFORMAT: &str = "{d} {d} {d}:{d} {} {} {} {}";
const MOUNTINFO_FORMAT: &str = "{d} {d} {d}:{d} {} {} {} {}";
const MOUNTINFO_PATH: &str = "/proc/self/mountinfo";
const PROC_PATH: &str = "/proc";
const ERR_FAILED_PARSE_MOUNTINFO: &str = "failed to parse mountinfo file";
const ERR_FAILED_PARSE_MOUNTINFO_FINAL_FIELDS: &str =
"failed to parse final fields in mountinfo file";
// since libc didn't defined this const for musl, thus redefined it here.
#[cfg(all(target_os = "linux", target_env = "gnu", not(target_arch = "s390x")))]
const PROC_SUPER_MAGIC: libc::c_long = 0x00009fa0;
@@ -518,7 +523,7 @@ pub fn pivot_rootfs<P: ?Sized + NixPath + std::fmt::Debug>(path: &P) -> Result<(
}
fn rootfs_parent_mount_private(path: &str) -> Result<()> {
let mount_infos = parse_mount_table()?;
let mount_infos = parse_mount_table(MOUNTINFO_PATH)?;
let mut max_len = 0;
let mut mount_point = String::from("");
@@ -546,8 +551,8 @@ fn rootfs_parent_mount_private(path: &str) -> Result<()> {
// Parse /proc/self/mountinfo because comparing Dev and ino does not work from
// bind mounts
fn parse_mount_table() -> Result<Vec<Info>> {
let file = File::open("/proc/self/mountinfo")?;
fn parse_mount_table(mountinfo_path: &str) -> Result<Vec<Info>> {
let file = File::open(mountinfo_path)?;
let reader = BufReader::new(file);
let mut infos = Vec::new();
@@ -569,7 +574,7 @@ fn parse_mount_table() -> Result<Vec<Info>> {
let (_id, _parent, _major, _minor, _root, mount_point, _opts, optional) = scan_fmt!(
&line,
MOUNTINFOFORMAT,
MOUNTINFO_FORMAT,
i32,
i32,
i32,
@@ -578,12 +583,17 @@ fn parse_mount_table() -> Result<Vec<Info>> {
String,
String,
String
)?;
)
.map_err(|_| anyhow!(ERR_FAILED_PARSE_MOUNTINFO))?;
let fields: Vec<&str> = line.split(" - ").collect();
if fields.len() == 2 {
let (fstype, _source, _vfs_opts) =
scan_fmt!(fields[1], "{} {} {}", String, String, String)?;
let final_fields: Vec<&str> = fields[1].split_whitespace().collect();
if final_fields.len() != 3 {
return Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO_FINAL_FIELDS));
}
let fstype = final_fields[0].to_string();
let mut optional_new = String::new();
if optional != "-" {
@@ -598,7 +608,7 @@ fn parse_mount_table() -> Result<Vec<Info>> {
infos.push(info);
} else {
return Err(anyhow!("failed to parse mount info file".to_string()));
return Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO));
}
}
@@ -619,7 +629,7 @@ fn chroot<P: ?Sized + NixPath>(_path: &P) -> Result<(), nix::Error> {
pub fn ms_move_root(rootfs: &str) -> Result<bool> {
unistd::chdir(rootfs)?;
let mount_infos = parse_mount_table()?;
let mount_infos = parse_mount_table(MOUNTINFO_PATH)?;
let root_path = Path::new(rootfs);
let abs_root_buf = root_path.absolutize()?;
@@ -1046,10 +1056,12 @@ fn readonly_path(path: &str) -> Result<()> {
#[cfg(test)]
mod tests {
use super::*;
use crate::assert_result;
use crate::skip_if_not_root;
use std::fs::create_dir;
use std::fs::create_dir_all;
use std::fs::remove_dir_all;
use std::io;
use std::os::unix::fs;
use std::os::unix::io::AsRawFd;
use tempfile::tempdir;
@@ -1508,6 +1520,121 @@ mod tests {
}
}
#[test]
fn test_parse_mount_table() {
#[derive(Debug)]
struct TestData<'a> {
mountinfo_data: Option<&'a str>,
result: Result<Vec<Info>>,
}
let tests = &[
TestData {
mountinfo_data: Some(
"22 933 0:20 / /sys rw,nodev shared:2 - sysfs sysfs rw,noexec",
),
result: Ok(vec![Info {
mount_point: "/sys".to_string(),
optional: "shared:2".to_string(),
fstype: "sysfs".to_string(),
}]),
},
TestData {
mountinfo_data: Some(
r#"22 933 0:20 / /sys rw,nodev - sysfs sysfs rw,noexec
81 13 1:2 / /tmp/dir rw shared:2 - tmpfs tmpfs rw"#,
),
result: Ok(vec![
Info {
mount_point: "/sys".to_string(),
optional: "".to_string(),
fstype: "sysfs".to_string(),
},
Info {
mount_point: "/tmp/dir".to_string(),
optional: "shared:2".to_string(),
fstype: "tmpfs".to_string(),
},
]),
},
TestData {
mountinfo_data: Some(
"22 933 0:20 /foo\040-\040bar /sys rw,nodev shared:2 - sysfs sysfs rw,noexec",
),
result: Ok(vec![Info {
mount_point: "/sys".to_string(),
optional: "shared:2".to_string(),
fstype: "sysfs".to_string(),
}]),
},
TestData {
mountinfo_data: Some(""),
result: Ok(vec![]),
},
TestData {
mountinfo_data: Some("invalid line data - sysfs sysfs rw"),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
},
TestData {
mountinfo_data: Some("22 96 0:21 / /sys rw,noexec - sysfs"),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO_FINAL_FIELDS)),
},
TestData {
mountinfo_data: Some("22 96 0:21 / /sys rw,noexec - sysfs sysfs rw rw"),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO_FINAL_FIELDS)),
},
TestData {
mountinfo_data: Some("22 96 0:21 / /sys rw,noexec shared:2 - x - x"),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
},
TestData {
mountinfo_data: Some("-"),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
},
TestData {
mountinfo_data: Some("--"),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
},
TestData {
mountinfo_data: Some("- -"),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
},
TestData {
mountinfo_data: Some(" - "),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
},
TestData {
mountinfo_data: Some(
r#"22 933 0:20 / /sys rw,nodev - sysfs sysfs rw,noexec
invalid line
81 13 1:2 / /tmp/dir rw shared:2 - tmpfs tmpfs rw"#,
),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
},
TestData {
mountinfo_data: None,
result: Err(anyhow!(io::Error::from_raw_os_error(libc::ENOENT))),
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let tempdir = tempdir().unwrap();
let mountinfo_path = tempdir.path().join("mountinfo");
if let Some(mountinfo_data) = d.mountinfo_data {
std::fs::write(&mountinfo_path, mountinfo_data).unwrap();
}
let result = parse_mount_table(mountinfo_path.to_str().unwrap());
let msg = format!("{}: result: {:?}", msg, result);
assert_result!(d.result, result, msg);
}
}
#[test]
fn test_dev_rel_path() {
// Valid device paths

View File

@@ -5,7 +5,7 @@
use oci::Spec;
#[derive(Debug)]
#[derive(Serialize, Deserialize, Debug, Default, Clone)]
pub struct CreateOpts {
pub cgroup_name: String,
pub use_systemd_cgroup: bool,

View File

@@ -417,3 +417,59 @@ fn reset_sigpipe() {
use crate::config::AgentConfig;
use std::os::unix::io::{FromRawFd, RawFd};
#[cfg(test)]
mod tests {
use super::*;
use crate::test_utils::test_utils::TestUserType;
#[tokio::test]
async fn test_create_logger_task() {
#[derive(Debug)]
struct TestData {
vsock_port: u32,
test_user: TestUserType,
result: Result<()>,
}
let tests = &[
TestData {
// non-root user cannot use privileged vsock port
vsock_port: 1,
test_user: TestUserType::NonRootOnly,
result: Err(anyhow!(nix::errno::Errno::from_i32(libc::EACCES))),
},
TestData {
// passing vsock_port 0 causes logger task to write to stdout
vsock_port: 0,
test_user: TestUserType::Any,
result: Ok(()),
},
];
for (i, d) in tests.iter().enumerate() {
if d.test_user == TestUserType::RootOnly {
skip_if_not_root!();
} else if d.test_user == TestUserType::NonRootOnly {
skip_if_root!();
}
let msg = format!("test[{}]: {:?}", i, d);
let (rfd, wfd) = unistd::pipe2(OFlag::O_CLOEXEC).unwrap();
defer!({
// rfd is closed by the use of PipeStream in the crate_logger_task function,
// but we will attempt to close in case of a failure
let _ = unistd::close(rfd);
unistd::close(wfd).unwrap();
});
let (shutdown_tx, shutdown_rx) = channel(true);
shutdown_tx.send(true).unwrap();
let result = create_logger_task(rfd, d.vsock_port, shutdown_rx).await;
let msg = format!("{}, result: {:?}", msg, result);
assert_result!(d.result, result, msg);
}
}
}

View File

@@ -1017,6 +1017,7 @@ fn parse_options(option_list: Vec<String>) -> HashMap<String, String> {
#[cfg(test)]
mod tests {
use super::*;
use crate::test_utils::test_utils::TestUserType;
use crate::{skip_if_not_root, skip_loop_if_not_root, skip_loop_if_root};
use protobuf::RepeatedField;
use protocols::agent::FSGroup;
@@ -1026,13 +1027,6 @@ mod tests {
use std::path::PathBuf;
use tempfile::tempdir;
#[derive(Debug, PartialEq)]
enum TestUserType {
RootOnly,
NonRootOnly,
Any,
}
#[test]
fn test_mount() {
#[derive(Debug)]

View File

@@ -1875,7 +1875,7 @@ async fn do_add_swap(sandbox: &Arc<Mutex<Sandbox>>, req: &AddSwapRequest) -> Res
// - config.json at /<CONTAINER_BASE>/<cid>/config.json
// - container rootfs bind mounted at /<CONTAINER_BASE>/<cid>/rootfs
// - modify container spec root to point to /<CONTAINER_BASE>/<cid>/rootfs
fn setup_bundle(cid: &str, spec: &mut Spec) -> Result<PathBuf> {
pub fn setup_bundle(cid: &str, spec: &mut Spec) -> Result<PathBuf> {
let spec_root = if let Some(sr) = &spec.root {
sr
} else {

View File

@@ -5,7 +5,14 @@
#![allow(clippy::module_inception)]
#[cfg(test)]
mod test_utils {
pub mod test_utils {
#[derive(Debug, PartialEq)]
pub enum TestUserType {
RootOnly,
NonRootOnly,
Any,
}
#[macro_export]
macro_rules! skip_if_root {
() => {

View File

@@ -180,6 +180,78 @@ block_device_driver = "virtio-blk"
# but it will not abort container execution.
#guest_hook_path = "/usr/share/oci/hooks"
#
# These options are related to network rate limiter at the VMM level, and are
# based on the Cloud Hypervisor I/O throttling. Those are disabled by default
# and we strongly advise users to refer the Cloud Hypervisor official
# documentation for a better understanding of its internals:
# https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/docs/io_throttling.md
#
# Bandwidth rate limiter options
#
# net_rate_limiter_bw_max_rate controls network I/O bandwidth (size in bits/sec
# for SB/VM).
# The same value is used for inbound and outbound bandwidth.
# Default 0-sized value means unlimited rate.
#net_rate_limiter_bw_max_rate = 0
#
# net_rate_limiter_bw_one_time_burst increases the initial max rate and this
# initial extra credit does *NOT* affect the overall limit and can be used for
# an *initial* burst of data.
# This is *optional* and only takes effect if net_rate_limiter_bw_max_rate is
# set to a non zero value.
#net_rate_limiter_bw_one_time_burst = 0
#
# Operation rate limiter options
#
# net_rate_limiter_ops_max_rate controls network I/O bandwidth (size in ops/sec
# for SB/VM).
# The same value is used for inbound and outbound bandwidth.
# Default 0-sized value means unlimited rate.
#net_rate_limiter_ops_max_rate = 0
#
# net_rate_limiter_ops_one_time_burst increases the initial max rate and this
# initial extra credit does *NOT* affect the overall limit and can be used for
# an *initial* burst of data.
# This is *optional* and only takes effect if net_rate_limiter_bw_max_rate is
# set to a non zero value.
#net_rate_limiter_ops_one_time_burst = 0
#
# These options are related to disk rate limiter at the VMM level, and are
# based on the Cloud Hypervisor I/O throttling. Those are disabled by default
# and we strongly advise users to refer the Cloud Hypervisor official
# documentation for a better understanding of its internals:
# https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/docs/io_throttling.md
#
# Bandwidth rate limiter options
#
# disk_rate_limiter_bw_max_rate controls disk I/O bandwidth (size in bits/sec
# for SB/VM).
# The same value is used for inbound and outbound bandwidth.
# Default 0-sized value means unlimited rate.
#disk_rate_limiter_bw_max_rate = 0
#
# disk_rate_limiter_bw_one_time_burst increases the initial max rate and this
# initial extra credit does *NOT* affect the overall limit and can be used for
# an *initial* burst of data.
# This is *optional* and only takes effect if disk_rate_limiter_bw_max_rate is
# set to a non zero value.
#disk_rate_limiter_bw_one_time_burst = 0
#
# Operation rate limiter options
#
# disk_rate_limiter_ops_max_rate controls disk I/O bandwidth (size in ops/sec
# for SB/VM).
# The same value is used for inbound and outbound bandwidth.
# Default 0-sized value means unlimited rate.
#disk_rate_limiter_ops_max_rate = 0
#
# disk_rate_limiter_ops_one_time_burst increases the initial max rate and this
# initial extra credit does *NOT* affect the overall limit and can be used for
# an *initial* burst of data.
# This is *optional* and only takes effect if disk_rate_limiter_bw_max_rate is
# set to a non zero value.
#disk_rate_limiter_ops_one_time_burst = 0
[agent.@PROJECT_TYPE@]
# If enabled, make the agent display debug-level messages.
# (default: disabled)

View File

@@ -74,70 +74,78 @@ type factory struct {
}
type hypervisor struct {
Path string `toml:"path"`
JailerPath string `toml:"jailer_path"`
Kernel string `toml:"kernel"`
CtlPath string `toml:"ctlpath"`
Initrd string `toml:"initrd"`
Image string `toml:"image"`
Firmware string `toml:"firmware"`
FirmwareVolume string `toml:"firmware_volume"`
MachineAccelerators string `toml:"machine_accelerators"`
CPUFeatures string `toml:"cpu_features"`
KernelParams string `toml:"kernel_params"`
MachineType string `toml:"machine_type"`
BlockDeviceDriver string `toml:"block_device_driver"`
EntropySource string `toml:"entropy_source"`
SharedFS string `toml:"shared_fs"`
VirtioFSDaemon string `toml:"virtio_fs_daemon"`
VirtioFSCache string `toml:"virtio_fs_cache"`
VhostUserStorePath string `toml:"vhost_user_store_path"`
FileBackedMemRootDir string `toml:"file_mem_backend"`
GuestHookPath string `toml:"guest_hook_path"`
GuestMemoryDumpPath string `toml:"guest_memory_dump_path"`
HypervisorPathList []string `toml:"valid_hypervisor_paths"`
JailerPathList []string `toml:"valid_jailer_paths"`
CtlPathList []string `toml:"valid_ctlpaths"`
VirtioFSDaemonList []string `toml:"valid_virtio_fs_daemon_paths"`
VirtioFSExtraArgs []string `toml:"virtio_fs_extra_args"`
PFlashList []string `toml:"pflashes"`
VhostUserStorePathList []string `toml:"valid_vhost_user_store_paths"`
FileBackedMemRootList []string `toml:"valid_file_mem_backends"`
EntropySourceList []string `toml:"valid_entropy_sources"`
EnableAnnotations []string `toml:"enable_annotations"`
RxRateLimiterMaxRate uint64 `toml:"rx_rate_limiter_max_rate"`
TxRateLimiterMaxRate uint64 `toml:"tx_rate_limiter_max_rate"`
MemOffset uint64 `toml:"memory_offset"`
VirtioFSCacheSize uint32 `toml:"virtio_fs_cache_size"`
DefaultMaxVCPUs uint32 `toml:"default_maxvcpus"`
MemorySize uint32 `toml:"default_memory"`
MemSlots uint32 `toml:"memory_slots"`
DefaultBridges uint32 `toml:"default_bridges"`
Msize9p uint32 `toml:"msize_9p"`
PCIeRootPort uint32 `toml:"pcie_root_port"`
NumVCPUs int32 `toml:"default_vcpus"`
BlockDeviceCacheSet bool `toml:"block_device_cache_set"`
BlockDeviceCacheDirect bool `toml:"block_device_cache_direct"`
BlockDeviceCacheNoflush bool `toml:"block_device_cache_noflush"`
EnableVhostUserStore bool `toml:"enable_vhost_user_store"`
DisableBlockDeviceUse bool `toml:"disable_block_device_use"`
MemPrealloc bool `toml:"enable_mem_prealloc"`
HugePages bool `toml:"enable_hugepages"`
VirtioMem bool `toml:"enable_virtio_mem"`
IOMMU bool `toml:"enable_iommu"`
IOMMUPlatform bool `toml:"enable_iommu_platform"`
Debug bool `toml:"enable_debug"`
DisableNestingChecks bool `toml:"disable_nesting_checks"`
EnableIOThreads bool `toml:"enable_iothreads"`
DisableImageNvdimm bool `toml:"disable_image_nvdimm"`
HotplugVFIOOnRootBus bool `toml:"hotplug_vfio_on_root_bus"`
DisableVhostNet bool `toml:"disable_vhost_net"`
GuestMemoryDumpPaging bool `toml:"guest_memory_dump_paging"`
ConfidentialGuest bool `toml:"confidential_guest"`
GuestSwap bool `toml:"enable_guest_swap"`
Rootless bool `toml:"rootless"`
DisableSeccomp bool `toml:"disable_seccomp"`
DisableSeLinux bool `toml:"disable_selinux"`
Path string `toml:"path"`
JailerPath string `toml:"jailer_path"`
Kernel string `toml:"kernel"`
CtlPath string `toml:"ctlpath"`
Initrd string `toml:"initrd"`
Image string `toml:"image"`
Firmware string `toml:"firmware"`
FirmwareVolume string `toml:"firmware_volume"`
MachineAccelerators string `toml:"machine_accelerators"`
CPUFeatures string `toml:"cpu_features"`
KernelParams string `toml:"kernel_params"`
MachineType string `toml:"machine_type"`
BlockDeviceDriver string `toml:"block_device_driver"`
EntropySource string `toml:"entropy_source"`
SharedFS string `toml:"shared_fs"`
VirtioFSDaemon string `toml:"virtio_fs_daemon"`
VirtioFSCache string `toml:"virtio_fs_cache"`
VhostUserStorePath string `toml:"vhost_user_store_path"`
FileBackedMemRootDir string `toml:"file_mem_backend"`
GuestHookPath string `toml:"guest_hook_path"`
GuestMemoryDumpPath string `toml:"guest_memory_dump_path"`
HypervisorPathList []string `toml:"valid_hypervisor_paths"`
JailerPathList []string `toml:"valid_jailer_paths"`
CtlPathList []string `toml:"valid_ctlpaths"`
VirtioFSDaemonList []string `toml:"valid_virtio_fs_daemon_paths"`
VirtioFSExtraArgs []string `toml:"virtio_fs_extra_args"`
PFlashList []string `toml:"pflashes"`
VhostUserStorePathList []string `toml:"valid_vhost_user_store_paths"`
FileBackedMemRootList []string `toml:"valid_file_mem_backends"`
EntropySourceList []string `toml:"valid_entropy_sources"`
EnableAnnotations []string `toml:"enable_annotations"`
RxRateLimiterMaxRate uint64 `toml:"rx_rate_limiter_max_rate"`
TxRateLimiterMaxRate uint64 `toml:"tx_rate_limiter_max_rate"`
MemOffset uint64 `toml:"memory_offset"`
DiskRateLimiterBwMaxRate int64 `toml:"disk_rate_limiter_bw_max_rate"`
DiskRateLimiterBwOneTimeBurst int64 `toml:"disk_rate_limiter_bw_one_time_burst"`
DiskRateLimiterOpsMaxRate int64 `toml:"disk_rate_limiter_ops_max_rate"`
DiskRateLimiterOpsOneTimeBurst int64 `toml:"disk_rate_limiter_ops_one_time_burst"`
NetRateLimiterBwMaxRate int64 `toml:"net_rate_limiter_bw_max_rate"`
NetRateLimiterBwOneTimeBurst int64 `toml:"net_rate_limiter_bw_one_time_burst"`
NetRateLimiterOpsMaxRate int64 `toml:"net_rate_limiter_ops_max_rate"`
NetRateLimiterOpsOneTimeBurst int64 `toml:"net_rate_limiter_ops_one_time_burst"`
VirtioFSCacheSize uint32 `toml:"virtio_fs_cache_size"`
DefaultMaxVCPUs uint32 `toml:"default_maxvcpus"`
MemorySize uint32 `toml:"default_memory"`
MemSlots uint32 `toml:"memory_slots"`
DefaultBridges uint32 `toml:"default_bridges"`
Msize9p uint32 `toml:"msize_9p"`
PCIeRootPort uint32 `toml:"pcie_root_port"`
NumVCPUs int32 `toml:"default_vcpus"`
BlockDeviceCacheSet bool `toml:"block_device_cache_set"`
BlockDeviceCacheDirect bool `toml:"block_device_cache_direct"`
BlockDeviceCacheNoflush bool `toml:"block_device_cache_noflush"`
EnableVhostUserStore bool `toml:"enable_vhost_user_store"`
DisableBlockDeviceUse bool `toml:"disable_block_device_use"`
MemPrealloc bool `toml:"enable_mem_prealloc"`
HugePages bool `toml:"enable_hugepages"`
VirtioMem bool `toml:"enable_virtio_mem"`
IOMMU bool `toml:"enable_iommu"`
IOMMUPlatform bool `toml:"enable_iommu_platform"`
Debug bool `toml:"enable_debug"`
DisableNestingChecks bool `toml:"disable_nesting_checks"`
EnableIOThreads bool `toml:"enable_iothreads"`
DisableImageNvdimm bool `toml:"disable_image_nvdimm"`
HotplugVFIOOnRootBus bool `toml:"hotplug_vfio_on_root_bus"`
DisableVhostNet bool `toml:"disable_vhost_net"`
GuestMemoryDumpPaging bool `toml:"guest_memory_dump_paging"`
ConfidentialGuest bool `toml:"confidential_guest"`
GuestSwap bool `toml:"enable_guest_swap"`
Rootless bool `toml:"rootless"`
DisableSeccomp bool `toml:"disable_seccomp"`
DisableSeLinux bool `toml:"disable_selinux"`
}
type runtime struct {
@@ -482,6 +490,34 @@ func (h hypervisor) getInitrdAndImage() (initrd string, image string, err error)
return
}
func (h hypervisor) getDiskRateLimiterBwMaxRate() int64 {
return h.DiskRateLimiterBwMaxRate
}
func (h hypervisor) getDiskRateLimiterBwOneTimeBurst() int64 {
if h.DiskRateLimiterBwOneTimeBurst != 0 && h.getDiskRateLimiterBwMaxRate() == 0 {
kataUtilsLogger.Warn("The DiskRateLimiterBwOneTimeBurst is set but DiskRateLimiterBwMaxRate is not set, this option will be ignored.")
h.DiskRateLimiterBwOneTimeBurst = 0
}
return h.DiskRateLimiterBwOneTimeBurst
}
func (h hypervisor) getDiskRateLimiterOpsMaxRate() int64 {
return h.DiskRateLimiterOpsMaxRate
}
func (h hypervisor) getDiskRateLimiterOpsOneTimeBurst() int64 {
if h.DiskRateLimiterOpsOneTimeBurst != 0 && h.getDiskRateLimiterOpsMaxRate() == 0 {
kataUtilsLogger.Warn("The DiskRateLimiterOpsOneTimeBurst is set but DiskRateLimiterOpsMaxRate is not set, this option will be ignored.")
h.DiskRateLimiterOpsOneTimeBurst = 0
}
return h.DiskRateLimiterOpsOneTimeBurst
}
func (h hypervisor) getRxRateLimiterCfg() uint64 {
return h.RxRateLimiterMaxRate
}
@@ -490,6 +526,34 @@ func (h hypervisor) getTxRateLimiterCfg() uint64 {
return h.TxRateLimiterMaxRate
}
func (h hypervisor) getNetRateLimiterBwMaxRate() int64 {
return h.NetRateLimiterBwMaxRate
}
func (h hypervisor) getNetRateLimiterBwOneTimeBurst() int64 {
if h.NetRateLimiterBwOneTimeBurst != 0 && h.getNetRateLimiterBwMaxRate() == 0 {
kataUtilsLogger.Warn("The NetRateLimiterBwOneTimeBurst is set but NetRateLimiterBwMaxRate is not set, this option will be ignored.")
h.NetRateLimiterBwOneTimeBurst = 0
}
return h.NetRateLimiterBwOneTimeBurst
}
func (h hypervisor) getNetRateLimiterOpsMaxRate() int64 {
return h.NetRateLimiterOpsMaxRate
}
func (h hypervisor) getNetRateLimiterOpsOneTimeBurst() int64 {
if h.NetRateLimiterOpsOneTimeBurst != 0 && h.getNetRateLimiterOpsMaxRate() == 0 {
kataUtilsLogger.Warn("The NetRateLimiterOpsOneTimeBurst is set but NetRateLimiterOpsMaxRate is not set, this option will be ignored.")
h.NetRateLimiterOpsOneTimeBurst = 0
}
return h.NetRateLimiterOpsOneTimeBurst
}
func (h hypervisor) getIOMMUPlatform() bool {
if h.IOMMUPlatform {
kataUtilsLogger.Info("IOMMUPlatform is enabled by default.")
@@ -828,52 +892,60 @@ func newClhHypervisorConfig(h hypervisor) (vc.HypervisorConfig, error) {
}
return vc.HypervisorConfig{
HypervisorPath: hypervisor,
HypervisorPathList: h.HypervisorPathList,
KernelPath: kernel,
InitrdPath: initrd,
ImagePath: image,
FirmwarePath: firmware,
MachineAccelerators: machineAccelerators,
KernelParams: vc.DeserializeParams(strings.Fields(kernelParams)),
HypervisorMachineType: machineType,
NumVCPUs: h.defaultVCPUs(),
DefaultMaxVCPUs: h.defaultMaxVCPUs(),
MemorySize: h.defaultMemSz(),
MemSlots: h.defaultMemSlots(),
MemOffset: h.defaultMemOffset(),
VirtioMem: h.VirtioMem,
EntropySource: h.GetEntropySource(),
EntropySourceList: h.EntropySourceList,
DefaultBridges: h.defaultBridges(),
DisableBlockDeviceUse: h.DisableBlockDeviceUse,
SharedFS: sharedFS,
VirtioFSDaemon: h.VirtioFSDaemon,
VirtioFSDaemonList: h.VirtioFSDaemonList,
VirtioFSCacheSize: h.VirtioFSCacheSize,
VirtioFSCache: h.VirtioFSCache,
MemPrealloc: h.MemPrealloc,
HugePages: h.HugePages,
FileBackedMemRootDir: h.FileBackedMemRootDir,
FileBackedMemRootList: h.FileBackedMemRootList,
Debug: h.Debug,
DisableNestingChecks: h.DisableNestingChecks,
BlockDeviceDriver: blockDriver,
BlockDeviceCacheSet: h.BlockDeviceCacheSet,
BlockDeviceCacheDirect: h.BlockDeviceCacheDirect,
BlockDeviceCacheNoflush: h.BlockDeviceCacheNoflush,
EnableIOThreads: h.EnableIOThreads,
Msize9p: h.msize9p(),
HotplugVFIOOnRootBus: h.HotplugVFIOOnRootBus,
PCIeRootPort: h.PCIeRootPort,
DisableVhostNet: true,
GuestHookPath: h.guestHookPath(),
VirtioFSExtraArgs: h.VirtioFSExtraArgs,
SGXEPCSize: defaultSGXEPCSize,
EnableAnnotations: h.EnableAnnotations,
DisableSeccomp: h.DisableSeccomp,
ConfidentialGuest: h.ConfidentialGuest,
DisableSeLinux: h.DisableSeLinux,
HypervisorPath: hypervisor,
HypervisorPathList: h.HypervisorPathList,
KernelPath: kernel,
InitrdPath: initrd,
ImagePath: image,
FirmwarePath: firmware,
MachineAccelerators: machineAccelerators,
KernelParams: vc.DeserializeParams(strings.Fields(kernelParams)),
HypervisorMachineType: machineType,
NumVCPUs: h.defaultVCPUs(),
DefaultMaxVCPUs: h.defaultMaxVCPUs(),
MemorySize: h.defaultMemSz(),
MemSlots: h.defaultMemSlots(),
MemOffset: h.defaultMemOffset(),
VirtioMem: h.VirtioMem,
EntropySource: h.GetEntropySource(),
EntropySourceList: h.EntropySourceList,
DefaultBridges: h.defaultBridges(),
DisableBlockDeviceUse: h.DisableBlockDeviceUse,
SharedFS: sharedFS,
VirtioFSDaemon: h.VirtioFSDaemon,
VirtioFSDaemonList: h.VirtioFSDaemonList,
VirtioFSCacheSize: h.VirtioFSCacheSize,
VirtioFSCache: h.VirtioFSCache,
MemPrealloc: h.MemPrealloc,
HugePages: h.HugePages,
FileBackedMemRootDir: h.FileBackedMemRootDir,
FileBackedMemRootList: h.FileBackedMemRootList,
Debug: h.Debug,
DisableNestingChecks: h.DisableNestingChecks,
BlockDeviceDriver: blockDriver,
BlockDeviceCacheSet: h.BlockDeviceCacheSet,
BlockDeviceCacheDirect: h.BlockDeviceCacheDirect,
BlockDeviceCacheNoflush: h.BlockDeviceCacheNoflush,
EnableIOThreads: h.EnableIOThreads,
Msize9p: h.msize9p(),
HotplugVFIOOnRootBus: h.HotplugVFIOOnRootBus,
PCIeRootPort: h.PCIeRootPort,
DisableVhostNet: true,
GuestHookPath: h.guestHookPath(),
VirtioFSExtraArgs: h.VirtioFSExtraArgs,
SGXEPCSize: defaultSGXEPCSize,
EnableAnnotations: h.EnableAnnotations,
DisableSeccomp: h.DisableSeccomp,
ConfidentialGuest: h.ConfidentialGuest,
DisableSeLinux: h.DisableSeLinux,
NetRateLimiterBwMaxRate: h.getNetRateLimiterBwMaxRate(),
NetRateLimiterBwOneTimeBurst: h.getNetRateLimiterBwOneTimeBurst(),
NetRateLimiterOpsMaxRate: h.getNetRateLimiterOpsMaxRate(),
NetRateLimiterOpsOneTimeBurst: h.getNetRateLimiterOpsOneTimeBurst(),
DiskRateLimiterBwMaxRate: h.getDiskRateLimiterBwMaxRate(),
DiskRateLimiterBwOneTimeBurst: h.getDiskRateLimiterBwOneTimeBurst(),
DiskRateLimiterOpsMaxRate: h.getDiskRateLimiterOpsMaxRate(),
DiskRateLimiterOpsOneTimeBurst: h.getDiskRateLimiterOpsOneTimeBurst(),
}, nil
}

View File

@@ -810,6 +810,14 @@ func TestNewClhHypervisorConfig(t *testing.T) {
kernelPath := path.Join(tmpdir, "kernel")
imagePath := path.Join(tmpdir, "image")
virtioFsDaemon := path.Join(tmpdir, "virtiofsd")
netRateLimiterBwMaxRate := int64(1000)
netRateLimiterBwOneTimeBurst := int64(1000)
netRateLimiterOpsMaxRate := int64(0)
netRateLimiterOpsOneTimeBurst := int64(1000)
diskRateLimiterBwMaxRate := int64(1000)
diskRateLimiterBwOneTimeBurst := int64(1000)
diskRateLimiterOpsMaxRate := int64(0)
diskRateLimiterOpsOneTimeBurst := int64(1000)
for _, file := range []string{imagePath, hypervisorPath, kernelPath, virtioFsDaemon} {
err := createEmptyFile(file)
@@ -817,11 +825,19 @@ func TestNewClhHypervisorConfig(t *testing.T) {
}
hypervisor := hypervisor{
Path: hypervisorPath,
Kernel: kernelPath,
Image: imagePath,
VirtioFSDaemon: virtioFsDaemon,
VirtioFSCache: "always",
Path: hypervisorPath,
Kernel: kernelPath,
Image: imagePath,
VirtioFSDaemon: virtioFsDaemon,
VirtioFSCache: "always",
NetRateLimiterBwMaxRate: netRateLimiterBwMaxRate,
NetRateLimiterBwOneTimeBurst: netRateLimiterBwOneTimeBurst,
NetRateLimiterOpsMaxRate: netRateLimiterOpsMaxRate,
NetRateLimiterOpsOneTimeBurst: netRateLimiterOpsOneTimeBurst,
DiskRateLimiterBwMaxRate: diskRateLimiterBwMaxRate,
DiskRateLimiterBwOneTimeBurst: diskRateLimiterBwOneTimeBurst,
DiskRateLimiterOpsMaxRate: diskRateLimiterOpsMaxRate,
DiskRateLimiterOpsOneTimeBurst: diskRateLimiterOpsOneTimeBurst,
}
config, err := newClhHypervisorConfig(hypervisor)
if err != nil {
@@ -852,6 +868,39 @@ func TestNewClhHypervisorConfig(t *testing.T) {
t.Errorf("Expected VirtioFSCache %v, got %v", true, config.VirtioFSCache)
}
if config.NetRateLimiterBwMaxRate != netRateLimiterBwMaxRate {
t.Errorf("Expected value for network bandwidth rate limiter %v, got %v", netRateLimiterBwMaxRate, config.NetRateLimiterBwMaxRate)
}
if config.NetRateLimiterBwOneTimeBurst != netRateLimiterBwOneTimeBurst {
t.Errorf("Expected value for network bandwidth one time burst %v, got %v", netRateLimiterBwOneTimeBurst, config.NetRateLimiterBwOneTimeBurst)
}
if config.NetRateLimiterOpsMaxRate != netRateLimiterOpsMaxRate {
t.Errorf("Expected value for network operations rate limiter %v, got %v", netRateLimiterOpsMaxRate, config.NetRateLimiterOpsMaxRate)
}
// We expect 0 (zero) here as netRateLimiterOpsMaxRate is not set (set to zero).
if config.NetRateLimiterOpsOneTimeBurst != 0 {
t.Errorf("Expected value for network operations one time burst %v, got %v", netRateLimiterOpsOneTimeBurst, config.NetRateLimiterOpsOneTimeBurst)
}
if config.DiskRateLimiterBwMaxRate != diskRateLimiterBwMaxRate {
t.Errorf("Expected value for disk bandwidth rate limiter %v, got %v", diskRateLimiterBwMaxRate, config.DiskRateLimiterBwMaxRate)
}
if config.DiskRateLimiterBwOneTimeBurst != diskRateLimiterBwOneTimeBurst {
t.Errorf("Expected value for disk bandwidth one time burst %v, got %v", diskRateLimiterBwOneTimeBurst, config.DiskRateLimiterBwOneTimeBurst)
}
if config.DiskRateLimiterOpsMaxRate != diskRateLimiterOpsMaxRate {
t.Errorf("Expected value for disk operations rate limiter %v, got %v", diskRateLimiterOpsMaxRate, config.DiskRateLimiterOpsMaxRate)
}
// We expect 0 (zero) here as diskRateLimiterOpsMaxRate is not set (set to zero).
if config.DiskRateLimiterOpsOneTimeBurst != 0 {
t.Errorf("Expected value for disk operations one time burst %v, got %v", diskRateLimiterOpsOneTimeBurst, config.DiskRateLimiterOpsOneTimeBurst)
}
}
func TestHypervisorDefaults(t *testing.T) {

View File

@@ -165,6 +165,7 @@ type cloudHypervisor struct {
APIClient clhClient
ctx context.Context
id string
devicesIds map[string]string
vmconfig chclient.VmConfig
state CloudHypervisorState
config HypervisorConfig
@@ -358,6 +359,7 @@ func (clh *cloudHypervisor) CreateVM(ctx context.Context, id string, network Net
clh.id = id
clh.state.state = clhNotReady
clh.devicesIds = make(map[string]string)
clh.Logger().WithField("function", "CreateVM").Info("creating Sandbox")
@@ -442,6 +444,11 @@ func (clh *cloudHypervisor) CreateVM(ctx context.Context, id string, network Net
disk := chclient.NewDiskConfig(imagePath)
disk.SetReadonly(true)
diskRateLimiterConfig := clh.getDiskRateLimiterConfig()
if diskRateLimiterConfig != nil {
disk.SetRateLimiterConfig(*diskRateLimiterConfig)
}
if clh.vmconfig.Disks != nil {
*clh.vmconfig.Disks = append(*clh.vmconfig.Disks, *disk)
} else {
@@ -665,7 +672,11 @@ func (clh *cloudHypervisor) hotplugAddBlockDevice(drive *config.BlockDrive) erro
clhDisk := *chclient.NewDiskConfig(drive.File)
clhDisk.Readonly = &drive.ReadOnly
clhDisk.VhostUser = func(b bool) *bool { return &b }(false)
clhDisk.Id = &driveID
diskRateLimiterConfig := clh.getDiskRateLimiterConfig()
if diskRateLimiterConfig != nil {
clhDisk.SetRateLimiterConfig(*diskRateLimiterConfig)
}
pciInfo, _, err := cl.VmAddDiskPut(ctx, clhDisk)
@@ -673,6 +684,7 @@ func (clh *cloudHypervisor) hotplugAddBlockDevice(drive *config.BlockDrive) erro
return fmt.Errorf("failed to hotplug block device %+v %s", drive, openAPIClientError(err))
}
clh.devicesIds[driveID] = pciInfo.GetId()
drive.PCIPath, err = clhPciInfoToPath(pciInfo)
return err
@@ -686,11 +698,11 @@ func (clh *cloudHypervisor) hotPlugVFIODevice(device *config.VFIODev) error {
// Create the clh device config via the constructor to ensure default values are properly assigned
clhDevice := *chclient.NewVmAddDevice()
clhDevice.Path = &device.SysfsDev
clhDevice.Id = &device.ID
pciInfo, _, err := cl.VmAddDevicePut(ctx, clhDevice)
if err != nil {
return fmt.Errorf("Failed to hotplug device %+v %s", device, openAPIClientError(err))
}
clh.devicesIds[device.ID] = pciInfo.GetId()
// clh doesn't use bridges, so the PCI path is simply the slot
// number of the device. This will break if clh starts using
@@ -751,13 +763,15 @@ func (clh *cloudHypervisor) HotplugRemoveDevice(ctx context.Context, devInfo int
ctx, cancel := context.WithTimeout(context.Background(), clhHotPlugAPITimeout*time.Second)
defer cancel()
originalDeviceID := clh.devicesIds[deviceID]
remove := *chclient.NewVmRemoveDevice()
remove.Id = &deviceID
remove.Id = &originalDeviceID
_, err := cl.VmRemoveDevicePut(ctx, remove)
if err != nil {
err = fmt.Errorf("failed to hotplug remove (unplug) device %+v: %s", devInfo, openAPIClientError(err))
}
delete(clh.devicesIds, deviceID)
return nil, err
}
@@ -1304,6 +1318,52 @@ func (clh *cloudHypervisor) addVSock(cid int64, path string) {
clh.vmconfig.Vsock = chclient.NewVsockConfig(cid, path)
}
func (clh *cloudHypervisor) getRateLimiterConfig(bwSize, bwOneTimeBurst, opsSize, opsOneTimeBurst int64) *chclient.RateLimiterConfig {
if bwSize == 0 && opsSize == 0 {
return nil
}
rateLimiterConfig := chclient.NewRateLimiterConfig()
if bwSize != 0 {
bwTokenBucket := chclient.NewTokenBucket(bwSize, int64(utils.DefaultRateLimiterRefillTimeMilliSecs))
if bwOneTimeBurst != 0 {
bwTokenBucket.SetOneTimeBurst(bwOneTimeBurst)
}
rateLimiterConfig.SetBandwidth(*bwTokenBucket)
}
if opsSize != 0 {
opsTokenBucket := chclient.NewTokenBucket(opsSize, int64(utils.DefaultRateLimiterRefillTimeMilliSecs))
if opsOneTimeBurst != 0 {
opsTokenBucket.SetOneTimeBurst(opsOneTimeBurst)
}
rateLimiterConfig.SetOps(*opsTokenBucket)
}
return rateLimiterConfig
}
func (clh *cloudHypervisor) getNetRateLimiterConfig() *chclient.RateLimiterConfig {
return clh.getRateLimiterConfig(
int64(utils.RevertBytes(uint64(clh.config.NetRateLimiterBwMaxRate/8))),
int64(utils.RevertBytes(uint64(clh.config.NetRateLimiterBwOneTimeBurst/8))),
clh.config.NetRateLimiterOpsMaxRate,
clh.config.NetRateLimiterOpsOneTimeBurst)
}
func (clh *cloudHypervisor) getDiskRateLimiterConfig() *chclient.RateLimiterConfig {
return clh.getRateLimiterConfig(
int64(utils.RevertBytes(uint64(clh.config.DiskRateLimiterBwMaxRate/8))),
int64(utils.RevertBytes(uint64(clh.config.DiskRateLimiterBwOneTimeBurst/8))),
clh.config.DiskRateLimiterOpsMaxRate,
clh.config.DiskRateLimiterOpsOneTimeBurst)
}
func (clh *cloudHypervisor) addNet(e Endpoint) error {
clh.Logger().WithField("endpoint-type", e).Debugf("Adding Endpoint of type %v", e)
@@ -1323,9 +1383,15 @@ func (clh *cloudHypervisor) addNet(e Endpoint) error {
"tap": tapPath,
}).Info("Adding Net")
netRateLimiterConfig := clh.getNetRateLimiterConfig()
net := chclient.NewNetConfig()
net.Mac = &mac
net.Tap = &tapPath
if netRateLimiterConfig != nil {
net.SetRateLimiterConfig(*netRateLimiterConfig)
}
if clh.vmconfig.Net != nil {
*clh.vmconfig.Net = append(*clh.vmconfig.Net, *net)
} else {
@@ -1435,5 +1501,5 @@ func (clh *cloudHypervisor) vmInfo() (chclient.VmInfo, error) {
}
func (clh *cloudHypervisor) IsRateLimiterBuiltin() bool {
return false
return true
}

View File

@@ -52,17 +52,21 @@ func newClhConfig() (HypervisorConfig, error) {
}
return HypervisorConfig{
KernelPath: testClhKernelPath,
ImagePath: testClhImagePath,
HypervisorPath: testClhPath,
NumVCPUs: defaultVCPUs,
BlockDeviceDriver: config.VirtioBlock,
MemorySize: defaultMemSzMiB,
DefaultBridges: defaultBridges,
DefaultMaxVCPUs: uint32(64),
SharedFS: config.VirtioFS,
VirtioFSCache: virtioFsCacheAlways,
VirtioFSDaemon: testVirtiofsdPath,
KernelPath: testClhKernelPath,
ImagePath: testClhImagePath,
HypervisorPath: testClhPath,
NumVCPUs: defaultVCPUs,
BlockDeviceDriver: config.VirtioBlock,
MemorySize: defaultMemSzMiB,
DefaultBridges: defaultBridges,
DefaultMaxVCPUs: uint32(64),
SharedFS: config.VirtioFS,
VirtioFSCache: virtioFsCacheAlways,
VirtioFSDaemon: testVirtiofsdPath,
NetRateLimiterBwMaxRate: int64(0),
NetRateLimiterBwOneTimeBurst: int64(0),
NetRateLimiterOpsMaxRate: int64(0),
NetRateLimiterOpsOneTimeBurst: int64(0),
}, nil
}
@@ -191,6 +195,181 @@ func TestCloudHypervisorAddNetCheckEnpointTypes(t *testing.T) {
}
}
// Check AddNet properly sets up the network rate limiter
func TestCloudHypervisorNetRateLimiter(t *testing.T) {
assert := assert.New(t)
tapPath := "/path/to/tap"
validVeth := &VethEndpoint{}
validVeth.NetPair.TapInterface.TAPIface.Name = tapPath
type args struct {
bwMaxRate int64
bwOneTimeBurst int64
opsMaxRate int64
opsOneTimeBurst int64
}
//nolint: govet
tests := []struct {
name string
args args
expectsRateLimiter bool
expectsBwBucketToken bool
expectsOpsBucketToken bool
}{
// Bandwidth
{
"Bandwidth | max rate with one time burst",
args{
bwMaxRate: int64(1000),
bwOneTimeBurst: int64(10000),
},
true, // expectsRateLimiter
true, // expectsBwBucketToken
false, // expectsOpsBucketToken
},
{
"Bandwidth | max rate without one time burst",
args{
bwMaxRate: int64(1000),
},
true, // expectsRateLimiter
true, // expectsBwBucketToken
false, // expectsOpsBucketToken
},
{
"Bandwidth | no max rate with one time burst",
args{
bwOneTimeBurst: int64(10000),
},
false, // expectsRateLimiter
false, // expectsBwBucketToken
false, // expectsOpsBucketToken
},
{
"Bandwidth | no max rate and no one time burst",
args{},
false, // expectsRateLimiter
false, // expectsBwBucketToken
false, // expectsOpsBucketToken
},
// Operations
{
"Operations | max rate with one time burst",
args{
opsMaxRate: int64(1000),
opsOneTimeBurst: int64(10000),
},
true, // expectsRateLimiter
false, // expectsBwBucketToken
true, // expectsOpsBucketToken
},
{
"Operations | max rate without one time burst",
args{
opsMaxRate: int64(1000),
},
true, // expectsRateLimiter
false, // expectsBwBucketToken
true, // expectsOpsBucketToken
},
{
"Operations | no max rate with one time burst",
args{
opsOneTimeBurst: int64(10000),
},
false, // expectsRateLimiter
false, // expectsBwBucketToken
false, // expectsOpsBucketToken
},
{
"Operations | no max rate and no one time burst",
args{},
false, // expectsRateLimiter
false, // expectsBwBucketToken
false, // expectsOpsBucketToken
},
// Bandwidth and Operations
{
"Bandwidth and Operations | max rate with one time burst",
args{
bwMaxRate: int64(1000),
bwOneTimeBurst: int64(10000),
opsMaxRate: int64(1000),
opsOneTimeBurst: int64(10000),
},
true, // expectsRateLimiter
true, // expectsBwBucketToken
true, // expectsOpsBucketToken
},
{
"Bandwidth and Operations | max rate without one time burst",
args{
bwMaxRate: int64(1000),
opsMaxRate: int64(1000),
},
true, // expectsRateLimiter
true, // expectsBwBucketToken
true, // expectsOpsBucketToken
},
{
"Bandwidth and Operations | no max rate with one time burst",
args{
bwOneTimeBurst: int64(10000),
opsOneTimeBurst: int64(10000),
},
false, // expectsRateLimiter
false, // expectsBwBucketToken
false, // expectsOpsBucketToken
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
clhConfig, err := newClhConfig()
assert.NoError(err)
clhConfig.NetRateLimiterBwMaxRate = tt.args.bwMaxRate
clhConfig.NetRateLimiterBwOneTimeBurst = tt.args.bwOneTimeBurst
clhConfig.NetRateLimiterOpsMaxRate = tt.args.opsMaxRate
clhConfig.NetRateLimiterOpsOneTimeBurst = tt.args.opsOneTimeBurst
clh := &cloudHypervisor{}
clh.config = clhConfig
clh.APIClient = &clhClientMock{}
if err := clh.addNet(validVeth); err != nil {
t.Errorf("cloudHypervisor.addNet() error = %v", err)
} else {
netConfig := (*clh.vmconfig.Net)[0]
assert.Equal(netConfig.HasRateLimiterConfig(), tt.expectsRateLimiter)
if tt.expectsRateLimiter {
rateLimiterConfig := netConfig.GetRateLimiterConfig()
assert.Equal(rateLimiterConfig.HasBandwidth(), tt.expectsBwBucketToken)
assert.Equal(rateLimiterConfig.HasOps(), tt.expectsOpsBucketToken)
if tt.expectsBwBucketToken {
bwBucketToken := rateLimiterConfig.GetBandwidth()
assert.Equal(bwBucketToken.GetSize(), int64(utils.RevertBytes(uint64(tt.args.bwMaxRate/8))))
assert.Equal(bwBucketToken.GetOneTimeBurst(), int64(utils.RevertBytes(uint64(tt.args.bwOneTimeBurst/8))))
}
if tt.expectsOpsBucketToken {
opsBucketToken := rateLimiterConfig.GetOps()
assert.Equal(opsBucketToken.GetSize(), int64(tt.args.opsMaxRate))
assert.Equal(opsBucketToken.GetOneTimeBurst(), int64(tt.args.opsOneTimeBurst))
}
}
}
})
}
}
func TestCloudHypervisorBootVM(t *testing.T) {
clh := &cloudHypervisor{}
clh.APIClient = &clhClientMock{}
@@ -384,6 +563,7 @@ func TestCloudHypervisorHotplugAddBlockDevice(t *testing.T) {
clh := &cloudHypervisor{}
clh.config = clhConfig
clh.APIClient = &clhClientMock{}
clh.devicesIds = make(map[string]string)
clh.config.BlockDeviceDriver = config.VirtioBlock
err = clh.hotplugAddBlockDevice(&config.BlockDrive{Pmem: false})
@@ -406,6 +586,7 @@ func TestCloudHypervisorHotplugRemoveDevice(t *testing.T) {
clh := &cloudHypervisor{}
clh.config = clhConfig
clh.APIClient = &clhClientMock{}
clh.devicesIds = make(map[string]string)
_, err = clh.HotplugRemoveDevice(context.Background(), &config.BlockDrive{}, BlockDev)
assert.NoError(err, "Hotplug remove block device expected no error")

View File

@@ -930,7 +930,7 @@ func (fc *firecracker) fcAddNetDevice(ctx context.Context, endpoint Endpoint) {
// The implementation of rate limiter is based on TBF.
// Rate Limiter defines a token bucket with a maximum capacity (size) to store tokens, and an interval for refilling purposes (refill_time).
// The refill-rate is derived from size and refill_time, and it is the constant rate at which the tokens replenish.
refillTime := uint64(1000)
refillTime := uint64(utils.DefaultRateLimiterRefillTimeMilliSecs)
var rxRateLimiter models.RateLimiter
rxSize := fc.config.RxRateLimiterMaxRate
if rxSize > 0 {
@@ -938,7 +938,7 @@ func (fc *firecracker) fcAddNetDevice(ctx context.Context, endpoint Endpoint) {
// kata-defined rxSize is in bits with scaling factors of 1000, but firecracker-defined
// rxSize is in bytes with scaling factors of 1024, need reversion.
rxSize = revertBytes(rxSize / 8)
rxSize = utils.RevertBytes(rxSize / 8)
rxTokenBucket := models.TokenBucket{
RefillTime: &refillTime,
Size: &rxSize,
@@ -955,7 +955,7 @@ func (fc *firecracker) fcAddNetDevice(ctx context.Context, endpoint Endpoint) {
// kata-defined txSize is in bits with scaling factors of 1000, but firecracker-defined
// txSize is in bytes with scaling factors of 1024, need reversion.
txSize = revertBytes(txSize / 8)
txSize = utils.RevertBytes(txSize / 8)
txTokenBucket := models.TokenBucket{
RefillTime: &refillTime,
Size: &txSize,
@@ -1266,15 +1266,3 @@ func (fc *firecracker) GenerateSocket(id string) (interface{}, error) {
func (fc *firecracker) IsRateLimiterBuiltin() bool {
return true
}
// In firecracker, it accepts the size of rate limiter in scaling factors of 2^10(1024)
// But in kata-defined rate limiter, for better Human-readability, we prefer scaling factors of 10^3(1000).
// func revertByte reverts num from scaling factors of 1000 to 1024, e.g. 10000000(10MB) to 10485760.
func revertBytes(num uint64) uint64 {
a := num / 1000
b := num % 1000
if a == 0 {
return num
}
return 1024*revertBytes(a) + b
}

View File

@@ -50,17 +50,6 @@ func TestFCTruncateID(t *testing.T) {
assert.Equal(expectedID, id)
}
func TestRevertBytes(t *testing.T) {
assert := assert.New(t)
//10MB
testNum := uint64(10000000)
expectedNum := uint64(10485760)
num := revertBytes(testNum)
assert.Equal(expectedNum, num)
}
func TestFCParseVersion(t *testing.T) {
assert := assert.New(t)

View File

@@ -380,12 +380,48 @@ type HypervisorConfig struct {
// Enable SGX. Hardware-based isolation and memory encryption.
SGXEPCSize int64
// DiskRateLimiterBwRate is used to control disk I/O bandwidth on VM level.
// The same value, defined in bits per second, is used for inbound and outbound bandwidth.
DiskRateLimiterBwMaxRate int64
// DiskRateLimiterBwOneTimeBurst is used to control disk I/O bandwidth on VM level.
// This increases the initial max rate and this initial extra credit does *NOT* replenish
// and can be used for an *initial* burst of data.
DiskRateLimiterBwOneTimeBurst int64
// DiskRateLimiterOpsRate is used to control disk I/O operations on VM level.
// The same value, defined in operations per second, is used for inbound and outbound bandwidth.
DiskRateLimiterOpsMaxRate int64
// DiskRateLimiterOpsOneTimeBurst is used to control disk I/O operations on VM level.
// This increases the initial max rate and this initial extra credit does *NOT* replenish
// and can be used for an *initial* burst of data.
DiskRateLimiterOpsOneTimeBurst int64
// RxRateLimiterMaxRate is used to control network I/O inbound bandwidth on VM level.
RxRateLimiterMaxRate uint64
// TxRateLimiterMaxRate is used to control network I/O outbound bandwidth on VM level.
TxRateLimiterMaxRate uint64
// NetRateLimiterBwRate is used to control network I/O bandwidth on VM level.
// The same value, defined in bits per second, is used for inbound and outbound bandwidth.
NetRateLimiterBwMaxRate int64
// NetRateLimiterBwOneTimeBurst is used to control network I/O bandwidth on VM level.
// This increases the initial max rate and this initial extra credit does *NOT* replenish
// and can be used for an *initial* burst of data.
NetRateLimiterBwOneTimeBurst int64
// NetRateLimiterOpsRate is used to control network I/O operations on VM level.
// The same value, defined in operations per second, is used for inbound and outbound bandwidth.
NetRateLimiterOpsMaxRate int64
// NetRateLimiterOpsOneTimeBurst is used to control network I/O operations on VM level.
// This increases the initial max rate and this initial extra credit does *NOT* replenish
// and can be used for an *initial* burst of data.
NetRateLimiterOpsOneTimeBurst int64
// MemOffset specifies memory space for nvdimm device
MemOffset uint64

View File

@@ -38,6 +38,7 @@ import (
pkgUtils "github.com/kata-containers/kata-containers/src/runtime/pkg/utils"
"github.com/kata-containers/kata-containers/src/runtime/pkg/uuid"
"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/device/config"
"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/device/drivers"
"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/types"
vcTypes "github.com/kata-containers/kata-containers/src/runtime/virtcontainers/types"
"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/utils"
@@ -129,6 +130,8 @@ const (
fallbackFileBackedMemDir = "/dev/shm"
qemuStopSandboxTimeoutSecs = 15
qomPathPrefix = "/machine/peripheral/"
)
// agnostic list of kernel parameters
@@ -1395,31 +1398,68 @@ func (q *qemu) hotplugAddVhostUserBlkDevice(ctx context.Context, vAttr *config.V
}()
driver := "vhost-user-blk-pci"
addr, bridge, err := q.arch.addDeviceToBridge(ctx, vAttr.DevID, types.PCI)
if err != nil {
return err
}
defer func() {
if err != nil {
q.arch.removeDeviceFromBridge(vAttr.DevID)
machineType := q.HypervisorConfig().HypervisorMachineType
switch machineType {
case QemuVirt:
if q.state.PCIeRootPort <= 0 {
return fmt.Errorf("Vhost-user-blk device is a PCIe device if machine type is virt. Need to add the PCIe Root Port by setting the pcie_root_port parameter in the configuration for virt")
}
}()
bridgeSlot, err := vcTypes.PciSlotFromInt(bridge.Addr)
if err != nil {
return err
}
devSlot, err := vcTypes.PciSlotFromString(addr)
if err != nil {
return err
}
vAttr.PCIPath, err = vcTypes.PciPathFromSlots(bridgeSlot, devSlot)
//The addr of a dev is corresponding with device:function for PCIe in qemu which starting from 0
//Since the dev is the first and only one on this bus(root port), it should be 0.
addr := "00"
if err = q.qmpMonitorCh.qmp.ExecutePCIVhostUserDevAdd(q.qmpMonitorCh.ctx, driver, devID, vAttr.DevID, addr, bridge.ID); err != nil {
return err
}
bridgeId := fmt.Sprintf("%s%d", pcieRootPortPrefix, len(drivers.AllPCIeDevs))
drivers.AllPCIeDevs[devID] = true
bridgeQomPath := fmt.Sprintf("%s%s", qomPathPrefix, bridgeId)
bridgeSlot, err := q.qomGetSlot(bridgeQomPath)
if err != nil {
return err
}
devSlot, err := vcTypes.PciSlotFromString(addr)
if err != nil {
return err
}
vAttr.PCIPath, err = vcTypes.PciPathFromSlots(bridgeSlot, devSlot)
if err != nil {
return err
}
if err = q.qmpMonitorCh.qmp.ExecutePCIVhostUserDevAdd(q.qmpMonitorCh.ctx, driver, devID, vAttr.DevID, addr, bridgeId); err != nil {
return err
}
default:
addr, bridge, err := q.arch.addDeviceToBridge(ctx, vAttr.DevID, types.PCI)
if err != nil {
return err
}
defer func() {
if err != nil {
q.arch.removeDeviceFromBridge(vAttr.DevID)
}
}()
bridgeSlot, err := vcTypes.PciSlotFromInt(bridge.Addr)
if err != nil {
return err
}
devSlot, err := vcTypes.PciSlotFromString(addr)
if err != nil {
return err
}
vAttr.PCIPath, err = vcTypes.PciPathFromSlots(bridgeSlot, devSlot)
if err = q.qmpMonitorCh.qmp.ExecutePCIVhostUserDevAdd(q.qmpMonitorCh.ctx, driver, devID, vAttr.DevID, addr, bridge.ID); err != nil {
return err
}
}
return nil
}
@@ -1461,8 +1501,13 @@ func (q *qemu) hotplugVhostUserDevice(ctx context.Context, vAttr *config.VhostUs
return fmt.Errorf("Incorrect vhost-user device type found")
}
} else {
if err := q.arch.removeDeviceFromBridge(vAttr.DevID); err != nil {
return err
machineType := q.HypervisorConfig().HypervisorMachineType
if machineType != QemuVirt {
if err := q.arch.removeDeviceFromBridge(vAttr.DevID); err != nil {
return err
}
}
if err := q.qmpMonitorCh.qmp.ExecuteDeviceDel(q.qmpMonitorCh.ctx, devID); err != nil {
@@ -2315,7 +2360,7 @@ func genericAppendPCIeRootPort(devices []govmmQemu.Device, number uint32, machin
addr string
)
switch machineType {
case QemuQ35:
case QemuQ35, QemuVirt:
bus = defaultBridgeBus
chassis = "0"
multiFunction = false

View File

@@ -25,6 +25,11 @@ const cpBinaryName = "cp"
const fileMode0755 = os.FileMode(0755)
// The DefaultRateLimiterRefillTime is used for calculating the rate at
// which a TokenBucket is replinished, in cases where a RateLimiter is
// applied to either network or disk I/O.
const DefaultRateLimiterRefillTimeMilliSecs = 1000
// MibToBytesShift the number to shift needed to convert MiB to Bytes
const MibToBytesShift = 20
@@ -458,3 +463,19 @@ func getAllParentPaths(path string) []string {
// remove the "/" or "." from the return result
return paths[1:]
}
// In Cloud Hypervisor, as well as in Firecracker, the crate used by the VMMs
// accepts the size of rate limiter in scaling factors of 2^10(1024).
// But in kata-defined rate limiter, for better Human-readability, we prefer
// scaling factors of 10^3(1000).
//
// func revertBytes reverts num from scaling factors of 1000 to 1024, e.g.
// 10000000(10MB) to 10485760.
func RevertBytes(num uint64) uint64 {
a := num / 1000
b := num % 1000
if a == 0 {
return num
}
return 1024*RevertBytes(a) + b
}

View File

@@ -569,3 +569,14 @@ func TestGetAllParentPaths(t *testing.T) {
assert.Equal(tc.parents, getAllParentPaths(tc.targetPath))
}
}
func TestRevertBytes(t *testing.T) {
assert := assert.New(t)
//10MB
testNum := uint64(10000000)
expectedNum := uint64(10485760)
num := RevertBytes(testNum)
assert.Equal(expectedNum, num)
}

View File

@@ -4,7 +4,7 @@
The Kata Containers agent control tool (`kata-agent-ctl`) is a low-level test
tool. It allows basic interaction with the Kata Containers agent,
`kata-agent`, that runs inside the virtual machine.
`kata-agent`, that runs inside the virtual machine (VM).
Unlike the Kata Runtime, which only ever makes sequences of correctly ordered
and valid agent API calls, this tool allows users to make arbitrary agent API
@@ -117,7 +117,7 @@ establish the VSOCK guest CID value to connect to the agent.
1. Start a Kata Container
1. Establish the VSOCK guest CID number for the virtual machine:
1. Establish the VSOCK guest CID number for the VM:
```sh
$ guest_cid=$(sudo ss -H --vsock | awk '{print $6}' | cut -d: -f1)
@@ -211,10 +211,12 @@ $ sudo install -o root -g root -m 0755 ~/.cargo/bin/kata-agent-ctl /usr/local/bi
> **Warnings:**
>
> - This method is **only** for testing and development!
> - These methods are **only** for testing and development!
> - Only continue if you are using a non-critical system
> (such as a freshly installed VM environment).
#### Use a Unix abstract domain socket
1. Start the agent, specifying a local socket for it to communicate on:
```sh
@@ -233,3 +235,31 @@ $ sudo install -o root -g root -m 0755 ~/.cargo/bin/kata-agent-ctl /usr/local/bi
>
> The `@` in the server address is required - it denotes an abstract
> socket which the agent requires (see `unix(7)`).
#### Use a VSOCK loopback socket
VSOCK supports a special CID value of `1` (known symbolically as
`VMADDR_CID_LOCAL`) which assumes that the VM is actually
the local environment. This is effectively a `localhost` or loopback
interface which does not require an actual VM to be
running.
1. Start the agent, specifying the local VSOCK socket for it to communicate on:
```sh
$ vsock_loopback_cid=1
$ agent_vsock_port=1024
$ sudo KATA_AGENT_SERVER_ADDR="vsock://${vsock_loopback_cid}:${agent_vsock_port}" target/x86_64-unknown-linux-musl/release/kata-agent
```
> **Note:** This example assumes an Intel x86-64 system.
1. Run the tool in the same environment:
```sh
$ vsock_loopback_cid=1
$ agent_vsock_port=1024
$ cargo run -- -l debug connect --server-address "vsock://${vsock_loopback_cid}:${agent_vsock_port}" --bundle-dir "$bundle_dir" -c Check -c GetGuestDetails
```

View File

@@ -483,10 +483,8 @@ fn create_ttrpc_client(
if path.starts_with('@') {
abstract_socket = true;
// Remove the magic abstract-socket request character ('@')
// and crucially add a trailing nul terminator (required to
// interoperate with the ttrpc crate).
path = path[1..].to_string() + &"\x00".to_string();
// Remove the magic abstract-socket request character ('@').
path = path[1..].to_string();
}
if abstract_socket {

1
src/tools/runk/.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
/vendor/

1474
src/tools/runk/Cargo.lock generated Normal file

File diff suppressed because it is too large Load Diff

29
src/tools/runk/Cargo.toml Normal file
View File

@@ -0,0 +1,29 @@
[package]
name = "runk"
version = "0.0.1"
authors = ["The Kata Containers community <kata-dev@lists.katacontainers.io>"]
description = "runk: Kata OCI container runtime based on Kata agent"
license = "Apache-2.0"
edition = "2018"
[dependencies]
libcontainer = { path = "./libcontainer" }
rustjail = { path = "../../agent/rustjail", features = ["standard-oci-runtime"] }
oci = { path = "../../libs/oci" }
logging = { path = "../../libs/logging" }
liboci-cli = "0.0.3"
clap = { version = "3.0.6", features = ["derive", "cargo"] }
libc = "0.2.108"
nix = "0.23.0"
anyhow = "1.0.52"
slog = "2.7.0"
chrono = { version = "0.4.19", features = ["serde"] }
slog-async = "2.7.0"
tokio = { version = "1.15.0", features = ["full"] }
serde = { version = "1.0.133", features = ["derive"] }
serde_json = "1.0.74"
[workspace]
members = [
"libcontainer"
]

60
src/tools/runk/Makefile Normal file
View File

@@ -0,0 +1,60 @@
# Copyright 2021-2022 Sony Group Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
include ../../../utils.mk
TARGET = runk
TARGET_PATH = target/$(TRIPLE)/$(BUILD_TYPE)/$(TARGET)
AGENT_TARGET = oci-kata-agent
AGENT_TARGET_PATH = target/$(TRIPLE)/$(BUILD_TYPE)/$(AGENT_TARGET)
AGENT_SOURCE_PATH = ../../agent
# BINDIR is a directory for installing executable programs
BINDIR := /usr/local/bin
.DEFAULT_GOAL := default
default: build
build: build-agent build-runk
build-agent:
make -C $(AGENT_SOURCE_PATH) STANDARD_OCI_RUNTIME=yes
build-runk:
@RUSTFLAGS="$(EXTRA_RUSTFLAGS) --deny warnings" cargo build --target $(TRIPLE) --$(BUILD_TYPE)
install: install-agent install-runk
install-agent:
install -D $(AGENT_SOURCE_PATH)/$(AGENT_TARGET_PATH) $(BINDIR)/$(AGENT_TARGET)
install-runk:
install -D $(TARGET_PATH) $(BINDIR)/$(TARGET)
clean:
cargo clean
vendor:
cargo vendor
test:
cargo test --all --target $(TRIPLE) -- --nocapture
check: standard_rust_check
.PHONY: \
build \
build-agent \
build-runk \
install \
install-agent \
install-runk \
clean \
clippy \
format \
vendor \
test \
check \

284
src/tools/runk/README.md Normal file
View File

@@ -0,0 +1,284 @@
# runk
## Overview
> **Warnings:**
> `runk` is currently an experimental tool.
> Only continue if you are using a non-critical system.
`runk` is a standard OCI container runtime written in Rust based on a modified version of
the [Kata Container agent](https://github.com/kata-containers/kata-containers/tree/main/src/agent), `kata-agent`.
`runk` conforms to the [OCI Container Runtime specifications](https://github.com/opencontainers/runtime-spec).
Unlike the [Kata Container runtime](https://github.com/kata-containers/kata-containers/tree/main/src/agent#features),
`kata-runtime`, `runk` spawns and runs containers on the host machine directly.
The user can run `runk` in the same way as the existing container runtimes such as `runc`,
the most used implementation of the OCI runtime specs.
## Why does `runk` exist?
The `kata-agent` is a process running inside a virtual machine (VM) as a supervisor for managing containers
and processes running within those containers.
In other words, the `kata-agent` is a kind of "low-level" container runtime inside VM because the agent
spawns and runs containers according to the OCI runtime specs.
However, the `kata-agent` does not have the OCI Command-Line Interface (CLI) that is defined in the
[runtime spec](https://github.com/opencontainers/runtime-spec/blob/master/runtime.md).
The `kata-runtime` provides the CLI part of the Kata Containers runtime component,
but the `kata-runtime` is a container runtime for creating hardware-virtualized containers running on the host.
`runk` is a Rust-based standard OCI container runtime that manages normal containers,
not hardware-virtualized containers.
`runk` aims to become one of the alternatives to existing OCI compliant container runtimes.
The `kata-agent` has most of the [features](https://github.com/kata-containers/kata-containers/tree/main/src/agent#features)
needed for the container runtime and delivers high performance with a low memory footprint owing to the
implementation by Rust language.
Therefore, `runk` leverages the mechanism of the `kata-agent` to avoid reinventing the wheel.
## Performance
`runk` is faster than `runc` and has a lower memory footprint.
This table shows the average of the elapsed time and the memory footprint (maximum resident set size)
for running sequentially 100 containers, the containers run `/bin/true` using `run` command with
[detached mode](https://github.com/opencontainers/runc/blob/master/docs/terminals.md#detached)
on 12 CPU cores (`3.8 GHz AMD Ryzen 9 3900X`) and 32 GiB of RAM.
`runk` always runs containers with detached mode currently.
Evaluation Results:
| | `runk` (v0.0.1) | `runc` (v1.0.3) | `crun` (v1.4.2) |
|-----------------------|---------------|---------------|---------------|
| time [ms] | 39.83 | 50.39 | 38.41 |
| memory footprint [MB] | 4.013 | 10.78 | 1.738 |
## Status of `runk`
We drafted the initial code here, and any contributions to `runk` and [`kata-agent`](https://github.com/kata-containers/kata-containers/tree/main/src/agent)
are welcome.
Regarding features compared to `runc`, see the `Status of runk` section in the [issue](https://github.com/kata-containers/kata-containers/issues/2784).
## Building
`runk` uses the modified the `kata-agent` binary, `oci-kata-agent`, which is an agent to be called from `runk`.
Therefore, you also need to build the `oci-kata-agent` to run `runk`.
You can build both `runk` and `oci-kata-agent` as follows.
```bash
$ cd runk
$ make
```
To install `runk` and `oci-kata-agent` into default directory for install executable program (`/usr/local/bin`):
```bash
$ sudo make install
```
## Using `runk` directly
Please note that `runk` is a low level tool not developed with an end user in mind.
It is mostly employed by other higher-level container software like `containerd`.
If you still want to use `runk` directly, here's how.
### Prerequisites
It is necessary to create an OCI bundle to use the tool. The simplest method is:
``` bash
$ bundle_dir="bundle"
$ rootfs_dir="$bundle_dir/rootfs"
$ image="busybox"
$ mkdir -p "$rootfs_dir" && (cd "$bundle_dir" && runk spec)
$ sudo docker export $(sudo docker create "$image") | tar -C "$rootfs_dir" -xf -
```
> **Note:**
> If you use the unmodified `runk spec` template, this should give a `sh` session inside the container.
> However, if you use `runk` directly and run a container with the unmodified template,
> `runk` cannot launch the `sh` session because `runk` does not support terminal handling yet.
> You need to edit the process field in the `config.json` should look like this below
> with `"terminal": false` and `"args": ["sleep", "10"]`.
```json
"process": {
"terminal": false,
"user": {
"uid": 0,
"gid": 0
},
"args": [
"sleep",
"10"
],
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"TERM=xterm"
],
"cwd": "/",
[...]
}
```
If you want to launch the `sh` session inside the container, you need to run `runk` from `containerd`.
Please refer to the [Using `runk` from containerd](#using-runk-from-containerd) section
### Running a container
Now you can go through the [lifecycle operations](https://github.com/opencontainers/runtime-spec/blob/master/runtime.md)
in your shell.
You need to run `runk` as `root` because `runk` does not have the rootless feature which is the ability
to run containers without root privileges.
```bash
$ cd $bundle_dir
# Create a container
$ sudo runk create test
# View the container is created and in the "created" state
$ sudo runk state test
# Start the process inside the container
$ sudo runk start test
# After 10 seconds view that the container has exited and is now in the "stopped" state
$ sudo runk state test
# Now delete the container
$ sudo runk delete test
```
## Using `runk` from `containerd`
`runk` can run containers with the containerd runtime handler support on `containerd`.
### Prerequisites for `runk` with containerd
* `containerd` v1.2.4 or above
* `cri-tools`
> **Note:**
> [`cri-tools`](https://github.com/kubernetes-sigs/cri-tools) is a set of tools for CRI
> used for development and testing.
Install `cri-tools` from source code:
```bash
$ go get github.com/kubernetes-incubator/cri-tools
$ pushd $GOPATH/src/github.com/kubernetes-incubator/cri-tools
$ make
$ sudo -E make install
$ popd
```
Write the `crictl` configuration file:
``` bash
$ cat <<EOF | sudo tee /etc/crictl.yaml
runtime-endpoint: unix:///run/containerd/containerd.sock
EOF
```
### Configure `containerd` to use `runk`
Update `/etc/containerd/config.toml`:
```bash
$ cat <<EOF | sudo tee /etc/containerd/config.toml
version = 2
[plugins."io.containerd.runtime.v1.linux"]
shim_debug = true
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runk]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runk.options]
BinaryName = "/usr/local/bin/runk"
EOF
```
Restart `containerd`:
```bash
$ sudo systemctl restart containerd
```
### Running a container with `crictl` command line
You can run containers in `runk` via containerd's CRI.
Pull the `busybox` image:
``` bash
$ sudo crictl pull busybox
```
Create the sandbox configuration:
``` bash
$ cat <<EOF | tee sandbox.json
{
"metadata": {
"name": "busybox-sandbox",
"namespace": "default",
"attempt": 1,
"uid": "hdishd83djaidwnduwk28bcsb"
},
"log_directory": "/tmp",
"linux": {
}
}
EOF
```
Create the container configuration:
``` bash
$ cat <<EOF | tee container.json
{
"metadata": {
"name": "busybox"
},
"image": {
"image": "docker.io/busybox"
},
"command": [
"sh"
],
"envs": [
{
"key": "PATH",
"value": "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
},
{
"key": "TERM",
"value": "xterm"
}
],
"log_path": "busybox.0.log",
"stdin": true,
"stdin_once": true,
"tty": true
}
EOF
```
With the `crictl` command line of `cri-tools`, you can specify runtime class with `-r` or `--runtime` flag.
Launch a sandbox and container using the `crictl`:
```bash
# Run a container inside a sandbox
$ sudo crictl run -r runk container.json sandbox.json
f492eee753887ba3dfbba9022028975380739aba1269df431d097b73b23c3871
# Attach to the running container
$ sudo crictl attach --stdin --tty f492eee753887ba3dfbba9022028975380739aba1269df431d097b73b23c3871
/ #
```

View File

@@ -0,0 +1,23 @@
[package]
name = "libcontainer"
version = "0.0.1"
authors = ["The Kata Containers community <kata-dev@lists.katacontainers.io>"]
description = "Library for runk container"
license = "Apache-2.0"
edition = "2018"
[dependencies]
rustjail = { path = "../../../agent/rustjail", features = ["standard-oci-runtime"] }
oci = { path = "../../../libs/oci" }
logging = { path = "../../../libs/logging" }
derive_builder = "0.10.2"
libc = "0.2.108"
nix = "0.23.0"
anyhow = "1.0.52"
slog = "2.7.0"
chrono = { version = "0.4.19", features = ["serde"] }
serde = { version = "1.0.133", features = ["derive"] }
serde_json = "1.0.74"
[dev-dependencies]
tempfile = "3.3.0"

View File

@@ -0,0 +1,121 @@
// Copyright 2021-2022 Sony Group Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
use crate::container::{get_config_path, ContainerContext};
use anyhow::{anyhow, Result};
use derive_builder::Builder;
use oci::Spec;
use std::path::{Path, PathBuf};
#[derive(Default, Builder, Debug)]
pub struct Container {
id: String,
bundle: PathBuf,
root: PathBuf,
console_socket: Option<PathBuf>,
}
impl Container {
pub fn create_ctx(self) -> Result<ContainerContext> {
let bundle_canon = self.bundle.canonicalize()?;
let config_path = get_config_path(&bundle_canon);
let mut spec = Spec::load(
config_path
.to_str()
.ok_or_else(|| anyhow!("invalid config path"))?,
)?;
if spec.root.is_some() {
let mut spec_root = spec
.root
.as_mut()
.ok_or_else(|| anyhow!("root config was not present in the spec file"))?;
let rootfs_path = Path::new(&spec_root.path);
// If the rootfs path in the spec file is a relative path,
// convert it into a canonical path to pass validation of rootfs in the agent.
if !&rootfs_path.is_absolute() {
let rootfs_name = rootfs_path
.file_name()
.ok_or_else(|| anyhow!("invalid rootfs name"))?;
spec_root.path = bundle_canon
.join(rootfs_name)
.to_str()
.map(|s| s.to_string())
.ok_or_else(|| anyhow!("failed to convert bundle path"))?;
}
}
Ok(ContainerContext {
id: self.id,
bundle: self.bundle,
state_root: self.root,
spec,
// TODO: liboci-cli does not support --no-pivot option for create and run command.
// After liboci-cli supports the option, we will change the following code.
// no_pivot_root: self.no_pivot,
no_pivot_root: false,
console_socket: self.console_socket,
})
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::container::CONFIG_FILE_NAME;
use oci::Spec;
use std::{fs::File, path::PathBuf};
use tempfile::tempdir;
#[derive(Debug)]
struct TestData {
id: String,
bundle: PathBuf,
root: PathBuf,
console_socket: Option<PathBuf>,
spec: Spec,
no_pivot_root: bool,
}
#[test]
fn test_create_ctx() {
let bundle_dir = tempdir().unwrap();
let config_file = bundle_dir.path().join(CONFIG_FILE_NAME);
let spec = Spec::default();
let file = File::create(config_file).unwrap();
serde_json::to_writer(&file, &spec).unwrap();
let test_data = TestData {
id: String::from("test"),
bundle: PathBuf::from(bundle_dir.into_path()),
root: PathBuf::from("test"),
console_socket: Some(PathBuf::from("test")),
spec: Spec::default(),
no_pivot_root: false,
};
let test_ctx = ContainerContext {
id: test_data.id.clone(),
bundle: test_data.bundle.clone(),
state_root: test_data.root.clone(),
spec: test_data.spec.clone(),
no_pivot_root: test_data.no_pivot_root,
console_socket: test_data.console_socket.clone(),
};
let ctx = ContainerBuilder::default()
.id(test_data.id.clone())
.bundle(test_data.bundle.clone())
.root(test_data.root.clone())
.console_socket(test_data.console_socket.clone())
.build()
.unwrap()
.create_ctx()
.unwrap();
assert_eq!(test_ctx, ctx);
}
}

View File

@@ -0,0 +1,40 @@
// Copyright 2021-2022 Sony Group Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
use anyhow::{anyhow, Result};
use rustjail::cgroups::fs::Manager as CgroupManager;
use std::{
path::Path,
{fs, thread, time},
};
pub fn destroy_cgroup(cgroup_mg: &CgroupManager) -> Result<()> {
for path in cgroup_mg.paths.values() {
remove_cgroup_dir(Path::new(path))?;
}
Ok(())
}
// Try to remove the provided cgroups path five times with increasing delay between tries.
// If after all there are not removed cgroups, an appropriate error will be returned.
fn remove_cgroup_dir(path: &Path) -> Result<()> {
let mut retries = 5;
let mut delay = time::Duration::from_millis(10);
while retries != 0 {
if retries != 5 {
delay *= 2;
thread::sleep(delay);
}
if !path.exists() || fs::remove_dir(path).is_ok() {
return Ok(());
}
retries -= 1;
}
return Err(anyhow!("failed to remove cgroups paths: {:?}", path));
}

View File

@@ -0,0 +1,151 @@
// Copyright 2021-2022 Sony Group Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
use crate::status::Status;
use anyhow::{anyhow, Result};
use nix::unistd::{chdir, unlink, Pid};
use oci::Spec;
use rustjail::{
container::{BaseContainer, LinuxContainer, EXEC_FIFO_FILENAME},
process::Process,
specconv::CreateOpts,
};
use slog::Logger;
use std::{
env::current_dir,
path::{Path, PathBuf},
};
pub const CONFIG_FILE_NAME: &str = "config.json";
#[derive(Debug, Copy, Clone, PartialEq)]
pub enum ContainerAction {
Create,
Run,
}
#[derive(Debug, Clone, PartialEq)]
pub struct ContainerContext {
pub id: String,
pub bundle: PathBuf,
pub state_root: PathBuf,
pub spec: Spec,
pub no_pivot_root: bool,
pub console_socket: Option<PathBuf>,
}
impl ContainerContext {
pub async fn launch(&self, action: ContainerAction, logger: &Logger) -> Result<Pid> {
Status::create_dir(&self.state_root, &self.id)?;
let current_dir = current_dir()?;
chdir(&self.bundle)?;
let create_opts = CreateOpts {
cgroup_name: "".to_string(),
use_systemd_cgroup: false,
no_pivot_root: self.no_pivot_root,
no_new_keyring: false,
spec: Some(self.spec.clone()),
rootless_euid: false,
rootless_cgroup: false,
};
let mut ctr = LinuxContainer::new(
&self.id,
&self
.state_root
.to_str()
.map(|s| s.to_string())
.ok_or_else(|| anyhow!("failed to convert bundle path"))?,
create_opts.clone(),
logger,
)?;
let process = if self.spec.process.is_some() {
Process::new(
logger,
self.spec
.process
.as_ref()
.ok_or_else(|| anyhow!("process config was not present in the spec file"))?,
&self.id,
true,
0,
)?
} else {
return Err(anyhow!("no process configuration"));
};
if let Some(ref csocket_path) = self.console_socket {
ctr.set_console_socket(csocket_path)?;
}
match action {
ContainerAction::Create => {
ctr.start(process).await?;
}
ContainerAction::Run => {
ctr.run(process).await?;
}
}
let oci_state = ctr.oci_state()?;
let status = Status::new(
&self.state_root,
oci_state,
ctr.init_process_start_time,
ctr.created,
ctr.cgroup_manager
.ok_or_else(|| anyhow!("cgroup manager was not present"))?,
create_opts,
)?;
status.save()?;
if action == ContainerAction::Run {
let fifo_path = get_fifo_path(&status);
if fifo_path.exists() {
unlink(&fifo_path)?;
}
}
chdir(&current_dir)?;
Ok(Pid::from_raw(ctr.init_process_pid))
}
}
pub fn get_config_path<P: AsRef<Path>>(bundle: P) -> PathBuf {
bundle.as_ref().join(CONFIG_FILE_NAME)
}
pub fn get_fifo_path(status: &Status) -> PathBuf {
status.root.join(&status.id).join(EXEC_FIFO_FILENAME)
}
#[cfg(test)]
mod tests {
use super::*;
use crate::utils::test_utils::*;
use rustjail::container::EXEC_FIFO_FILENAME;
use std::path::PathBuf;
#[test]
fn test_get_config_path() {
let test_data = PathBuf::from(TEST_BUNDLE_PATH).join(CONFIG_FILE_NAME);
assert_eq!(get_config_path(TEST_BUNDLE_PATH), test_data);
}
#[test]
fn test_get_fifo_path() {
let test_data = PathBuf::from(TEST_BUNDLE_PATH)
.join(TEST_CONTAINER_ID)
.join(EXEC_FIFO_FILENAME);
let status = create_dummy_status();
assert_eq!(get_fifo_path(&status), test_data);
}
}

View File

@@ -0,0 +1,10 @@
// Copyright 2021-2022 Sony Group Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
pub mod builder;
pub mod cgroup;
pub mod container;
pub mod status;
pub mod utils;

View File

@@ -0,0 +1,246 @@
// Copyright 2021-2022 Sony Group Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
use crate::container::get_fifo_path;
use crate::utils::*;
use anyhow::{anyhow, Result};
use chrono::{DateTime, Utc};
use libc::pid_t;
use nix::{
errno::Errno,
sys::{signal::kill, stat::Mode},
unistd::Pid,
};
use oci::{ContainerState, State as OCIState};
use rustjail::{cgroups::fs::Manager as CgroupManager, specconv::CreateOpts};
use serde::{Deserialize, Serialize};
use std::{
fs::{self, File, OpenOptions},
path::{Path, PathBuf},
time::SystemTime,
};
const STATUS_FILE: &str = "status.json";
#[derive(Serialize, Deserialize, Debug, Clone)]
#[serde(rename_all = "camelCase")]
pub struct Status {
pub oci_version: String,
pub id: String,
pub pid: pid_t,
pub root: PathBuf,
pub bundle: PathBuf,
pub rootfs: String,
pub process_start_time: u64,
pub created: DateTime<Utc>,
pub cgroup_manager: CgroupManager,
pub config: CreateOpts,
}
impl Status {
pub fn new(
root: &Path,
oci_state: OCIState,
process_start_time: u64,
created_time: SystemTime,
cgroup_mg: CgroupManager,
config: CreateOpts,
) -> Result<Self> {
let created = DateTime::from(created_time);
let rootfs = config
.clone()
.spec
.ok_or_else(|| anyhow!("spec config was not present"))?
.root
.as_ref()
.ok_or_else(|| anyhow!("root config was not present in the spec"))?
.path
.clone();
Ok(Self {
oci_version: oci_state.version,
id: oci_state.id,
pid: oci_state.pid,
root: root.to_path_buf(),
bundle: PathBuf::from(&oci_state.bundle),
rootfs,
process_start_time,
created,
cgroup_manager: cgroup_mg,
config,
})
}
pub fn save(&self) -> Result<()> {
let state_file_path = Self::get_file_path(&self.root, &self.id);
if !&self.root.exists() {
create_dir_with_mode(&self.root, Mode::S_IRWXU, true)?;
}
let file = OpenOptions::new()
.write(true)
.create(true)
.truncate(true)
.open(state_file_path)?;
serde_json::to_writer(&file, self)?;
Ok(())
}
pub fn load(state_root: &Path, id: &str) -> Result<Self> {
let state_file_path = Self::get_file_path(state_root, id);
if !state_file_path.exists() {
return Err(anyhow!("container \"{}\" does not exist", id));
}
let file = File::open(&state_file_path)?;
let state: Self = serde_json::from_reader(&file)?;
Ok(state)
}
pub fn create_dir(state_root: &Path, id: &str) -> Result<()> {
let state_dir_path = Self::get_dir_path(state_root, id);
if !state_dir_path.exists() {
create_dir_with_mode(state_dir_path, Mode::S_IRWXU, true)?;
} else {
return Err(anyhow!("container with id exists: \"{}\"", id));
}
Ok(())
}
pub fn remove_dir(&self) -> Result<()> {
let state_dir_path = Self::get_dir_path(&self.root, &self.id);
fs::remove_dir_all(state_dir_path)?;
Ok(())
}
pub fn get_dir_path(state_root: &Path, id: &str) -> PathBuf {
state_root.join(id)
}
pub fn get_file_path(state_root: &Path, id: &str) -> PathBuf {
state_root.join(id).join(STATUS_FILE)
}
}
pub fn is_process_running(pid: Pid) -> Result<bool> {
match kill(pid, None) {
Err(errno) => {
if errno != Errno::ESRCH {
return Err(anyhow!("no such process"));
}
Ok(false)
}
Ok(()) => Ok(true),
}
}
pub fn get_current_container_state(status: &Status) -> Result<ContainerState> {
let running = is_process_running(Pid::from_raw(status.pid))?;
let mut has_fifo = false;
if running {
let fifo = get_fifo_path(status);
if fifo.exists() {
has_fifo = true
}
}
if running && !has_fifo {
// TODO: Check paused status.
// runk does not support pause command currently.
}
if !running {
Ok(ContainerState::Stopped)
} else if has_fifo {
Ok(ContainerState::Created)
} else {
Ok(ContainerState::Running)
}
}
pub fn get_all_pid(cgm: &CgroupManager) -> Result<Vec<Pid>> {
let cgroup_path = cgm.paths.get("devices");
match cgroup_path {
Some(v) => {
let path = Path::new(v);
if !path.exists() {
return Err(anyhow!("cgroup devices file does not exist"));
}
let procs_path = path.join("cgroup.procs");
let pids: Vec<Pid> = lines_from_file(&procs_path)?
.into_iter()
.map(|v| {
Pid::from_raw(
v.parse::<pid_t>()
.expect("failed to parse string into pid_t"),
)
})
.collect();
Ok(pids)
}
None => Err(anyhow!("cgroup devices file dose not exist")),
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::utils::test_utils::*;
use chrono::{DateTime, Utc};
use nix::unistd::getpid;
use oci::ContainerState;
use rustjail::cgroups::fs::Manager as CgroupManager;
use std::path::Path;
use std::time::SystemTime;
#[test]
fn test_status() {
let cgm: CgroupManager = serde_json::from_str(TEST_CGM_DATA).unwrap();
let oci_state = create_dummy_oci_state();
let created = SystemTime::now();
let status = Status::new(
Path::new(TEST_BUNDLE_PATH),
oci_state.clone(),
1,
created,
cgm,
create_dummy_opts(),
)
.unwrap();
assert_eq!(status.id, oci_state.id);
assert_eq!(status.pid, oci_state.pid);
assert_eq!(status.process_start_time, 1);
assert_eq!(status.created, DateTime::<Utc>::from(created));
}
#[test]
fn test_is_process_running() {
let pid = getpid();
let ret = is_process_running(pid).unwrap();
assert!(ret);
}
#[test]
fn test_get_current_container_state() {
let status = create_dummy_status();
let state = get_current_container_state(&status).unwrap();
assert_eq!(state, ContainerState::Running);
}
#[test]
fn test_get_all_pid() {
let cgm: CgroupManager = serde_json::from_str(TEST_CGM_DATA).unwrap();
assert!(get_all_pid(&cgm).is_ok());
}
}

View File

@@ -0,0 +1,106 @@
// Copyright 2021-2022 Sony Group Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
use anyhow::{anyhow, Result};
use nix::sys::stat::Mode;
use std::{
fs::{DirBuilder, File},
io::{prelude::*, BufReader},
os::unix::fs::DirBuilderExt,
path::Path,
};
pub fn lines_from_file<P: AsRef<Path>>(path: P) -> Result<Vec<String>> {
let file = File::open(&path)?;
let buf = BufReader::new(file);
Ok(buf
.lines()
.map(|v| v.expect("could not parse line"))
.collect())
}
pub fn create_dir_with_mode<P: AsRef<Path>>(path: P, mode: Mode, recursive: bool) -> Result<()> {
let path = path.as_ref();
if path.exists() {
return Err(anyhow!("{} already exists", path.display()));
}
Ok(DirBuilder::new()
.recursive(recursive)
.mode(mode.bits())
.create(path)?)
}
#[cfg(test)]
pub(crate) mod test_utils {
use crate::status::Status;
use nix::unistd::getpid;
use oci::State as OCIState;
use oci::{ContainerState, Root, Spec};
use rustjail::cgroups::fs::Manager as CgroupManager;
use rustjail::specconv::CreateOpts;
use std::path::Path;
use std::time::SystemTime;
pub const TEST_CONTAINER_ID: &str = "test";
pub const TEST_BUNDLE_PATH: &str = "/test";
pub const TEST_ANNOTATION: &str = "test";
pub const TEST_CGM_DATA: &str = r#"{
"paths": {
"devices": "/sys/fs/cgroup/devices"
},
"mounts": {
"devices": "/sys/fs/cgroup/devices"
},
"cpath": "test"
}"#;
pub fn create_dummy_opts() -> CreateOpts {
let spec = Spec {
root: Some(Root::default()),
..Default::default()
};
CreateOpts {
cgroup_name: "".to_string(),
use_systemd_cgroup: false,
no_pivot_root: false,
no_new_keyring: false,
spec: Some(spec),
rootless_euid: false,
rootless_cgroup: false,
}
}
pub fn create_dummy_oci_state() -> OCIState {
OCIState {
version: "1.0.0".to_string(),
id: TEST_CONTAINER_ID.to_string(),
status: ContainerState::Running,
pid: getpid().as_raw(),
bundle: TEST_BUNDLE_PATH.to_string(),
annotations: [(TEST_ANNOTATION.to_string(), TEST_ANNOTATION.to_string())]
.iter()
.cloned()
.collect(),
}
}
pub fn create_dummy_status() -> Status {
let cgm: CgroupManager = serde_json::from_str(TEST_CGM_DATA).unwrap();
let oci_state = create_dummy_oci_state();
let created = SystemTime::now();
let status = Status::new(
Path::new(TEST_BUNDLE_PATH),
oci_state.clone(),
1,
created,
cgm,
create_dummy_opts(),
)
.unwrap();
status
}
}

View File

@@ -0,0 +1,37 @@
// Copyright 2021-2022 Sony Group Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
use anyhow::Result;
use libcontainer::{builder::ContainerBuilder, container::ContainerAction};
use liboci_cli::Create;
use nix::unistd::Pid;
use slog::{info, Logger};
use std::{fs, path::Path};
pub async fn run(opts: Create, root: &Path, logger: &Logger) -> Result<()> {
let ctx = ContainerBuilder::default()
.id(opts.container_id)
.bundle(opts.bundle)
.root(root.to_path_buf())
.console_socket(opts.console_socket)
.build()?
.create_ctx()?;
let pid = ctx.launch(ContainerAction::Create, logger).await?;
if let Some(ref pid_file) = opts.pid_file {
create_pid_file(pid_file, pid)?;
}
info!(&logger, "create command finished successfully");
Ok(())
}
fn create_pid_file<P: AsRef<Path>>(pid_file: P, pid: Pid) -> Result<()> {
fs::write(pid_file.as_ref(), format!("{}", pid))?;
Ok(())
}

View File

@@ -0,0 +1,103 @@
// Copyright 2021-2022 Sony Group Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
use anyhow::{anyhow, Result};
use libcontainer::{
cgroup,
status::{get_current_container_state, Status},
};
use liboci_cli::Delete;
use nix::{
errno::Errno,
sys::signal::{kill, Signal},
unistd::Pid,
};
use oci::{ContainerState, State as OCIState};
use rustjail::container;
use slog::{info, Logger};
use std::{fs, path::Path};
pub async fn run(opts: Delete, root: &Path, logger: &Logger) -> Result<()> {
let container_id = &opts.container_id;
let status_dir = Status::get_dir_path(root, container_id);
if !status_dir.exists() {
return Err(anyhow!("container {} does not exist", container_id));
}
let status = if let Ok(value) = Status::load(root, container_id) {
value
} else {
fs::remove_dir_all(status_dir)?;
return Ok(());
};
let spec = status
.config
.spec
.as_ref()
.ok_or_else(|| anyhow!("spec config was not present in the status"))?;
let oci_state = OCIState {
version: status.oci_version.clone(),
id: status.id.clone(),
status: get_current_container_state(&status)?,
pid: status.pid,
bundle: status
.bundle
.to_str()
.ok_or_else(|| anyhow!("invalid bundle path"))?
.to_string(),
annotations: spec.annotations.clone(),
};
if spec.hooks.is_some() {
let hooks = spec
.hooks
.as_ref()
.ok_or_else(|| anyhow!("hooks config was not present"))?;
for h in hooks.poststop.iter() {
container::execute_hook(logger, h, &oci_state).await?;
}
}
match oci_state.status {
ContainerState::Stopped => {
destroy_container(&status)?;
}
ContainerState::Created => {
kill(Pid::from_raw(status.pid), Some(Signal::SIGKILL))?;
destroy_container(&status)?;
}
_ => {
if opts.force {
match kill(Pid::from_raw(status.pid), Some(Signal::SIGKILL)) {
Err(errno) => {
if errno != Errno::ESRCH {
return Err(anyhow!("{}", errno));
}
}
Ok(()) => {}
}
destroy_container(&status)?;
} else {
return Err(anyhow!(
"cannot delete container {} that is not stopped",
container_id
));
}
}
}
info!(&logger, "delete command finished successfully");
Ok(())
}
fn destroy_container(status: &Status) -> Result<()> {
cgroup::destroy_cgroup(&status.cgroup_manager)?;
status.remove_dir()?;
Ok(())
}

View File

@@ -0,0 +1,82 @@
// Copyright 2021-2022 Sony Group Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
use anyhow::{anyhow, Result};
use libcontainer::status::{self, get_current_container_state, Status};
use liboci_cli::Kill;
use nix::{
sys::signal::{kill, Signal},
unistd::Pid,
};
use oci::ContainerState;
use slog::{info, Logger};
use std::{convert::TryFrom, path::Path, str::FromStr};
pub fn run(opts: Kill, state_root: &Path, logger: &Logger) -> Result<()> {
let container_id = &opts.container_id;
let status = Status::load(state_root, container_id)?;
let current_state = get_current_container_state(&status)?;
let sig = parse_signal(&opts.signal)?;
// TODO: liboci-cli does not support --all option for kill command.
// After liboci-cli supports the option, we will change the following code.
let all = false;
if all {
let pids = status::get_all_pid(&status.cgroup_manager)?;
for pid in pids {
if !status::is_process_running(pid)? {
continue;
}
kill(pid, sig)?;
}
} else {
if current_state == ContainerState::Stopped {
return Err(anyhow!("container {} not running", container_id));
}
let p = Pid::from_raw(status.pid);
if status::is_process_running(p)? {
kill(p, sig)?;
}
}
info!(&logger, "kill command finished successfully");
Ok(())
}
fn parse_signal(signal: &str) -> Result<Signal> {
if let Ok(num) = signal.parse::<i32>() {
return Ok(Signal::try_from(num)?);
}
let mut signal_upper = signal.to_uppercase();
if !signal_upper.starts_with("SIG") {
signal_upper = "SIG".to_string() + &signal_upper;
}
Ok(Signal::from_str(&signal_upper)?)
}
#[cfg(test)]
mod tests {
use super::*;
use nix::sys::signal::Signal;
#[test]
fn test_parse_signal() {
assert_eq!(Signal::SIGHUP, parse_signal("1").unwrap());
assert_eq!(Signal::SIGHUP, parse_signal("sighup").unwrap());
assert_eq!(Signal::SIGHUP, parse_signal("hup").unwrap());
assert_eq!(Signal::SIGHUP, parse_signal("SIGHUP").unwrap());
assert_eq!(Signal::SIGHUP, parse_signal("HUP").unwrap());
assert_eq!(Signal::SIGKILL, parse_signal("9").unwrap());
assert_eq!(Signal::SIGKILL, parse_signal("sigkill").unwrap());
assert_eq!(Signal::SIGKILL, parse_signal("kill").unwrap());
assert_eq!(Signal::SIGKILL, parse_signal("SIGKILL").unwrap());
assert_eq!(Signal::SIGKILL, parse_signal("KILL").unwrap());
}
}

View File

@@ -0,0 +1,12 @@
// Copyright 2021-2022 Sony Group Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
pub mod create;
pub mod delete;
pub mod kill;
pub mod run;
pub mod spec;
pub mod start;
pub mod state;

View File

@@ -0,0 +1,26 @@
// Copyright 2021-2022 Sony Group Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
use anyhow::Result;
use libcontainer::{builder::ContainerBuilder, container::ContainerAction};
use liboci_cli::Run;
use slog::{info, Logger};
use std::path::Path;
pub async fn run(opts: Run, root: &Path, logger: &Logger) -> Result<()> {
let ctx = ContainerBuilder::default()
.id(opts.container_id)
.bundle(opts.bundle)
.root(root.to_path_buf())
.console_socket(opts.console_socket)
.build()?
.create_ctx()?;
ctx.launch(ContainerAction::Run, logger).await?;
info!(&logger, "run command finished successfully");
Ok(())
}

View File

@@ -0,0 +1,207 @@
// Copyright 2021-2022 Sony Group Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
//use crate::container::get_config_path;
use anyhow::Result;
use libcontainer::container::CONFIG_FILE_NAME;
use liboci_cli::Spec;
use slog::{info, Logger};
use std::{fs::File, io::Write, path::Path};
pub const DEFAULT_SPEC: &str = r#"{
"ociVersion": "1.0.2-dev",
"process": {
"terminal": true,
"user": {
"uid": 0,
"gid": 0
},
"args": [
"sh"
],
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"TERM=xterm"
],
"cwd": "/",
"capabilities": {
"bounding": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"effective": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"inheritable": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"permitted": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"ambient": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
]
},
"rlimits": [
{
"type": "RLIMIT_NOFILE",
"hard": 1024,
"soft": 1024
}
],
"noNewPrivileges": true
},
"root": {
"path": "rootfs",
"readonly": true
},
"hostname": "runk",
"mounts": [
{
"destination": "/proc",
"type": "proc",
"source": "proc"
},
{
"destination": "/dev",
"type": "tmpfs",
"source": "tmpfs",
"options": [
"nosuid",
"strictatime",
"mode=755",
"size=65536k"
]
},
{
"destination": "/dev/pts",
"type": "devpts",
"source": "devpts",
"options": [
"nosuid",
"noexec",
"newinstance",
"ptmxmode=0666",
"mode=0620",
"gid=5"
]
},
{
"destination": "/dev/shm",
"type": "tmpfs",
"source": "shm",
"options": [
"nosuid",
"noexec",
"nodev",
"mode=1777",
"size=65536k"
]
},
{
"destination": "/dev/mqueue",
"type": "mqueue",
"source": "mqueue",
"options": [
"nosuid",
"noexec",
"nodev"
]
},
{
"destination": "/sys",
"type": "sysfs",
"source": "sysfs",
"options": [
"nosuid",
"noexec",
"nodev",
"ro"
]
},
{
"destination": "/sys/fs/cgroup",
"type": "cgroup",
"source": "cgroup",
"options": [
"nosuid",
"noexec",
"nodev",
"relatime",
"ro"
]
}
],
"linux": {
"resources": {
"devices": [
{
"allow": false,
"access": "rwm"
}
]
},
"namespaces": [
{
"type": "pid"
},
{
"type": "network"
},
{
"type": "ipc"
},
{
"type": "uts"
},
{
"type": "mount"
}
],
"maskedPaths": [
"/proc/acpi",
"/proc/asound",
"/proc/kcore",
"/proc/keys",
"/proc/latency_stats",
"/proc/timer_list",
"/proc/timer_stats",
"/proc/sched_debug",
"/sys/firmware",
"/proc/scsi"
],
"readonlyPaths": [
"/proc/bus",
"/proc/fs",
"/proc/irq",
"/proc/sys",
"/proc/sysrq-trigger"
]
}
}"#;
pub fn run(_opts: Spec, logger: &Logger) -> Result<()> {
// TODO: liboci-cli does not support --bundle option for spec command.
// After liboci-cli supports the option, we will change the following code.
// let config_path = get_config_path(&opts.bundle);
let config_path = Path::new(".").join(CONFIG_FILE_NAME);
let config_data = DEFAULT_SPEC;
let mut file = File::create(config_path)?;
file.write_all(config_data.as_bytes())?;
info!(&logger, "spec command finished successfully");
Ok(())
}

View File

@@ -0,0 +1,48 @@
// Copyright 2021-2022 Sony Group Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
use crate::commands::state::get_container_state_name;
use anyhow::{anyhow, Result};
use libcontainer::{
container::get_fifo_path,
status::{get_current_container_state, Status},
};
use liboci_cli::Start;
use nix::unistd::unlink;
use oci::ContainerState;
use slog::{info, Logger};
use std::{fs::OpenOptions, io::prelude::*, path::Path, time::SystemTime};
pub fn run(opts: Start, state_root: &Path, logger: &Logger) -> Result<()> {
let mut status = Status::load(state_root, &opts.container_id)?;
let state = get_current_container_state(&status)?;
if state != ContainerState::Created {
return Err(anyhow!(
"cannot start a container in the {} state",
get_container_state_name(state)
));
};
let fifo_path = get_fifo_path(&status);
let mut file = OpenOptions::new().write(true).open(&fifo_path)?;
file.write_all("0".as_bytes())?;
info!(&logger, "container started");
status.process_start_time = SystemTime::now()
.duration_since(SystemTime::UNIX_EPOCH)?
.as_secs();
status.save()?;
if fifo_path.exists() {
unlink(&fifo_path)?;
}
info!(&logger, "start command finished successfully");
Ok(())
}

View File

@@ -0,0 +1,79 @@
// Copyright 2021-2022 Sony Group Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
use anyhow::Result;
use chrono::{DateTime, Utc};
use libcontainer::status::{get_current_container_state, Status};
use liboci_cli::State;
use oci::ContainerState;
use serde::{Deserialize, Serialize};
use slog::{info, Logger};
use std::path::{Path, PathBuf};
#[derive(Serialize, Deserialize, Debug)]
#[serde(rename_all = "camelCase")]
pub struct RuntimeState {
pub oci_version: String,
pub id: String,
pub pid: i32,
pub status: String,
pub bundle: PathBuf,
pub created: DateTime<Utc>,
}
impl RuntimeState {
pub fn new(status: Status, state: ContainerState) -> Self {
Self {
oci_version: status.oci_version,
id: status.id,
pid: status.pid,
status: get_container_state_name(state),
bundle: status.bundle,
created: status.created,
}
}
}
pub fn run(opts: State, state_root: &Path, logger: &Logger) -> Result<()> {
let status = Status::load(state_root, &opts.container_id)?;
let state = get_current_container_state(&status)?;
let oci_state = RuntimeState::new(status, state);
let json_state = &serde_json::to_string_pretty(&oci_state)?;
println!("{}", json_state);
info!(&logger, "state command finished successfully");
Ok(())
}
pub fn get_container_state_name(state: ContainerState) -> String {
match state {
ContainerState::Creating => "creating",
ContainerState::Created => "created",
ContainerState::Running => "running",
ContainerState::Stopped => "stopped",
ContainerState::Paused => "paused",
}
.into()
}
#[cfg(test)]
mod tests {
use super::*;
use oci::ContainerState;
#[test]
fn test_get_container_state_name() {
assert_eq!(
"creating",
get_container_state_name(ContainerState::Creating)
);
assert_eq!("created", get_container_state_name(ContainerState::Created));
assert_eq!("running", get_container_state_name(ContainerState::Running));
assert_eq!("stopped", get_container_state_name(ContainerState::Stopped));
assert_eq!("paused", get_container_state_name(ContainerState::Paused));
}
}

111
src/tools/runk/src/main.rs Normal file
View File

@@ -0,0 +1,111 @@
// Copyright 2021-2022 Sony Group Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
use anyhow::{anyhow, Result};
use clap::{crate_description, crate_name, Parser};
use liboci_cli::{CommonCmd, GlobalOpts, StandardCmd};
use slog::{o, Logger};
use slog_async::AsyncGuard;
use std::{
fs::OpenOptions,
path::{Path, PathBuf},
process::exit,
};
const DEFAULT_ROOT_DIR: &str = "/run/runk";
const DEFAULT_LOG_LEVEL: slog::Level = slog::Level::Info;
mod commands;
#[derive(Parser, Debug)]
enum SubCommand {
#[clap(flatten)]
Standard(StandardCmd),
#[clap(flatten)]
Common(CommonCmd),
}
#[derive(Parser, Debug)]
#[clap(version, author, about = crate_description!())]
struct Cli {
#[clap(flatten)]
global: GlobalOpts,
#[clap(subcommand)]
subcmd: SubCommand,
}
async fn cmd_run(subcmd: SubCommand, root_path: &Path, logger: &Logger) -> Result<()> {
match subcmd {
SubCommand::Standard(cmd) => match cmd {
StandardCmd::Create(create) => commands::create::run(create, root_path, logger).await,
StandardCmd::Start(start) => commands::start::run(start, root_path, logger),
StandardCmd::Kill(kill) => commands::kill::run(kill, root_path, logger),
StandardCmd::Delete(delete) => commands::delete::run(delete, root_path, logger).await,
StandardCmd::State(state) => commands::state::run(state, root_path, logger),
},
SubCommand::Common(cmd) => match cmd {
CommonCmd::Run(run) => commands::run::run(run, root_path, logger).await,
CommonCmd::Spec(spec) => commands::spec::run(spec, logger),
_ => {
return Err(anyhow!("command is not implemented yet"));
}
},
}
}
fn setup_logger(
log_file: Option<PathBuf>,
log_level: slog::Level,
) -> Result<(Logger, Option<AsyncGuard>)> {
if let Some(ref file) = log_file {
let log_writer = OpenOptions::new()
.write(true)
.read(true)
.create(true)
.truncate(true)
.open(&file)?;
// TODO: Support 'text' log format.
let (logger_local, logger_async_guard_local) =
logging::create_logger(crate_name!(), crate_name!(), log_level, log_writer);
Ok((logger_local, Some(logger_async_guard_local)))
} else {
let logger = slog::Logger::root(slog::Discard, o!());
Ok((logger, None))
}
}
async fn real_main() -> Result<()> {
let cli = Cli::parse();
let root_path = if let Some(path) = cli.global.root {
path
} else {
PathBuf::from(DEFAULT_ROOT_DIR)
};
let log_level = if cli.global.debug {
slog::Level::Debug
} else {
DEFAULT_LOG_LEVEL
};
let (logger, _async_guard) = setup_logger(cli.global.log, log_level)?;
cmd_run(cli.subcmd, &root_path, &logger).await?;
Ok(())
}
#[tokio::main]
async fn main() {
if let Err(e) = real_main().await {
eprintln!("ERROR: {}", e);
exit(1);
}
exit(0);
}

View File

@@ -11,7 +11,41 @@ be utilized to install Kata Containers on a running Kubernetes cluster.
### Install Kata on a running Kubernetes cluster
#### Installing the latest image
#### k3s cluster
For your [k3s](https://k3s.io/) cluster, run:
```sh
$ git clone github.com/kata-containers/kata-containers
```
Check and switch to the stable branch of your choice, if wanted, and then run:
```bash
$ cd kata-containers/kata-containers/tools/packaging/kata-deploy
$ kubectl apply -f kata-rbac/base/kata-rbac.yaml
$ kubectl apply -k kata-deploy/overlays/k3s
```
#### RKE2 cluster
For your [RKE2](https://docs.rke2.io/) cluster, run:
```sh
$ git clone github.com/kata-containers/kata-containers
```
Check and switch to the stable branch of your choice, if wanted, and then run:
```bash
$ cd kata-containers/kata-containers/tools/packaging/kata-deploy
$ kubectl apply -f kata-rbac/base/kata-rbac.yaml
$ kubectl apply -k kata-deploy/overlays/rke2
```
#### Vanilla Kubernetes cluster
##### Installing the latest image
The latest image refers to pre-release and release candidate content. For stable releases, please, use the "stable" instructions.
@@ -20,7 +54,7 @@ $ kubectl apply -f https://raw.githubusercontent.com/kata-containers/kata-contai
$ kubectl apply -f https://raw.githubusercontent.com/kata-containers/kata-containers/main/tools/packaging/kata-deploy/kata-deploy/base/kata-deploy.yaml
```
#### Installing the stable image
##### Installing the stable image
The stable image refers to the last stable releases content.
@@ -32,20 +66,9 @@ $ kubectl apply -f https://raw.githubusercontent.com/kata-containers/kata-contai
$ kubectl apply -f https://raw.githubusercontent.com/kata-containers/kata-containers/main/tools/packaging/kata-deploy/kata-deploy/base/kata-deploy-stable.yaml
```
#### For your [k3s](https://k3s.io/) cluster, do:
```sh
$ GO111MODULE=auto go get github.com/kata-containers/kata-containers
```
```bash
$ cd $GOPATH/src/github.com/kata-containers/kata-containers/tools/packaging/kata-deploy
$ kubectl apply -k kata-deploy/overlays/k3s
```
#### Ensure kata-deploy is ready
```bash
kubectl -n kube-system wait --timeout=10m --for=condition=Ready -l name=kata-deploy pod
$ kubectl -n kube-system wait --timeout=10m --for=condition=Ready -l name=kata-deploy pod
```
### Run a sample workload

View File

@@ -0,0 +1,5 @@
bases:
- ../../base
patchesStrategicMerge:
- mount_rke2_conf.yaml

View File

@@ -0,0 +1,17 @@
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kubelet-kata-cleanup
namespace: kube-system
spec:
template:
spec:
containers:
- name: kube-kata-cleanup
volumeMounts:
- name: containerd-conf
mountPath: /etc/containerd/
volumes:
- name: containerd-conf
hostPath:
path: /var/lib/rancher/rke2/agent/etc/containerd/

View File

@@ -0,0 +1,5 @@
bases:
- ../../base
patchesStrategicMerge:
- mount_rke2_conf.yaml

View File

@@ -0,0 +1,12 @@
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kata-deploy
namespace: kube-system
spec:
template:
spec:
volumes:
- name: containerd-conf
hostPath:
path: /var/lib/rancher/rke2/agent/etc/containerd/

View File

@@ -40,7 +40,11 @@ function get_container_runtime() {
die "invalid node name"
fi
if echo "$runtime" | grep -qE 'containerd.*-k3s'; then
if systemctl is-active --quiet k3s-agent; then
if systemctl is-active --quiet rke2-agent; then
echo "rke2-agent"
elif systemctl is-active --quiet rke2-server; then
echo "rke2-server"
elif systemctl is-active --quiet k3s-agent; then
echo "k3s-agent"
else
echo "k3s"
@@ -63,7 +67,7 @@ function configure_cri_runtime() {
crio)
configure_crio
;;
containerd | k3s | k3s-agent)
containerd | k3s | k3s-agent | rke2-agent | rke2-server)
configure_containerd
;;
esac
@@ -241,7 +245,7 @@ function cleanup_cri_runtime() {
crio)
cleanup_crio
;;
containerd | k3s | k3s-agent)
containerd | k3s | k3s-agent | rke2-agent | rke2-server)
cleanup_containerd
;;
esac
@@ -280,7 +284,7 @@ function main() {
# CRI-O isn't consistent with the naming -- let's use crio to match the service file
if [ "$runtime" == "cri-o" ]; then
runtime="crio"
elif [ "$runtime" == "k3s" ] || [ "$runtime" == "k3s-agent" ]; then
elif [ "$runtime" == "k3s" ] || [ "$runtime" == "k3s-agent" ] || [ "$runtime" == "rke2-agent" ] || [ "$runtime" == "rke2-server" ]; then
containerd_conf_tmpl_file="${containerd_conf_file}.tmpl"
if [ ! -f "$containerd_conf_tmpl_file" ]; then
cp "$containerd_conf_file" "$containerd_conf_tmpl_file"
@@ -303,11 +307,10 @@ function main() {
fi
# only install / remove / update if we are dealing with CRIO or containerd
if [[ "$runtime" =~ ^(crio|containerd|k3s|k3s-agent)$ ]]; then
if [[ "$runtime" =~ ^(crio|containerd|k3s|k3s-agent|rke2-agent|rke2-server)$ ]]; then
case "$action" in
install)
install_artifacts
configure_cri_runtime "$runtime"
configure_kata

View File

@@ -100,8 +100,8 @@ assets:
.*/v?(\d\S+)\.tar\.gz
tdx:
description: "VMM that uses KVM and supports TDX"
url: "https://github.com/intel/qemu-tdx"
tag: "tdx-qemu-2021.11.29-v6.0.0-rc1-mvp"
url: "https://github.com/intel/qemu-dcp"
tag: "SPR-BKC-QEMU-v2.2"
qemu-experimental:
description: "QEMU with virtiofs support"