CCv0: Merge main into CCv0 branch

Merge remote-tracking branch 'upstream/main' into CCv0 Fixes: #4200 Signed-off-by: Megan Wright <megan.wright@.ibm.com>
2026-02-18 13:04:36 +01:00 · 2022-05-04 11:26:50 +01:00
parent 9b27329281 ec250c10e9
commit ef1ae5bc93
64 changed files with 5110 additions and 539 deletions
--- a/1
+++ b/1
@@ -14,6 +14,7 @@ TOOLS =

 TOOLS += agent-ctl
 TOOLS += trace-forwarder
+TOOLS += runk

 STANDARD_TARGETS = build check clean install test vendor

--- a/README.md
+++ b/README.md
@@ -132,6 +132,7 @@ The table below lists the remaining parts of the project:
 | [osbuilder](tools/osbuilder) | infrastructure | Tool to create "mini O/S" rootfs and initrd images and kernel for the hypervisor. |
 | [`agent-ctl`](src/tools/agent-ctl) | utility | Tool that provides low-level access for testing the agent. |
 | [`trace-forwarder`](src/tools/trace-forwarder) | utility | Agent tracing helper. |
+| [`runk`](src/tools/runk) | utility | Standard OCI container runtime based on the agent. |
 | [`ci`](https://github.com/kata-containers/ci) | CI | Continuous Integration configuration files and scripts. |
 | [`katacontainers.io`](https://github.com/kata-containers/www.katacontainers.io) | Source for the [`katacontainers.io`](https://www.katacontainers.io) site. |

--- a/docs/design/README.md
+++ b/docs/design/README.md
@@ -11,6 +11,7 @@ Kata Containers design documents:
 - [`Inotify` support](inotify.md)
 - [Metrics(Kata 2.0)](kata-2-0-metrics.md)
 - [Design for Kata Containers `Lazyload` ability with `nydus`](kata-nydus-design.md)
+- [Design for direct-assigned volume](direct-blk-device-assignment.md)

 ---

--- a/docs/design/direct-blk-device-assignment.md
+++ b/docs/design/direct-blk-device-assignment.md
@@ -0,0 +1,253 @@
+# Motivation
+Today, there exist a few gaps between Container Storage Interface (CSI) and virtual machine (VM) based runtimes such as Kata Containers 
+that prevent them from working together smoothly.
+
+First, it’s cumbersome to use a persistent volume (PV) with Kata Containers. Today, for a PV with Filesystem volume mode, Virtio-fs
+is the only way to surface it inside a Kata Container guest VM. But often mounting the filesystem (FS) within the guest operating system (OS) is 
+desired due to performance benefits, availability of native FS features and security benefits over the Virtio-fs mechanism.
+
+Second, it’s difficult if not impossible to resize a PV online with Kata Containers. While a PV can be expanded on the host OS, 
+the updated metadata needs to be propagated to the guest OS in order for the application container to use the expanded volume. 
+Currently, there is not a way to propagate the PV metadata from the host OS to the guest OS without restarting the Pod sandbox.
+
+# Proposed Solution
+
+Because of the OS boundary, these features cannot be implemented in the CSI node driver plugin running on the host OS 
+as is normally done in the runc container. Instead, they can be done by the Kata Containers agent inside the guest OS, 
+but it requires the CSI driver to pass the relevant information to the Kata Containers runtime. 
+An ideal long term solution would be to have the `kubelet` coordinating the communication between the CSI driver and 
+the container runtime, as described in [KEP-2857](https://github.com/kubernetes/enhancements/pull/2893/files). 
+However, as the KEP is still under review, we would like to propose a short/medium term solution to unblock our use case.
+
+The proposed solution is built on top of a previous [proposal](https://github.com/egernst/kata-containers/blob/da-proposal/docs/design/direct-assign-volume.md) 
+described by Eric Ernst. The previous proposal has two gaps:
+
+1. Writing a `csiPlugin.json` file to the volume root path introduced a security risk. A malicious user can gain unauthorized 
+access to a block device by writing their own `csiPlugin.json` to the above location through an ephemeral CSI plugin.  
+
+2. The proposal didn't describe how to establish a mapping between a volume and a kata sandbox, which is needed for 
+implementing CSI volume resize and volume stat collection APIs.
+
+This document particularly focuses on how to address these two gaps.
+
+## Assumptions and Limitations
+1. The proposal assumes that a block device volume will only be used by one Pod on a node at a time, which we believe 
+is the most common pattern in Kata Containers use cases. It’s also unsafe to have the same block device attached to more than 
+one Kata pod. In the context of Kubernetes, the `PersistentVolumeClaim` (PVC) needs to have the `accessMode` as `ReadWriteOncePod`. 
+2. More advanced Kubernetes volume features such as, `fsGroup`, `fsGroupChangePolicy`, and `subPath` are not supported. 
+
+## End User Interface
+
+1. The user specifies a PV as a direct-assigned volume. How a PV is specified as a direct-assigned volume is left for each CSI implementation to decide.
+There are a few options for reference:
+   1. A storage class parameter specifies whether it's a direct-assigned volume. This avoids any lookups of PVC 
+   or Pod information from the CSI plugin (as external provisioner takes care of these). However, all PVs in the storage class with the parameter set 
+   will have host mounts skipped.
+   2. Use a PVC annotation. This approach requires the CSI plugins have `--extra-create-metadata` [set](https://kubernetes-csi.github.io/docs/external-provisioner.html#persistentvolumeclaim-and-persistentvolume-parameters)
+   to be able to perform a lookup of the PVC annotations from the API server. Pro: API server lookup of annotations only required during creation of PV. 
+   Con: The CSI plugin will always skip host mounting of the PV.
+   3. The CSI plugin can also lookup pod `runtimeclass` during `NodePublish`. This approach can be found in the [ALIBABA CSI plugin](https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver/blob/master/pkg/disk/nodeserver.go#L248).
+2. The CSI node driver delegates the direct assigned volume to the Kata Containers runtime. The CSI node driver APIs need to 
+   be modified to pass the volume mount information and collect volume information to/from the Kata Containers runtime by invoking `kata-runtime` command line commands.
+   * **NodePublishVolume** -- It invokes `kata-runtime direct-volume add --volume-path [volumePath] --mount-info [mountInfo]` 
+   to propagate the volume mount information to the Kata Containers runtime for it to carry out the filesystem mount operation.
+   The `volumePath` is the [target_path](https://github.com/container-storage-interface/spec/blob/master/csi.proto#L1364) in the CSI `NodePublishVolumeRequest`.
+   The `mountInfo` is a serialized JSON string. 
+   * **NodeGetVolumeStats** -- It invokes `kata-runtime direct-volume stats --volume-path [volumePath]` to retrieve the filesystem stats of direct-assigned volume.
+   * **NodeExpandVolume** -- It invokes `kata-runtime direct-volume resize --volume-path [volumePath] --size [size]` to send a resize request to the Kata Containers runtime to
+   resize the direct-assigned volume.
+   * **NodeStageVolume/NodeUnStageVolume** -- It invokes `kata-runtime direct-volume remove --volume-path [volumePath]` to remove the persisted metadata of a direct-assigned volume.
+
+The `mountInfo` object is defined as follows:
+```Golang
+type MountInfo struct {
+    // The type of the volume (ie. block)
+    VolumeType string `json:"volume-type"`
+    // The device backing the volume.
+    Device string `json:"device"`
+    // The filesystem type to be mounted on the volume.
+    FsType string `json:"fstype"`
+    // Additional metadata to pass to the agent regarding this volume.
+    Metadata map[string]string `json:"metadata,omitempty"`
+    // Additional mount options.
+    Options []string `json:"options,omitempty"`
+}
+```
+Notes: given that the `mountInfo` is persisted to the disk by the Kata runtime, it shouldn't container any secrets (such as SMB mount password).
+
+## Implementation Details
+
+### Kata runtime
+Instead of the CSI node driver writing the mount info into a `csiPlugin.json` file under the volume root, 
+as described in the original proposal, here we propose that the CSI node driver passes the mount information to 
+the Kata Containers runtime through a new `kata-runtime` commandline command. The `kata-runtime` then writes the mount 
+information to a `mount-info.json` file in a predefined location (`/run/kata-containers/shared/direct-volumes/[volume_path]/`).
+
+When the Kata Containers runtime starts a container, it verifies whether a volume mount is a direct-assigned volume by checking 
+whether there is a `mountInfo` file under the computed Kata `direct-volumes` directory. If it is, the runtime parses the `mountInfo` file, 
+updates the mount spec with the data in `mountInfo`. The updated mount spec is then passed to the Kata agent in the guest VM together
+with other mounts. The Kata Containers runtime also creates a file named by the sandbox id under the `direct-volumes/[volume_path]/` 
+directory. The reason for adding a sandbox id file is to establish a mapping between the volume and the sandbox using it. 
+Later, when the Kata Containers runtime handles the `get-stats` and `resize` commands, it uses the sandbox id to identify 
+the endpoint of the corresponding `containerd-shim-kata-v2`.
+
+### containerd-shim-kata-v2 changes
+`containerd-shim-kata-v2` provides an API for sandbox management through a Unix domain socket. Two new handlers are proposed: `/direct-volume/stats` and `/direct-volume/resize`:
+
+Example:
+
+```bash
+$ curl --unix-socket "$shim_socket_path" -I -X GET 'http://localhost/direct-volume/stats/[urlSafeVolumePath]'
+$ curl --unix-socket "$shim_socket_path" -I -X POST 'http://localhost/direct-volume/resize' -d '{ "volumePath"": [volumePath], "Size": "123123" }'
+```
+
+The shim then forwards the corresponding request to the `kata-agent` to carry out the operations inside the guest VM. For `resize` operation, 
+the Kata runtime also needs to notify the hypervisor to resize the block device (e.g. call `block_resize` in QEMU). 
+
+### Kata agent changes
+
+The mount spec of a direct-assigned volume is passed to `kata-agent` through the existing `Storage` GRPC object. 
+Two new APIs and three new GRPC objects are added to GRPC protocol between the shim and agent for resizing and getting volume stats:
+```protobuf
+
+rpc GetVolumeStats(VolumeStatsRequest) returns (VolumeStatsResponse);
+rpc ResizeVolume(ResizeVolumeRequest) returns (google.protobuf.Empty);
+
+message VolumeStatsRequest {
+// The volume path on the guest outside the container
+    string volume_guest_path = 1;
+}
+
+message ResizeVolumeRequest {
+// Full VM guest path of the volume (outside the container)
+    string volume_guest_path = 1;
+    uint64 size = 2;
+}
+
+// This should be kept in sync with CSI NodeGetVolumeStatsResponse (https://github.com/container-storage-interface/spec/blob/v1.5.0/csi.proto)
+message VolumeStatsResponse {
+   // This field is OPTIONAL.
+   repeated VolumeUsage usage = 1;
+   // Information about the current condition of the volume.
+   // This field is OPTIONAL.
+   // This field MUST be specified if the VOLUME_CONDITION node
+   // capability is supported.
+   VolumeCondition volume_condition = 2;
+}
+message VolumeUsage {
+   enum Unit {
+      UNKNOWN = 0;
+      BYTES = 1;
+      INODES = 2;
+   }
+   // The available capacity in specified Unit. This field is OPTIONAL.
+   // The value of this field MUST NOT be negative.
+   uint64 available = 1;
+
+   // The total capacity in specified Unit. This field is REQUIRED.
+   // The value of this field MUST NOT be negative.
+   uint64 total = 2;
+
+   // The used capacity in specified Unit. This field is OPTIONAL.
+   // The value of this field MUST NOT be negative.
+   uint64 used = 3;
+
+   // Units by which values are measured. This field is REQUIRED.
+   Unit unit = 4;
+}
+
+// VolumeCondition represents the current condition of a volume.
+message VolumeCondition {
+
+   // Normal volumes are available for use and operating optimally.
+   // An abnormal volume does not meet these criteria.
+   // This field is REQUIRED.
+   bool abnormal = 1;
+
+   // The message describing the condition of the volume.
+   // This field is REQUIRED.
+   string message = 2;
+}
+
+```
+
+### Step by step walk-through
+
+Given the following definition:
+```YAML
+---
+apiVersion: v1
+kind: Pod
+metadata:
+  name: app
+spec:
+  runtime-class: kata-qemu
+  containers:
+  - name: app
+    image: centos
+    command: ["/bin/sh"]
+    args: ["-c", "while true; do echo $(date -u) >> /data/out.txt; sleep 5; done"]
+    volumeMounts:
+    - name: persistent-storage
+      mountPath: /data
+  volumes:
+  - name: persistent-storage
+    persistentVolumeClaim:
+      claimName: ebs-claim
+---
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  annotations:
+    skip-hostmount: "true"
+  name: ebs-claim
+spec:
+  accessModes:
+    - ReadWriteOncePod
+  volumeMode: Filesystem
+  storageClassName: ebs-sc
+  resources:
+    requests:
+      storage: 4Gi
+---
+kind: StorageClass
+apiVersion: storage.k8s.io/v1
+metadata:
+  name: ebs-sc
+provisioner: ebs.csi.aws.com
+volumeBindingMode: WaitForFirstConsumer
+parameters:
+  csi.storage.k8s.io/fstype: ext4
+
+```
+Let’s assume that changes have been made in the `aws-ebs-csi-driver` node driver.
+
+**Node publish volume**
+1. In the node CSI driver, the `NodePublishVolume` API invokes: `kata-runtime direct-volume add --volume-path "/kubelet/a/b/c/d/sdf" --mount-info "{\"Device\": \"/dev/sdf\", \"fstype\": \"ext4\"}"`.
+2. The `Kata-runtime` writes the mount-info JSON to a file called `mountInfo.json` under `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
+
+**Node unstage volume**
+1. In the node CSI driver, the `NodeUnstageVolume` API invokes: `kata-runtime direct-volume remove --volume-path "/kubelet/a/b/c/d/sdf"`.
+2. Kata-runtime deletes the directory `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
+
+**Use the volume in sandbox**
+1. Upon the request to start a container, the `containerd-shim-kata-v2` examines the container spec,
+and iterates through the mounts. For each mount, if there is a `mountInfo.json` file under `/run/kata-containers/shared/direct-volumes/[mount source path]`,
+it generates a `storage` GRPC object after overwriting the mount spec with the information in `mountInfo.json`.
+2. The shim sends the storage objects to kata-agent through TTRPC.
+3. The shim writes a file with the sandbox id as the name under `/run/kata-containers/shared/direct-volumes/[mount source path]`.
+4. The kata-agent mounts the storage objects for the container.
+
+**Node expand volume**
+1. In the node CSI driver, the `NodeExpandVolume` API invokes: `kata-runtime direct-volume resize –-volume-path "/kubelet/a/b/c/d/sdf" –-size 8Gi`.
+2. The Kata runtime checks whether there is a sandbox id file under the directory `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
+3. The Kata runtime identifies the shim instance through the sandbox id, and sends a GRPC request to resize the volume.
+4. The shim handles the request, asks the hypervisor to resize the block device and sends a GRPC request to Kata agent to resize the filesystem.
+5. Kata agent receives the request and resizes the filesystem.
+
+**Node get volume stats**
+1. In the node CSI driver, the `NodeGetVolumeStats` API invokes: `kata-runtime direct-volume stats –-volume-path "/kubelet/a/b/c/d/sdf"`.
+2. The Kata runtime checks whether there is a sandbox id file under the directory `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
+3. The Kata runtime identifies the shim instance through the sandbox id, and sends a GRPC request to get the volume stats.
+4. The shim handles the request and forwards it to the Kata agent.
+5. Kata agent receives the request and returns the filesystem stats.
--- a/docs/design/virtualization.md
+++ b/docs/design/virtualization.md
@@ -39,7 +39,7 @@ Details of each solution and a summary are provided below.
 Kata Containers with QEMU has complete compatibility with Kubernetes.

 Depending on the host architecture, Kata Containers supports various machine types,
-for example `pc` and `q35` on x86 systems, `virt` on ARM systems and `pseries` on IBM Power systems. The default Kata Containers
+for example `q35` on x86 systems, `virt` on ARM systems and `pseries` on IBM Power systems. The default Kata Containers
 machine type is `q35`. The machine type and its [`Machine accelerators`](#machine-accelerators) can
 be changed by editing the runtime [`configuration`](architecture/README.md#configuration) file.

@@ -60,9 +60,8 @@ Machine accelerators are architecture specific and can be used to improve the pe
 and enable specific features of the machine types. The following machine accelerators
 are used in Kata Containers:

- NVDIMM: This machine accelerator is x86 specific and only supported by `pc` and
-`q35` machine types. `nvdimm` is used to provide the root filesystem as a persistent
-memory device to the Virtual Machine.
+- NVDIMM: This machine accelerator is x86 specific and only supported by `q35` machine types.
+`nvdimm` is used to provide the root filesystem as a persistent memory device to the Virtual Machine.

 #### Hotplug devices

--- a/docs/install/container-manager/containerd/containerd-install.md
+++ b/docs/install/container-manager/containerd/containerd-install.md
@@ -81,7 +81,7 @@
  - Download the standard `systemd(1)` service file and install to
    `/etc/systemd/system/`:

-    - https://raw.githubusercontent.com/containerd/containerd/master/containerd.service
+    - https://raw.githubusercontent.com/containerd/containerd/main/containerd.service

    > **Notes:**
    >
--- a/docs/use-cases/GPU-passthrough-and-Kata.md
+++ b/docs/use-cases/GPU-passthrough-and-Kata.md
@@ -3,4 +3,4 @@
 Kata Containers supports passing certain GPUs from the host into the container. Select the GPU vendor for detailed information:

 - [Intel](Intel-GPU-passthrough-and-Kata.md)
- [Nvidia](Nvidia-GPU-passthrough-and-Kata.md)
+- [NVIDIA](NVIDIA-GPU-passthrough-and-Kata.md)
--- a/docs/use-cases/NVIDIA-GPU-passthrough-and-Kata.md
+++ b/docs/use-cases/NVIDIA-GPU-passthrough-and-Kata.md
@@ -0,0 +1,372 @@
+# Using NVIDIA GPU device with Kata Containers
+
+An NVIDIA GPU device can be passed to a Kata Containers container using GPU
+passthrough (NVIDIA GPU pass-through mode) as well as GPU mediated passthrough
+(NVIDIA vGPU mode).
+
+NVIDIA GPU pass-through mode, an entire physical GPU is directly assigned to one
+VM, bypassing the NVIDIA Virtual GPU Manager. In this mode of operation, the GPU
+is accessed exclusively by the NVIDIA driver running in the VM to which it is
+assigned. The GPU is not shared among VMs.
+
+NVIDIA Virtual GPU (vGPU) enables multiple virtual machines (VMs) to have
+simultaneous, direct access to a single physical GPU, using the same NVIDIA
+graphics drivers that are deployed on non-virtualized operating systems. By
+doing this, NVIDIA vGPU provides VMs with unparalleled graphics performance,
+compute performance, and application compatibility, together with the
+cost-effectiveness and scalability brought about by sharing a GPU among multiple
+workloads. A vGPU can be either time-sliced or Multi-Instance GPU (MIG)-backed
+with [MIG-slices](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/).
+
+| Technology | Description | Behavior | Detail |
+| --- | --- | --- | --- |
+| NVIDIA GPU pass-through mode | GPU passthrough | Physical GPU assigned to a single VM | Direct GPU assignment to VM without limitation |
+| NVIDIA vGPU time-sliced | GPU time-sliced | Physical GPU time-sliced for multiple VMs | Mediated passthrough |
+| NVIDIA vGPU MIG-backed | GPU with MIG-slices | Physical GPU MIG-sliced for multiple VMs | Mediated passthrough |
+
+## Hardware Requirements
+
+NVIDIA GPUs Recommended for Virtualization:
+
+- NVIDIA Tesla (T4, M10, P6, V100 or newer)
+- NVIDIA Quadro RTX 6000/8000
+
+## Host BIOS Requirements
+
+Some hardware requires a larger PCI BARs window, for example, NVIDIA Tesla P100,
+K40m
+
+```sh
+$ lspci -s d0:00.0 -vv | grep Region
+        Region 0: Memory at e7000000 (32-bit, non-prefetchable) [size=16M]
+        Region 1: Memory at 222800000000 (64-bit, prefetchable) [size=32G] # Above 4G
+        Region 3: Memory at 223810000000 (64-bit, prefetchable) [size=32M]
+```
+
+For large BARs devices, MMIO mapping above 4G address space should be `enabled`
+in the PCI configuration of the BIOS.
+
+Some hardware vendors use different name in BIOS, such as:
+
+- Above 4G Decoding
+- Memory Hole for PCI MMIO
+- Memory Mapped I/O above 4GB
+
+If one is using a GPU based on the Ampere architecture and later additionally
+SR-IOV needs to be enabled for the vGPU use-case.
+
+The following steps outline the workflow for using an NVIDIA GPU with Kata.
+
+## Host Kernel Requirements
+
+The following configurations need to be enabled on your host kernel:
+
+- `CONFIG_VFIO`
+- `CONFIG_VFIO_IOMMU_TYPE1`
+- `CONFIG_VFIO_MDEV`
+- `CONFIG_VFIO_MDEV_DEVICE`
+- `CONFIG_VFIO_PCI`
+
+Your host kernel needs to be booted with `intel_iommu=on` on the kernel command
+line.
+
+## Install and configure Kata Containers
+
+To use non-large BARs devices (for example, NVIDIA Tesla T4), you need Kata
+version 1.3.0 or above. Follow the [Kata Containers setup
+instructions](../install/README.md) to install the latest version of Kata.
+
+To use large BARs devices (for example, NVIDIA Tesla P100), you need Kata
+version 1.11.0 or above.
+
+The following configuration in the Kata `configuration.toml` file as shown below
+can work:
+
+Hotplug for PCI devices with small BARs by `acpi_pcihp` (Linux's ACPI PCI
+Hotplug driver):
+
+```sh
+machine_type = "q35"
+
+hotplug_vfio_on_root_bus = false
+```
+
+Hotplug for PCIe devices with large BARs by `pciehp` (Linux's PCIe Hotplug
+driver):
+
+```sh
+machine_type = "q35"
+
+hotplug_vfio_on_root_bus = true
+pcie_root_port = 1
+```
+
+## Build Kata Containers kernel with GPU support
+
+The default guest kernel installed with Kata Containers does not provide GPU
+support. To use an NVIDIA GPU with Kata Containers, you need to build a kernel
+with the necessary GPU support.
+
+The following kernel config options need to be enabled:
+
+```sh
+# Support PCI/PCIe device hotplug (Required for large BARs device)
+CONFIG_HOTPLUG_PCI_PCIE=y
+
+# Support for loading modules (Required for load NVIDIA drivers)
+CONFIG_MODULES=y
+CONFIG_MODULE_UNLOAD=y
+
+# Enable the MMIO access method for PCIe devices (Required for large BARs device)
+ CONFIG_PCI_MMCONFIG=y
+```
+
+The following kernel config options need to be disabled:
+
+```sh
+# Disable Open Source NVIDIA driver nouveau
+# It conflicts with NVIDIA official driver
+CONFIG_DRM_NOUVEAU=n
+```
+
+> **Note**: `CONFIG_DRM_NOUVEAU` is normally disabled by default.
+It is worth checking that it is not enabled in your kernel configuration to
+prevent any conflicts.
+
+Build the Kata Containers kernel with the previous config options, using the
+instructions described in [Building Kata Containers
+kernel](../../tools/packaging/kernel). For further details on building and
+installing guest kernels, see [the developer
+guide](../Developer-Guide.md#install-guest-kernel-images).
+
+There is an easy way to build a guest kernel that supports NVIDIA GPU:
+
+```sh
+## Build guest kernel with ../../tools/packaging/kernel
+
+# Prepare (download guest kernel source, generate .config)
+$ ./build-kernel.sh -v 5.15.23 -g nvidia -f setup
+
+# Build guest kernel
+$ ./build-kernel.sh -v 5.15.23 -g nvidia build
+
+# Install guest kernel
+$ sudo -E ./build-kernel.sh -v 5.15.23 -g nvidia install
+```
+
+To build NVIDIA Driver in Kata container, `linux-headers` is required.
+This is a way to generate deb packages for `linux-headers`:
+
+> **Note**:
+> Run `make rpm-pkg` to build the rpm package.
+> Run `make deb-pkg` to build the deb package.
+>
+
+```sh
+$ cd kata-linux-5.15.23-89
+$ make deb-pkg
+```
+Before using the new guest kernel, please update the `kernel` parameters in
+ `configuration.toml`.
+
+```sh
+kernel = "/usr/share/kata-containers/vmlinuz-nvidia-gpu.container"
+```
+
+## NVIDIA GPU pass-through mode with Kata Containers
+
+Use the following steps to pass an NVIDIA GPU device in pass-through mode with Kata:
+
+1. Find the Bus-Device-Function (BDF) for GPU device on host:
+
+   ```sh
+   $ sudo lspci -nn -D | grep -i nvidia
+   0000:d0:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:20b9] (rev a1)
+   ```
+
+   > PCI address `0000:d0:00.0` is assigned to the hardware GPU device.
+   > `10de:20b9` is the device ID of the hardware GPU device.
+
+2. Find the IOMMU group for the GPU device:
+
+   ```sh
+   $ BDF="0000:d0:00.0"
+   $ readlink -e /sys/bus/pci/devices/$BDF/iommu_group
+   ```
+
+   The previous output shows that the GPU belongs to IOMMU group 192. The next
+   step is to bind the GPU to the VFIO-PCI driver.
+
+   ```sh
+   $ BDF="0000:d0:00.0"
+   $ DEV="/sys/bus/pci/devices/$BDF"
+   $ echo "vfio-pci" > $DEV/driver_override
+   $ echo $BDF > $DEV/driver/unbind
+   $ echo $BDF > /sys/bus/pci/drivers_probe
+   # To return the device to the standard driver, we simply clear the
+   # driver_override and reprobe the device, ex:
+   $ echo > $DEV/preferred_driver
+   $ echo $BDF > $DEV/driver/unbind
+   $ echo $BDF > /sys/bus/pci/drivers_probe
+   ```
+
+3. Check the IOMMU group number under `/dev/vfio`:
+
+   ```sh
+   $ ls -l /dev/vfio
+   total 0
+   crw------- 1 zvonkok zvonkok 243,   0 Mar 18 03:06 192
+   crw-rw-rw- 1 root    root     10, 196 Mar 18 02:27 vfio
+   ```
+
+4. Start a Kata container with GPU device:
+
+   ```sh
+   # You may need to `modprobe vhost-vsock` if you get
+   # host system doesn't support vsock: stat /dev/vhost-vsock
+   $ sudo ctr --debug run --runtime "io.containerd.kata.v2"  --device /dev/vfio/192  --rm -t  "docker.io/library/archlinux:latest" arch uname -r
+   ```
+
+5. Run `lspci` within the container to verify the GPU device is seen in the list
+   of the PCI devices. Note the vendor-device id of the GPU (`10de:20b9`) in the `lspci` output.
+
+   ```sh
+   $ sudo ctr --debug run --runtime "io.containerd.kata.v2"  --device /dev/vfio/192  --rm -t  "docker.io/library/archlinux:latest" arch sh -c "lspci -nn | grep '10de:20b9'"
+   ```
+
+6. Additionally, you can check the PCI BARs space of the NVIDIA GPU device in the container:
+
+   ```sh
+   $ sudo ctr --debug run --runtime "io.containerd.kata.v2"  --device /dev/vfio/192  --rm -t  "docker.io/library/archlinux:latest" arch sh -c "lspci -s 02:00.0 -vv | grep Region"
+   ```
+
+   > **Note**: If you see a message similar to the above, the BAR space of the NVIDIA
+   > GPU has been successfully allocated.
+
+## NVIDIA vGPU mode with Kata Containers
+
+NVIDIA vGPU is a licensed product on all supported GPU boards. A software license
+is required to enable all vGPU features within the guest VM.
+
+> **TODO**: Will follow up with instructions
+
+## Install NVIDIA Driver + Toolkit in Kata Containers Guest OS
+
+Consult the [Developer-Guide](https://github.com/kata-containers/kata-containers/blob/main/docs/Developer-Guide.md#create-a-rootfs-image) on how to create a
+rootfs base image for a distribution of your choice. This is going to be used as
+a base for a NVIDIA enabled guest OS. Use the `EXTRA_PKGS` variable to install
+all the needed packages to compile the drivers. Also copy the kernel development
+packages from the previous `make deb-pkg` into `$ROOTFS_DIR`.
+
+```sh
+export EXTRA_PKGS="gcc make curl gnupg"
+```
+
+Having the `$ROOTFS_DIR` exported in the previous step we can now install all the
+need parts in the guest OS. In this case we have an Ubuntu based rootfs.
+
+First off all mount the special filesystems into the rootfs
+
+```sh
+$ sudo mount -t sysfs -o ro none ${ROOTFS_DIR}/sys
+$ sudo mount -t proc -o ro none ${ROOTFS_DIR}/proc
+$ sudo mount -t tmpfs none ${ROOTFS_DIR}/tmp
+$ sudo mount -o bind,ro /dev ${ROOTFS_DIR}/dev
+$ sudo mount -t devpts none ${ROOTFS_DIR}/dev/pts
+```
+
+Now we can enter `chroot`
+
+```sh
+$ sudo chroot ${ROOTFS_DIR}
+```
+
+Inside the rootfs one is going to install the drivers and toolkit to enable easy
+creation of GPU containers with Kata. We can also use this rootfs for any other
+container not specifically only for GPUs.
+
+As a prerequisite install the copied kernel development packages
+
+```sh
+$ sudo dpkg -i *.deb
+```
+
+Get the driver run file, since we need to build the driver against a kernel that
+is not running on the host we need the ability to specify the exact version we
+want the driver to build against. Take the kernel version one used for building
+the NVIDIA kernel (`5.15.23-nvidia-gpu`).
+
+```sh
+$ wget https://us.download.nvidia.com/XFree86/Linux-x86_64/510.54/NVIDIA-Linux-x86_64-510.54.run
+$ chmod +x NVIDIA-Linux-x86_64-510.54.run
+# Extract the source files so we can run the installer with arguments
+$ ./NVIDIA-Linux-x86_64-510.54.run -x
+$ cd NVIDIA-Linux-x86_64-510.54
+$ ./nvidia-installer -k 5.15.23-nvidia-gpu
+```
+Having the drivers installed we need to install the toolkit which will take care
+of providing the right bits into the container.
+
+```sh
+$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
+$ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
+$ curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
+$ apt update
+$ apt install nvidia-container-toolkit
+```
+
+Create the hook execution file for Kata:
+
+```
+# Content of $ROOTFS_DIR/usr/share/oci/hooks/prestart/nvidia-container-toolkit.sh
+
+#!/bin/bash -x
+
+/usr/bin/nvidia-container-toolkit -debug $@
+```
+
+As a last step one can do some cleanup of files or package caches. Build the
+rootfs and configure it for use with Kata according to the development guide.
+
+Enable the `guest_hook_path` in Kata's `configuration.toml`
+
+```sh
+guest_hook_path = "/usr/share/oci/hooks"
+```
+
+One has build a NVIDIA rootfs, kernel and now we can run any GPU container
+without installing the drivers into the container. Check NVIDIA device status
+with `nvidia-smi`
+
+```sh
+$  sudo ctr --debug run --runtime "io.containerd.kata.v2"  --device /dev/vfio/192  --rm -t "docker.io/nvidia/cuda:11.6.0-base-ubuntu20.04" cuda nvidia-smi
+Fri Mar 18 10:36:59 2022
+-----------------------------------------------------------------------------+
+| NVIDIA-SMI 510.54       Driver Version: 510.54       CUDA Version: 11.6     |
+|-------------------------------+----------------------+----------------------+
+| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
+| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
+|                               |                      |               MIG M. |
+|===============================+======================+======================|
+|   0  NVIDIA A30X         Off  | 00000000:02:00.0 Off |                    0 |
+| N/A   38C    P0    67W / 230W |      0MiB / 24576MiB |      0%      Default |
+|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
+
+-----------------------------------------------------------------------------+
+| Processes:                                                                  |
+|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
+|        ID   ID                                                   Usage      |
+|=============================================================================|
+|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
+```
+
+As a last step one can remove the additional packages and files that were added
+to the `$ROOTFS_DIR` to keep it as small as possible.
+
+## References
+
+- [Configuring a VM for GPU Pass-Through by Using the QEMU Command Line](https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#using-gpu-pass-through-red-hat-el-qemu-cli)
+- https://gitlab.com/nvidia/container-images/driver/-/tree/master
+- https://github.com/NVIDIA/nvidia-docker/wiki/Driver-containers
--- a/docs/use-cases/Nvidia-GPU-passthrough-and-Kata.md
+++ b/docs/use-cases/Nvidia-GPU-passthrough-and-Kata.md
@@ -1,293 +0,0 @@
-# Using Nvidia GPU device with Kata Containers
-
-An Nvidia GPU device can be passed to a Kata Containers container using GPU passthrough
-(Nvidia GPU pass-through mode) as well as GPU mediated passthrough (Nvidia vGPU mode). 
-
-Nvidia GPU pass-through mode, an entire physical GPU is directly assigned to one VM,
-bypassing the Nvidia Virtual GPU Manager. In this mode of operation, the GPU is accessed
-exclusively by the Nvidia driver running in the VM to which it is assigned.
-The GPU is not shared among VMs.
-
-Nvidia Virtual GPU (vGPU) enables multiple virtual machines (VMs) to have simultaneous,
-direct access to a single physical GPU, using the same Nvidia graphics drivers that are
-deployed on non-virtualized operating systems. By doing this, Nvidia vGPU provides VMs
-with unparalleled graphics performance, compute performance, and application compatibility,
-together with the cost-effectiveness and scalability brought about by sharing a GPU
-among multiple workloads.
-
-| Technology | Description | Behaviour | Detail |
-| --- | --- | --- | --- |
-| Nvidia GPU pass-through mode | GPU passthrough | Physical GPU assigned to a single VM | Direct GPU assignment to VM without limitation |
-| Nvidia vGPU mode | GPU sharing | Physical GPU shared by multiple VMs | Mediated passthrough |
-
-## Hardware Requirements
-Nvidia GPUs Recommended for Virtualization:
-
- Nvidia Tesla (T4, M10, P6, V100 or newer)
- Nvidia Quadro RTX 6000/8000
-
-## Host BIOS Requirements
-
-Some hardware requires a larger PCI BARs window, for example, Nvidia Tesla P100, K40m
-```
-$ lspci -s 04:00.0 -vv | grep Region
-      Region 0: Memory at c6000000 (32-bit, non-prefetchable) [size=16M]
-      Region 1: Memory at 383800000000 (64-bit, prefetchable) [size=16G] #above 4G
-      Region 3: Memory at 383c00000000 (64-bit, prefetchable) [size=32M]
-```
-
-For large BARs devices, MMIO mapping above 4G address space should be `enabled`
-in the PCI configuration of the BIOS.
-
-Some hardware vendors use different name in BIOS, such as:
-
- Above 4G Decoding
- Memory Hole for PCI MMIO
- Memory Mapped I/O above 4GB
-
-The following steps outline the workflow for using an Nvidia GPU with Kata.
-
-## Host Kernel Requirements
-The following configurations need to be enabled on your host kernel:
-
- `CONFIG_VFIO`
- `CONFIG_VFIO_IOMMU_TYPE1`
- `CONFIG_VFIO_MDEV`
- `CONFIG_VFIO_MDEV_DEVICE`
- `CONFIG_VFIO_PCI`
-
-Your host kernel needs to be booted with `intel_iommu=on` on the kernel command line.
-
-## Install and configure Kata Containers
-To use non-large BARs devices (for example, Nvidia Tesla T4), you need Kata version 1.3.0 or above.
-Follow the [Kata Containers setup instructions](../install/README.md)
-to install the latest version of Kata.
-
-To use large BARs devices (for example, Nvidia Tesla P100), you need Kata version 1.11.0 or above.
-
-The following configuration in the Kata `configuration.toml` file as shown below can work:
-
-Hotplug for PCI devices by `acpi_pcihp` (Linux's ACPI PCI Hotplug driver):
-```
-machine_type = "q35"
-
-hotplug_vfio_on_root_bus = false
-```
-
-Hotplug for PCIe devices by `pciehp` (Linux's PCIe Hotplug driver):
-```
-machine_type = "q35"
-
-hotplug_vfio_on_root_bus = true
-pcie_root_port = 1
-```
-
-## Build Kata Containers kernel with GPU support
-The default guest kernel installed with Kata Containers does not provide GPU support.
-To use an Nvidia GPU with Kata Containers, you need to build a kernel with the
-necessary GPU support.
-
-The following kernel config options need to be enabled:
-```
-# Support PCI/PCIe device hotplug (Required for large BARs device)
-CONFIG_HOTPLUG_PCI_PCIE=y
-
-# Support for loading modules (Required for load Nvidia drivers)
-CONFIG_MODULES=y
-CONFIG_MODULE_UNLOAD=y
-
-# Enable the MMIO access method for PCIe devices (Required for large BARs device)
-CONFIG_PCI_MMCONFIG=y
-```
-
-The following kernel config options need to be disabled:
-```
-# Disable Open Source Nvidia driver nouveau
-# It conflicts with Nvidia official driver
-CONFIG_DRM_NOUVEAU=n
-```
-> **Note**: `CONFIG_DRM_NOUVEAU` is normally disabled by default.
-It is worth checking that it is not enabled in your kernel configuration to prevent any conflicts.
-
-
-Build the Kata Containers kernel with the previous config options,
-using the instructions described in [Building Kata Containers kernel](../../tools/packaging/kernel).
-For further details on building and installing guest kernels,
-see [the developer guide](../Developer-Guide.md#install-guest-kernel-images).
-
-There is an easy way to build a guest kernel that supports Nvidia GPU:
-```
-## Build guest kernel with ../../tools/packaging/kernel
-
-# Prepare (download guest kernel source, generate .config)
-$ ./build-kernel.sh -v 4.19.86 -g nvidia -f setup
-
-# Build guest kernel
-$ ./build-kernel.sh -v 4.19.86 -g nvidia build
-
-# Install guest kernel
-$ sudo -E ./build-kernel.sh -v 4.19.86 -g nvidia install
-/usr/share/kata-containers/vmlinux-nvidia-gpu.container -> vmlinux-4.19.86-70-nvidia-gpu
-/usr/share/kata-containers/vmlinuz-nvidia-gpu.container -> vmlinuz-4.19.86-70-nvidia-gpu
-```
-
-To build Nvidia Driver in Kata container, `kernel-devel` is required.  
-This is a way to generate rpm packages for `kernel-devel`:
-```
-$ cd kata-linux-4.19.86-68
-$ make rpm-pkg
-Output RPMs:
-~/rpmbuild/RPMS/x86_64/kernel-devel-4.19.86_nvidia_gpu-1.x86_64.rpm
-```
-> **Note**:
-> - `kernel-devel` should be installed in Kata container before run Nvidia driver installer.
-> - Run `make deb-pkg` to build the deb package.
-
-Before using the new guest kernel, please update the `kernel` parameters in `configuration.toml`.
-```
-kernel = "/usr/share/kata-containers/vmlinuz-nvidia-gpu.container"
-```
-
-## Nvidia GPU pass-through mode with Kata Containers
-Use the following steps to pass an Nvidia GPU device in pass-through mode with Kata:
-
-1. Find the Bus-Device-Function (BDF) for GPU device on host:
-   ```
-   $ sudo lspci -nn -D | grep -i nvidia
-   0000:04:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:15f8] (rev a1)
-   0000:84:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:15f8] (rev a1)
-   ```
-   > PCI address `0000:04:00.0` is assigned to the hardware GPU device.
-   > `10de:15f8` is the device ID of the hardware GPU device.
-
-2. Find the IOMMU group for the GPU device:
-   ```
-   $ BDF="0000:04:00.0"
-   $ readlink -e /sys/bus/pci/devices/$BDF/iommu_group
-   /sys/kernel/iommu_groups/45
-   ```
-   The previous output shows that the GPU belongs to IOMMU group 45.
-
-3. Check the IOMMU group number under `/dev/vfio`:
-   ```
-   $ ls -l /dev/vfio
-   total 0
-   crw------- 1 root root 248,   0 Feb 28 09:57 45
-   crw------- 1 root root 248,   1 Feb 28 09:57 54
-   crw-rw-rw- 1 root root  10, 196 Feb 28 09:57 vfio
-   ```
-
-4. Start a Kata container with GPU device:
-   ```
-   $ sudo docker run -it --runtime=kata-runtime --cap-add=ALL --device /dev/vfio/45 centos /bin/bash
-   ```
-
-5. Run `lspci` within the container to verify the GPU device is seen in the list
-   of the PCI devices. Note the vendor-device id of the GPU (`10de:15f8`) in the `lspci` output.
-   ```
-   $ lspci -nn -D | grep '10de:15f8'
-   0000:01:01.0 3D controller [0302]: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] [10de:15f8] (rev a1)
-   ```
-
-6. Additionally, you can check the PCI BARs space of the Nvidia GPU device in the container:
-   ```
-   $ lspci -s 01:01.0 -vv | grep Region
-		Region 0: Memory at c0000000 (32-bit, non-prefetchable) [disabled] [size=16M]
-		Region 1: Memory at 4400000000 (64-bit, prefetchable) [disabled] [size=16G]
-		Region 3: Memory at 4800000000 (64-bit, prefetchable) [disabled] [size=32M]
-   ```
-   > **Note**: If you see a message similar to the above, the BAR space of the Nvidia
-   > GPU has been successfully allocated.
-
-## Nvidia vGPU mode with Kata Containers
-
-Nvidia vGPU is a licensed product on all supported GPU boards. A software license
-is required to enable all vGPU features within the guest VM.
-
-> **Note**: There is no suitable test environment, so it is not written here.
-
-
-## Install Nvidia Driver in Kata Containers
-Download the official Nvidia driver from
-[https://www.nvidia.com/Download/index.aspx](https://www.nvidia.com/Download/index.aspx),
-for example `NVIDIA-Linux-x86_64-418.87.01.run`.
-
-Install the `kernel-devel`(generated in the previous steps) for guest kernel:
-```
-$ sudo rpm -ivh kernel-devel-4.19.86_gpu-1.x86_64.rpm
-```
-
-Here is an example to extract, compile and install Nvidia driver:
-```
-## Extract
-$ sh ./NVIDIA-Linux-x86_64-418.87.01.run -x
-
-## Compile and install (It will take some time)
-$ cd NVIDIA-Linux-x86_64-418.87.01
-$ sudo ./nvidia-installer -a -q --ui=none \
- --no-cc-version-check \
- --no-opengl-files --no-install-libglvnd \
- --kernel-source-path=/usr/src/kernels/`uname -r`
-```
-
-Or just run one command line:
-```
-$ sudo sh ./NVIDIA-Linux-x86_64-418.87.01.run -a -q --ui=none \
- --no-cc-version-check \
- --no-opengl-files --no-install-libglvnd \
- --kernel-source-path=/usr/src/kernels/`uname -r`
-```
-
-To view detailed logs of the installer:
-```
-$ tail -f /var/log/nvidia-installer.log
-```
-
-Load Nvidia driver module manually
-```
-# Optional（generate modules.dep and map files for Nvidia driver）
-$ sudo depmod
-
-# Load module
-$ sudo modprobe nvidia-drm
-
-# Check module
-$ lsmod | grep nvidia
-nvidia_drm             45056  0
-nvidia_modeset       1093632  1 nvidia_drm
-nvidia              18202624  1 nvidia_modeset
-drm_kms_helper        159744  1 nvidia_drm
-drm                   364544  3 nvidia_drm,drm_kms_helper
-i2c_core               65536  3 nvidia,drm_kms_helper,drm
-ipmi_msghandler        49152  1 nvidia
-```
-
-
-Check Nvidia device status with `nvidia-smi`
-```
-$ nvidia-smi
-Tue Mar  3 00:03:49 2020
-+-----------------------------------------------------------------------------+
-| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
-|-------------------------------+----------------------+----------------------+
-| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
-| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
-|===============================+======================+======================|
-|   0  Tesla P100-PCIE...  Off  | 00000000:01:01.0 Off |                    0 |
-| N/A   27C    P0    25W / 250W |      0MiB / 16280MiB |      0%      Default |
-+-------------------------------+----------------------+----------------------+
-
-+-----------------------------------------------------------------------------+
-| Processes:                                                       GPU Memory |
-|  GPU       PID   Type   Process name                             Usage      |
-|=============================================================================|
-|  No running processes found                                                 |
-+-----------------------------------------------------------------------------+
-
-```
-
-## References
-
- [Configuring a VM for GPU Pass-Through by Using the QEMU Command Line](https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#using-gpu-pass-through-red-hat-el-qemu-cli)
- https://gitlab.com/nvidia/container-images/driver/-/tree/master
- https://github.com/NVIDIA/nvidia-docker/wiki/Driver-containers
--- a/src/agent/Cargo.lock
+++ b/src/agent/Cargo.lock
@@ -2256,6 +2256,7 @@ dependencies = [
 "async-trait",
 "capctl",
 "caps",
+ "cfg-if 0.1.10",
 "cgroups-rs",
 "futures",
 "inotify",
--- a/src/agent/Cargo.toml
+++ b/src/agent/Cargo.toml
@@ -82,3 +82,13 @@ lto = true

 [features]
 seccomp = ["rustjail/seccomp"]
+standard-oci-runtime = ["rustjail/standard-oci-runtime"]
+
+[[bin]]
+name = "kata-agent"
+path = "src/main.rs"
+
+[[bin]]
+name = "oci-kata-agent"
+path = "src/main.rs"
+required-features = ["standard-oci-runtime"]
--- a/src/agent/Makefile
+++ b/src/agent/Makefile
@@ -33,6 +33,14 @@ ifeq ($(SECCOMP),yes)
    override EXTRA_RUSTFEATURES += seccomp
 endif

+##VAR STANDARD_OCI_RUNTIME=yes|no define if agent enables standard oci runtime feature
+STANDARD_OCI_RUNTIME := no
+
+# Enable standard oci runtime feature of rust build
+ifeq ($(STANDARD_OCI_RUNTIME),yes)
+    override EXTRA_RUSTFEATURES += standard-oci-runtime
+endif
+
 ifneq ($(EXTRA_RUSTFEATURES),)
    override EXTRA_RUSTFEATURES := --features "$(EXTRA_RUSTFEATURES)"
 endif
--- a/src/agent/rustjail/Cargo.toml
+++ b/src/agent/rustjail/Cargo.toml
@@ -25,6 +25,7 @@ path-absolutize = "1.2.0"
 anyhow = "1.0.32"
 cgroups = { package = "cgroups-rs", version = "0.2.8" }
 rlimit = "0.5.3"
+cfg-if = "0.1.0"

 tokio = { version = "1.2.0", features = ["sync", "io-util", "process", "time", "macros"] }
 futures = "0.3.17"
@@ -38,3 +39,4 @@ tempfile = "3.1.0"

 [features]
 seccomp = ["libseccomp"]
+standard-oci-runtime = []
--- a/src/agent/rustjail/src/console.rs
+++ b/src/agent/rustjail/src/console.rs
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: Apache-2.0
+//
+// Copyright 2021 Sony Group Corporation
+//
+
+use anyhow::{anyhow, Result};
+use nix::errno::Errno;
+use nix::pty;
+use nix::sys::{socket, uio};
+use nix::unistd::{self, dup2};
+use std::os::unix::io::{AsRawFd, RawFd};
+use std::path::Path;
+
+pub fn setup_console_socket(csocket_path: &str) -> Result<Option<RawFd>> {
+    if csocket_path.is_empty() {
+        return Ok(None);
+    }
+
+    let socket_fd = socket::socket(
+        socket::AddressFamily::Unix,
+        socket::SockType::Stream,
+        socket::SockFlag::empty(),
+        None,
+    )?;
+
+    match socket::connect(
+        socket_fd,
+        &socket::SockAddr::Unix(socket::UnixAddr::new(Path::new(csocket_path))?),
+    ) {
+        Ok(()) => Ok(Some(socket_fd)),
+        Err(errno) => Err(anyhow!("failed to open console fd: {}", errno)),
+    }
+}
+
+pub fn setup_master_console(socket_fd: RawFd) -> Result<()> {
+    let pseudo = pty::openpty(None, None)?;
+
+    let pty_name: &[u8] = b"/dev/ptmx";
+    let iov = [uio::IoVec::from_slice(pty_name)];
+    let fds = [pseudo.master];
+    let cmsg = socket::ControlMessage::ScmRights(&fds);
+
+    socket::sendmsg(socket_fd, &iov, &[cmsg], socket::MsgFlags::empty(), None)?;
+
+    unistd::setsid()?;
+    let ret = unsafe { libc::ioctl(pseudo.slave, libc::TIOCSCTTY) };
+    Errno::result(ret).map_err(|e| anyhow!(e).context("ioctl TIOCSCTTY"))?;
+
+    dup2(pseudo.slave, std::io::stdin().as_raw_fd())?;
+    dup2(pseudo.slave, std::io::stdout().as_raw_fd())?;
+    dup2(pseudo.slave, std::io::stderr().as_raw_fd())?;
+
+    unistd::close(socket_fd)?;
+
+    Ok(())
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::skip_if_not_root;
+    use std::fs::File;
+    use std::os::unix::net::UnixListener;
+    use std::path::PathBuf;
+    use tempfile::{self, tempdir};
+
+    const CONSOLE_SOCKET: &str = "console-socket";
+
+    #[test]
+    fn test_setup_console_socket() {
+        let dir = tempdir()
+            .map_err(|e| anyhow!(e).context("tempdir failed"))
+            .unwrap();
+        let socket_path = dir.path().join(CONSOLE_SOCKET);
+
+        let _listener = UnixListener::bind(&socket_path).unwrap();
+
+        let ret = setup_console_socket(socket_path.to_str().unwrap());
+
+        assert!(ret.is_ok());
+    }
+}
--- a/src/agent/rustjail/src/container.rs
+++ b/src/agent/rustjail/src/container.rs
@@ -23,6 +23,8 @@ use crate::cgroups::fs::Manager as FsManager;
 #[cfg(test)]
 use crate::cgroups::mock::Manager as FsManager;
 use crate::cgroups::Manager;
+#[cfg(feature = "standard-oci-runtime")]
+use crate::console;
 use crate::log_child;
 use crate::process::Process;
 #[cfg(feature = "seccomp")]
@@ -64,7 +66,7 @@ use tokio::sync::Mutex;

 use crate::utils;

-const EXEC_FIFO_FILENAME: &str = "exec.fifo";
+pub const EXEC_FIFO_FILENAME: &str = "exec.fifo";

 const INIT: &str = "INIT";
 const NO_PIVOT: &str = "NO_PIVOT";
@@ -74,6 +76,10 @@ const CLOG_FD: &str = "CLOG_FD";
 const FIFO_FD: &str = "FIFO_FD";
 const HOME_ENV_KEY: &str = "HOME";
 const PIDNS_FD: &str = "PIDNS_FD";
+const CONSOLE_SOCKET_FD: &str = "CONSOLE_SOCKET_FD";
+
+#[cfg(feature = "standard-oci-runtime")]
+const OCI_AGENT_BINARY: &str = "oci-kata-agent";

 #[derive(Debug)]
 pub struct ContainerStatus {
@@ -82,7 +88,7 @@ pub struct ContainerStatus {
 }

 impl ContainerStatus {
-    fn new() -> Self {
+    pub fn new() -> Self {
        ContainerStatus {
            pre_status: ContainerState::Created,
            cur_status: ContainerState::Created,
@@ -99,6 +105,12 @@ impl ContainerStatus {
    }
 }

+impl Default for ContainerStatus {
+    fn default() -> Self {
+        Self::new()
+    }
+}
+
 pub type Config = CreateOpts;
 type NamespaceType = String;

@@ -106,7 +118,7 @@ lazy_static! {
    // This locker ensures the child exit signal will be received by the right receiver.
    pub static ref WAIT_PID_LOCKER: Arc<Mutex<bool>> = Arc::new(Mutex::new(false));

-    static ref NAMESPACES: HashMap<&'static str, CloneFlags> = {
+    pub static ref NAMESPACES: HashMap<&'static str, CloneFlags> = {
        let mut m = HashMap::new();
        m.insert("user", CloneFlags::CLONE_NEWUSER);
        m.insert("ipc", CloneFlags::CLONE_NEWIPC);
@@ -119,7 +131,7 @@ lazy_static! {
    };

 // type to name hashmap, better to be in NAMESPACES
-    static ref TYPETONAME: HashMap<&'static str, &'static str> = {
+    pub static ref TYPETONAME: HashMap<&'static str, &'static str> = {
        let mut m = HashMap::new();
        m.insert("ipc", "ipc");
        m.insert("user", "user");
@@ -236,6 +248,8 @@ pub struct LinuxContainer {
    pub status: ContainerStatus,
    pub created: SystemTime,
    pub logger: Logger,
+    #[cfg(feature = "standard-oci-runtime")]
+    pub console_socket: PathBuf,
 }

 #[derive(Serialize, Deserialize, Debug)]
@@ -359,7 +373,6 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
            )));
        }
    }
-
    log_child!(cfd_log, "child process start run");
    let buf = read_sync(crfd)?;
    let spec_str = std::str::from_utf8(&buf)?;
@@ -379,6 +392,9 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {

    let cm: FsManager = serde_json::from_str(cm_str)?;

+    #[cfg(feature = "standard-oci-runtime")]
+    let csocket_fd = console::setup_console_socket(&std::env::var(CONSOLE_SOCKET_FD)?)?;
+
    let p = if spec.process.is_some() {
        spec.process.as_ref().unwrap()
    } else {
@@ -670,10 +686,19 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
    let _ = unistd::close(crfd);
    let _ = unistd::close(cwfd);

-    unistd::setsid().context("create a new session")?;
    if oci_process.terminal {
-        unsafe {
-            libc::ioctl(0, libc::TIOCSCTTY);
+        cfg_if::cfg_if! {
+            if #[cfg(feature = "standard-oci-runtime")] {
+                if let Some(csocket_fd) = csocket_fd {
+                    console::setup_master_console(csocket_fd)?;
+                } else {
+                    return Err(anyhow!("failed to get console master socket fd"));
+                }
+            }
+            else {
+                unistd::setsid().context("create a new session")?;
+                unsafe { libc::ioctl(0, libc::TIOCSCTTY) };
+            }
        }
    }

@@ -926,8 +951,24 @@ impl BaseContainer for LinuxContainer {
            let _ = unistd::close(pid);
        });

-        let exec_path = std::env::current_exe()?;
+        cfg_if::cfg_if! {
+            if #[cfg(feature = "standard-oci-runtime")] {
+                let exec_path = PathBuf::from(OCI_AGENT_BINARY);
+            }
+            else {
+                let exec_path = std::env::current_exe()?;
+            }
+        }
+
        let mut child = std::process::Command::new(exec_path);
+
+        #[allow(unused_mut)]
+        let mut console_name = PathBuf::from("");
+        #[cfg(feature = "standard-oci-runtime")]
+        if !self.console_socket.as_os_str().is_empty() {
+            console_name = self.console_socket.clone();
+        }
+
        let mut child = child
            .arg("init")
            .stdin(child_stdin)
@@ -937,7 +978,8 @@ impl BaseContainer for LinuxContainer {
            .env(NO_PIVOT, format!("{}", self.config.no_pivot_root))
            .env(CRFD_FD, format!("{}", crfd))
            .env(CWFD_FD, format!("{}", cwfd))
-            .env(CLOG_FD, format!("{}", cfd_log));
+            .env(CLOG_FD, format!("{}", cfd_log))
+            .env(CONSOLE_SOCKET_FD, console_name);

        if p.init {
            child = child.env(FIFO_FD, format!("{}", fifofd));
@@ -1419,8 +1461,16 @@ impl LinuxContainer {
                .unwrap()
                .as_secs(),
            logger: logger.new(o!("module" => "rustjail", "subsystem" => "container", "cid" => id)),
+            #[cfg(feature = "standard-oci-runtime")]
+            console_socket: Path::new("").to_path_buf(),
        })
    }
+
+    #[cfg(feature = "standard-oci-runtime")]
+    pub fn set_console_socket(&mut self, console_socket: &Path) -> Result<()> {
+        self.console_socket = console_socket.to_path_buf();
+        Ok(())
+    }
 }

 fn setgroups(grps: &[libc::gid_t]) -> Result<()> {
@@ -1460,7 +1510,7 @@ use std::process::Stdio;
 use std::time::Duration;
 use tokio::io::{AsyncReadExt, AsyncWriteExt};

-async fn execute_hook(logger: &Logger, h: &Hook, st: &OCIState) -> Result<()> {
+pub async fn execute_hook(logger: &Logger, h: &Hook, st: &OCIState) -> Result<()> {
    let logger = logger.new(o!("action" => "execute-hook"));

    let binary = PathBuf::from(h.path.as_str());
--- a/src/agent/rustjail/src/lib.rs
+++ b/src/agent/rustjail/src/lib.rs
@@ -30,6 +30,8 @@ extern crate regex;

 pub mod capabilities;
 pub mod cgroups;
+#[cfg(feature = "standard-oci-runtime")]
+pub mod console;
 pub mod container;
 pub mod mount;
 pub mod pipestream;
@@ -529,6 +531,31 @@ mod tests {
        };
    }

+    // Parameters:
+    //
+    // 1: expected Result
+    // 2: actual Result
+    // 3: string used to identify the test on error
+    #[macro_export]
+    macro_rules! assert_result {
+        ($expected_result:expr, $actual_result:expr, $msg:expr) => {
+            if $expected_result.is_ok() {
+                let expected_value = $expected_result.as_ref().unwrap();
+                let actual_value = $actual_result.unwrap();
+                assert!(*expected_value == actual_value, "{}", $msg);
+            } else {
+                assert!($actual_result.is_err(), "{}", $msg);
+
+                let expected_error = $expected_result.as_ref().unwrap_err();
+                let expected_error_msg = format!("{:?}", expected_error);
+
+                let actual_error_msg = format!("{:?}", $actual_result.unwrap_err());
+
+                assert!(expected_error_msg == actual_error_msg, "{}", $msg);
+            }
+        };
+    }
+
    #[test]
    fn test_process_grpc_to_oci() {
        #[derive(Debug)]
--- a/src/agent/rustjail/src/mount.rs
+++ b/src/agent/rustjail/src/mount.rs
@@ -32,16 +32,21 @@ use crate::log_child;

 // Info reveals information about a particular mounted filesystem. This
 // struct is populated from the content in the /proc/<pid>/mountinfo file.
-#[derive(std::fmt::Debug)]
+#[derive(std::fmt::Debug, PartialEq)]
 pub struct Info {
    mount_point: String,
    optional: String,
    fstype: String,
 }

-const MOUNTINFOFORMAT: &str = "{d} {d} {d}:{d} {} {} {} {}";
+const MOUNTINFO_FORMAT: &str = "{d} {d} {d}:{d} {} {} {} {}";
+const MOUNTINFO_PATH: &str = "/proc/self/mountinfo";
 const PROC_PATH: &str = "/proc";

+const ERR_FAILED_PARSE_MOUNTINFO: &str = "failed to parse mountinfo file";
+const ERR_FAILED_PARSE_MOUNTINFO_FINAL_FIELDS: &str =
+    "failed to parse final fields in mountinfo file";
+
 // since libc didn't defined this const for musl, thus redefined it here.
 #[cfg(all(target_os = "linux", target_env = "gnu", not(target_arch = "s390x")))]
 const PROC_SUPER_MAGIC: libc::c_long = 0x00009fa0;
@@ -518,7 +523,7 @@ pub fn pivot_rootfs<P: ?Sized + NixPath + std::fmt::Debug>(path: &P) -> Result<(
 }

 fn rootfs_parent_mount_private(path: &str) -> Result<()> {
-    let mount_infos = parse_mount_table()?;
+    let mount_infos = parse_mount_table(MOUNTINFO_PATH)?;

    let mut max_len = 0;
    let mut mount_point = String::from("");
@@ -546,8 +551,8 @@ fn rootfs_parent_mount_private(path: &str) -> Result<()> {

 // Parse /proc/self/mountinfo because comparing Dev and ino does not work from
 // bind mounts
-fn parse_mount_table() -> Result<Vec<Info>> {
-    let file = File::open("/proc/self/mountinfo")?;
+fn parse_mount_table(mountinfo_path: &str) -> Result<Vec<Info>> {
+    let file = File::open(mountinfo_path)?;
    let reader = BufReader::new(file);
    let mut infos = Vec::new();

@@ -569,7 +574,7 @@ fn parse_mount_table() -> Result<Vec<Info>> {

        let (_id, _parent, _major, _minor, _root, mount_point, _opts, optional) = scan_fmt!(
            &line,
-            MOUNTINFOFORMAT,
+            MOUNTINFO_FORMAT,
            i32,
            i32,
            i32,
@@ -578,12 +583,17 @@ fn parse_mount_table() -> Result<Vec<Info>> {
            String,
            String,
            String
-        )?;
+        )
+        .map_err(|_| anyhow!(ERR_FAILED_PARSE_MOUNTINFO))?;

        let fields: Vec<&str> = line.split(" - ").collect();
        if fields.len() == 2 {
-            let (fstype, _source, _vfs_opts) =
-                scan_fmt!(fields[1], "{} {} {}", String, String, String)?;
+            let final_fields: Vec<&str> = fields[1].split_whitespace().collect();
+
+            if final_fields.len() != 3 {
+                return Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO_FINAL_FIELDS));
+            }
+            let fstype = final_fields[0].to_string();

            let mut optional_new = String::new();
            if optional != "-" {
@@ -598,7 +608,7 @@ fn parse_mount_table() -> Result<Vec<Info>> {

            infos.push(info);
        } else {
-            return Err(anyhow!("failed to parse mount info file".to_string()));
+            return Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO));
        }
    }

@@ -619,7 +629,7 @@ fn chroot<P: ?Sized + NixPath>(_path: &P) -> Result<(), nix::Error> {

 pub fn ms_move_root(rootfs: &str) -> Result<bool> {
    unistd::chdir(rootfs)?;
-    let mount_infos = parse_mount_table()?;
+    let mount_infos = parse_mount_table(MOUNTINFO_PATH)?;

    let root_path = Path::new(rootfs);
    let abs_root_buf = root_path.absolutize()?;
@@ -1046,10 +1056,12 @@ fn readonly_path(path: &str) -> Result<()> {
 #[cfg(test)]
 mod tests {
    use super::*;
+    use crate::assert_result;
    use crate::skip_if_not_root;
    use std::fs::create_dir;
    use std::fs::create_dir_all;
    use std::fs::remove_dir_all;
+    use std::io;
    use std::os::unix::fs;
    use std::os::unix::io::AsRawFd;
    use tempfile::tempdir;
@@ -1508,6 +1520,121 @@ mod tests {
        }
    }

+    #[test]
+    fn test_parse_mount_table() {
+        #[derive(Debug)]
+        struct TestData<'a> {
+            mountinfo_data: Option<&'a str>,
+            result: Result<Vec<Info>>,
+        }
+
+        let tests = &[
+            TestData {
+                mountinfo_data: Some(
+                    "22 933 0:20 / /sys rw,nodev shared:2 - sysfs sysfs rw,noexec",
+                ),
+                result: Ok(vec![Info {
+                    mount_point: "/sys".to_string(),
+                    optional: "shared:2".to_string(),
+                    fstype: "sysfs".to_string(),
+                }]),
+            },
+            TestData {
+                mountinfo_data: Some(
+                    r#"22 933 0:20 / /sys rw,nodev - sysfs sysfs rw,noexec
+                       81 13 1:2 / /tmp/dir rw shared:2 - tmpfs tmpfs rw"#,
+                ),
+                result: Ok(vec![
+                    Info {
+                        mount_point: "/sys".to_string(),
+                        optional: "".to_string(),
+                        fstype: "sysfs".to_string(),
+                    },
+                    Info {
+                        mount_point: "/tmp/dir".to_string(),
+                        optional: "shared:2".to_string(),
+                        fstype: "tmpfs".to_string(),
+                    },
+                ]),
+            },
+            TestData {
+                mountinfo_data: Some(
+                    "22 933 0:20 /foo\040-\040bar /sys rw,nodev shared:2 - sysfs sysfs rw,noexec",
+                ),
+                result: Ok(vec![Info {
+                    mount_point: "/sys".to_string(),
+                    optional: "shared:2".to_string(),
+                    fstype: "sysfs".to_string(),
+                }]),
+            },
+            TestData {
+                mountinfo_data: Some(""),
+                result: Ok(vec![]),
+            },
+            TestData {
+                mountinfo_data: Some("invalid line data - sysfs sysfs rw"),
+                result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
+            },
+            TestData {
+                mountinfo_data: Some("22 96 0:21 / /sys rw,noexec - sysfs"),
+                result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO_FINAL_FIELDS)),
+            },
+            TestData {
+                mountinfo_data: Some("22 96 0:21 / /sys rw,noexec - sysfs sysfs rw rw"),
+                result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO_FINAL_FIELDS)),
+            },
+            TestData {
+                mountinfo_data: Some("22 96 0:21 / /sys rw,noexec shared:2 - x - x"),
+                result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
+            },
+            TestData {
+                mountinfo_data: Some("-"),
+                result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
+            },
+            TestData {
+                mountinfo_data: Some("--"),
+                result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
+            },
+            TestData {
+                mountinfo_data: Some("- -"),
+                result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
+            },
+            TestData {
+                mountinfo_data: Some(" - "),
+                result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
+            },
+            TestData {
+                mountinfo_data: Some(
+                    r#"22 933 0:20 / /sys rw,nodev - sysfs sysfs rw,noexec
+                       invalid line
+                       81 13 1:2 / /tmp/dir rw shared:2 - tmpfs tmpfs rw"#,
+                ),
+                result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
+            },
+            TestData {
+                mountinfo_data: None,
+                result: Err(anyhow!(io::Error::from_raw_os_error(libc::ENOENT))),
+            },
+        ];
+
+        for (i, d) in tests.iter().enumerate() {
+            let msg = format!("test[{}]: {:?}", i, d);
+
+            let tempdir = tempdir().unwrap();
+            let mountinfo_path = tempdir.path().join("mountinfo");
+
+            if let Some(mountinfo_data) = d.mountinfo_data {
+                std::fs::write(&mountinfo_path, mountinfo_data).unwrap();
+            }
+
+            let result = parse_mount_table(mountinfo_path.to_str().unwrap());
+
+            let msg = format!("{}: result: {:?}", msg, result);
+
+            assert_result!(d.result, result, msg);
+        }
+    }
+
    #[test]
    fn test_dev_rel_path() {
        // Valid device paths
--- a/src/agent/rustjail/src/specconv.rs
+++ b/src/agent/rustjail/src/specconv.rs
@@ -5,7 +5,7 @@

 use oci::Spec;

-#[derive(Debug)]
+#[derive(Serialize, Deserialize, Debug, Default, Clone)]
 pub struct CreateOpts {
    pub cgroup_name: String,
    pub use_systemd_cgroup: bool,
--- a/src/agent/src/main.rs
+++ b/src/agent/src/main.rs
@@ -417,3 +417,59 @@ fn reset_sigpipe() {

 use crate::config::AgentConfig;
 use std::os::unix::io::{FromRawFd, RawFd};
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::test_utils::test_utils::TestUserType;
+
+    #[tokio::test]
+    async fn test_create_logger_task() {
+        #[derive(Debug)]
+        struct TestData {
+            vsock_port: u32,
+            test_user: TestUserType,
+            result: Result<()>,
+        }
+
+        let tests = &[
+            TestData {
+                // non-root user cannot use privileged vsock port
+                vsock_port: 1,
+                test_user: TestUserType::NonRootOnly,
+                result: Err(anyhow!(nix::errno::Errno::from_i32(libc::EACCES))),
+            },
+            TestData {
+                // passing vsock_port 0 causes logger task to write to stdout
+                vsock_port: 0,
+                test_user: TestUserType::Any,
+                result: Ok(()),
+            },
+        ];
+
+        for (i, d) in tests.iter().enumerate() {
+            if d.test_user == TestUserType::RootOnly {
+                skip_if_not_root!();
+            } else if d.test_user == TestUserType::NonRootOnly {
+                skip_if_root!();
+            }
+
+            let msg = format!("test[{}]: {:?}", i, d);
+            let (rfd, wfd) = unistd::pipe2(OFlag::O_CLOEXEC).unwrap();
+            defer!({
+                // rfd is closed by the use of PipeStream in the crate_logger_task function,
+                // but we will attempt to close in case of a failure
+                let _ = unistd::close(rfd);
+                unistd::close(wfd).unwrap();
+            });
+
+            let (shutdown_tx, shutdown_rx) = channel(true);
+
+            shutdown_tx.send(true).unwrap();
+            let result = create_logger_task(rfd, d.vsock_port, shutdown_rx).await;
+
+            let msg = format!("{}, result: {:?}", msg, result);
+            assert_result!(d.result, result, msg);
+        }
+    }
+}
--- a/src/agent/src/mount.rs
+++ b/src/agent/src/mount.rs
@@ -1017,6 +1017,7 @@ fn parse_options(option_list: Vec<String>) -> HashMap<String, String> {
 #[cfg(test)]
 mod tests {
    use super::*;
+    use crate::test_utils::test_utils::TestUserType;
    use crate::{skip_if_not_root, skip_loop_if_not_root, skip_loop_if_root};
    use protobuf::RepeatedField;
    use protocols::agent::FSGroup;
@@ -1026,13 +1027,6 @@ mod tests {
    use std::path::PathBuf;
    use tempfile::tempdir;

-    #[derive(Debug, PartialEq)]
-    enum TestUserType {
-        RootOnly,
-        NonRootOnly,
-        Any,
-    }
-
    #[test]
    fn test_mount() {
        #[derive(Debug)]
--- a/src/agent/src/rpc.rs
+++ b/src/agent/src/rpc.rs
@@ -1875,7 +1875,7 @@ async fn do_add_swap(sandbox: &Arc<Mutex<Sandbox>>, req: &AddSwapRequest) -> Res
 // - config.json at /<CONTAINER_BASE>/<cid>/config.json
 // - container rootfs bind mounted at /<CONTAINER_BASE>/<cid>/rootfs
 // - modify container spec root to point to /<CONTAINER_BASE>/<cid>/rootfs
-fn setup_bundle(cid: &str, spec: &mut Spec) -> Result<PathBuf> {
+pub fn setup_bundle(cid: &str, spec: &mut Spec) -> Result<PathBuf> {
    let spec_root = if let Some(sr) = &spec.root {
        sr
    } else {
--- a/src/agent/src/test_utils.rs
+++ b/src/agent/src/test_utils.rs
@@ -5,7 +5,14 @@
 #![allow(clippy::module_inception)]

 #[cfg(test)]
-mod test_utils {
+pub mod test_utils {
+    #[derive(Debug, PartialEq)]
+    pub enum TestUserType {
+        RootOnly,
+        NonRootOnly,
+        Any,
+    }
+
    #[macro_export]
    macro_rules! skip_if_root {
        () => {
--- a/src/runtime/config/configuration-clh.toml.in
+++ b/src/runtime/config/configuration-clh.toml.in
@@ -180,6 +180,78 @@ block_device_driver = "virtio-blk"
 # but it will not abort container execution.
 #guest_hook_path = "/usr/share/oci/hooks"
 #
+# These options are related to network rate limiter at the VMM level, and are
+# based on the Cloud Hypervisor I/O throttling.  Those are disabled by default
+# and we strongly advise users to refer the Cloud Hypervisor official
+# documentation for a better understanding of its internals:
+# https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/docs/io_throttling.md
+# 
+# Bandwidth rate limiter options
+# 
+# net_rate_limiter_bw_max_rate controls network I/O bandwidth (size in bits/sec
+# for SB/VM).
+# The same value is used for inbound and outbound bandwidth.
+# Default 0-sized value means unlimited rate.
+#net_rate_limiter_bw_max_rate = 0
+#
+# net_rate_limiter_bw_one_time_burst increases the initial max rate and this
+# initial extra credit does *NOT* affect the overall limit and can be used for
+# an *initial* burst of data.
+# This is *optional* and only takes effect if net_rate_limiter_bw_max_rate is
+# set to a non zero value.
+#net_rate_limiter_bw_one_time_burst = 0
+#
+# Operation rate limiter options
+#
+# net_rate_limiter_ops_max_rate controls network I/O bandwidth (size in ops/sec
+# for SB/VM).
+# The same value is used for inbound and outbound bandwidth.
+# Default 0-sized value means unlimited rate.
+#net_rate_limiter_ops_max_rate = 0
+#
+# net_rate_limiter_ops_one_time_burst increases the initial max rate and this
+# initial extra credit does *NOT* affect the overall limit and can be used for
+# an *initial* burst of data.
+# This is *optional* and only takes effect if net_rate_limiter_bw_max_rate is
+# set to a non zero value.
+#net_rate_limiter_ops_one_time_burst = 0
+#
+# These options are related to disk rate limiter at the VMM level, and are
+# based on the Cloud Hypervisor I/O throttling.  Those are disabled by default
+# and we strongly advise users to refer the Cloud Hypervisor official
+# documentation for a better understanding of its internals:
+# https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/docs/io_throttling.md
+# 
+# Bandwidth rate limiter options
+# 
+# disk_rate_limiter_bw_max_rate controls disk I/O bandwidth (size in bits/sec
+# for SB/VM).
+# The same value is used for inbound and outbound bandwidth.
+# Default 0-sized value means unlimited rate.
+#disk_rate_limiter_bw_max_rate = 0
+#
+# disk_rate_limiter_bw_one_time_burst increases the initial max rate and this
+# initial extra credit does *NOT* affect the overall limit and can be used for
+# an *initial* burst of data.
+# This is *optional* and only takes effect if disk_rate_limiter_bw_max_rate is
+# set to a non zero value.
+#disk_rate_limiter_bw_one_time_burst = 0
+#
+# Operation rate limiter options
+#
+# disk_rate_limiter_ops_max_rate controls disk I/O bandwidth (size in ops/sec
+# for SB/VM).
+# The same value is used for inbound and outbound bandwidth.
+# Default 0-sized value means unlimited rate.
+#disk_rate_limiter_ops_max_rate = 0
+#
+# disk_rate_limiter_ops_one_time_burst increases the initial max rate and this
+# initial extra credit does *NOT* affect the overall limit and can be used for
+# an *initial* burst of data.
+# This is *optional* and only takes effect if disk_rate_limiter_bw_max_rate is
+# set to a non zero value.
+#disk_rate_limiter_ops_one_time_burst = 0
+
 [agent.@PROJECT_TYPE@]
 # If enabled, make the agent display debug-level messages.
 # (default: disabled)
--- a/src/runtime/pkg/katautils/config.go
+++ b/src/runtime/pkg/katautils/config.go
@@ -74,70 +74,78 @@ type factory struct {
 }

 type hypervisor struct {
-	Path                    string   `toml:"path"`
-	JailerPath              string   `toml:"jailer_path"`
-	Kernel                  string   `toml:"kernel"`
-	CtlPath                 string   `toml:"ctlpath"`
-	Initrd                  string   `toml:"initrd"`
-	Image                   string   `toml:"image"`
-	Firmware                string   `toml:"firmware"`
-	FirmwareVolume          string   `toml:"firmware_volume"`
-	MachineAccelerators     string   `toml:"machine_accelerators"`
-	CPUFeatures             string   `toml:"cpu_features"`
-	KernelParams            string   `toml:"kernel_params"`
-	MachineType             string   `toml:"machine_type"`
-	BlockDeviceDriver       string   `toml:"block_device_driver"`
-	EntropySource           string   `toml:"entropy_source"`
-	SharedFS                string   `toml:"shared_fs"`
-	VirtioFSDaemon          string   `toml:"virtio_fs_daemon"`
-	VirtioFSCache           string   `toml:"virtio_fs_cache"`
-	VhostUserStorePath      string   `toml:"vhost_user_store_path"`
-	FileBackedMemRootDir    string   `toml:"file_mem_backend"`
-	GuestHookPath           string   `toml:"guest_hook_path"`
-	GuestMemoryDumpPath     string   `toml:"guest_memory_dump_path"`
-	HypervisorPathList      []string `toml:"valid_hypervisor_paths"`
-	JailerPathList          []string `toml:"valid_jailer_paths"`
-	CtlPathList             []string `toml:"valid_ctlpaths"`
-	VirtioFSDaemonList      []string `toml:"valid_virtio_fs_daemon_paths"`
-	VirtioFSExtraArgs       []string `toml:"virtio_fs_extra_args"`
-	PFlashList              []string `toml:"pflashes"`
-	VhostUserStorePathList  []string `toml:"valid_vhost_user_store_paths"`
-	FileBackedMemRootList   []string `toml:"valid_file_mem_backends"`
-	EntropySourceList       []string `toml:"valid_entropy_sources"`
-	EnableAnnotations       []string `toml:"enable_annotations"`
-	RxRateLimiterMaxRate    uint64   `toml:"rx_rate_limiter_max_rate"`
-	TxRateLimiterMaxRate    uint64   `toml:"tx_rate_limiter_max_rate"`
-	MemOffset               uint64   `toml:"memory_offset"`
-	VirtioFSCacheSize       uint32   `toml:"virtio_fs_cache_size"`
-	DefaultMaxVCPUs         uint32   `toml:"default_maxvcpus"`
-	MemorySize              uint32   `toml:"default_memory"`
-	MemSlots                uint32   `toml:"memory_slots"`
-	DefaultBridges          uint32   `toml:"default_bridges"`
-	Msize9p                 uint32   `toml:"msize_9p"`
-	PCIeRootPort            uint32   `toml:"pcie_root_port"`
-	NumVCPUs                int32    `toml:"default_vcpus"`
-	BlockDeviceCacheSet     bool     `toml:"block_device_cache_set"`
-	BlockDeviceCacheDirect  bool     `toml:"block_device_cache_direct"`
-	BlockDeviceCacheNoflush bool     `toml:"block_device_cache_noflush"`
-	EnableVhostUserStore    bool     `toml:"enable_vhost_user_store"`
-	DisableBlockDeviceUse   bool     `toml:"disable_block_device_use"`
-	MemPrealloc             bool     `toml:"enable_mem_prealloc"`
-	HugePages               bool     `toml:"enable_hugepages"`
-	VirtioMem               bool     `toml:"enable_virtio_mem"`
-	IOMMU                   bool     `toml:"enable_iommu"`
-	IOMMUPlatform           bool     `toml:"enable_iommu_platform"`
-	Debug                   bool     `toml:"enable_debug"`
-	DisableNestingChecks    bool     `toml:"disable_nesting_checks"`
-	EnableIOThreads         bool     `toml:"enable_iothreads"`
-	DisableImageNvdimm      bool     `toml:"disable_image_nvdimm"`
-	HotplugVFIOOnRootBus    bool     `toml:"hotplug_vfio_on_root_bus"`
-	DisableVhostNet         bool     `toml:"disable_vhost_net"`
-	GuestMemoryDumpPaging   bool     `toml:"guest_memory_dump_paging"`
-	ConfidentialGuest       bool     `toml:"confidential_guest"`
-	GuestSwap               bool     `toml:"enable_guest_swap"`
-	Rootless                bool     `toml:"rootless"`
-	DisableSeccomp          bool     `toml:"disable_seccomp"`
-	DisableSeLinux          bool     `toml:"disable_selinux"`
+	Path                           string   `toml:"path"`
+	JailerPath                     string   `toml:"jailer_path"`
+	Kernel                         string   `toml:"kernel"`
+	CtlPath                        string   `toml:"ctlpath"`
+	Initrd                         string   `toml:"initrd"`
+	Image                          string   `toml:"image"`
+	Firmware                       string   `toml:"firmware"`
+	FirmwareVolume                 string   `toml:"firmware_volume"`
+	MachineAccelerators            string   `toml:"machine_accelerators"`
+	CPUFeatures                    string   `toml:"cpu_features"`
+	KernelParams                   string   `toml:"kernel_params"`
+	MachineType                    string   `toml:"machine_type"`
+	BlockDeviceDriver              string   `toml:"block_device_driver"`
+	EntropySource                  string   `toml:"entropy_source"`
+	SharedFS                       string   `toml:"shared_fs"`
+	VirtioFSDaemon                 string   `toml:"virtio_fs_daemon"`
+	VirtioFSCache                  string   `toml:"virtio_fs_cache"`
+	VhostUserStorePath             string   `toml:"vhost_user_store_path"`
+	FileBackedMemRootDir           string   `toml:"file_mem_backend"`
+	GuestHookPath                  string   `toml:"guest_hook_path"`
+	GuestMemoryDumpPath            string   `toml:"guest_memory_dump_path"`
+	HypervisorPathList             []string `toml:"valid_hypervisor_paths"`
+	JailerPathList                 []string `toml:"valid_jailer_paths"`
+	CtlPathList                    []string `toml:"valid_ctlpaths"`
+	VirtioFSDaemonList             []string `toml:"valid_virtio_fs_daemon_paths"`
+	VirtioFSExtraArgs              []string `toml:"virtio_fs_extra_args"`
+	PFlashList                     []string `toml:"pflashes"`
+	VhostUserStorePathList         []string `toml:"valid_vhost_user_store_paths"`
+	FileBackedMemRootList          []string `toml:"valid_file_mem_backends"`
+	EntropySourceList              []string `toml:"valid_entropy_sources"`
+	EnableAnnotations              []string `toml:"enable_annotations"`
+	RxRateLimiterMaxRate           uint64   `toml:"rx_rate_limiter_max_rate"`
+	TxRateLimiterMaxRate           uint64   `toml:"tx_rate_limiter_max_rate"`
+	MemOffset                      uint64   `toml:"memory_offset"`
+	DiskRateLimiterBwMaxRate       int64    `toml:"disk_rate_limiter_bw_max_rate"`
+	DiskRateLimiterBwOneTimeBurst  int64    `toml:"disk_rate_limiter_bw_one_time_burst"`
+	DiskRateLimiterOpsMaxRate      int64    `toml:"disk_rate_limiter_ops_max_rate"`
+	DiskRateLimiterOpsOneTimeBurst int64    `toml:"disk_rate_limiter_ops_one_time_burst"`
+	NetRateLimiterBwMaxRate        int64    `toml:"net_rate_limiter_bw_max_rate"`
+	NetRateLimiterBwOneTimeBurst   int64    `toml:"net_rate_limiter_bw_one_time_burst"`
+	NetRateLimiterOpsMaxRate       int64    `toml:"net_rate_limiter_ops_max_rate"`
+	NetRateLimiterOpsOneTimeBurst  int64    `toml:"net_rate_limiter_ops_one_time_burst"`
+	VirtioFSCacheSize              uint32   `toml:"virtio_fs_cache_size"`
+	DefaultMaxVCPUs                uint32   `toml:"default_maxvcpus"`
+	MemorySize                     uint32   `toml:"default_memory"`
+	MemSlots                       uint32   `toml:"memory_slots"`
+	DefaultBridges                 uint32   `toml:"default_bridges"`
+	Msize9p                        uint32   `toml:"msize_9p"`
+	PCIeRootPort                   uint32   `toml:"pcie_root_port"`
+	NumVCPUs                       int32    `toml:"default_vcpus"`
+	BlockDeviceCacheSet            bool     `toml:"block_device_cache_set"`
+	BlockDeviceCacheDirect         bool     `toml:"block_device_cache_direct"`
+	BlockDeviceCacheNoflush        bool     `toml:"block_device_cache_noflush"`
+	EnableVhostUserStore           bool     `toml:"enable_vhost_user_store"`
+	DisableBlockDeviceUse          bool     `toml:"disable_block_device_use"`
+	MemPrealloc                    bool     `toml:"enable_mem_prealloc"`
+	HugePages                      bool     `toml:"enable_hugepages"`
+	VirtioMem                      bool     `toml:"enable_virtio_mem"`
+	IOMMU                          bool     `toml:"enable_iommu"`
+	IOMMUPlatform                  bool     `toml:"enable_iommu_platform"`
+	Debug                          bool     `toml:"enable_debug"`
+	DisableNestingChecks           bool     `toml:"disable_nesting_checks"`
+	EnableIOThreads                bool     `toml:"enable_iothreads"`
+	DisableImageNvdimm             bool     `toml:"disable_image_nvdimm"`
+	HotplugVFIOOnRootBus           bool     `toml:"hotplug_vfio_on_root_bus"`
+	DisableVhostNet                bool     `toml:"disable_vhost_net"`
+	GuestMemoryDumpPaging          bool     `toml:"guest_memory_dump_paging"`
+	ConfidentialGuest              bool     `toml:"confidential_guest"`
+	GuestSwap                      bool     `toml:"enable_guest_swap"`
+	Rootless                       bool     `toml:"rootless"`
+	DisableSeccomp                 bool     `toml:"disable_seccomp"`
+	DisableSeLinux                 bool     `toml:"disable_selinux"`
 }

 type runtime struct {
@@ -482,6 +490,34 @@ func (h hypervisor) getInitrdAndImage() (initrd string, image string, err error)
 	return
 }

+func (h hypervisor) getDiskRateLimiterBwMaxRate() int64 {
+	return h.DiskRateLimiterBwMaxRate
+}
+
+func (h hypervisor) getDiskRateLimiterBwOneTimeBurst() int64 {
+	if h.DiskRateLimiterBwOneTimeBurst != 0 && h.getDiskRateLimiterBwMaxRate() == 0 {
+		kataUtilsLogger.Warn("The DiskRateLimiterBwOneTimeBurst is set but DiskRateLimiterBwMaxRate is not set, this option will be ignored.")
+
+		h.DiskRateLimiterBwOneTimeBurst = 0
+	}
+
+	return h.DiskRateLimiterBwOneTimeBurst
+}
+
+func (h hypervisor) getDiskRateLimiterOpsMaxRate() int64 {
+	return h.DiskRateLimiterOpsMaxRate
+}
+
+func (h hypervisor) getDiskRateLimiterOpsOneTimeBurst() int64 {
+	if h.DiskRateLimiterOpsOneTimeBurst != 0 && h.getDiskRateLimiterOpsMaxRate() == 0 {
+		kataUtilsLogger.Warn("The DiskRateLimiterOpsOneTimeBurst is set but DiskRateLimiterOpsMaxRate is not set, this option will be ignored.")
+
+		h.DiskRateLimiterOpsOneTimeBurst = 0
+	}
+
+	return h.DiskRateLimiterOpsOneTimeBurst
+}
+
 func (h hypervisor) getRxRateLimiterCfg() uint64 {
 	return h.RxRateLimiterMaxRate
 }
@@ -490,6 +526,34 @@ func (h hypervisor) getTxRateLimiterCfg() uint64 {
 	return h.TxRateLimiterMaxRate
 }

+func (h hypervisor) getNetRateLimiterBwMaxRate() int64 {
+	return h.NetRateLimiterBwMaxRate
+}
+
+func (h hypervisor) getNetRateLimiterBwOneTimeBurst() int64 {
+	if h.NetRateLimiterBwOneTimeBurst != 0 && h.getNetRateLimiterBwMaxRate() == 0 {
+		kataUtilsLogger.Warn("The NetRateLimiterBwOneTimeBurst is set but NetRateLimiterBwMaxRate is not set, this option will be ignored.")
+
+		h.NetRateLimiterBwOneTimeBurst = 0
+	}
+
+	return h.NetRateLimiterBwOneTimeBurst
+}
+
+func (h hypervisor) getNetRateLimiterOpsMaxRate() int64 {
+	return h.NetRateLimiterOpsMaxRate
+}
+
+func (h hypervisor) getNetRateLimiterOpsOneTimeBurst() int64 {
+	if h.NetRateLimiterOpsOneTimeBurst != 0 && h.getNetRateLimiterOpsMaxRate() == 0 {
+		kataUtilsLogger.Warn("The NetRateLimiterOpsOneTimeBurst is set but NetRateLimiterOpsMaxRate is not set, this option will be ignored.")
+
+		h.NetRateLimiterOpsOneTimeBurst = 0
+	}
+
+	return h.NetRateLimiterOpsOneTimeBurst
+}
+
 func (h hypervisor) getIOMMUPlatform() bool {
 	if h.IOMMUPlatform {
 		kataUtilsLogger.Info("IOMMUPlatform is enabled by default.")
@@ -828,52 +892,60 @@ func newClhHypervisorConfig(h hypervisor) (vc.HypervisorConfig, error) {
 	}

 	return vc.HypervisorConfig{
-		HypervisorPath:          hypervisor,
-		HypervisorPathList:      h.HypervisorPathList,
-		KernelPath:              kernel,
-		InitrdPath:              initrd,
-		ImagePath:               image,
-		FirmwarePath:            firmware,
-		MachineAccelerators:     machineAccelerators,
-		KernelParams:            vc.DeserializeParams(strings.Fields(kernelParams)),
-		HypervisorMachineType:   machineType,
-		NumVCPUs:                h.defaultVCPUs(),
-		DefaultMaxVCPUs:         h.defaultMaxVCPUs(),
-		MemorySize:              h.defaultMemSz(),
-		MemSlots:                h.defaultMemSlots(),
-		MemOffset:               h.defaultMemOffset(),
-		VirtioMem:               h.VirtioMem,
-		EntropySource:           h.GetEntropySource(),
-		EntropySourceList:       h.EntropySourceList,
-		DefaultBridges:          h.defaultBridges(),
-		DisableBlockDeviceUse:   h.DisableBlockDeviceUse,
-		SharedFS:                sharedFS,
-		VirtioFSDaemon:          h.VirtioFSDaemon,
-		VirtioFSDaemonList:      h.VirtioFSDaemonList,
-		VirtioFSCacheSize:       h.VirtioFSCacheSize,
-		VirtioFSCache:           h.VirtioFSCache,
-		MemPrealloc:             h.MemPrealloc,
-		HugePages:               h.HugePages,
-		FileBackedMemRootDir:    h.FileBackedMemRootDir,
-		FileBackedMemRootList:   h.FileBackedMemRootList,
-		Debug:                   h.Debug,
-		DisableNestingChecks:    h.DisableNestingChecks,
-		BlockDeviceDriver:       blockDriver,
-		BlockDeviceCacheSet:     h.BlockDeviceCacheSet,
-		BlockDeviceCacheDirect:  h.BlockDeviceCacheDirect,
-		BlockDeviceCacheNoflush: h.BlockDeviceCacheNoflush,
-		EnableIOThreads:         h.EnableIOThreads,
-		Msize9p:                 h.msize9p(),
-		HotplugVFIOOnRootBus:    h.HotplugVFIOOnRootBus,
-		PCIeRootPort:            h.PCIeRootPort,
-		DisableVhostNet:         true,
-		GuestHookPath:           h.guestHookPath(),
-		VirtioFSExtraArgs:       h.VirtioFSExtraArgs,
-		SGXEPCSize:              defaultSGXEPCSize,
-		EnableAnnotations:       h.EnableAnnotations,
-		DisableSeccomp:          h.DisableSeccomp,
-		ConfidentialGuest:       h.ConfidentialGuest,
-		DisableSeLinux:          h.DisableSeLinux,
+		HypervisorPath:                 hypervisor,
+		HypervisorPathList:             h.HypervisorPathList,
+		KernelPath:                     kernel,
+		InitrdPath:                     initrd,
+		ImagePath:                      image,
+		FirmwarePath:                   firmware,
+		MachineAccelerators:            machineAccelerators,
+		KernelParams:                   vc.DeserializeParams(strings.Fields(kernelParams)),
+		HypervisorMachineType:          machineType,
+		NumVCPUs:                       h.defaultVCPUs(),
+		DefaultMaxVCPUs:                h.defaultMaxVCPUs(),
+		MemorySize:                     h.defaultMemSz(),
+		MemSlots:                       h.defaultMemSlots(),
+		MemOffset:                      h.defaultMemOffset(),
+		VirtioMem:                      h.VirtioMem,
+		EntropySource:                  h.GetEntropySource(),
+		EntropySourceList:              h.EntropySourceList,
+		DefaultBridges:                 h.defaultBridges(),
+		DisableBlockDeviceUse:          h.DisableBlockDeviceUse,
+		SharedFS:                       sharedFS,
+		VirtioFSDaemon:                 h.VirtioFSDaemon,
+		VirtioFSDaemonList:             h.VirtioFSDaemonList,
+		VirtioFSCacheSize:              h.VirtioFSCacheSize,
+		VirtioFSCache:                  h.VirtioFSCache,
+		MemPrealloc:                    h.MemPrealloc,
+		HugePages:                      h.HugePages,
+		FileBackedMemRootDir:           h.FileBackedMemRootDir,
+		FileBackedMemRootList:          h.FileBackedMemRootList,
+		Debug:                          h.Debug,
+		DisableNestingChecks:           h.DisableNestingChecks,
+		BlockDeviceDriver:              blockDriver,
+		BlockDeviceCacheSet:            h.BlockDeviceCacheSet,
+		BlockDeviceCacheDirect:         h.BlockDeviceCacheDirect,
+		BlockDeviceCacheNoflush:        h.BlockDeviceCacheNoflush,
+		EnableIOThreads:                h.EnableIOThreads,
+		Msize9p:                        h.msize9p(),
+		HotplugVFIOOnRootBus:           h.HotplugVFIOOnRootBus,
+		PCIeRootPort:                   h.PCIeRootPort,
+		DisableVhostNet:                true,
+		GuestHookPath:                  h.guestHookPath(),
+		VirtioFSExtraArgs:              h.VirtioFSExtraArgs,
+		SGXEPCSize:                     defaultSGXEPCSize,
+		EnableAnnotations:              h.EnableAnnotations,
+		DisableSeccomp:                 h.DisableSeccomp,
+		ConfidentialGuest:              h.ConfidentialGuest,
+		DisableSeLinux:                 h.DisableSeLinux,
+		NetRateLimiterBwMaxRate:        h.getNetRateLimiterBwMaxRate(),
+		NetRateLimiterBwOneTimeBurst:   h.getNetRateLimiterBwOneTimeBurst(),
+		NetRateLimiterOpsMaxRate:       h.getNetRateLimiterOpsMaxRate(),
+		NetRateLimiterOpsOneTimeBurst:  h.getNetRateLimiterOpsOneTimeBurst(),
+		DiskRateLimiterBwMaxRate:       h.getDiskRateLimiterBwMaxRate(),
+		DiskRateLimiterBwOneTimeBurst:  h.getDiskRateLimiterBwOneTimeBurst(),
+		DiskRateLimiterOpsMaxRate:      h.getDiskRateLimiterOpsMaxRate(),
+		DiskRateLimiterOpsOneTimeBurst: h.getDiskRateLimiterOpsOneTimeBurst(),
 	}, nil
 }

--- a/src/runtime/pkg/katautils/config_test.go
+++ b/src/runtime/pkg/katautils/config_test.go
@@ -810,6 +810,14 @@ func TestNewClhHypervisorConfig(t *testing.T) {
 	kernelPath := path.Join(tmpdir, "kernel")
 	imagePath := path.Join(tmpdir, "image")
 	virtioFsDaemon := path.Join(tmpdir, "virtiofsd")
+	netRateLimiterBwMaxRate := int64(1000)
+	netRateLimiterBwOneTimeBurst := int64(1000)
+	netRateLimiterOpsMaxRate := int64(0)
+	netRateLimiterOpsOneTimeBurst := int64(1000)
+	diskRateLimiterBwMaxRate := int64(1000)
+	diskRateLimiterBwOneTimeBurst := int64(1000)
+	diskRateLimiterOpsMaxRate := int64(0)
+	diskRateLimiterOpsOneTimeBurst := int64(1000)

 	for _, file := range []string{imagePath, hypervisorPath, kernelPath, virtioFsDaemon} {
 		err := createEmptyFile(file)
@@ -817,11 +825,19 @@ func TestNewClhHypervisorConfig(t *testing.T) {
 	}

 	hypervisor := hypervisor{
-		Path:           hypervisorPath,
-		Kernel:         kernelPath,
-		Image:          imagePath,
-		VirtioFSDaemon: virtioFsDaemon,
-		VirtioFSCache:  "always",
+		Path:                           hypervisorPath,
+		Kernel:                         kernelPath,
+		Image:                          imagePath,
+		VirtioFSDaemon:                 virtioFsDaemon,
+		VirtioFSCache:                  "always",
+		NetRateLimiterBwMaxRate:        netRateLimiterBwMaxRate,
+		NetRateLimiterBwOneTimeBurst:   netRateLimiterBwOneTimeBurst,
+		NetRateLimiterOpsMaxRate:       netRateLimiterOpsMaxRate,
+		NetRateLimiterOpsOneTimeBurst:  netRateLimiterOpsOneTimeBurst,
+		DiskRateLimiterBwMaxRate:       diskRateLimiterBwMaxRate,
+		DiskRateLimiterBwOneTimeBurst:  diskRateLimiterBwOneTimeBurst,
+		DiskRateLimiterOpsMaxRate:      diskRateLimiterOpsMaxRate,
+		DiskRateLimiterOpsOneTimeBurst: diskRateLimiterOpsOneTimeBurst,
 	}
 	config, err := newClhHypervisorConfig(hypervisor)
 	if err != nil {
@@ -852,6 +868,39 @@ func TestNewClhHypervisorConfig(t *testing.T) {
 		t.Errorf("Expected VirtioFSCache %v, got %v", true, config.VirtioFSCache)
 	}

+	if config.NetRateLimiterBwMaxRate != netRateLimiterBwMaxRate {
+		t.Errorf("Expected value for network bandwidth rate limiter %v, got %v", netRateLimiterBwMaxRate, config.NetRateLimiterBwMaxRate)
+	}
+
+	if config.NetRateLimiterBwOneTimeBurst != netRateLimiterBwOneTimeBurst {
+		t.Errorf("Expected value for network bandwidth one time burst %v, got %v", netRateLimiterBwOneTimeBurst, config.NetRateLimiterBwOneTimeBurst)
+	}
+
+	if config.NetRateLimiterOpsMaxRate != netRateLimiterOpsMaxRate {
+		t.Errorf("Expected value for network operations rate limiter %v, got %v", netRateLimiterOpsMaxRate, config.NetRateLimiterOpsMaxRate)
+	}
+
+	// We expect 0 (zero) here as netRateLimiterOpsMaxRate is not set (set to zero).
+	if config.NetRateLimiterOpsOneTimeBurst != 0 {
+		t.Errorf("Expected value for network operations one time burst %v, got %v", netRateLimiterOpsOneTimeBurst, config.NetRateLimiterOpsOneTimeBurst)
+	}
+
+	if config.DiskRateLimiterBwMaxRate != diskRateLimiterBwMaxRate {
+		t.Errorf("Expected value for disk bandwidth rate limiter %v, got %v", diskRateLimiterBwMaxRate, config.DiskRateLimiterBwMaxRate)
+	}
+
+	if config.DiskRateLimiterBwOneTimeBurst != diskRateLimiterBwOneTimeBurst {
+		t.Errorf("Expected value for disk bandwidth one time burst %v, got %v", diskRateLimiterBwOneTimeBurst, config.DiskRateLimiterBwOneTimeBurst)
+	}
+
+	if config.DiskRateLimiterOpsMaxRate != diskRateLimiterOpsMaxRate {
+		t.Errorf("Expected value for disk operations rate limiter %v, got %v", diskRateLimiterOpsMaxRate, config.DiskRateLimiterOpsMaxRate)
+	}
+
+	// We expect 0 (zero) here as diskRateLimiterOpsMaxRate is not set (set to zero).
+	if config.DiskRateLimiterOpsOneTimeBurst != 0 {
+		t.Errorf("Expected value for disk operations one time burst %v, got %v", diskRateLimiterOpsOneTimeBurst, config.DiskRateLimiterOpsOneTimeBurst)
+	}
 }

 func TestHypervisorDefaults(t *testing.T) {
--- a/src/runtime/virtcontainers/clh.go
+++ b/src/runtime/virtcontainers/clh.go
@@ -165,6 +165,7 @@ type cloudHypervisor struct {
 	APIClient      clhClient
 	ctx            context.Context
 	id             string
+	devicesIds     map[string]string
 	vmconfig       chclient.VmConfig
 	state          CloudHypervisorState
 	config         HypervisorConfig
@@ -358,6 +359,7 @@ func (clh *cloudHypervisor) CreateVM(ctx context.Context, id string, network Net

 	clh.id = id
 	clh.state.state = clhNotReady
+	clh.devicesIds = make(map[string]string)

 	clh.Logger().WithField("function", "CreateVM").Info("creating Sandbox")

@@ -442,6 +444,11 @@ func (clh *cloudHypervisor) CreateVM(ctx context.Context, id string, network Net
 			disk := chclient.NewDiskConfig(imagePath)
 			disk.SetReadonly(true)

+			diskRateLimiterConfig := clh.getDiskRateLimiterConfig()
+			if diskRateLimiterConfig != nil {
+				disk.SetRateLimiterConfig(*diskRateLimiterConfig)
+			}
+
 			if clh.vmconfig.Disks != nil {
 				*clh.vmconfig.Disks = append(*clh.vmconfig.Disks, *disk)
 			} else {
@@ -665,7 +672,11 @@ func (clh *cloudHypervisor) hotplugAddBlockDevice(drive *config.BlockDrive) erro
 	clhDisk := *chclient.NewDiskConfig(drive.File)
 	clhDisk.Readonly = &drive.ReadOnly
 	clhDisk.VhostUser = func(b bool) *bool { return &b }(false)
-	clhDisk.Id = &driveID
+
+	diskRateLimiterConfig := clh.getDiskRateLimiterConfig()
+	if diskRateLimiterConfig != nil {
+		clhDisk.SetRateLimiterConfig(*diskRateLimiterConfig)
+	}

 	pciInfo, _, err := cl.VmAddDiskPut(ctx, clhDisk)

@@ -673,6 +684,7 @@ func (clh *cloudHypervisor) hotplugAddBlockDevice(drive *config.BlockDrive) erro
 		return fmt.Errorf("failed to hotplug block device %+v %s", drive, openAPIClientError(err))
 	}

+	clh.devicesIds[driveID] = pciInfo.GetId()
 	drive.PCIPath, err = clhPciInfoToPath(pciInfo)

 	return err
@@ -686,11 +698,11 @@ func (clh *cloudHypervisor) hotPlugVFIODevice(device *config.VFIODev) error {
 	// Create the clh device config via the constructor to ensure default values are properly assigned
 	clhDevice := *chclient.NewVmAddDevice()
 	clhDevice.Path = &device.SysfsDev
-	clhDevice.Id = &device.ID
 	pciInfo, _, err := cl.VmAddDevicePut(ctx, clhDevice)
 	if err != nil {
 		return fmt.Errorf("Failed to hotplug device %+v %s", device, openAPIClientError(err))
 	}
+	clh.devicesIds[device.ID] = pciInfo.GetId()

 	// clh doesn't use bridges, so the PCI path is simply the slot
 	// number of the device.  This will break if clh starts using
@@ -751,13 +763,15 @@ func (clh *cloudHypervisor) HotplugRemoveDevice(ctx context.Context, devInfo int
 	ctx, cancel := context.WithTimeout(context.Background(), clhHotPlugAPITimeout*time.Second)
 	defer cancel()

+	originalDeviceID := clh.devicesIds[deviceID]
 	remove := *chclient.NewVmRemoveDevice()
-	remove.Id = &deviceID
+	remove.Id = &originalDeviceID
 	_, err := cl.VmRemoveDevicePut(ctx, remove)
 	if err != nil {
 		err = fmt.Errorf("failed to hotplug remove (unplug) device %+v: %s", devInfo, openAPIClientError(err))
 	}

+	delete(clh.devicesIds, deviceID)
 	return nil, err
 }

@@ -1304,6 +1318,52 @@ func (clh *cloudHypervisor) addVSock(cid int64, path string) {
 	clh.vmconfig.Vsock = chclient.NewVsockConfig(cid, path)
 }

+func (clh *cloudHypervisor) getRateLimiterConfig(bwSize, bwOneTimeBurst, opsSize, opsOneTimeBurst int64) *chclient.RateLimiterConfig {
+	if bwSize == 0 && opsSize == 0 {
+		return nil
+	}
+
+	rateLimiterConfig := chclient.NewRateLimiterConfig()
+
+	if bwSize != 0 {
+		bwTokenBucket := chclient.NewTokenBucket(bwSize, int64(utils.DefaultRateLimiterRefillTimeMilliSecs))
+
+		if bwOneTimeBurst != 0 {
+			bwTokenBucket.SetOneTimeBurst(bwOneTimeBurst)
+		}
+
+		rateLimiterConfig.SetBandwidth(*bwTokenBucket)
+	}
+
+	if opsSize != 0 {
+		opsTokenBucket := chclient.NewTokenBucket(opsSize, int64(utils.DefaultRateLimiterRefillTimeMilliSecs))
+
+		if opsOneTimeBurst != 0 {
+			opsTokenBucket.SetOneTimeBurst(opsOneTimeBurst)
+		}
+
+		rateLimiterConfig.SetOps(*opsTokenBucket)
+	}
+
+	return rateLimiterConfig
+}
+
+func (clh *cloudHypervisor) getNetRateLimiterConfig() *chclient.RateLimiterConfig {
+	return clh.getRateLimiterConfig(
+		int64(utils.RevertBytes(uint64(clh.config.NetRateLimiterBwMaxRate/8))),
+		int64(utils.RevertBytes(uint64(clh.config.NetRateLimiterBwOneTimeBurst/8))),
+		clh.config.NetRateLimiterOpsMaxRate,
+		clh.config.NetRateLimiterOpsOneTimeBurst)
+}
+
+func (clh *cloudHypervisor) getDiskRateLimiterConfig() *chclient.RateLimiterConfig {
+	return clh.getRateLimiterConfig(
+		int64(utils.RevertBytes(uint64(clh.config.DiskRateLimiterBwMaxRate/8))),
+		int64(utils.RevertBytes(uint64(clh.config.DiskRateLimiterBwOneTimeBurst/8))),
+		clh.config.DiskRateLimiterOpsMaxRate,
+		clh.config.DiskRateLimiterOpsOneTimeBurst)
+}
+
 func (clh *cloudHypervisor) addNet(e Endpoint) error {
 	clh.Logger().WithField("endpoint-type", e).Debugf("Adding Endpoint of type %v", e)

@@ -1323,9 +1383,15 @@ func (clh *cloudHypervisor) addNet(e Endpoint) error {
 		"tap": tapPath,
 	}).Info("Adding Net")

+	netRateLimiterConfig := clh.getNetRateLimiterConfig()
+
 	net := chclient.NewNetConfig()
 	net.Mac = &mac
 	net.Tap = &tapPath
+	if netRateLimiterConfig != nil {
+		net.SetRateLimiterConfig(*netRateLimiterConfig)
+	}
+
 	if clh.vmconfig.Net != nil {
 		*clh.vmconfig.Net = append(*clh.vmconfig.Net, *net)
 	} else {
@@ -1435,5 +1501,5 @@ func (clh *cloudHypervisor) vmInfo() (chclient.VmInfo, error) {
 }

 func (clh *cloudHypervisor) IsRateLimiterBuiltin() bool {
-	return false
+	return true
 }
--- a/src/runtime/virtcontainers/clh_test.go
+++ b/src/runtime/virtcontainers/clh_test.go
@@ -52,17 +52,21 @@ func newClhConfig() (HypervisorConfig, error) {
 	}

 	return HypervisorConfig{
-		KernelPath:        testClhKernelPath,
-		ImagePath:         testClhImagePath,
-		HypervisorPath:    testClhPath,
-		NumVCPUs:          defaultVCPUs,
-		BlockDeviceDriver: config.VirtioBlock,
-		MemorySize:        defaultMemSzMiB,
-		DefaultBridges:    defaultBridges,
-		DefaultMaxVCPUs:   uint32(64),
-		SharedFS:          config.VirtioFS,
-		VirtioFSCache:     virtioFsCacheAlways,
-		VirtioFSDaemon:    testVirtiofsdPath,
+		KernelPath:                    testClhKernelPath,
+		ImagePath:                     testClhImagePath,
+		HypervisorPath:                testClhPath,
+		NumVCPUs:                      defaultVCPUs,
+		BlockDeviceDriver:             config.VirtioBlock,
+		MemorySize:                    defaultMemSzMiB,
+		DefaultBridges:                defaultBridges,
+		DefaultMaxVCPUs:               uint32(64),
+		SharedFS:                      config.VirtioFS,
+		VirtioFSCache:                 virtioFsCacheAlways,
+		VirtioFSDaemon:                testVirtiofsdPath,
+		NetRateLimiterBwMaxRate:       int64(0),
+		NetRateLimiterBwOneTimeBurst:  int64(0),
+		NetRateLimiterOpsMaxRate:      int64(0),
+		NetRateLimiterOpsOneTimeBurst: int64(0),
 	}, nil
 }

@@ -191,6 +195,181 @@ func TestCloudHypervisorAddNetCheckEnpointTypes(t *testing.T) {
 	}
 }

+// Check AddNet properly sets up the network rate limiter
+func TestCloudHypervisorNetRateLimiter(t *testing.T) {
+	assert := assert.New(t)
+
+	tapPath := "/path/to/tap"
+
+	validVeth := &VethEndpoint{}
+	validVeth.NetPair.TapInterface.TAPIface.Name = tapPath
+
+	type args struct {
+		bwMaxRate       int64
+		bwOneTimeBurst  int64
+		opsMaxRate      int64
+		opsOneTimeBurst int64
+	}
+
+	//nolint: govet
+	tests := []struct {
+		name                  string
+		args                  args
+		expectsRateLimiter    bool
+		expectsBwBucketToken  bool
+		expectsOpsBucketToken bool
+	}{
+		// Bandwidth
+		{
+			"Bandwidth | max rate with one time burst",
+			args{
+				bwMaxRate:      int64(1000),
+				bwOneTimeBurst: int64(10000),
+			},
+			true,  // expectsRateLimiter
+			true,  // expectsBwBucketToken
+			false, // expectsOpsBucketToken
+		},
+		{
+			"Bandwidth | max rate without one time burst",
+			args{
+				bwMaxRate: int64(1000),
+			},
+			true,  // expectsRateLimiter
+			true,  // expectsBwBucketToken
+			false, // expectsOpsBucketToken
+		},
+		{
+			"Bandwidth | no max rate with one time burst",
+			args{
+				bwOneTimeBurst: int64(10000),
+			},
+			false, // expectsRateLimiter
+			false, // expectsBwBucketToken
+			false, // expectsOpsBucketToken
+		},
+		{
+			"Bandwidth | no max rate and no one time burst",
+			args{},
+			false, // expectsRateLimiter
+			false, // expectsBwBucketToken
+			false, // expectsOpsBucketToken
+		},
+
+		// Operations
+		{
+			"Operations | max rate with one time burst",
+			args{
+				opsMaxRate:      int64(1000),
+				opsOneTimeBurst: int64(10000),
+			},
+			true,  // expectsRateLimiter
+			false, // expectsBwBucketToken
+			true,  // expectsOpsBucketToken
+		},
+		{
+			"Operations | max rate without one time burst",
+			args{
+				opsMaxRate: int64(1000),
+			},
+			true,  // expectsRateLimiter
+			false, // expectsBwBucketToken
+			true,  // expectsOpsBucketToken
+		},
+		{
+			"Operations | no max rate with one time burst",
+			args{
+				opsOneTimeBurst: int64(10000),
+			},
+			false, // expectsRateLimiter
+			false, // expectsBwBucketToken
+			false, // expectsOpsBucketToken
+		},
+		{
+			"Operations | no max rate and no one time burst",
+			args{},
+			false, // expectsRateLimiter
+			false, // expectsBwBucketToken
+			false, // expectsOpsBucketToken
+		},
+
+		// Bandwidth and Operations
+		{
+			"Bandwidth and Operations | max rate with one time burst",
+			args{
+				bwMaxRate:       int64(1000),
+				bwOneTimeBurst:  int64(10000),
+				opsMaxRate:      int64(1000),
+				opsOneTimeBurst: int64(10000),
+			},
+			true, // expectsRateLimiter
+			true, // expectsBwBucketToken
+			true, // expectsOpsBucketToken
+		},
+		{
+			"Bandwidth and Operations | max rate without one time burst",
+			args{
+				bwMaxRate:  int64(1000),
+				opsMaxRate: int64(1000),
+			},
+			true, // expectsRateLimiter
+			true, // expectsBwBucketToken
+			true, // expectsOpsBucketToken
+		},
+		{
+			"Bandwidth and Operations | no max rate with one time burst",
+			args{
+				bwOneTimeBurst:  int64(10000),
+				opsOneTimeBurst: int64(10000),
+			},
+			false, // expectsRateLimiter
+			false, // expectsBwBucketToken
+			false, // expectsOpsBucketToken
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			clhConfig, err := newClhConfig()
+			assert.NoError(err)
+
+			clhConfig.NetRateLimiterBwMaxRate = tt.args.bwMaxRate
+			clhConfig.NetRateLimiterBwOneTimeBurst = tt.args.bwOneTimeBurst
+			clhConfig.NetRateLimiterOpsMaxRate = tt.args.opsMaxRate
+			clhConfig.NetRateLimiterOpsOneTimeBurst = tt.args.opsOneTimeBurst
+
+			clh := &cloudHypervisor{}
+			clh.config = clhConfig
+			clh.APIClient = &clhClientMock{}
+
+			if err := clh.addNet(validVeth); err != nil {
+				t.Errorf("cloudHypervisor.addNet() error = %v", err)
+			} else {
+				netConfig := (*clh.vmconfig.Net)[0]
+
+				assert.Equal(netConfig.HasRateLimiterConfig(), tt.expectsRateLimiter)
+				if tt.expectsRateLimiter {
+					rateLimiterConfig := netConfig.GetRateLimiterConfig()
+					assert.Equal(rateLimiterConfig.HasBandwidth(), tt.expectsBwBucketToken)
+					assert.Equal(rateLimiterConfig.HasOps(), tt.expectsOpsBucketToken)
+
+					if tt.expectsBwBucketToken {
+						bwBucketToken := rateLimiterConfig.GetBandwidth()
+						assert.Equal(bwBucketToken.GetSize(), int64(utils.RevertBytes(uint64(tt.args.bwMaxRate/8))))
+						assert.Equal(bwBucketToken.GetOneTimeBurst(), int64(utils.RevertBytes(uint64(tt.args.bwOneTimeBurst/8))))
+					}
+
+					if tt.expectsOpsBucketToken {
+						opsBucketToken := rateLimiterConfig.GetOps()
+						assert.Equal(opsBucketToken.GetSize(), int64(tt.args.opsMaxRate))
+						assert.Equal(opsBucketToken.GetOneTimeBurst(), int64(tt.args.opsOneTimeBurst))
+					}
+				}
+			}
+		})
+	}
+}
+
 func TestCloudHypervisorBootVM(t *testing.T) {
 	clh := &cloudHypervisor{}
 	clh.APIClient = &clhClientMock{}
@@ -384,6 +563,7 @@ func TestCloudHypervisorHotplugAddBlockDevice(t *testing.T) {
 	clh := &cloudHypervisor{}
 	clh.config = clhConfig
 	clh.APIClient = &clhClientMock{}
+	clh.devicesIds = make(map[string]string)

 	clh.config.BlockDeviceDriver = config.VirtioBlock
 	err = clh.hotplugAddBlockDevice(&config.BlockDrive{Pmem: false})
@@ -406,6 +586,7 @@ func TestCloudHypervisorHotplugRemoveDevice(t *testing.T) {
 	clh := &cloudHypervisor{}
 	clh.config = clhConfig
 	clh.APIClient = &clhClientMock{}
+	clh.devicesIds = make(map[string]string)

 	_, err = clh.HotplugRemoveDevice(context.Background(), &config.BlockDrive{}, BlockDev)
 	assert.NoError(err, "Hotplug remove block device expected no error")
--- a/src/runtime/virtcontainers/fc.go
+++ b/src/runtime/virtcontainers/fc.go
@@ -930,7 +930,7 @@ func (fc *firecracker) fcAddNetDevice(ctx context.Context, endpoint Endpoint) {
 	// The implementation of rate limiter is based on TBF.
 	// Rate Limiter defines a token bucket with a maximum capacity (size) to store tokens, and an interval for refilling purposes (refill_time).
 	// The refill-rate is derived from size and refill_time, and it is the constant rate at which the tokens replenish.
-	refillTime := uint64(1000)
+	refillTime := uint64(utils.DefaultRateLimiterRefillTimeMilliSecs)
 	var rxRateLimiter models.RateLimiter
 	rxSize := fc.config.RxRateLimiterMaxRate
 	if rxSize > 0 {
@@ -938,7 +938,7 @@ func (fc *firecracker) fcAddNetDevice(ctx context.Context, endpoint Endpoint) {

 		// kata-defined rxSize is in bits with scaling factors of 1000, but firecracker-defined
 		// rxSize is in bytes with scaling factors of 1024, need reversion.
-		rxSize = revertBytes(rxSize / 8)
+		rxSize = utils.RevertBytes(rxSize / 8)
 		rxTokenBucket := models.TokenBucket{
 			RefillTime: &refillTime,
 			Size:       &rxSize,
@@ -955,7 +955,7 @@ func (fc *firecracker) fcAddNetDevice(ctx context.Context, endpoint Endpoint) {

 		// kata-defined txSize is in bits with scaling factors of 1000, but firecracker-defined
 		// txSize is in bytes with scaling factors of 1024, need reversion.
-		txSize = revertBytes(txSize / 8)
+		txSize = utils.RevertBytes(txSize / 8)
 		txTokenBucket := models.TokenBucket{
 			RefillTime: &refillTime,
 			Size:       &txSize,
@@ -1266,15 +1266,3 @@ func (fc *firecracker) GenerateSocket(id string) (interface{}, error) {
 func (fc *firecracker) IsRateLimiterBuiltin() bool {
 	return true
 }
-
-// In firecracker, it accepts the size of rate limiter in scaling factors of 2^10(1024)
-// But in kata-defined rate limiter, for better Human-readability, we prefer scaling factors of 10^3(1000).
-// func revertByte reverts num from scaling factors of 1000 to 1024, e.g. 10000000(10MB) to 10485760.
-func revertBytes(num uint64) uint64 {
-	a := num / 1000
-	b := num % 1000
-	if a == 0 {
-		return num
-	}
-	return 1024*revertBytes(a) + b
-}
--- a/src/runtime/virtcontainers/fc_test.go
+++ b/src/runtime/virtcontainers/fc_test.go
@@ -50,17 +50,6 @@ func TestFCTruncateID(t *testing.T) {
 	assert.Equal(expectedID, id)
 }

-func TestRevertBytes(t *testing.T) {
-	assert := assert.New(t)
-
-	//10MB
-	testNum := uint64(10000000)
-	expectedNum := uint64(10485760)
-
-	num := revertBytes(testNum)
-	assert.Equal(expectedNum, num)
-}
-
 func TestFCParseVersion(t *testing.T) {
 	assert := assert.New(t)

--- a/src/runtime/virtcontainers/hypervisor.go
+++ b/src/runtime/virtcontainers/hypervisor.go
@@ -380,12 +380,48 @@ type HypervisorConfig struct {
 	// Enable SGX. Hardware-based isolation and memory encryption.
 	SGXEPCSize int64

+	// DiskRateLimiterBwRate is used to control disk I/O bandwidth on VM level.
+	// The same value, defined in bits per second, is used for inbound and outbound bandwidth.
+	DiskRateLimiterBwMaxRate int64
+
+	// DiskRateLimiterBwOneTimeBurst is used to control disk I/O bandwidth on VM level.
+	// This increases the initial max rate and this initial extra credit does *NOT* replenish
+	// and can be used for an *initial* burst of data.
+	DiskRateLimiterBwOneTimeBurst int64
+
+	// DiskRateLimiterOpsRate is used to control disk I/O operations on VM level.
+	// The same value, defined in operations per second, is used for inbound and outbound bandwidth.
+	DiskRateLimiterOpsMaxRate int64
+
+	// DiskRateLimiterOpsOneTimeBurst is used to control disk I/O operations on VM level.
+	// This increases the initial max rate and this initial extra credit does *NOT* replenish
+	// and can be used for an *initial* burst of data.
+	DiskRateLimiterOpsOneTimeBurst int64
+
 	// RxRateLimiterMaxRate is used to control network I/O inbound bandwidth on VM level.
 	RxRateLimiterMaxRate uint64

 	// TxRateLimiterMaxRate is used to control network I/O outbound bandwidth on VM level.
 	TxRateLimiterMaxRate uint64

+	// NetRateLimiterBwRate is used to control network I/O bandwidth on VM level.
+	// The same value, defined in bits per second, is used for inbound and outbound bandwidth.
+	NetRateLimiterBwMaxRate int64
+
+	// NetRateLimiterBwOneTimeBurst is used to control network I/O bandwidth on VM level.
+	// This increases the initial max rate and this initial extra credit does *NOT* replenish
+	// and can be used for an *initial* burst of data.
+	NetRateLimiterBwOneTimeBurst int64
+
+	// NetRateLimiterOpsRate is used to control network I/O operations on VM level.
+	// The same value, defined in operations per second, is used for inbound and outbound bandwidth.
+	NetRateLimiterOpsMaxRate int64
+
+	// NetRateLimiterOpsOneTimeBurst is used to control network I/O operations on VM level.
+	// This increases the initial max rate and this initial extra credit does *NOT* replenish
+	// and can be used for an *initial* burst of data.
+	NetRateLimiterOpsOneTimeBurst int64
+
 	// MemOffset specifies memory space for nvdimm device
 	MemOffset uint64

--- a/src/runtime/virtcontainers/qemu.go
+++ b/src/runtime/virtcontainers/qemu.go
@@ -38,6 +38,7 @@ import (
 	pkgUtils "github.com/kata-containers/kata-containers/src/runtime/pkg/utils"
 	"github.com/kata-containers/kata-containers/src/runtime/pkg/uuid"
 	"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/device/config"
+	"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/device/drivers"
 	"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/types"
 	vcTypes "github.com/kata-containers/kata-containers/src/runtime/virtcontainers/types"
 	"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/utils"
@@ -129,6 +130,8 @@ const (
 	fallbackFileBackedMemDir = "/dev/shm"

 	qemuStopSandboxTimeoutSecs = 15
+
+	qomPathPrefix = "/machine/peripheral/"
 )

 // agnostic list of kernel parameters
@@ -1395,31 +1398,68 @@ func (q *qemu) hotplugAddVhostUserBlkDevice(ctx context.Context, vAttr *config.V
 	}()

 	driver := "vhost-user-blk-pci"
-	addr, bridge, err := q.arch.addDeviceToBridge(ctx, vAttr.DevID, types.PCI)
-	if err != nil {
-		return err
-	}

-	defer func() {
-		if err != nil {
-			q.arch.removeDeviceFromBridge(vAttr.DevID)
+	machineType := q.HypervisorConfig().HypervisorMachineType
+
+	switch machineType {
+	case QemuVirt:
+		if q.state.PCIeRootPort <= 0 {
+			return fmt.Errorf("Vhost-user-blk device is a PCIe device if machine type is virt. Need to add the PCIe Root Port by setting the pcie_root_port parameter in the configuration for virt")
 		}
-	}()

-	bridgeSlot, err := vcTypes.PciSlotFromInt(bridge.Addr)
-	if err != nil {
-		return err
-	}
-	devSlot, err := vcTypes.PciSlotFromString(addr)
-	if err != nil {
-		return err
-	}
-	vAttr.PCIPath, err = vcTypes.PciPathFromSlots(bridgeSlot, devSlot)
+		//The addr of a dev is corresponding with device:function for PCIe in qemu which starting from 0
+		//Since the dev is the first and only one on this bus(root port), it should be 0.
+		addr := "00"

-	if err = q.qmpMonitorCh.qmp.ExecutePCIVhostUserDevAdd(q.qmpMonitorCh.ctx, driver, devID, vAttr.DevID, addr, bridge.ID); err != nil {
-		return err
-	}
+		bridgeId := fmt.Sprintf("%s%d", pcieRootPortPrefix, len(drivers.AllPCIeDevs))
+		drivers.AllPCIeDevs[devID] = true

+		bridgeQomPath := fmt.Sprintf("%s%s", qomPathPrefix, bridgeId)
+		bridgeSlot, err := q.qomGetSlot(bridgeQomPath)
+		if err != nil {
+			return err
+		}
+
+		devSlot, err := vcTypes.PciSlotFromString(addr)
+		if err != nil {
+			return err
+		}
+
+		vAttr.PCIPath, err = vcTypes.PciPathFromSlots(bridgeSlot, devSlot)
+		if err != nil {
+			return err
+		}
+
+		if err = q.qmpMonitorCh.qmp.ExecutePCIVhostUserDevAdd(q.qmpMonitorCh.ctx, driver, devID, vAttr.DevID, addr, bridgeId); err != nil {
+			return err
+		}
+
+	default:
+		addr, bridge, err := q.arch.addDeviceToBridge(ctx, vAttr.DevID, types.PCI)
+		if err != nil {
+			return err
+		}
+		defer func() {
+			if err != nil {
+				q.arch.removeDeviceFromBridge(vAttr.DevID)
+			}
+		}()
+
+		bridgeSlot, err := vcTypes.PciSlotFromInt(bridge.Addr)
+		if err != nil {
+			return err
+		}
+
+		devSlot, err := vcTypes.PciSlotFromString(addr)
+		if err != nil {
+			return err
+		}
+		vAttr.PCIPath, err = vcTypes.PciPathFromSlots(bridgeSlot, devSlot)
+
+		if err = q.qmpMonitorCh.qmp.ExecutePCIVhostUserDevAdd(q.qmpMonitorCh.ctx, driver, devID, vAttr.DevID, addr, bridge.ID); err != nil {
+			return err
+		}
+	}
 	return nil
 }

@@ -1461,8 +1501,13 @@ func (q *qemu) hotplugVhostUserDevice(ctx context.Context, vAttr *config.VhostUs
 			return fmt.Errorf("Incorrect vhost-user device type found")
 		}
 	} else {
-		if err := q.arch.removeDeviceFromBridge(vAttr.DevID); err != nil {
-			return err
+
+		machineType := q.HypervisorConfig().HypervisorMachineType
+
+		if machineType != QemuVirt {
+			if err := q.arch.removeDeviceFromBridge(vAttr.DevID); err != nil {
+				return err
+			}
 		}

 		if err := q.qmpMonitorCh.qmp.ExecuteDeviceDel(q.qmpMonitorCh.ctx, devID); err != nil {
@@ -2315,7 +2360,7 @@ func genericAppendPCIeRootPort(devices []govmmQemu.Device, number uint32, machin
 		addr          string
 	)
 	switch machineType {
-	case QemuQ35:
+	case QemuQ35, QemuVirt:
 		bus = defaultBridgeBus
 		chassis = "0"
 		multiFunction = false
--- a/src/runtime/virtcontainers/utils/utils.go
+++ b/src/runtime/virtcontainers/utils/utils.go
@@ -25,6 +25,11 @@ const cpBinaryName = "cp"

 const fileMode0755 = os.FileMode(0755)

+// The DefaultRateLimiterRefillTime is used for calculating the rate at
+// which a TokenBucket is replinished, in cases where a RateLimiter is
+// applied to either network or disk I/O.
+const DefaultRateLimiterRefillTimeMilliSecs = 1000
+
 // MibToBytesShift the number to shift needed to convert MiB to Bytes
 const MibToBytesShift = 20

@@ -458,3 +463,19 @@ func getAllParentPaths(path string) []string {
 	// remove the "/" or "." from the return result
 	return paths[1:]
 }
+
+// In Cloud Hypervisor, as well as in Firecracker, the crate used by the VMMs
+// accepts the size of rate limiter in scaling factors of 2^10(1024).
+// But in kata-defined rate limiter, for better Human-readability, we prefer
+// scaling factors of 10^3(1000).
+//
+// func revertBytes reverts num from scaling factors of 1000 to 1024, e.g.
+// 10000000(10MB) to 10485760.
+func RevertBytes(num uint64) uint64 {
+	a := num / 1000
+	b := num % 1000
+	if a == 0 {
+		return num
+	}
+	return 1024*RevertBytes(a) + b
+}
--- a/src/runtime/virtcontainers/utils/utils_test.go
+++ b/src/runtime/virtcontainers/utils/utils_test.go
@@ -569,3 +569,14 @@ func TestGetAllParentPaths(t *testing.T) {
 		assert.Equal(tc.parents, getAllParentPaths(tc.targetPath))
 	}
 }
+
+func TestRevertBytes(t *testing.T) {
+	assert := assert.New(t)
+
+	//10MB
+	testNum := uint64(10000000)
+	expectedNum := uint64(10485760)
+
+	num := RevertBytes(testNum)
+	assert.Equal(expectedNum, num)
+}
--- a/src/tools/agent-ctl/README.md
+++ b/src/tools/agent-ctl/README.md
@@ -4,7 +4,7 @@

 The Kata Containers agent control tool (`kata-agent-ctl`) is a low-level test
 tool. It allows basic interaction with the Kata Containers agent,
-`kata-agent`, that runs inside the virtual machine.
+`kata-agent`, that runs inside the virtual machine (VM).

 Unlike the Kata Runtime, which only ever makes sequences of correctly ordered
 and valid agent API calls, this tool allows users to make arbitrary agent API
@@ -117,7 +117,7 @@ establish the VSOCK guest CID value to connect to the agent.

 1. Start a Kata Container

-1. Establish the VSOCK guest CID number for the virtual machine:
+1. Establish the VSOCK guest CID number for the VM:

   ```sh
   $ guest_cid=$(sudo ss -H --vsock | awk '{print $6}' | cut -d: -f1)
@@ -211,10 +211,12 @@ $ sudo install -o root -g root -m 0755 ~/.cargo/bin/kata-agent-ctl /usr/local/bi

 > **Warnings:**
 >
-> - This method is **only** for testing and development!
+> - These methods are **only** for testing and development!
 > - Only continue if you are using a non-critical system
 >   (such as a freshly installed VM environment).

+#### Use a Unix abstract domain socket
+
 1. Start the agent, specifying a local socket for it to communicate on:

   ```sh
@@ -233,3 +235,31 @@ $ sudo install -o root -g root -m 0755 ~/.cargo/bin/kata-agent-ctl /usr/local/bi
   >
   > The `@` in the server address is required - it denotes an abstract
   > socket which the agent requires (see `unix(7)`).
+
+#### Use a VSOCK loopback socket
+
+VSOCK supports a special CID value of `1` (known symbolically as
+`VMADDR_CID_LOCAL`) which assumes that the VM is actually
+the local environment. This is effectively a `localhost` or loopback
+interface which does not require an actual VM to be
+running.
+
+1. Start the agent, specifying the local VSOCK socket for it to communicate on:
+
+   ```sh
+   $ vsock_loopback_cid=1
+   $ agent_vsock_port=1024
+
+   $ sudo KATA_AGENT_SERVER_ADDR="vsock://${vsock_loopback_cid}:${agent_vsock_port}" target/x86_64-unknown-linux-musl/release/kata-agent
+   ```
+
+   > **Note:** This example assumes an Intel x86-64 system.
+
+1. Run the tool in the same environment:
+
+   ```sh
+   $ vsock_loopback_cid=1
+   $ agent_vsock_port=1024
+
+   $ cargo run -- -l debug connect --server-address "vsock://${vsock_loopback_cid}:${agent_vsock_port}" --bundle-dir "$bundle_dir" -c Check -c GetGuestDetails
+   ```
--- a/src/tools/agent-ctl/src/client.rs
+++ b/src/tools/agent-ctl/src/client.rs
@@ -483,10 +483,8 @@ fn create_ttrpc_client(
            if path.starts_with('@') {
                abstract_socket = true;

-                // Remove the magic abstract-socket request character ('@')
-                // and crucially add a trailing nul terminator (required to
-                // interoperate with the ttrpc crate).
-                path = path[1..].to_string() + &"\x00".to_string();
+                // Remove the magic abstract-socket request character ('@').
+                path = path[1..].to_string();
            }

            if abstract_socket {
--- a/src/tools/runk/.gitignore
+++ b/src/tools/runk/.gitignore
@@ -0,0 +1 @@
+/vendor/
--- a/src/tools/runk/Cargo.lock
+++ b/src/tools/runk/Cargo.lock
--- a/src/tools/runk/Cargo.toml
+++ b/src/tools/runk/Cargo.toml
@@ -0,0 +1,29 @@
+[package]
+name = "runk"
+version = "0.0.1"
+authors = ["The Kata Containers community <kata-dev@lists.katacontainers.io>"]
+description = "runk: Kata OCI container runtime based on Kata agent"
+license = "Apache-2.0"
+edition = "2018"
+
+[dependencies]
+libcontainer = { path = "./libcontainer" }
+rustjail = { path = "../../agent/rustjail", features = ["standard-oci-runtime"] }
+oci = { path = "../../libs/oci" }
+logging = { path = "../../libs/logging" }
+liboci-cli = "0.0.3"
+clap = { version = "3.0.6", features = ["derive", "cargo"] }
+libc = "0.2.108"
+nix = "0.23.0"
+anyhow = "1.0.52"
+slog = "2.7.0"
+chrono = { version = "0.4.19", features = ["serde"] }
+slog-async = "2.7.0"
+tokio = { version = "1.15.0", features = ["full"] }
+serde = { version = "1.0.133", features = ["derive"] }
+serde_json = "1.0.74"
+
+[workspace]
+members = [
+    "libcontainer"
+]
--- a/src/tools/runk/Makefile
+++ b/src/tools/runk/Makefile
@@ -0,0 +1,60 @@
+# Copyright 2021-2022 Sony Group Corporation
+#
+# SPDX-License-Identifier: Apache-2.0
+#
+
+include ../../../utils.mk
+
+TARGET = runk
+TARGET_PATH = target/$(TRIPLE)/$(BUILD_TYPE)/$(TARGET)
+
+AGENT_TARGET = oci-kata-agent
+AGENT_TARGET_PATH = target/$(TRIPLE)/$(BUILD_TYPE)/$(AGENT_TARGET)
+AGENT_SOURCE_PATH = ../../agent
+
+# BINDIR is a directory for installing executable programs
+BINDIR := /usr/local/bin
+
+.DEFAULT_GOAL := default
+default: build
+
+build: build-agent build-runk
+
+build-agent:
+	make -C $(AGENT_SOURCE_PATH) STANDARD_OCI_RUNTIME=yes
+
+build-runk:
+	@RUSTFLAGS="$(EXTRA_RUSTFLAGS) --deny warnings" cargo build --target $(TRIPLE) --$(BUILD_TYPE)
+
+install: install-agent install-runk
+
+install-agent:
+	install -D $(AGENT_SOURCE_PATH)/$(AGENT_TARGET_PATH) $(BINDIR)/$(AGENT_TARGET)
+
+install-runk:
+	install -D $(TARGET_PATH) $(BINDIR)/$(TARGET)
+
+clean:
+	cargo clean
+
+vendor:
+	cargo vendor
+
+test:
+	cargo test --all --target $(TRIPLE) -- --nocapture
+
+check: standard_rust_check
+
+.PHONY: \
+	build \
+	build-agent \
+	build-runk \
+	install \
+	install-agent \
+	install-runk \
+	clean \
+	clippy \
+	format \
+	vendor \
+	test \
+	check \
--- a/src/tools/runk/README.md
+++ b/src/tools/runk/README.md
@@ -0,0 +1,284 @@
+# runk
+
+## Overview
+
+> **Warnings:**
+> `runk` is currently an experimental tool.
+> Only continue if you are using a non-critical system.
+
+`runk` is a standard OCI container runtime written in Rust based on a modified version of
+the [Kata Container agent](https://github.com/kata-containers/kata-containers/tree/main/src/agent), `kata-agent`.
+
+`runk` conforms to the [OCI Container Runtime specifications](https://github.com/opencontainers/runtime-spec).
+
+Unlike the [Kata Container runtime](https://github.com/kata-containers/kata-containers/tree/main/src/agent#features),
+`kata-runtime`, `runk` spawns and runs containers on the host machine directly.
+The user can run `runk` in the same way as the existing container runtimes such as `runc`,
+the most used implementation of the OCI runtime specs.
+
+## Why does `runk` exist?
+
+The `kata-agent` is a process running inside a virtual machine (VM) as a supervisor for managing containers
+and processes running within those containers.
+In other words, the `kata-agent` is a kind of "low-level" container runtime inside VM because the agent
+spawns and runs containers according to the OCI runtime specs.
+However, the `kata-agent` does not have the OCI Command-Line Interface (CLI) that is defined in the
+[runtime spec](https://github.com/opencontainers/runtime-spec/blob/master/runtime.md).
+The `kata-runtime` provides the CLI part of the Kata Containers runtime component,
+but the `kata-runtime` is a container runtime for creating hardware-virtualized containers running on the host.
+
+`runk` is a Rust-based standard OCI container runtime that manages normal containers,
+not hardware-virtualized containers.
+`runk` aims to become one of the alternatives to existing OCI compliant container runtimes.
+The `kata-agent` has most of the [features](https://github.com/kata-containers/kata-containers/tree/main/src/agent#features)
+needed for the container runtime and delivers high performance with a low memory footprint owing to the
+implementation by Rust language.
+Therefore, `runk` leverages the mechanism of the `kata-agent` to avoid reinventing the wheel.
+
+## Performance
+
+`runk` is faster than `runc` and has a lower memory footprint.
+
+This table shows the average of the elapsed time and the memory footprint (maximum resident set size)
+for running sequentially 100 containers, the containers run `/bin/true` using `run` command with
+[detached mode](https://github.com/opencontainers/runc/blob/master/docs/terminals.md#detached)
+on 12 CPU cores (`3.8 GHz AMD Ryzen 9 3900X`) and 32 GiB of RAM.
+`runk` always runs containers with detached mode currently.
+
+Evaluation Results:
+
+|                       | `runk` (v0.0.1) | `runc` (v1.0.3) | `crun` (v1.4.2) |
+|-----------------------|---------------|---------------|---------------|
+| time [ms]           | 39.83         | 50.39         | 38.41         |
+| memory footprint [MB] | 4.013         | 10.78         | 1.738         |
+
+## Status of `runk`
+
+We drafted the initial code here, and any contributions to `runk` and [`kata-agent`](https://github.com/kata-containers/kata-containers/tree/main/src/agent)
+are welcome.
+
+Regarding features compared to `runc`, see the `Status of runk` section in the [issue](https://github.com/kata-containers/kata-containers/issues/2784).
+
+## Building
+
+`runk` uses the modified the `kata-agent` binary, `oci-kata-agent`, which is an agent to be called from `runk`.
+Therefore, you also need to build the `oci-kata-agent` to run `runk`.
+
+You can build both `runk` and `oci-kata-agent` as follows.
+
+```bash
+$ cd runk
+$ make
+```
+
+To install `runk` and `oci-kata-agent` into default directory for install executable program (`/usr/local/bin`):
+
+```bash
+$ sudo make install
+```
+
+## Using `runk` directly
+
+Please note that `runk` is a low level tool not developed with an end user in mind.
+It is mostly employed by other higher-level container software like `containerd`.
+
+If you still want to use `runk` directly, here's how.
+
+### Prerequisites
+
+It is necessary to create an OCI bundle to use the tool. The simplest method is:
+
+``` bash
+$ bundle_dir="bundle"
+$ rootfs_dir="$bundle_dir/rootfs"
+$ image="busybox"
+$ mkdir -p "$rootfs_dir" && (cd "$bundle_dir" && runk spec)
+$ sudo docker export $(sudo docker create "$image") | tar -C "$rootfs_dir" -xf -
+```
+
+> **Note:**
+> If you use the unmodified `runk spec` template, this should give a `sh` session inside the container.
+> However, if you use `runk` directly and run a container with the unmodified template,
+> `runk` cannot launch the `sh` session because `runk` does not support terminal handling yet.
+> You need to edit the process field in the `config.json` should look like this below
+> with `"terminal": false` and `"args": ["sleep", "10"]`.
+
+```json
+"process": {
+    "terminal": false,
+    "user": {
+        "uid": 0,
+        "gid": 0
+    },
+    "args": [
+        "sleep",
+        "10"
+    ],
+    "env": [
+        "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
+        "TERM=xterm"
+    ],
+    "cwd": "/",
+    [...]
+}
+```
+
+If you want to launch the `sh` session inside the container, you need to run `runk` from `containerd`.
+
+Please refer to the [Using `runk` from containerd](#using-runk-from-containerd) section
+
+### Running a container
+
+Now you can go through the [lifecycle operations](https://github.com/opencontainers/runtime-spec/blob/master/runtime.md)
+in your shell.
+You need to run `runk` as `root` because `runk` does not have the rootless feature which is the ability
+to run containers without root privileges.
+
+```bash
+$ cd $bundle_dir
+
+# Create a container
+$ sudo runk create test
+
+# View the container is created and in the "created" state
+$ sudo runk state test
+
+# Start the process inside the container
+$ sudo runk start test
+
+# After 10 seconds view that the container has exited and is now in the "stopped" state
+$ sudo runk state test
+
+# Now delete the container
+$ sudo runk delete test
+```
+
+## Using `runk` from `containerd`
+
+`runk` can run containers with the containerd runtime handler support on `containerd`.
+
+### Prerequisites for `runk` with containerd
+
+* `containerd` v1.2.4 or above
+* `cri-tools`
+
+> **Note:**
+> [`cri-tools`](https://github.com/kubernetes-sigs/cri-tools) is a set of tools for CRI
+> used for development and testing.
+
+Install `cri-tools` from source code:
+
+```bash
+$ go get github.com/kubernetes-incubator/cri-tools
+$ pushd $GOPATH/src/github.com/kubernetes-incubator/cri-tools
+$ make
+$ sudo -E make install
+$ popd
+```
+
+Write the `crictl` configuration file:
+
+``` bash
+$ cat <<EOF | sudo tee /etc/crictl.yaml
+runtime-endpoint: unix:///run/containerd/containerd.sock
+EOF
+```
+
+### Configure `containerd` to use `runk`
+
+Update `/etc/containerd/config.toml`:
+
+```bash
+$ cat <<EOF | sudo tee /etc/containerd/config.toml
+version = 2
+[plugins."io.containerd.runtime.v1.linux"]
+  shim_debug = true
+[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
+  runtime_type = "io.containerd.runc.v2"
+[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runk]
+  runtime_type = "io.containerd.runc.v2"
+  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runk.options]
+    BinaryName = "/usr/local/bin/runk"
+EOF
+```
+
+Restart `containerd`:
+
+```bash
+$ sudo systemctl restart containerd
+```
+
+### Running a container with `crictl` command line
+
+You can run containers in `runk` via containerd's CRI.
+
+Pull the `busybox` image:
+
+``` bash
+$ sudo crictl pull busybox
+```
+
+Create the sandbox configuration:
+
+``` bash
+$ cat <<EOF | tee sandbox.json
+{
+    "metadata": {
+        "name": "busybox-sandbox",
+        "namespace": "default",
+        "attempt": 1,
+        "uid": "hdishd83djaidwnduwk28bcsb"
+    },
+    "log_directory": "/tmp",
+    "linux": {
+    }
+}
+EOF
+```
+
+Create the container configuration:
+
+``` bash
+$ cat <<EOF | tee container.json
+{
+    "metadata": {
+        "name": "busybox"
+    },
+    "image": {
+        "image": "docker.io/busybox"
+    },
+    "command": [
+        "sh"
+    ],
+    "envs": [
+        {
+            "key": "PATH",
+            "value": "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
+        },
+        {
+            "key": "TERM",
+            "value": "xterm"
+        }
+    ],
+    "log_path": "busybox.0.log",
+    "stdin": true,
+    "stdin_once": true,
+    "tty": true
+}
+EOF
+```
+
+With the `crictl` command line of `cri-tools`, you can specify runtime class with `-r` or `--runtime` flag.
+
+Launch a sandbox and container using the `crictl`:
+
+```bash
+# Run a container inside a sandbox
+$ sudo crictl run -r runk container.json sandbox.json
+f492eee753887ba3dfbba9022028975380739aba1269df431d097b73b23c3871
+
+# Attach to the running container
+$ sudo crictl attach --stdin --tty f492eee753887ba3dfbba9022028975380739aba1269df431d097b73b23c3871
+/ #
+```
+
--- a/src/tools/runk/libcontainer/Cargo.toml
+++ b/src/tools/runk/libcontainer/Cargo.toml
@@ -0,0 +1,23 @@
+[package]
+name = "libcontainer"
+version = "0.0.1"
+authors = ["The Kata Containers community <kata-dev@lists.katacontainers.io>"]
+description = "Library for runk container"
+license = "Apache-2.0"
+edition = "2018"
+
+[dependencies]
+rustjail = { path = "../../../agent/rustjail", features = ["standard-oci-runtime"] }
+oci = { path = "../../../libs/oci" }
+logging = { path = "../../../libs/logging" }
+derive_builder = "0.10.2"
+libc = "0.2.108"
+nix = "0.23.0"
+anyhow = "1.0.52"
+slog = "2.7.0"
+chrono = { version = "0.4.19", features = ["serde"] }
+serde = { version = "1.0.133", features = ["derive"] }
+serde_json = "1.0.74"
+
+[dev-dependencies]
+tempfile = "3.3.0"
--- a/src/tools/runk/libcontainer/src/builder.rs
+++ b/src/tools/runk/libcontainer/src/builder.rs
@@ -0,0 +1,121 @@
+// Copyright 2021-2022 Sony Group Corporation
+//
+// SPDX-License-Identifier: Apache-2.0
+//
+
+use crate::container::{get_config_path, ContainerContext};
+use anyhow::{anyhow, Result};
+use derive_builder::Builder;
+use oci::Spec;
+use std::path::{Path, PathBuf};
+
+#[derive(Default, Builder, Debug)]
+pub struct Container {
+    id: String,
+    bundle: PathBuf,
+    root: PathBuf,
+    console_socket: Option<PathBuf>,
+}
+
+impl Container {
+    pub fn create_ctx(self) -> Result<ContainerContext> {
+        let bundle_canon = self.bundle.canonicalize()?;
+        let config_path = get_config_path(&bundle_canon);
+        let mut spec = Spec::load(
+            config_path
+                .to_str()
+                .ok_or_else(|| anyhow!("invalid config path"))?,
+        )?;
+
+        if spec.root.is_some() {
+            let mut spec_root = spec
+                .root
+                .as_mut()
+                .ok_or_else(|| anyhow!("root config was not present in the spec file"))?;
+            let rootfs_path = Path::new(&spec_root.path);
+
+            // If the rootfs path in the spec file is a relative path,
+            // convert it into a canonical path to pass validation of rootfs in the agent.
+            if !&rootfs_path.is_absolute() {
+                let rootfs_name = rootfs_path
+                    .file_name()
+                    .ok_or_else(|| anyhow!("invalid rootfs name"))?;
+                spec_root.path = bundle_canon
+                    .join(rootfs_name)
+                    .to_str()
+                    .map(|s| s.to_string())
+                    .ok_or_else(|| anyhow!("failed to convert bundle path"))?;
+            }
+        }
+
+        Ok(ContainerContext {
+            id: self.id,
+            bundle: self.bundle,
+            state_root: self.root,
+            spec,
+            // TODO: liboci-cli does not support --no-pivot option for create and run command.
+            // After liboci-cli supports the option, we will change the following code.
+            // no_pivot_root: self.no_pivot,
+            no_pivot_root: false,
+            console_socket: self.console_socket,
+        })
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::container::CONFIG_FILE_NAME;
+    use oci::Spec;
+    use std::{fs::File, path::PathBuf};
+    use tempfile::tempdir;
+
+    #[derive(Debug)]
+    struct TestData {
+        id: String,
+        bundle: PathBuf,
+        root: PathBuf,
+        console_socket: Option<PathBuf>,
+        spec: Spec,
+        no_pivot_root: bool,
+    }
+
+    #[test]
+    fn test_create_ctx() {
+        let bundle_dir = tempdir().unwrap();
+        let config_file = bundle_dir.path().join(CONFIG_FILE_NAME);
+        let spec = Spec::default();
+        let file = File::create(config_file).unwrap();
+        serde_json::to_writer(&file, &spec).unwrap();
+
+        let test_data = TestData {
+            id: String::from("test"),
+            bundle: PathBuf::from(bundle_dir.into_path()),
+            root: PathBuf::from("test"),
+            console_socket: Some(PathBuf::from("test")),
+            spec: Spec::default(),
+            no_pivot_root: false,
+        };
+
+        let test_ctx = ContainerContext {
+            id: test_data.id.clone(),
+            bundle: test_data.bundle.clone(),
+            state_root: test_data.root.clone(),
+            spec: test_data.spec.clone(),
+            no_pivot_root: test_data.no_pivot_root,
+            console_socket: test_data.console_socket.clone(),
+        };
+
+        let ctx = ContainerBuilder::default()
+            .id(test_data.id.clone())
+            .bundle(test_data.bundle.clone())
+            .root(test_data.root.clone())
+            .console_socket(test_data.console_socket.clone())
+            .build()
+            .unwrap()
+            .create_ctx()
+            .unwrap();
+
+        assert_eq!(test_ctx, ctx);
+    }
+}
--- a/src/tools/runk/libcontainer/src/cgroup.rs
+++ b/src/tools/runk/libcontainer/src/cgroup.rs
@@ -0,0 +1,40 @@
+// Copyright 2021-2022 Sony Group Corporation
+//
+// SPDX-License-Identifier: Apache-2.0
+//
+
+use anyhow::{anyhow, Result};
+use rustjail::cgroups::fs::Manager as CgroupManager;
+use std::{
+    path::Path,
+    {fs, thread, time},
+};
+
+pub fn destroy_cgroup(cgroup_mg: &CgroupManager) -> Result<()> {
+    for path in cgroup_mg.paths.values() {
+        remove_cgroup_dir(Path::new(path))?;
+    }
+
+    Ok(())
+}
+
+// Try to remove the provided cgroups path five times with increasing delay between tries.
+// If after all there are not removed cgroups, an appropriate error will be returned.
+fn remove_cgroup_dir(path: &Path) -> Result<()> {
+    let mut retries = 5;
+    let mut delay = time::Duration::from_millis(10);
+    while retries != 0 {
+        if retries != 5 {
+            delay *= 2;
+            thread::sleep(delay);
+        }
+
+        if !path.exists() || fs::remove_dir(path).is_ok() {
+            return Ok(());
+        }
+
+        retries -= 1;
+    }
+
+    return Err(anyhow!("failed to remove cgroups paths: {:?}", path));
+}
--- a/src/tools/runk/libcontainer/src/container.rs
+++ b/src/tools/runk/libcontainer/src/container.rs
@@ -0,0 +1,151 @@
+// Copyright 2021-2022 Sony Group Corporation
+//
+// SPDX-License-Identifier: Apache-2.0
+//
+
+use crate::status::Status;
+use anyhow::{anyhow, Result};
+use nix::unistd::{chdir, unlink, Pid};
+use oci::Spec;
+use rustjail::{
+    container::{BaseContainer, LinuxContainer, EXEC_FIFO_FILENAME},
+    process::Process,
+    specconv::CreateOpts,
+};
+use slog::Logger;
+use std::{
+    env::current_dir,
+    path::{Path, PathBuf},
+};
+
+pub const CONFIG_FILE_NAME: &str = "config.json";
+
+#[derive(Debug, Copy, Clone, PartialEq)]
+pub enum ContainerAction {
+    Create,
+    Run,
+}
+
+#[derive(Debug, Clone, PartialEq)]
+pub struct ContainerContext {
+    pub id: String,
+    pub bundle: PathBuf,
+    pub state_root: PathBuf,
+    pub spec: Spec,
+    pub no_pivot_root: bool,
+    pub console_socket: Option<PathBuf>,
+}
+
+impl ContainerContext {
+    pub async fn launch(&self, action: ContainerAction, logger: &Logger) -> Result<Pid> {
+        Status::create_dir(&self.state_root, &self.id)?;
+
+        let current_dir = current_dir()?;
+        chdir(&self.bundle)?;
+
+        let create_opts = CreateOpts {
+            cgroup_name: "".to_string(),
+            use_systemd_cgroup: false,
+            no_pivot_root: self.no_pivot_root,
+            no_new_keyring: false,
+            spec: Some(self.spec.clone()),
+            rootless_euid: false,
+            rootless_cgroup: false,
+        };
+
+        let mut ctr = LinuxContainer::new(
+            &self.id,
+            &self
+                .state_root
+                .to_str()
+                .map(|s| s.to_string())
+                .ok_or_else(|| anyhow!("failed to convert bundle path"))?,
+            create_opts.clone(),
+            logger,
+        )?;
+
+        let process = if self.spec.process.is_some() {
+            Process::new(
+                logger,
+                self.spec
+                    .process
+                    .as_ref()
+                    .ok_or_else(|| anyhow!("process config was not present in the spec file"))?,
+                &self.id,
+                true,
+                0,
+            )?
+        } else {
+            return Err(anyhow!("no process configuration"));
+        };
+
+        if let Some(ref csocket_path) = self.console_socket {
+            ctr.set_console_socket(csocket_path)?;
+        }
+
+        match action {
+            ContainerAction::Create => {
+                ctr.start(process).await?;
+            }
+            ContainerAction::Run => {
+                ctr.run(process).await?;
+            }
+        }
+
+        let oci_state = ctr.oci_state()?;
+        let status = Status::new(
+            &self.state_root,
+            oci_state,
+            ctr.init_process_start_time,
+            ctr.created,
+            ctr.cgroup_manager
+                .ok_or_else(|| anyhow!("cgroup manager was not present"))?,
+            create_opts,
+        )?;
+
+        status.save()?;
+
+        if action == ContainerAction::Run {
+            let fifo_path = get_fifo_path(&status);
+            if fifo_path.exists() {
+                unlink(&fifo_path)?;
+            }
+        }
+
+        chdir(&current_dir)?;
+
+        Ok(Pid::from_raw(ctr.init_process_pid))
+    }
+}
+
+pub fn get_config_path<P: AsRef<Path>>(bundle: P) -> PathBuf {
+    bundle.as_ref().join(CONFIG_FILE_NAME)
+}
+
+pub fn get_fifo_path(status: &Status) -> PathBuf {
+    status.root.join(&status.id).join(EXEC_FIFO_FILENAME)
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::utils::test_utils::*;
+    use rustjail::container::EXEC_FIFO_FILENAME;
+    use std::path::PathBuf;
+
+    #[test]
+    fn test_get_config_path() {
+        let test_data = PathBuf::from(TEST_BUNDLE_PATH).join(CONFIG_FILE_NAME);
+        assert_eq!(get_config_path(TEST_BUNDLE_PATH), test_data);
+    }
+
+    #[test]
+    fn test_get_fifo_path() {
+        let test_data = PathBuf::from(TEST_BUNDLE_PATH)
+            .join(TEST_CONTAINER_ID)
+            .join(EXEC_FIFO_FILENAME);
+        let status = create_dummy_status();
+
+        assert_eq!(get_fifo_path(&status), test_data);
+    }
+}
--- a/src/tools/runk/libcontainer/src/lib.rs
+++ b/src/tools/runk/libcontainer/src/lib.rs
@@ -0,0 +1,10 @@
+// Copyright 2021-2022 Sony Group Corporation
+//
+// SPDX-License-Identifier: Apache-2.0
+//
+
+pub mod builder;
+pub mod cgroup;
+pub mod container;
+pub mod status;
+pub mod utils;
--- a/src/tools/runk/libcontainer/src/status.rs
+++ b/src/tools/runk/libcontainer/src/status.rs
@@ -0,0 +1,246 @@
+// Copyright 2021-2022 Sony Group Corporation
+//
+// SPDX-License-Identifier: Apache-2.0
+//
+
+use crate::container::get_fifo_path;
+use crate::utils::*;
+use anyhow::{anyhow, Result};
+use chrono::{DateTime, Utc};
+use libc::pid_t;
+use nix::{
+    errno::Errno,
+    sys::{signal::kill, stat::Mode},
+    unistd::Pid,
+};
+use oci::{ContainerState, State as OCIState};
+use rustjail::{cgroups::fs::Manager as CgroupManager, specconv::CreateOpts};
+use serde::{Deserialize, Serialize};
+use std::{
+    fs::{self, File, OpenOptions},
+    path::{Path, PathBuf},
+    time::SystemTime,
+};
+
+const STATUS_FILE: &str = "status.json";
+
+#[derive(Serialize, Deserialize, Debug, Clone)]
+#[serde(rename_all = "camelCase")]
+pub struct Status {
+    pub oci_version: String,
+    pub id: String,
+    pub pid: pid_t,
+    pub root: PathBuf,
+    pub bundle: PathBuf,
+    pub rootfs: String,
+    pub process_start_time: u64,
+    pub created: DateTime<Utc>,
+    pub cgroup_manager: CgroupManager,
+    pub config: CreateOpts,
+}
+
+impl Status {
+    pub fn new(
+        root: &Path,
+        oci_state: OCIState,
+        process_start_time: u64,
+        created_time: SystemTime,
+        cgroup_mg: CgroupManager,
+        config: CreateOpts,
+    ) -> Result<Self> {
+        let created = DateTime::from(created_time);
+        let rootfs = config
+            .clone()
+            .spec
+            .ok_or_else(|| anyhow!("spec config was not present"))?
+            .root
+            .as_ref()
+            .ok_or_else(|| anyhow!("root config was not present in the spec"))?
+            .path
+            .clone();
+
+        Ok(Self {
+            oci_version: oci_state.version,
+            id: oci_state.id,
+            pid: oci_state.pid,
+            root: root.to_path_buf(),
+            bundle: PathBuf::from(&oci_state.bundle),
+            rootfs,
+            process_start_time,
+            created,
+            cgroup_manager: cgroup_mg,
+            config,
+        })
+    }
+
+    pub fn save(&self) -> Result<()> {
+        let state_file_path = Self::get_file_path(&self.root, &self.id);
+
+        if !&self.root.exists() {
+            create_dir_with_mode(&self.root, Mode::S_IRWXU, true)?;
+        }
+
+        let file = OpenOptions::new()
+            .write(true)
+            .create(true)
+            .truncate(true)
+            .open(state_file_path)?;
+
+        serde_json::to_writer(&file, self)?;
+
+        Ok(())
+    }
+
+    pub fn load(state_root: &Path, id: &str) -> Result<Self> {
+        let state_file_path = Self::get_file_path(state_root, id);
+        if !state_file_path.exists() {
+            return Err(anyhow!("container \"{}\" does not exist", id));
+        }
+
+        let file = File::open(&state_file_path)?;
+        let state: Self = serde_json::from_reader(&file)?;
+
+        Ok(state)
+    }
+
+    pub fn create_dir(state_root: &Path, id: &str) -> Result<()> {
+        let state_dir_path = Self::get_dir_path(state_root, id);
+        if !state_dir_path.exists() {
+            create_dir_with_mode(state_dir_path, Mode::S_IRWXU, true)?;
+        } else {
+            return Err(anyhow!("container with id exists: \"{}\"", id));
+        }
+
+        Ok(())
+    }
+
+    pub fn remove_dir(&self) -> Result<()> {
+        let state_dir_path = Self::get_dir_path(&self.root, &self.id);
+        fs::remove_dir_all(state_dir_path)?;
+
+        Ok(())
+    }
+
+    pub fn get_dir_path(state_root: &Path, id: &str) -> PathBuf {
+        state_root.join(id)
+    }
+
+    pub fn get_file_path(state_root: &Path, id: &str) -> PathBuf {
+        state_root.join(id).join(STATUS_FILE)
+    }
+}
+
+pub fn is_process_running(pid: Pid) -> Result<bool> {
+    match kill(pid, None) {
+        Err(errno) => {
+            if errno != Errno::ESRCH {
+                return Err(anyhow!("no such process"));
+            }
+            Ok(false)
+        }
+        Ok(()) => Ok(true),
+    }
+}
+
+pub fn get_current_container_state(status: &Status) -> Result<ContainerState> {
+    let running = is_process_running(Pid::from_raw(status.pid))?;
+    let mut has_fifo = false;
+
+    if running {
+        let fifo = get_fifo_path(status);
+        if fifo.exists() {
+            has_fifo = true
+        }
+    }
+
+    if running && !has_fifo {
+        // TODO: Check paused status.
+        // runk does not support pause command currently.
+    }
+
+    if !running {
+        Ok(ContainerState::Stopped)
+    } else if has_fifo {
+        Ok(ContainerState::Created)
+    } else {
+        Ok(ContainerState::Running)
+    }
+}
+
+pub fn get_all_pid(cgm: &CgroupManager) -> Result<Vec<Pid>> {
+    let cgroup_path = cgm.paths.get("devices");
+    match cgroup_path {
+        Some(v) => {
+            let path = Path::new(v);
+            if !path.exists() {
+                return Err(anyhow!("cgroup devices file does not exist"));
+            }
+
+            let procs_path = path.join("cgroup.procs");
+            let pids: Vec<Pid> = lines_from_file(&procs_path)?
+                .into_iter()
+                .map(|v| {
+                    Pid::from_raw(
+                        v.parse::<pid_t>()
+                            .expect("failed to parse string into pid_t"),
+                    )
+                })
+                .collect();
+            Ok(pids)
+        }
+        None => Err(anyhow!("cgroup devices file dose not exist")),
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::utils::test_utils::*;
+    use chrono::{DateTime, Utc};
+    use nix::unistd::getpid;
+    use oci::ContainerState;
+    use rustjail::cgroups::fs::Manager as CgroupManager;
+    use std::path::Path;
+    use std::time::SystemTime;
+
+    #[test]
+    fn test_status() {
+        let cgm: CgroupManager = serde_json::from_str(TEST_CGM_DATA).unwrap();
+        let oci_state = create_dummy_oci_state();
+        let created = SystemTime::now();
+        let status = Status::new(
+            Path::new(TEST_BUNDLE_PATH),
+            oci_state.clone(),
+            1,
+            created,
+            cgm,
+            create_dummy_opts(),
+        )
+        .unwrap();
+
+        assert_eq!(status.id, oci_state.id);
+        assert_eq!(status.pid, oci_state.pid);
+        assert_eq!(status.process_start_time, 1);
+        assert_eq!(status.created, DateTime::<Utc>::from(created));
+    }
+
+    #[test]
+    fn test_is_process_running() {
+        let pid = getpid();
+        let ret = is_process_running(pid).unwrap();
+        assert!(ret);
+    }
+
+    #[test]
+    fn test_get_current_container_state() {
+        let status = create_dummy_status();
+        let state = get_current_container_state(&status).unwrap();
+        assert_eq!(state, ContainerState::Running);
+    }
+
+    #[test]
+    fn test_get_all_pid() {
+        let cgm: CgroupManager = serde_json::from_str(TEST_CGM_DATA).unwrap();
+        assert!(get_all_pid(&cgm).is_ok());
+    }
+}
--- a/src/tools/runk/libcontainer/src/utils.rs
+++ b/src/tools/runk/libcontainer/src/utils.rs
@@ -0,0 +1,106 @@
+// Copyright 2021-2022 Sony Group Corporation
+//
+// SPDX-License-Identifier: Apache-2.0
+//
+
+use anyhow::{anyhow, Result};
+use nix::sys::stat::Mode;
+use std::{
+    fs::{DirBuilder, File},
+    io::{prelude::*, BufReader},
+    os::unix::fs::DirBuilderExt,
+    path::Path,
+};
+
+pub fn lines_from_file<P: AsRef<Path>>(path: P) -> Result<Vec<String>> {
+    let file = File::open(&path)?;
+    let buf = BufReader::new(file);
+    Ok(buf
+        .lines()
+        .map(|v| v.expect("could not parse line"))
+        .collect())
+}
+
+pub fn create_dir_with_mode<P: AsRef<Path>>(path: P, mode: Mode, recursive: bool) -> Result<()> {
+    let path = path.as_ref();
+    if path.exists() {
+        return Err(anyhow!("{} already exists", path.display()));
+    }
+
+    Ok(DirBuilder::new()
+        .recursive(recursive)
+        .mode(mode.bits())
+        .create(path)?)
+}
+
+#[cfg(test)]
+pub(crate) mod test_utils {
+    use crate::status::Status;
+    use nix::unistd::getpid;
+    use oci::State as OCIState;
+    use oci::{ContainerState, Root, Spec};
+    use rustjail::cgroups::fs::Manager as CgroupManager;
+    use rustjail::specconv::CreateOpts;
+    use std::path::Path;
+    use std::time::SystemTime;
+
+    pub const TEST_CONTAINER_ID: &str = "test";
+    pub const TEST_BUNDLE_PATH: &str = "/test";
+    pub const TEST_ANNOTATION: &str = "test";
+    pub const TEST_CGM_DATA: &str = r#"{
+        "paths": {
+            "devices": "/sys/fs/cgroup/devices"
+        },
+        "mounts": {
+            "devices": "/sys/fs/cgroup/devices"
+        },
+        "cpath": "test"
+    }"#;
+
+    pub fn create_dummy_opts() -> CreateOpts {
+        let spec = Spec {
+            root: Some(Root::default()),
+            ..Default::default()
+        };
+        CreateOpts {
+            cgroup_name: "".to_string(),
+            use_systemd_cgroup: false,
+            no_pivot_root: false,
+            no_new_keyring: false,
+            spec: Some(spec),
+            rootless_euid: false,
+            rootless_cgroup: false,
+        }
+    }
+
+    pub fn create_dummy_oci_state() -> OCIState {
+        OCIState {
+            version: "1.0.0".to_string(),
+            id: TEST_CONTAINER_ID.to_string(),
+            status: ContainerState::Running,
+            pid: getpid().as_raw(),
+            bundle: TEST_BUNDLE_PATH.to_string(),
+            annotations: [(TEST_ANNOTATION.to_string(), TEST_ANNOTATION.to_string())]
+                .iter()
+                .cloned()
+                .collect(),
+        }
+    }
+
+    pub fn create_dummy_status() -> Status {
+        let cgm: CgroupManager = serde_json::from_str(TEST_CGM_DATA).unwrap();
+        let oci_state = create_dummy_oci_state();
+        let created = SystemTime::now();
+        let status = Status::new(
+            Path::new(TEST_BUNDLE_PATH),
+            oci_state.clone(),
+            1,
+            created,
+            cgm,
+            create_dummy_opts(),
+        )
+        .unwrap();
+
+        status
+    }
+}
--- a/src/tools/runk/src/commands/create.rs
+++ b/src/tools/runk/src/commands/create.rs
@@ -0,0 +1,37 @@
+// Copyright 2021-2022 Sony Group Corporation
+//
+// SPDX-License-Identifier: Apache-2.0
+//
+
+use anyhow::Result;
+use libcontainer::{builder::ContainerBuilder, container::ContainerAction};
+use liboci_cli::Create;
+use nix::unistd::Pid;
+use slog::{info, Logger};
+use std::{fs, path::Path};
+
+pub async fn run(opts: Create, root: &Path, logger: &Logger) -> Result<()> {
+    let ctx = ContainerBuilder::default()
+        .id(opts.container_id)
+        .bundle(opts.bundle)
+        .root(root.to_path_buf())
+        .console_socket(opts.console_socket)
+        .build()?
+        .create_ctx()?;
+
+    let pid = ctx.launch(ContainerAction::Create, logger).await?;
+
+    if let Some(ref pid_file) = opts.pid_file {
+        create_pid_file(pid_file, pid)?;
+    }
+
+    info!(&logger, "create command finished successfully");
+
+    Ok(())
+}
+
+fn create_pid_file<P: AsRef<Path>>(pid_file: P, pid: Pid) -> Result<()> {
+    fs::write(pid_file.as_ref(), format!("{}", pid))?;
+
+    Ok(())
+}
--- a/src/tools/runk/src/commands/delete.rs
+++ b/src/tools/runk/src/commands/delete.rs
@@ -0,0 +1,103 @@
+// Copyright 2021-2022 Sony Group Corporation
+//
+// SPDX-License-Identifier: Apache-2.0
+//
+
+use anyhow::{anyhow, Result};
+use libcontainer::{
+    cgroup,
+    status::{get_current_container_state, Status},
+};
+use liboci_cli::Delete;
+use nix::{
+    errno::Errno,
+    sys::signal::{kill, Signal},
+    unistd::Pid,
+};
+use oci::{ContainerState, State as OCIState};
+use rustjail::container;
+use slog::{info, Logger};
+use std::{fs, path::Path};
+
+pub async fn run(opts: Delete, root: &Path, logger: &Logger) -> Result<()> {
+    let container_id = &opts.container_id;
+    let status_dir = Status::get_dir_path(root, container_id);
+    if !status_dir.exists() {
+        return Err(anyhow!("container {} does not exist", container_id));
+    }
+
+    let status = if let Ok(value) = Status::load(root, container_id) {
+        value
+    } else {
+        fs::remove_dir_all(status_dir)?;
+        return Ok(());
+    };
+
+    let spec = status
+        .config
+        .spec
+        .as_ref()
+        .ok_or_else(|| anyhow!("spec config was not present in the status"))?;
+
+    let oci_state = OCIState {
+        version: status.oci_version.clone(),
+        id: status.id.clone(),
+        status: get_current_container_state(&status)?,
+        pid: status.pid,
+        bundle: status
+            .bundle
+            .to_str()
+            .ok_or_else(|| anyhow!("invalid bundle path"))?
+            .to_string(),
+        annotations: spec.annotations.clone(),
+    };
+
+    if spec.hooks.is_some() {
+        let hooks = spec
+            .hooks
+            .as_ref()
+            .ok_or_else(|| anyhow!("hooks config was not present"))?;
+        for h in hooks.poststop.iter() {
+            container::execute_hook(logger, h, &oci_state).await?;
+        }
+    }
+
+    match oci_state.status {
+        ContainerState::Stopped => {
+            destroy_container(&status)?;
+        }
+        ContainerState::Created => {
+            kill(Pid::from_raw(status.pid), Some(Signal::SIGKILL))?;
+            destroy_container(&status)?;
+        }
+        _ => {
+            if opts.force {
+                match kill(Pid::from_raw(status.pid), Some(Signal::SIGKILL)) {
+                    Err(errno) => {
+                        if errno != Errno::ESRCH {
+                            return Err(anyhow!("{}", errno));
+                        }
+                    }
+                    Ok(()) => {}
+                }
+                destroy_container(&status)?;
+            } else {
+                return Err(anyhow!(
+                    "cannot delete container {} that is not stopped",
+                    container_id
+                ));
+            }
+        }
+    }
+
+    info!(&logger, "delete command finished successfully");
+
+    Ok(())
+}
+
+fn destroy_container(status: &Status) -> Result<()> {
+    cgroup::destroy_cgroup(&status.cgroup_manager)?;
+    status.remove_dir()?;
+
+    Ok(())
+}
--- a/src/tools/runk/src/commands/kill.rs
+++ b/src/tools/runk/src/commands/kill.rs
@@ -0,0 +1,82 @@
+// Copyright 2021-2022 Sony Group Corporation
+//
+// SPDX-License-Identifier: Apache-2.0
+//
+
+use anyhow::{anyhow, Result};
+use libcontainer::status::{self, get_current_container_state, Status};
+use liboci_cli::Kill;
+use nix::{
+    sys::signal::{kill, Signal},
+    unistd::Pid,
+};
+use oci::ContainerState;
+use slog::{info, Logger};
+use std::{convert::TryFrom, path::Path, str::FromStr};
+
+pub fn run(opts: Kill, state_root: &Path, logger: &Logger) -> Result<()> {
+    let container_id = &opts.container_id;
+    let status = Status::load(state_root, container_id)?;
+    let current_state = get_current_container_state(&status)?;
+    let sig = parse_signal(&opts.signal)?;
+
+    // TODO: liboci-cli does not support --all option for kill command.
+    // After liboci-cli supports the option, we will change the following code.
+    let all = false;
+    if all {
+        let pids = status::get_all_pid(&status.cgroup_manager)?;
+        for pid in pids {
+            if !status::is_process_running(pid)? {
+                continue;
+            }
+            kill(pid, sig)?;
+        }
+    } else {
+        if current_state == ContainerState::Stopped {
+            return Err(anyhow!("container {} not running", container_id));
+        }
+
+        let p = Pid::from_raw(status.pid);
+        if status::is_process_running(p)? {
+            kill(p, sig)?;
+        }
+    }
+
+    info!(&logger, "kill command finished successfully");
+
+    Ok(())
+}
+
+fn parse_signal(signal: &str) -> Result<Signal> {
+    if let Ok(num) = signal.parse::<i32>() {
+        return Ok(Signal::try_from(num)?);
+    }
+
+    let mut signal_upper = signal.to_uppercase();
+    if !signal_upper.starts_with("SIG") {
+        signal_upper = "SIG".to_string() + &signal_upper;
+    }
+
+    Ok(Signal::from_str(&signal_upper)?)
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use nix::sys::signal::Signal;
+
+    #[test]
+    fn test_parse_signal() {
+        assert_eq!(Signal::SIGHUP, parse_signal("1").unwrap());
+        assert_eq!(Signal::SIGHUP, parse_signal("sighup").unwrap());
+        assert_eq!(Signal::SIGHUP, parse_signal("hup").unwrap());
+        assert_eq!(Signal::SIGHUP, parse_signal("SIGHUP").unwrap());
+        assert_eq!(Signal::SIGHUP, parse_signal("HUP").unwrap());
+
+        assert_eq!(Signal::SIGKILL, parse_signal("9").unwrap());
+        assert_eq!(Signal::SIGKILL, parse_signal("sigkill").unwrap());
+        assert_eq!(Signal::SIGKILL, parse_signal("kill").unwrap());
+        assert_eq!(Signal::SIGKILL, parse_signal("SIGKILL").unwrap());
+        assert_eq!(Signal::SIGKILL, parse_signal("KILL").unwrap());
+    }
+}
--- a/src/tools/runk/src/commands/mod.rs
+++ b/src/tools/runk/src/commands/mod.rs
@@ -0,0 +1,12 @@
+// Copyright 2021-2022 Sony Group Corporation
+//
+// SPDX-License-Identifier: Apache-2.0
+//
+
+pub mod create;
+pub mod delete;
+pub mod kill;
+pub mod run;
+pub mod spec;
+pub mod start;
+pub mod state;
--- a/src/tools/runk/src/commands/run.rs
+++ b/src/tools/runk/src/commands/run.rs
@@ -0,0 +1,26 @@
+// Copyright 2021-2022 Sony Group Corporation
+//
+// SPDX-License-Identifier: Apache-2.0
+//
+
+use anyhow::Result;
+use libcontainer::{builder::ContainerBuilder, container::ContainerAction};
+use liboci_cli::Run;
+use slog::{info, Logger};
+use std::path::Path;
+
+pub async fn run(opts: Run, root: &Path, logger: &Logger) -> Result<()> {
+    let ctx = ContainerBuilder::default()
+        .id(opts.container_id)
+        .bundle(opts.bundle)
+        .root(root.to_path_buf())
+        .console_socket(opts.console_socket)
+        .build()?
+        .create_ctx()?;
+
+    ctx.launch(ContainerAction::Run, logger).await?;
+
+    info!(&logger, "run command finished successfully");
+
+    Ok(())
+}
--- a/src/tools/runk/src/commands/spec.rs
+++ b/src/tools/runk/src/commands/spec.rs
@@ -0,0 +1,207 @@
+// Copyright 2021-2022 Sony Group Corporation
+//
+// SPDX-License-Identifier: Apache-2.0
+//
+
+//use crate::container::get_config_path;
+use anyhow::Result;
+use libcontainer::container::CONFIG_FILE_NAME;
+use liboci_cli::Spec;
+use slog::{info, Logger};
+use std::{fs::File, io::Write, path::Path};
+
+pub const DEFAULT_SPEC: &str = r#"{
+	"ociVersion": "1.0.2-dev",
+	"process": {
+		"terminal": true,
+		"user": {
+			"uid": 0,
+			"gid": 0
+		},
+		"args": [
+			"sh"
+		],
+		"env": [
+			"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
+			"TERM=xterm"
+		],
+		"cwd": "/",
+		"capabilities": {
+			"bounding": [
+				"CAP_AUDIT_WRITE",
+				"CAP_KILL",
+				"CAP_NET_BIND_SERVICE"
+			],
+			"effective": [
+				"CAP_AUDIT_WRITE",
+				"CAP_KILL",
+				"CAP_NET_BIND_SERVICE"
+			],
+			"inheritable": [
+				"CAP_AUDIT_WRITE",
+				"CAP_KILL",
+				"CAP_NET_BIND_SERVICE"
+			],
+			"permitted": [
+				"CAP_AUDIT_WRITE",
+				"CAP_KILL",
+				"CAP_NET_BIND_SERVICE"
+			],
+			"ambient": [
+				"CAP_AUDIT_WRITE",
+				"CAP_KILL",
+				"CAP_NET_BIND_SERVICE"
+			]
+		},
+		"rlimits": [
+			{
+				"type": "RLIMIT_NOFILE",
+				"hard": 1024,
+				"soft": 1024
+			}
+		],
+		"noNewPrivileges": true
+	},
+	"root": {
+		"path": "rootfs",
+		"readonly": true
+	},
+	"hostname": "runk",
+	"mounts": [
+		{
+			"destination": "/proc",
+			"type": "proc",
+			"source": "proc"
+		},
+		{
+			"destination": "/dev",
+			"type": "tmpfs",
+			"source": "tmpfs",
+			"options": [
+				"nosuid",
+				"strictatime",
+				"mode=755",
+				"size=65536k"
+			]
+		},
+		{
+			"destination": "/dev/pts",
+			"type": "devpts",
+			"source": "devpts",
+			"options": [
+				"nosuid",
+				"noexec",
+				"newinstance",
+				"ptmxmode=0666",
+				"mode=0620",
+				"gid=5"
+			]
+		},
+		{
+			"destination": "/dev/shm",
+			"type": "tmpfs",
+			"source": "shm",
+			"options": [
+				"nosuid",
+				"noexec",
+				"nodev",
+				"mode=1777",
+				"size=65536k"
+			]
+		},
+		{
+			"destination": "/dev/mqueue",
+			"type": "mqueue",
+			"source": "mqueue",
+			"options": [
+				"nosuid",
+				"noexec",
+				"nodev"
+			]
+		},
+		{
+			"destination": "/sys",
+			"type": "sysfs",
+			"source": "sysfs",
+			"options": [
+				"nosuid",
+				"noexec",
+				"nodev",
+				"ro"
+			]
+		},
+		{
+			"destination": "/sys/fs/cgroup",
+			"type": "cgroup",
+			"source": "cgroup",
+			"options": [
+				"nosuid",
+				"noexec",
+				"nodev",
+				"relatime",
+				"ro"
+			]
+		}
+	],
+	"linux": {
+		"resources": {
+			"devices": [
+				{
+					"allow": false,
+					"access": "rwm"
+				}
+			]
+		},
+		"namespaces": [
+			{
+				"type": "pid"
+			},
+			{
+				"type": "network"
+			},
+			{
+				"type": "ipc"
+			},
+			{
+				"type": "uts"
+			},
+			{
+				"type": "mount"
+			}
+		],
+		"maskedPaths": [
+			"/proc/acpi",
+			"/proc/asound",
+			"/proc/kcore",
+			"/proc/keys",
+			"/proc/latency_stats",
+			"/proc/timer_list",
+			"/proc/timer_stats",
+			"/proc/sched_debug",
+			"/sys/firmware",
+			"/proc/scsi"
+		],
+		"readonlyPaths": [
+			"/proc/bus",
+			"/proc/fs",
+			"/proc/irq",
+			"/proc/sys",
+			"/proc/sysrq-trigger"
+		]
+	}
+}"#;
+
+pub fn run(_opts: Spec, logger: &Logger) -> Result<()> {
+    // TODO: liboci-cli does not support --bundle option for spec command.
+    // After liboci-cli supports the option, we will　change the following code.
+    // let config_path = get_config_path(&opts.bundle);
+    let config_path = Path::new(".").join(CONFIG_FILE_NAME);
+    let config_data = DEFAULT_SPEC;
+
+    let mut file = File::create(config_path)?;
+    file.write_all(config_data.as_bytes())?;
+
+    info!(&logger, "spec command finished successfully");
+
+    Ok(())
+}
--- a/src/tools/runk/src/commands/start.rs
+++ b/src/tools/runk/src/commands/start.rs
@@ -0,0 +1,48 @@
+// Copyright 2021-2022 Sony Group Corporation
+//
+// SPDX-License-Identifier: Apache-2.0
+//
+
+use crate::commands::state::get_container_state_name;
+use anyhow::{anyhow, Result};
+use libcontainer::{
+    container::get_fifo_path,
+    status::{get_current_container_state, Status},
+};
+use liboci_cli::Start;
+use nix::unistd::unlink;
+use oci::ContainerState;
+use slog::{info, Logger};
+use std::{fs::OpenOptions, io::prelude::*, path::Path, time::SystemTime};
+
+pub fn run(opts: Start, state_root: &Path, logger: &Logger) -> Result<()> {
+    let mut status = Status::load(state_root, &opts.container_id)?;
+    let state = get_current_container_state(&status)?;
+    if state != ContainerState::Created {
+        return Err(anyhow!(
+            "cannot start a container in the {} state",
+            get_container_state_name(state)
+        ));
+    };
+
+    let fifo_path = get_fifo_path(&status);
+    let mut file = OpenOptions::new().write(true).open(&fifo_path)?;
+
+    file.write_all("0".as_bytes())?;
+
+    info!(&logger, "container started");
+
+    status.process_start_time = SystemTime::now()
+        .duration_since(SystemTime::UNIX_EPOCH)?
+        .as_secs();
+
+    status.save()?;
+
+    if fifo_path.exists() {
+        unlink(&fifo_path)?;
+    }
+
+    info!(&logger, "start command finished successfully");
+
+    Ok(())
+}
--- a/src/tools/runk/src/commands/state.rs
+++ b/src/tools/runk/src/commands/state.rs
@@ -0,0 +1,79 @@
+// Copyright 2021-2022 Sony Group Corporation
+//
+// SPDX-License-Identifier: Apache-2.0
+//
+
+use anyhow::Result;
+use chrono::{DateTime, Utc};
+use libcontainer::status::{get_current_container_state, Status};
+use liboci_cli::State;
+use oci::ContainerState;
+use serde::{Deserialize, Serialize};
+use slog::{info, Logger};
+use std::path::{Path, PathBuf};
+
+#[derive(Serialize, Deserialize, Debug)]
+#[serde(rename_all = "camelCase")]
+pub struct RuntimeState {
+    pub oci_version: String,
+    pub id: String,
+    pub pid: i32,
+    pub status: String,
+    pub bundle: PathBuf,
+    pub created: DateTime<Utc>,
+}
+
+impl RuntimeState {
+    pub fn new(status: Status, state: ContainerState) -> Self {
+        Self {
+            oci_version: status.oci_version,
+            id: status.id,
+            pid: status.pid,
+            status: get_container_state_name(state),
+            bundle: status.bundle,
+            created: status.created,
+        }
+    }
+}
+
+pub fn run(opts: State, state_root: &Path, logger: &Logger) -> Result<()> {
+    let status = Status::load(state_root, &opts.container_id)?;
+    let state = get_current_container_state(&status)?;
+    let oci_state = RuntimeState::new(status, state);
+    let json_state = &serde_json::to_string_pretty(&oci_state)?;
+
+    println!("{}", json_state);
+
+    info!(&logger, "state command finished successfully");
+
+    Ok(())
+}
+
+pub fn get_container_state_name(state: ContainerState) -> String {
+    match state {
+        ContainerState::Creating => "creating",
+        ContainerState::Created => "created",
+        ContainerState::Running => "running",
+        ContainerState::Stopped => "stopped",
+        ContainerState::Paused => "paused",
+    }
+    .into()
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use oci::ContainerState;
+
+    #[test]
+    fn test_get_container_state_name() {
+        assert_eq!(
+            "creating",
+            get_container_state_name(ContainerState::Creating)
+        );
+        assert_eq!("created", get_container_state_name(ContainerState::Created));
+        assert_eq!("running", get_container_state_name(ContainerState::Running));
+        assert_eq!("stopped", get_container_state_name(ContainerState::Stopped));
+        assert_eq!("paused", get_container_state_name(ContainerState::Paused));
+    }
+}
--- a/src/tools/runk/src/main.rs
+++ b/src/tools/runk/src/main.rs
@@ -0,0 +1,111 @@
+// Copyright 2021-2022 Sony Group Corporation
+//
+// SPDX-License-Identifier: Apache-2.0
+//
+
+use anyhow::{anyhow, Result};
+use clap::{crate_description, crate_name, Parser};
+use liboci_cli::{CommonCmd, GlobalOpts, StandardCmd};
+use slog::{o, Logger};
+use slog_async::AsyncGuard;
+use std::{
+    fs::OpenOptions,
+    path::{Path, PathBuf},
+    process::exit,
+};
+
+const DEFAULT_ROOT_DIR: &str = "/run/runk";
+const DEFAULT_LOG_LEVEL: slog::Level = slog::Level::Info;
+
+mod commands;
+
+#[derive(Parser, Debug)]
+enum SubCommand {
+    #[clap(flatten)]
+    Standard(StandardCmd),
+    #[clap(flatten)]
+    Common(CommonCmd),
+}
+
+#[derive(Parser, Debug)]
+#[clap(version, author, about = crate_description!())]
+struct Cli {
+    #[clap(flatten)]
+    global: GlobalOpts,
+    #[clap(subcommand)]
+    subcmd: SubCommand,
+}
+
+async fn cmd_run(subcmd: SubCommand, root_path: &Path, logger: &Logger) -> Result<()> {
+    match subcmd {
+        SubCommand::Standard(cmd) => match cmd {
+            StandardCmd::Create(create) => commands::create::run(create, root_path, logger).await,
+            StandardCmd::Start(start) => commands::start::run(start, root_path, logger),
+            StandardCmd::Kill(kill) => commands::kill::run(kill, root_path, logger),
+            StandardCmd::Delete(delete) => commands::delete::run(delete, root_path, logger).await,
+            StandardCmd::State(state) => commands::state::run(state, root_path, logger),
+        },
+        SubCommand::Common(cmd) => match cmd {
+            CommonCmd::Run(run) => commands::run::run(run, root_path, logger).await,
+            CommonCmd::Spec(spec) => commands::spec::run(spec, logger),
+            _ => {
+                return Err(anyhow!("command is not implemented yet"));
+            }
+        },
+    }
+}
+
+fn setup_logger(
+    log_file: Option<PathBuf>,
+    log_level: slog::Level,
+) -> Result<(Logger, Option<AsyncGuard>)> {
+    if let Some(ref file) = log_file {
+        let log_writer = OpenOptions::new()
+            .write(true)
+            .read(true)
+            .create(true)
+            .truncate(true)
+            .open(&file)?;
+
+        // TODO: Support 'text' log format.
+        let (logger_local, logger_async_guard_local) =
+            logging::create_logger(crate_name!(), crate_name!(), log_level, log_writer);
+
+        Ok((logger_local, Some(logger_async_guard_local)))
+    } else {
+        let logger = slog::Logger::root(slog::Discard, o!());
+        Ok((logger, None))
+    }
+}
+
+async fn real_main() -> Result<()> {
+    let cli = Cli::parse();
+
+    let root_path = if let Some(path) = cli.global.root {
+        path
+    } else {
+        PathBuf::from(DEFAULT_ROOT_DIR)
+    };
+
+    let log_level = if cli.global.debug {
+        slog::Level::Debug
+    } else {
+        DEFAULT_LOG_LEVEL
+    };
+
+    let (logger, _async_guard) = setup_logger(cli.global.log, log_level)?;
+
+    cmd_run(cli.subcmd, &root_path, &logger).await?;
+
+    Ok(())
+}
+
+#[tokio::main]
+async fn main() {
+    if let Err(e) = real_main().await {
+        eprintln!("ERROR: {}", e);
+        exit(1);
+    }
+
+    exit(0);
+}
--- a/tools/packaging/kata-deploy/README.md
+++ b/tools/packaging/kata-deploy/README.md
@@ -11,7 +11,41 @@ be utilized to install Kata Containers on a running Kubernetes cluster.

 ### Install Kata on a running Kubernetes cluster

-#### Installing the latest image
+#### k3s cluster
+
+For your [k3s](https://k3s.io/) cluster, run:
+
+```sh
+$ git clone github.com/kata-containers/kata-containers
+```
+
+Check and switch to the stable branch of your choice, if wanted, and then run:
+
+```bash
+$ cd kata-containers/kata-containers/tools/packaging/kata-deploy
+$ kubectl apply -f kata-rbac/base/kata-rbac.yaml
+$ kubectl apply -k kata-deploy/overlays/k3s
+```
+
+#### RKE2 cluster
+
+For your [RKE2](https://docs.rke2.io/) cluster, run:
+
+```sh
+$ git clone github.com/kata-containers/kata-containers
+```
+
+Check and switch to the stable branch of your choice, if wanted, and then run:
+
+```bash
+$ cd kata-containers/kata-containers/tools/packaging/kata-deploy
+$ kubectl apply -f kata-rbac/base/kata-rbac.yaml
+$ kubectl apply -k kata-deploy/overlays/rke2
+```
+
+#### Vanilla Kubernetes cluster
+
+##### Installing the latest image

 The latest image refers to pre-release and release candidate content.  For stable releases, please, use the "stable" instructions.

@@ -20,7 +54,7 @@ $ kubectl apply -f https://raw.githubusercontent.com/kata-containers/kata-contai
 $ kubectl apply -f https://raw.githubusercontent.com/kata-containers/kata-containers/main/tools/packaging/kata-deploy/kata-deploy/base/kata-deploy.yaml
 ```

-#### Installing the stable image
+##### Installing the stable image

 The stable image refers to the last stable releases content.

@@ -32,20 +66,9 @@ $ kubectl apply -f https://raw.githubusercontent.com/kata-containers/kata-contai
 $ kubectl apply -f https://raw.githubusercontent.com/kata-containers/kata-containers/main/tools/packaging/kata-deploy/kata-deploy/base/kata-deploy-stable.yaml
 ```

-#### For your [k3s](https://k3s.io/) cluster, do:
-
-```sh
-$ GO111MODULE=auto go get github.com/kata-containers/kata-containers
-```
-
-```bash
-$ cd $GOPATH/src/github.com/kata-containers/kata-containers/tools/packaging/kata-deploy
-$ kubectl apply -k kata-deploy/overlays/k3s
-```
-
 #### Ensure kata-deploy is ready
 ```bash
-kubectl -n kube-system wait --timeout=10m --for=condition=Ready -l name=kata-deploy pod
+$ kubectl -n kube-system wait --timeout=10m --for=condition=Ready -l name=kata-deploy pod
 ```

 ### Run a sample workload
--- a/tools/packaging/kata-deploy/kata-cleanup/overlays/rke2/kustomization.yaml
+++ b/tools/packaging/kata-deploy/kata-cleanup/overlays/rke2/kustomization.yaml
@@ -0,0 +1,5 @@
+bases:
+- ../../base
+
+patchesStrategicMerge:
+- mount_rke2_conf.yaml
--- a/tools/packaging/kata-deploy/kata-cleanup/overlays/rke2/mount_rke2_conf.yaml
+++ b/tools/packaging/kata-deploy/kata-cleanup/overlays/rke2/mount_rke2_conf.yaml
@@ -0,0 +1,17 @@
+apiVersion: apps/v1
+kind: DaemonSet
+metadata:
+  name: kubelet-kata-cleanup
+  namespace: kube-system
+spec:
+  template:
+    spec:
+      containers:
+      - name: kube-kata-cleanup
+        volumeMounts:
+        - name: containerd-conf
+          mountPath: /etc/containerd/
+      volumes:
+        - name: containerd-conf
+          hostPath:
+            path: /var/lib/rancher/rke2/agent/etc/containerd/
--- a/tools/packaging/kata-deploy/kata-deploy/overlays/rke2/kustomization.yaml
+++ b/tools/packaging/kata-deploy/kata-deploy/overlays/rke2/kustomization.yaml
@@ -0,0 +1,5 @@
+bases:
+- ../../base
+
+patchesStrategicMerge:
+- mount_rke2_conf.yaml
--- a/tools/packaging/kata-deploy/kata-deploy/overlays/rke2/mount_rke2_conf.yaml
+++ b/tools/packaging/kata-deploy/kata-deploy/overlays/rke2/mount_rke2_conf.yaml
@@ -0,0 +1,12 @@
+apiVersion: apps/v1
+kind: DaemonSet
+metadata:
+  name: kata-deploy
+  namespace: kube-system
+spec:
+  template:
+    spec:
+      volumes:
+        - name: containerd-conf
+          hostPath:
+            path: /var/lib/rancher/rke2/agent/etc/containerd/
--- a/tools/packaging/kata-deploy/scripts/kata-deploy.sh
+++ b/tools/packaging/kata-deploy/scripts/kata-deploy.sh
@@ -40,7 +40,11 @@ function get_container_runtime() {
                die "invalid node name"
 	fi
 	if echo "$runtime" | grep -qE 'containerd.*-k3s'; then
-		if systemctl is-active --quiet k3s-agent; then
+		if systemctl is-active --quiet rke2-agent; then
+			echo "rke2-agent"
+		elif systemctl is-active --quiet rke2-server; then
+			echo "rke2-server"
+		elif systemctl is-active --quiet k3s-agent; then
 			echo "k3s-agent"
 		else
 			echo "k3s"
@@ -63,7 +67,7 @@ function configure_cri_runtime() {
 	crio)
 		configure_crio
 		;;
-	containerd | k3s | k3s-agent)
+	containerd | k3s | k3s-agent | rke2-agent | rke2-server)
 		configure_containerd
 		;;
 	esac
@@ -241,7 +245,7 @@ function cleanup_cri_runtime() {
 	crio)
 		cleanup_crio
 		;;
-	containerd | k3s | k3s-agent)
+	containerd | k3s | k3s-agent | rke2-agent | rke2-server)
 		cleanup_containerd
 		;;
 	esac
@@ -280,7 +284,7 @@ function main() {
 	# CRI-O isn't consistent with the naming -- let's use crio to match the service file
 	if [ "$runtime" == "cri-o" ]; then
 		runtime="crio"
-	elif [ "$runtime" == "k3s" ] || [ "$runtime" == "k3s-agent" ]; then
+	elif [ "$runtime" == "k3s" ] || [ "$runtime" == "k3s-agent" ] || [ "$runtime" == "rke2-agent" ] || [ "$runtime" == "rke2-server" ]; then
 		containerd_conf_tmpl_file="${containerd_conf_file}.tmpl"
 		if [ ! -f "$containerd_conf_tmpl_file" ]; then
 			cp "$containerd_conf_file" "$containerd_conf_tmpl_file"
@@ -303,11 +307,10 @@ function main() {
 	fi

 	# only install / remove / update if we are dealing with CRIO or containerd
-	if [[ "$runtime" =~ ^(crio|containerd|k3s|k3s-agent)$ ]]; then
+	if [[ "$runtime" =~ ^(crio|containerd|k3s|k3s-agent|rke2-agent|rke2-server)$ ]]; then

 		case "$action" in
 		install)
-
 			install_artifacts
 			configure_cri_runtime "$runtime"
 			configure_kata
--- a/tools/packaging/qemu/patches/tag_patches/tdx-qemu-2021.11.29-v6.0.0-rc1-mvp/no_patches.txt
+++ b/tools/packaging/qemu/patches/tag_patches/tdx-qemu-2021.11.29-v6.0.0-rc1-mvp/no_patches.txt
--- a/versions.yaml
+++ b/versions.yaml
@@ -100,8 +100,8 @@ assets:
        .*/v?(\d\S+)\.tar\.gz
      tdx:
        description: "VMM that uses KVM and supports TDX"
-        url: "https://github.com/intel/qemu-tdx"
-        tag: "tdx-qemu-2021.11.29-v6.0.0-rc1-mvp"
+        url: "https://github.com/intel/qemu-dcp"
+        tag: "SPR-BKC-QEMU-v2.2"

    qemu-experimental:
      description: "QEMU with virtiofs support"