Pulling image is the most time-consuming step in the container lifecycle. This PR introduse nydus to kata container, it can lazily pull image when container start. So it can speed up kata container create and start. Fixes #2724 Signed-off-by: luodaowen.backend <luodaowen.backend@bytedance.com>
5.8 KiB
Background
Research shows that time to take for pull operation accounts for 76% of container startup time but only 6.4% of that data is read. So if we can get data on demand (lazy load), it will speed up the container start. Nydus is a project which build image with new format and can get data on demand when container start.
The following benchmarking result shows the performance improvement compared with the OCI image for the container cold startup elapsed time on containerd. As the OCI image size increases, the container startup time of using nydus image remains very short. Click here to see nydus design.
Proposal - Bring lazyload ability to Kata Containers
Nydusd is a fuse/virtiofs daemon which is provided by nydus project and it supports PassthroughFS and RAFS (Registry Acceleration File System) natively, so in Kata Containers, we can use nydusd in place of virtiofsd and mount nydus image to guest in the meanwhile.
The process of creating/starting Kata Containers with virtiofsd,
- When creating sandbox, the Kata Containers Containerd v2 shim will launch
virtiofsdbefore VM starts and share directories with VM. - When creating container, the Kata Containers Containerd v2 shim will mount rootfs to
kataShared(/run/kata-containers/shared/sandboxes/<SANDBOX>/mounts/<CONTAINER>/rootfs), so it can be seen at the path/run/kata-containers/shared/containers/shared/\<CONTAINER\>/rootfsin the guest and used as container's rootfs.
The process of creating/starting Kata Containers with nydusd,
- When creating sandbox, the Kata Containers Containerd v2 shim will launch
nydusddaemon before VM starts. After VM starts,kata-agentwill mountvirtiofsat the path/run/kata-containers/sharedand Kata Containers Containerd v2 shim mountpassthroughfsfilesystem to path/run/kata-containers/shared/containerswhen the VM starts.
# start nydusd
$ sandbox_id=my-test-sandbox
$ sudo /usr/local/bin/nydusd --log-level info --sock /run/vc/vm/${sandbox_id}/vhost-user-fs.sock --apisock /run/vc/vm/${sandbox_id}/api.sock
# source: the host sharedir which will pass through to guest
$ sudo curl -v --unix-socket /run/vc/vm/${sandbox_id}/api.sock \
-X POST "http://localhost/api/v1/mount?mountpoint=/containers" -H "accept: */*" \
-H "Content-Type: application/json" \
-d '{
"source":"/path/to/sharedir",
"fs_type":"passthrough_fs",
"config":""
}'
- When creating normal container, the Kata Containers Containerd v2 shim send request to
nydusdto mountrafsat the path/run/kata-containers/shared/rafs/<container_id>/lowerdirin guest.
# source: the metafile of nydus image
# config: the config of this image
$ sudo curl --unix-socket /run/vc/vm/${sandbox_id}/api.sock \
-X POST "http://localhost/api/v1/mount?mountpoint=/rafs/<container_id>/lowerdir" -H "accept: */*" \
-H "Content-Type: application/json" \
-d '{
"source":"/path/to/bootstrap",
"fs_type":"rafs",
"config":"config":"{\"device\":{\"backend\":{\"type\":\"localfs\",\"config\":{\"dir\":\"blobs\"}},\"cache\":{\"type\":\"blobcache\",\"config\":{\"work_dir\":\"cache\"}}},\"mode\":\"direct\",\"digest_validate\":true}",
}'
The Kata Containers Containerd v2 shim will also bind mount snapshotdir which nydus-snapshotter assigns to sharedir。
So in guest, container rootfs=overlay(lowerdir=rafs, upperdir=snapshotdir/fs, workdir=snapshotdir/work)
how to transfer the
rafsinfo fromnydus-snapshotterto the Kata Containers Containerd v2 shim?
By default, when creating OCI image container, nydus-snapshotter will return struct Mount slice below to containerd and containerd use them to mount rootfs
[
{
Type: "overlay",
Source: "overlay",
Options: [lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/<snapshot_A>/mnt,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/<snapshot_B>/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/<snapshot_B>/work],
}
]
Then, we can append rafs info into Options, but if do this, containerd will mount failed, as containerd can not identify rafs info. Here, we can refer to containerd mount helper and provide a binary called nydus-overlayfs. The Mount slice which nydus-snapshotter returned becomes
[
{
Type: "fuse.nydus-overlayfs",
Source: "overlay",
Options: [lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/<snapshot_A>/mnt,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/<snapshot_B>/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/<snapshot_B>/work,extraoption=base64({source:xxx,config:xxx,snapshotdir:xxx})],
}
]
When containerd find Type is fuse.nydus-overlayfs,
- containerd will call
mount.fusecommand; - in
mount.fuse, it will callnydus-overlayfs. - in
nydus-overlayfs, it will ignore theextraoptionand do the overlay mount.
Finally, in the Kata Containers Containerd v2 shim, it parse extraoption and get the rafs info to mount the image in guest.

