Pulling image is the most time-consuming step in the container lifecycle. This PR
introduse nydus to kata container, it can lazily pull image when container start. So it
can speed up kata container create and start.
Fixes#2724
Signed-off-by: luodaowen.backend <luodaowen.backend@bytedance.com>
This will be useful at runtime level; no need for oci or uuid to be subpkg of
virtcontainers.
While at it, ensure we run gofmt on the changed files.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
There are two types packages under virtcontainers, and the
virtcontainers/pkg/types has a few codes, merging them into
one can make it easy for outstanding and using types package.
Fixes: #3031
Signed-off-by: bin <bin@hyper.sh>
Current handling of read-only mounts is a little tricky.
However, a clearer solution can be used here:
1. make a private ro bind mount at privateDest to the mount source
2. make a bind mount at mountDest to the mount created in step 1
3. umount the private bind mount created in step 1
One important aspect is that the mount in step 2 is duplicated from
the one we created in step 1. So the MS_RDONLY flag is properly
preserved in all mounts created in the propagtion.
Fixes: #2205
Depends-on: github.com/kata-containers/tests#4106
Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>
Change logger in Trace call in newContainer from sandbox.Logger() to
nil. Passing nil will cause an error to be logged by kataTraceLogger
instead of the sandbox logger, which will avoid having the log message
report it as part of the sandbox subsystem when it is part of the
container subsystem.
The kataTraceLogger will not log it as related to the container
subsystem, but since the container logger has not been created at this
point, and we already use the kataTraceLogger in other instances where a
subsystem's logger has not been created yet, this PR makes the call
consistent with other code.
Fixes#2665
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
The only process we are adding there is the container host one, and
there is no such thing anymore.
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
Where possible, move attributes added with AddTag() to Trace() call to
reduce the amount of code used for tracing.
Fixes#2512
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Tracing tags are stored inconsistently throughout the runtime. Change
all instances of tracing tags to variables.
Fixes#2512
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
The swappiness is not right if just set
io.katacontainers.container.resource.swappiness:
$ pod_yaml=pod.yaml
$ container_yaml=container.yaml
$ image="quay.io/prometheus/busybox:latest"
$ cat << EOF > "${pod_yaml}"
metadata:
name: busybox-sandbox1
EOF
$ cat << EOF > "${container_yaml}"
metadata:
name: busybox-killed-vmm
annotations:
io.katacontainers.container.resource.swappiness: "100"
image:
image: "$image"
command:
- top
EOF
$ sudo crictl pull $image
$ podid=$(sudo crictl runp $pod_yaml)
$ cid=$(sudo crictl create $podid $container_yaml $pod_yaml)
$ sudo crictl start $cid
crictl exec $cid cat /sys/fs/cgroup/memory/memory.swappiness
60
The cause of this issue is there are two elements store the resources
infomation. They are c.config.Resources for calculateSandboxMemory and
c.GetPatchedOCISpec() for agent.
This add initConfigResourcesMemory to Container and call it in
newContainer to handle the issue.
Fixes: #2372
Signed-off-by: Hui Zhu <teawater@antfin.com>
This commit add code to handle the annotations
"io.katacontainers.container.resource.swappiness" and
"io.katacontainers.container.resource.swap_in_bytes".
It will set the value of "io.katacontainers.resource.swappiness" to
c.config.Resources.Memory.Swappiness and set the value of
"io.katacontainers.resource.swap_in_bytes" to
c.config.Resources.Memory.Swap.
Fixes: #2201
Signed-off-by: Hui Zhu <teawater@antfin.com>
Removes custom trace functions defined across the repo and creates
a single trace function in a new katatrace package. Also moves
span tag management into this package and provides a function to
dynamically add a tag at runtime, such as a container id, etc.
Fixes#1162
Signed-off-by: Benjamin Porter <bporter816@gmail.com>
To workaround virtiofs' lack of inotify support, we'll special case
particular mounts which are typically watched, and pass on information
to the agent so it can ensure that the mount presented to the container
is indeed watchable (see applicable agent commit).
This commit will:
- identify watchable mounts based on file count and mount source
- create a watchable-bind storage object for these mounts to
communicate intent to the agent
- update the OCI spec to take the updated watchable mount source into account
Unit tests added and updated for the newly introduced
functionality/functions.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
There's no reason to pass the paths; they can be
determined when they are actually used.
Let's make the return values more comparable to the other mount handling
functions (we'll add storage object in future commit), and pass the mount maps as
function parameters.
...No functional changes here...
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Add sandbox, container, and hypervisor IDs to trace spans. Note that
some spans in sandbox.go are created with a trace() call from api.go.
These spans have additional attributes set after span creation to
overwrite the api attributes.
Fixes#1878
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Span attributes (tags) are not consistent in runtime tracing, so
designate and use core attributes such source, package, subsystem, and
type as span metadata for more understandable output.
Use WithAttributes() during span creation to reduce calls to
SetAttributes().
Modify Trace() in katautils to accept slice of attributes so multiple
functions using different attributes can use it.
Fixes#1852
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
cgroupsCreate will just keep the CPU resources infomation but not the
others.
Set it to c.config.Resources will clean most of resources of the
container.
This commit remove it to handle the issue.
Fixes: #1758
Signed-off-by: Hui Zhu <teawater@antfin.com>
A wrong path was being used for container directory when
virtiofs is utilized. This resulted in a warning message in
logs when a container is killed, or completes:
level=warning msg="Could not remove container share dir"
Without proper removal, they'd later be cleaned up when the shared
path is removed as part of stopping the sandbox.
Fixes: #1559
Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
Right now we rely heavily on mount propagation to share host
files/directories to the guest. However, because virtiofsd
pivots and moves itself to a separate mount namespace, the remount
mount is not present in virtiofsd's mount. And it causes guest to be
able to write to the host RO volume.
To fix it, create a private RO mount and then move it to the host mounts
dir so that it will be present readonly in the host-guest shared dir.
Fixes: #1552
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
It turns out we have managed to break the static checker in many
difference places with the absence of static checker in github action.
Let's fix them while enabling static checker in github actions...
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
A significant number of trace calls did not use a parent context that
would create proper span ordering in trace output. Add local context to
functions for use in trace calls to facilitate proper span ordering.
Additionally, change whether trace function returns context in some
functions in virtcontainers and use existing context rather than
background context in bindMount() so that span exists as a child of a
parent span.
Fixes#1355
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
This commit includes two changes:
- migrate from opentracing to opentelemetry
- add jaeger configuration items
Fixes: #1351
Signed-off-by: bin <bin@hyper.sh>
bindmount remount events are not propagated through mount subtrees,
so we have to remount the shared dir mountpoint directly.
E.g.,
```
mkdir -p source dest foo source/foo
mount -o bind --make-shared source dest
mount -o bind foo source/foo
echo bind mount rw
mount | grep foo
echo remount ro
mount -o remount,bind,ro source/foo
mount | grep foo
```
would result in:
```
bind mount rw
/dev/xvda1 on /home/ubuntu/source/foo type ext4 (rw,relatime,discard,data=ordered)
/dev/xvda1 on /home/ubuntu/dest/foo type ext4 (rw,relatime,discard,data=ordered)
remount ro
/dev/xvda1 on /home/ubuntu/source/foo type ext4 (ro,relatime,discard,data=ordered)
/dev/xvda1 on /home/ubuntu/dest/foo type ext4 (rw,relatime,discard,data=ordered)
```
The reason is that bind mount creats new mount structs and attaches them to different mount subtrees.
However, MS_REMOUNT only looks for existing mount structs to modify and does not try to propagate the
change to mount structs in other subtrees.
Fixes: #1061
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Kata doesn't map any numa topologies in the guest. Let's make sure we
clear the Cpuset fields before passing container updates to the
guest.
Note, in the future we may want to have a vCPU to guest CPU mapping and
still include the cpuset.Cpus. Until we have this support, clear this as
well.
Fixes: #932
Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
Allow for constraining the cpuset as well as the devices-whitelist . Revert
sandbox constraints for cpu/memory, as they break the K8S use case. Can
re-add behind a non-default flag in the future.
The sandbox CPUSet should be updated every time a container is created,
updated, or removed.
To facilitate this without rewriting the 'non constrained cgroup'
handling, let's add to the Sandbox's cgroupsUpdate function.
Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
We do not need the vc types translation for network data structures.
Just use the protocol buffer definitions.
Fixes: #415
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
We need to make sure containers cannot modify host path unless it is explicitly shared to it. Right now we expose an additional top level shared directory to the guest and allow it to be modified. This is less ideal and can be enhanced by following method:
1. create two directories for each sandbox:
-. /run/kata-containers/shared/sandboxes/$sbx_id/mounts/, a directory to hold all host/guest shared mounts
-. /run/kata-containers/shared/sandboxes/$sbx_id/shared/, a host/guest shared directory (9pfs/virtiofs source dir)
2. /run/kata-containers/shared/sandboxes/$sbx_id/mounts/ is bind mounted readonly to /run/kata-containers/shared/sandboxes/$sbx_id/shared/, so guest cannot modify it
3. host-guest shared files/directories are mounted one-level under /run/kata-containers/shared/sandboxes/$sbx_id/mounts/ and thus present to guest at one level under /run/kata-containers/shared/sandboxes/$sbx_id/shared/
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
To use the kata-containers repo path.
Most of the change is generated by script:
find . -type f -name "*.go" |xargs sed -i -e \
's|github.com/kata-containers/runtime|github.com/kata-containers/kata-containers/src/runtime|g'
Fixes: #201
Signed-off-by: Peng Tao <bergwolf@hyper.sh>