The device.Bus was reset if a specific combination of
configuration parameters were not met. With the new
PCIe topology this should not happen anymore
Fixes: #7381
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
We need to do proper sandbox sizing when we're doing cold-plug introduce CDI,
the de-facto standard for enabling devices in containers. containerd
will pass-through annotations for accumulated CPU,Memory and now CDI
devices. With that information sandbox sizing can be derived correctly.
Fixes: #7331
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
The kata runtime invokes removeStaleVirtiofsShareMounts after
a container is stopped to clean up the stale virtiofs file caches.
Fixes: #6455
Signed-off-by: Feng Wang <fwang@confluent.io>
For remote hypervisor, the configmap, secrets, downward-api or project-volumes are
copied from host to guest. This patch watches for changes to the host files
and copies the changes to the guest.
Note that configmap updates takes significantly longer than updates via downward-api.
This is similar across runc and Kata runtimes.
Fixes: #6341
Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Signed-off-by: Julien Ropé <jrope@redhat.com>
The fsGroup will be specified by the fsGroup key in
the direct-assign mountinfo metadate field.
This will be set when invoking the kata-runtime
binary and providing the key, value pair in the metadata
field. Similarly, the fsGroupChangePolicy will also
be provided in the mountinfo metadate field.
Adding an extra fields FsGroup and FSGroupChangePolicy
in the Mount construct for container mount which will
be populated when creating block devices by parsing
out the mountInfo.json.
And in handleDeviceBlockVolume of the kata-agent client,
it checks if the mount FSGroup is not nil, which
indicates that fsGroup change is required in the guest,
and will provide the FSGroup field in the protobuf to
pass the value to the agent.
Fixes#4018
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
Currently kata shim v2 doesn't translate ESRCH signal, causing container
fail to stop and shim leak.
Fixes: #3874
Signed-off-by: Feng Wang <feng.wang@databricks.com>
Mount the direct-assigned block device fs only once and keep a refcount
in the guest. Also use the ro flag inside the options field to determine
whether the block device and filesystem should be mounted as ro
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
During the container creation, it will parse the mount info file
of the direct assigned volumes and update the in memory mount object.
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
We need to convert them to uint64 as their types may differ on various
host OSes, but unix.Major|Minor takes a uint64 regardless.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Switching to the generic FilesystemSharer brings 2 majors improvements:
1. Remove container and sandbox specific code from kata_agent.go
2. Allow for non Linux implementations to provide ways to share
container files and root filesystems with the Kata Linux guest.
Fixes#3622
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
CRI-O start shim process without setting TTRPC_ADDRESS,
that the forwarding events goroutine will get errors.
For CRI-O runtime, we can log the events to log file.
Fixes: #3733
Signed-off-by: bin <bin@hyper.sh>
Pulling image is the most time-consuming step in the container lifecycle. This PR
introduse nydus to kata container, it can lazily pull image when container start. So it
can speed up kata container create and start.
Fixes#2724
Signed-off-by: luodaowen.backend <luodaowen.backend@bytedance.com>
In the confidential computing scenario, there is no Image
information on the host, so skip handling Rootfs at
CreateContainer.
Fixes#3009
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
This will be useful at runtime level; no need for oci or uuid to be subpkg of
virtcontainers.
While at it, ensure we run gofmt on the changed files.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
There are two types packages under virtcontainers, and the
virtcontainers/pkg/types has a few codes, merging them into
one can make it easy for outstanding and using types package.
Fixes: #3031
Signed-off-by: bin <bin@hyper.sh>
Current handling of read-only mounts is a little tricky.
However, a clearer solution can be used here:
1. make a private ro bind mount at privateDest to the mount source
2. make a bind mount at mountDest to the mount created in step 1
3. umount the private bind mount created in step 1
One important aspect is that the mount in step 2 is duplicated from
the one we created in step 1. So the MS_RDONLY flag is properly
preserved in all mounts created in the propagtion.
Fixes: #2205
Depends-on: github.com/kata-containers/tests#4106
Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>
Change logger in Trace call in newContainer from sandbox.Logger() to
nil. Passing nil will cause an error to be logged by kataTraceLogger
instead of the sandbox logger, which will avoid having the log message
report it as part of the sandbox subsystem when it is part of the
container subsystem.
The kataTraceLogger will not log it as related to the container
subsystem, but since the container logger has not been created at this
point, and we already use the kataTraceLogger in other instances where a
subsystem's logger has not been created yet, this PR makes the call
consistent with other code.
Fixes#2665
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
The only process we are adding there is the container host one, and
there is no such thing anymore.
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
Where possible, move attributes added with AddTag() to Trace() call to
reduce the amount of code used for tracing.
Fixes#2512
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Tracing tags are stored inconsistently throughout the runtime. Change
all instances of tracing tags to variables.
Fixes#2512
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
The swappiness is not right if just set
io.katacontainers.container.resource.swappiness:
$ pod_yaml=pod.yaml
$ container_yaml=container.yaml
$ image="quay.io/prometheus/busybox:latest"
$ cat << EOF > "${pod_yaml}"
metadata:
name: busybox-sandbox1
EOF
$ cat << EOF > "${container_yaml}"
metadata:
name: busybox-killed-vmm
annotations:
io.katacontainers.container.resource.swappiness: "100"
image:
image: "$image"
command:
- top
EOF
$ sudo crictl pull $image
$ podid=$(sudo crictl runp $pod_yaml)
$ cid=$(sudo crictl create $podid $container_yaml $pod_yaml)
$ sudo crictl start $cid
crictl exec $cid cat /sys/fs/cgroup/memory/memory.swappiness
60
The cause of this issue is there are two elements store the resources
infomation. They are c.config.Resources for calculateSandboxMemory and
c.GetPatchedOCISpec() for agent.
This add initConfigResourcesMemory to Container and call it in
newContainer to handle the issue.
Fixes: #2372
Signed-off-by: Hui Zhu <teawater@antfin.com>
This commit add code to handle the annotations
"io.katacontainers.container.resource.swappiness" and
"io.katacontainers.container.resource.swap_in_bytes".
It will set the value of "io.katacontainers.resource.swappiness" to
c.config.Resources.Memory.Swappiness and set the value of
"io.katacontainers.resource.swap_in_bytes" to
c.config.Resources.Memory.Swap.
Fixes: #2201
Signed-off-by: Hui Zhu <teawater@antfin.com>
Removes custom trace functions defined across the repo and creates
a single trace function in a new katatrace package. Also moves
span tag management into this package and provides a function to
dynamically add a tag at runtime, such as a container id, etc.
Fixes#1162
Signed-off-by: Benjamin Porter <bporter816@gmail.com>
To workaround virtiofs' lack of inotify support, we'll special case
particular mounts which are typically watched, and pass on information
to the agent so it can ensure that the mount presented to the container
is indeed watchable (see applicable agent commit).
This commit will:
- identify watchable mounts based on file count and mount source
- create a watchable-bind storage object for these mounts to
communicate intent to the agent
- update the OCI spec to take the updated watchable mount source into account
Unit tests added and updated for the newly introduced
functionality/functions.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
There's no reason to pass the paths; they can be
determined when they are actually used.
Let's make the return values more comparable to the other mount handling
functions (we'll add storage object in future commit), and pass the mount maps as
function parameters.
...No functional changes here...
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Add sandbox, container, and hypervisor IDs to trace spans. Note that
some spans in sandbox.go are created with a trace() call from api.go.
These spans have additional attributes set after span creation to
overwrite the api attributes.
Fixes#1878
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Span attributes (tags) are not consistent in runtime tracing, so
designate and use core attributes such source, package, subsystem, and
type as span metadata for more understandable output.
Use WithAttributes() during span creation to reduce calls to
SetAttributes().
Modify Trace() in katautils to accept slice of attributes so multiple
functions using different attributes can use it.
Fixes#1852
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
cgroupsCreate will just keep the CPU resources infomation but not the
others.
Set it to c.config.Resources will clean most of resources of the
container.
This commit remove it to handle the issue.
Fixes: #1758
Signed-off-by: Hui Zhu <teawater@antfin.com>