Commit Graph

73 Commits

Author SHA1 Message Date
Fabiano Fidêncio
7164ced4dc CCv0: Merge from main -- August 1st
Conflicts:
	src/runtime/pkg/katautils/config.go
	src/runtime/virtcontainers/container.go
	src/runtime/virtcontainers/hypervisor.go
	src/runtime/virtcontainers/qemu_arch_base.go
	src/runtime/virtcontainers/sandbox.go
	tests/integration/kubernetes/gha-run.sh
	tests/integration/kubernetes/setup.sh
	tools/packaging/kata-deploy/kata-deploy/base/kata-deploy.yaml
	tools/packaging/kata-deploy/local-build/kata-deploy-binaries.sh
	tools/packaging/kata-deploy/scripts/kata-deploy.sh
	tools/packaging/kernel/kata_config_version
	versions.yaml

Fixes: #7433

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-04 22:15:09 +02:00
Zvonko Kaiser
62aa6750ec vfio: Added better handling of VFIO Control Devices
Depending on the vfio_mode we need to mount the
VFIO control device additionally into the container.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-20 13:42:42 +00:00
Zvonko Kaiser
114542e2ba s390x: Fixing device.Bus assignment
The device.Bus was reset if a specific combination of
configuration parameters were not met. With the new
PCIe topology this should not happen anymore

Fixes: #7381

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-20 07:24:26 +00:00
Zvonko Kaiser
3b9f8fdbcb CCv0: Adding CDI support for cold and hot-plug of VFIO devices
We need to do proper sandbox sizing when we're doing cold-plug introduce CDI,
the de-facto standard for enabling devices in containers. containerd
will pass-through annotations for accumulated CPU,Memory and now CDI
devices. With that information sandbox sizing can be derived correctly.

Fixes: #7331

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-19 06:55:58 +00:00
stevenhorsman
33143eb342 CCv0: Merge main into CCv0 branch
Merge remote-tracking branch 'upstream/main' into CCv0

Fixes: kata-containers#5645
Depends-on: github.com/kata-containers/kata-containers#6885

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-05-25 16:17:59 +01:00
Feng Wang
205909fbed runtime: Fix virtiofs fd leak
The kata runtime invokes removeStaleVirtiofsShareMounts after
a container is stopped to clean up the stale virtiofs file caches.

Fixes: #6455
Signed-off-by: Feng Wang <fwang@confluent.io>
2023-04-26 15:53:39 -07:00
Pradipta Banerjee
3081cd5f8e runtime: propagate configmap/secrets etc changes for remote-hyp
For remote hypervisor, the configmap, secrets, downward-api or project-volumes are
copied from host to guest. This patch watches for changes to the host files
and copies the changes to the guest.

Note that configmap updates takes significantly longer than updates via downward-api.
This is similar across runc and Kata runtimes.

Fixes: #6341

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Signed-off-by: Julien Ropé <jrope@redhat.com>
2023-03-30 03:14:52 +00:00
Georgina Kinge
bda68b16f1 CCv0: Merge main into CCv0 branch
Merge remote-tracking branch 'upstream/main' into CCv0

Fixes: #4555
Signed-off-by: Georgina Kinge <georgina.kinge@ibm.com>
2022-06-29 13:22:22 +01:00
Eric Ernst
f9e96c6506 runtime: device: move to top level package
Let's move device package to runtime/pkg instead of being buried under
virtcontainers.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-06-26 21:31:29 -07:00
Rafael Fonseca
4d5e446643 runtime: remove duplicate 'types' import
Fallout of 09f7962ff

Fixes #4285

Signed-off-by: Rafael Fonseca <r4f4rfs@gmail.com>
2022-05-30 16:02:29 +02:00
Rafael Fonseca
ce2e521a0f runtime: remove duplicate 'types' import
Fallout of 09f7962ff

Fixes #4285

Signed-off-by: Rafael Fonseca <r4f4rfs@gmail.com>
2022-05-19 13:49:47 +02:00
Megan Wright
738ae8c60e CCv0: Merge main into CCv0 branch
Merge remote-tracking branch 'upstream/main' into CCv0

Fixes: #4115
Signed-off-by: Megan-Wright <megan.wright.ibm.com>
2022-04-20 11:32:31 +01:00
Yibo Zhuang
532d53977e runtime: fsGroup support for direct-assigned volume
The fsGroup will be specified by the fsGroup key in
the direct-assign mountinfo metadate field.
This will be set when invoking the kata-runtime
binary and providing the key, value pair in the metadata
field. Similarly, the fsGroupChangePolicy will also
be provided in the mountinfo metadate field.

Adding an extra fields FsGroup and FSGroupChangePolicy
in the Mount construct for container mount which will
be populated when creating block devices by parsing
out the mountInfo.json.

And in handleDeviceBlockVolume of the kata-agent client,
it checks if the mount FSGroup is not nil, which
indicates that fsGroup change is required in the guest,
and will provide the FSGroup field in the protobuf to
pass the value to the agent.

Fixes #4018

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
2022-04-11 08:41:13 -07:00
Megan Wright
ef8ba4bbec CCv0: Merge main into CCv0 branch
Merge remote-tracking branch 'upstream/main' into CCv0

Fixes: #3931
Signed-off-by: Megan Wright megan.wright@ibm.com
2022-03-23 17:01:38 +00:00
Feng Wang
aa5ae6b17c runtime: Properly handle ESRCH error when signaling container
Currently kata shim v2 doesn't translate ESRCH signal, causing container
fail to stop and shim leak.

Fixes: #3874

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-14 11:03:05 -07:00
stevenhorsman
75e2e5ab46 CCv0: Merge main into CCv0 branch
Merge remote-tracking branch 'upstream/main' into CCv0

Fixes: #3843
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2022-03-07 11:09:12 +00:00
Feng Wang
e76519af83 runtime: small refactor to improve readability
Remove some confusing/duplicate code so it's more readable

Fixes: #3454

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-04 10:00:52 -08:00
Feng Wang
f905161bbb runtime: mount direct-assigned block device fs only once
Mount the direct-assigned block device fs only once and keep a refcount
in the guest. Also use the ro flag inside the options field to determine
whether the block device and filesystem should be mounted as ro

Fixes: #3454

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-03 18:57:02 -08:00
Feng Wang
c39281ad65 runtime: update container creation to work with direct assigned volumes
During the container creation, it will parse the mount info file
of the direct assigned volumes and update the in memory mount object.

Fixes: #3454

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-03 18:57:02 -08:00
stevenhorsman
4decf30b3e CCv0: Merge main into CCv0 branch
Merge remote-tracking branch 'upstream/main' into CCv0

Fixes: #3807
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2022-03-02 15:02:54 +00:00
Eric Ernst
e355a71860 container: file is not linux specific
This should not be linux specific -- drop restriction.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-02-28 08:01:53 -08:00
Samuel Ortiz
ad0449195d virtcontainers: Convert stats dev_t to uint64
We need to convert them to uint64 as their types may differ on various
host OSes, but unix.Major|Minor takes a uint64 regardless.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-28 08:01:53 -08:00
Samuel Ortiz
1103f5a4d4 virtcontainers: Use FilesystemSharer for sharing the containers files
Switching to the generic FilesystemSharer brings 2 majors improvements:

1. Remove container and sandbox specific code from kata_agent.go
2. Allow for non Linux implementations to provide ways to share
   container files and root filesystems with the Kata Linux guest.

Fixes #3622

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-25 17:22:27 +01:00
bin
f6fc1621f7 shim: log events for CRI-O
CRI-O start shim process without setting TTRPC_ADDRESS,
that the forwarding events goroutine will get errors.

For CRI-O runtime, we can log the events to log file.

Fixes: #3733

Signed-off-by: bin <bin@hyper.sh>
2022-02-22 11:02:50 +08:00
stevenhorsman
e7e4ba9fc4 CCv0: Merge main into CCv0 branch
Merge remote-tracking branch 'upstream/main' into CCv0

Fixes: #3738
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2022-02-21 15:22:18 +00:00
Samuel Ortiz
77c29bfd3b container: Remove VFIO lazy attach handling
With the recently added VFIO fixes and support, we should not need that
anymore.

Fixes #3108

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-17 08:39:44 +01:00
bin
81a8baa5e5 runtime: add hugepages support
Add hugepages support, port from:
b486387cba

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Signed-off-by: bin <bin@hyper.sh>
2022-02-16 15:14:53 +08:00
stevenhorsman
9f3b2aaf6a CCv0: Merge main into CCv0 branch
Merge remote-tracking branch 'upstream/main' into CCv0

Fixes: #3573
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2022-02-14 16:25:09 +00:00
luodaowen.backend
2d9f89aec7 feature(nydusd): add nydusd support to introduse lazyload ability
Pulling image is the most time-consuming step in the container lifecycle. This PR
introduse nydus to kata container, it can lazily pull image when container start. So it
can speed up kata container create and start.

Fixes #2724

Signed-off-by: luodaowen.backend <luodaowen.backend@bytedance.com>
2022-02-11 21:41:17 +08:00
bin
85f5ae190e runtime: close span before return from function in case of error
Return before closing span will cause invalid spans, so span should
be closed before function return.

Fixes: #3424

Signed-off-by: bin <bin@hyper.sh>
2022-01-11 19:45:41 +08:00
stevenhorsman
ea34b30839 Merge remote-tracking branch 'upstream/main' into CCv0
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2021-11-19 14:37:33 +00:00
wllenyj
5691e66e1b shim: Fix CreateContainer for the confidential computing case
In the confidential computing scenario, there is no Image
information on the host, so skip handling Rootfs at
CreateContainer.

Fixes #3009

Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
2021-11-18 22:41:05 +01:00
Eric Ernst
860f30882a virtcontainers: move oci, uuid packages top level
This will be useful at runtime level; no need for oci or uuid to be subpkg of
virtcontainers.

While at it, ensure we run gofmt on the changed files.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-11-17 14:12:57 -08:00
bin
09f7962ff1 runtime: merge virtcontainers/pkg/types into virtcontainers/types
There are two types packages under virtcontainers, and the
virtcontainers/pkg/types has a few codes, merging them into
one can make it easy for outstanding and using types package.

Fixes: #3031

Signed-off-by: bin <bin@hyper.sh>
2021-11-12 15:06:39 +08:00
Yujia Qiao
e66d0473be virtcontainers: simplify read-only mount handling
Current handling of read-only mounts is a little tricky.
However, a clearer solution can be used here:
  1. make a private ro bind mount at privateDest to the mount source
  2. make a bind mount at mountDest to the mount created in step 1
  3. umount the private bind mount created in step 1
One important aspect is that the mount in step 2 is duplicated from
the one we created in step 1. So the MS_RDONLY flag is properly
preserved in all mounts created in the propagtion.

Fixes: #2205

Depends-on: github.com/kata-containers/tests#4106

Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>
2021-10-28 15:48:41 +08:00
Manohar Castelino
4d47aeef2e hypervisor: Export generic interface methods
This is in preparation for creating a seperate hypervisor package.
Non functional change.

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Chelsea Mafrica
077b77c178 runtime: tracing: Fix logger passed in newContainer
Change logger in Trace call in newContainer from sandbox.Logger() to
nil. Passing nil will cause an error to be logged by kataTraceLogger
instead of the sandbox logger, which will avoid having the log message
report it as part of the sandbox subsystem when it is part of the
container subsystem.

The kataTraceLogger will not log it as related to the container
subsystem, but since the container logger has not been created at this
point, and we already use the kataTraceLogger in other instances where a
subsystem's logger has not been created yet, this PR makes the call
consistent with other code.

Fixes #2665

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-09-17 11:41:04 -07:00
Samuel Ortiz
f17752b0dc virtcontainers: container: Do not create and manage container host cgroups
The only process we are adding there is the container host one, and
there is no such thing anymore.

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
2021-09-14 07:09:33 +02:00
Chelsea Mafrica
8f0f949abf tracing: Move dynamically added attributes to Trace()
Where possible, move attributes added with AddTag() to Trace() call to
reduce the amount of code used for tracing.

Fixes #2512

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-08-27 08:26:40 -07:00
Chelsea Mafrica
8058e97212 tracing: Change runtime tracing tags to vars
Tracing tags are stored inconsistently throughout the runtime. Change
all instances of tracing tags to variables.

Fixes #2512

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-08-26 15:55:32 -07:00
Hui Zhu
e6408fe670 Container: Add initConfigResourcesMemory and call it in newContainer
The swappiness is not right if just set
io.katacontainers.container.resource.swappiness:
$ pod_yaml=pod.yaml
$ container_yaml=container.yaml
$ image="quay.io/prometheus/busybox:latest"
$ cat << EOF > "${pod_yaml}"
metadata:
  name: busybox-sandbox1
EOF
$ cat << EOF > "${container_yaml}"
metadata:
  name: busybox-killed-vmm
annotations:
  io.katacontainers.container.resource.swappiness: "100"
image:
  image: "$image"
command:
- top
EOF
$ sudo crictl pull $image
$ podid=$(sudo crictl runp $pod_yaml)
$ cid=$(sudo crictl create $podid $container_yaml $pod_yaml)
$ sudo crictl start $cid
crictl exec $cid cat /sys/fs/cgroup/memory/memory.swappiness
60

The cause of this issue is there are two elements store the resources
infomation.  They are c.config.Resources for calculateSandboxMemory and
c.GetPatchedOCISpec() for agent.
This add initConfigResourcesMemory to Container and call it in
newContainer to handle the issue.

Fixes: #2372

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-08-02 16:02:12 +08:00
Hui Zhu
ee90affc18 newContainer: Initialize c.config.Resources.Memory if it is nil
container start fail if
io.katacontainers.container.resource.swap_in_bytes and
memory_limit_in_bytes are not set.
$ pod_yaml=pod.yaml
$ container_yaml=container.yaml
$ image="quay.io/prometheus/busybox:latest"
$ cat << EOF > "${pod_yaml}"
metadata:
  name: busybox-sandbox1
EOF
$ cat << EOF > "${container_yaml}"
metadata:
  name: busybox-killed-vmm
annotations:
  io.katacontainers.container.resource.swappiness: "60"
image:
  image: "$image"
command:
- top
EOF
$ sudo crictl pull $image
$ podid=$(sudo crictl runp $pod_yaml)
$ cid=$(sudo crictl create $podid $container_yaml $pod_yaml)
$ sudo crictl start $cid
DEBU[0000] get runtime connection
DEBU[0000] connect using endpoint
'unix:///var/run/containerd/containerd.sock' with '10s' timeout
DEBU[0000] connected successfully using endpoint:
unix:///var/run/containerd/containerd.sock
DEBU[0000] StartContainerRequest:
&StartContainerRequest{ContainerId:4fea91d16f661931fe33acd247efe831ef9e571588ba18b5a16f04c278fd61b8,}
DEBU[0000] StartContainerResponse: nil
FATA[0000] starting the container
"4fea91d16f661931fe33acd247efe831ef9e571588ba18b5a16f04c278fd61b8": rpc
error: code = Unknown desc = failed to create containerd task: failed to
create shim: ttrpc: closed: unknown

The cause of fail if if c.config.Resources.Memory is nil, values of
io.katacontainers.container.resource.swappiness and
io.katacontainers.container.resource.swap_in_bytes will be store in
newContainer.

This commit initialize c.config.Resources.Memory if it is nil in
newContainer.

Fixes: #2367

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-08-01 10:03:27 +08:00
Julio Montes
47d95dc1c6 runtime: virtcontainers: fix govet fieldalignment
Fix structures alignment

fixes #2271

Depends-on: github.com/kata-containers/tests#3727

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-07-20 11:59:15 -05:00
Hui Zhu
a733f537e5 runtime: newContainer: Handle the annotations of SWAP
This commit add code to handle the annotations
"io.katacontainers.container.resource.swappiness" and
"io.katacontainers.container.resource.swap_in_bytes".
It will set the value of "io.katacontainers.resource.swappiness" to
c.config.Resources.Memory.Swappiness and set the value of
"io.katacontainers.resource.swap_in_bytes" to
c.config.Resources.Memory.Swap.

Fixes: #2201

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-07-19 23:20:46 +08:00
Benjamin Porter
b10e3e22b5 tracing: Consolidate tracing into a new katatrace package
Removes custom trace functions defined across the repo and creates
a single trace function in a new katatrace package. Also moves
span tag management into this package and provides a function to
dynamically add a tag at runtime, such as a container id, etc.

Fixes #1162

Signed-off-by: Benjamin Porter <bporter816@gmail.com>
2021-07-11 14:19:51 -05:00
Eric Ernst
064dfb164b runtime: Add "watchable-mounts" concept for inotify support
To workaround virtiofs' lack of inotify support, we'll special case
particular mounts which are typically watched, and pass on information
to the agent so it can ensure that the mount presented to the container
is indeed watchable (see applicable agent commit).

This commit will:
 - identify watchable mounts based on file count and mount source
 - create a watchable-bind storage object for these mounts to
   communicate intent to the agent
 - update the OCI spec to take the updated watchable mount source into account

Unit tests added and updated for the newly introduced
functionality/functions.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-06-24 10:07:06 -07:00
Eric Ernst
57c0cee0a5 runtime: Cleanup mountSharedDirMounts, shareFile parameters
There's no reason to pass the paths; they can be
determined when they are actually used.

Let's make the return values more comparable to the other mount handling
functions (we'll add storage object in future commit), and pass the mount maps as
function parameters.

...No functional changes here...

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-06-24 10:07:06 -07:00
Chelsea Mafrica
8ca0207281 tracing: Add sandbox and container ID to trace spans
Add sandbox, container, and hypervisor IDs to trace spans. Note that
some spans in sandbox.go are created with a trace() call from api.go.
These spans have additional attributes set after span creation to
overwrite the api attributes.

Fixes #1878

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-06-02 21:53:54 -07:00
Chelsea Mafrica
05a46fede0 tracing: Make runtime span attributes more consistent
Span attributes (tags) are not consistent in runtime tracing, so
designate and use core attributes such source, package, subsystem, and
type as span metadata for more understandable output.

Use WithAttributes() during span creation to reduce calls to
SetAttributes().

Modify Trace() in katautils to accept slice of attributes so multiple
functions using different attributes can use it.

Fixes #1852

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-05-27 10:07:11 -07:00
Hui Zhu
0787ea8073 cgroupsCreate: not set resources to c.config.Resources
cgroupsCreate will just keep the CPU resources infomation but not the
others.
Set it to c.config.Resources will clean most of resources of the
container.

This commit remove it to handle the issue.

Fixes: #1758

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-04-27 16:44:30 +08:00