Commit Graph

1329 Commits

Author SHA1 Message Date
Fabiano Fidêncio
ebc23df752 Merge pull request #2714 from egernst/watcher-fixup-backport
stable-2.2 | watcher: ensure we create target mount point for storage
2021-09-24 09:32:29 +02:00
Fabiano Fidêncio
e58fabfc20 Merge pull request #2598 from c3d/backport/2589-virtiofsd-perms-perms
stable-2.2 | virtiofs: Create shared directory with 0700 mode, not 0750
2021-09-24 09:16:59 +02:00
Peng Tao
feb06dad8a Merge pull request #2623 from Bevisy/stable-2.2-2615-bp
[backport]sandbox: Allow the device to be accessed,such as /dev/null and /dev/u…
2021-09-24 14:04:36 +08:00
Eric Ernst
d9b41fc583 watcher: ensure we create target mount point for storage
We would only create the target when updating files. We need to make
sure that we create the target if the source is a directory. Without
this, we'll fail to start a container that utilizes an empty configmap,
for example.

Add unit tests for this.

Fixes: #2638

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-09-23 15:45:57 -07:00
Chelsea Mafrica
83f219577d Merge pull request #2668 from cmaf/tracing-newContainer-logger-bp-2.2
stable-2.2 | runtime: tracing: Fix logger passed in newContainer
2021-09-23 09:58:14 -07:00
Chelsea Mafrica
97421afe17 Merge pull request #2664 from cmaf/tracing-stop-rootctx-bp-2.2
stable-2.2 | runtime: tracing: Use root context to stop tracing
2021-09-23 09:57:57 -07:00
Chelsea Mafrica
484af1a559 Merge pull request #2678 from nubificus/stable-2.2-fix_fc_vcpu_thread
stable-2.2 | virtcontainers: fc: parse vcpuID correctly
2021-09-20 09:46:07 -07:00
Snir Sheriber
2ca867da7b runtime: Add container field to logs
and unified field naming

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

Backport from commit 0c7789fad6
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2021-09-20 11:04:09 +02:00
Snir Sheriber
f4da502c4f shimv2: add information to method comment
add a comment to explicitly mentioned method is a binary call

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

Backport from commit 72e3538e36
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2021-09-20 11:03:45 +02:00
Snir Sheriber
16164241df shimv2: add logging to shimv2 api calls
and also fetch and log container id from the request

Fixes: #2527
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

Backport from commit 8dadca9cd1
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2021-09-20 11:02:35 +02:00
Christophe de Dinechin
25c7e1181a virtiofs: Create shared directory with 0700 mode, not 0750
A discussion on the Linux kernel mailing list [1] exposed that virtiofsd makes a
core assumption that the file systems being shared are not accessible by any
non-privileged user. We currently create the `shared` directory in the sandbox
with the default `0750` permissions, which gives read and directory traversal
access to the group. There is no real good reason for a non-root user to access
the shared directory, and this is potentially dangerous.

Fixes: #2589

[1]: https://lore.kernel.org/linux-fsdevel/YTI+k29AoeGdX13Q@redhat.com/

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2021-09-20 10:54:18 +02:00
Anastassios Nanos
4c5bf0576b virtcontainers: fc: parse vcpuID correctly
In getThreadIDs(), the cpuID variable is derived from a string that
already contains a whitespace. As a result, strings.SplitAfter returns
the cpuID with a leading space. This makes any go variant of string to int
fail (strconv.ParseInt() in our case). This patch makes sure that the
leading space character is removed so the string passed to
strconv.ParseInt() is "CPUID" and not " CPUID".

This has been caused by a change in the naming scheme of vcpu threads
for Firecracker after v0.19.1.

Fixes: #2592

Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
2021-09-18 08:10:13 +00:00
Chelsea Mafrica
b3e620dbcf runtime: tracing: Fix logger passed in newContainer
Change logger in Trace call in newContainer from sandbox.Logger() to
nil. Passing nil will cause an error to be logged by kataTraceLogger
instead of the sandbox logger, which will avoid having the log message
report it as part of the sandbox subsystem when it is part of the
container subsystem.

The kataTraceLogger will not log it as related to the container
subsystem, but since the container logger has not been created at this
point, and we already use the kataTraceLogger in other instances where a
subsystem's logger has not been created yet, this PR makes the call
consistent with other code.

Backport of #2666
Fixes #2667

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-09-16 16:30:29 -07:00
Chelsea Mafrica
98c2ca13c1 runtime: tracing: Use root context to stop tracing
Call StopTracing with s.rootCtx, which is the root context for tracing,
instead of s.ctx, which is parent to a subset of trace spans.

Backport of #2662

Fixes #2663

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-09-16 11:19:40 -07:00
Binbin Zhang
56920bc943 sandbox: Allow the device to be accessed,such as /dev/null and /dev/urandom
If the device has no permission, such as /dev/null, /dev/urandom,
it needs to be added into cgroup.

Fixes: #2615
Backport: #2616

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2021-09-14 10:33:49 +08:00
Bo Chen
a1874ccd62 virtcontainers: clh: Revert the workaround incorrect default values
Given the fix to the bugs of the openapi spec file is included in the
Cloud Hypervisor v18.0 [1], this patch reverts the workaround we carried
in the CLH driver.

This reverts commit 932ee41b3f.

[1] https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3029

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit f785ff0bf2)
2021-09-13 14:17:58 -07:00
Bo Chen
c2c650500b virtcontainers: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v18.0.
Note: The client code of cloud-hypervisor's (CLH) OpenAPI is
automatically generated by openapi-generator [1-2].

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 0e0e59dc5f)
2021-09-13 14:17:58 -07:00
Samuel Ortiz
eedf139076 Merge pull request #2608 from Bevisy/main-2539-bp
[backport]sandbox: Add device permissions such as /dev/null to cgroup
2021-09-13 19:07:17 +02:00
Samuel Ortiz
1792a9fe11 runtime: Fix README link
The LICENSE file lives in the project's root.

Fixes #2612

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2021-09-11 09:57:49 +02:00
Binbin Zhang
807cc8a3a5 sandbox: Add device permissions such as /dev/null to cgroup
adds the default devices for unix such as /dev/null, /dev/urandom to
the container's resource cgroup spec

Fixes: #2539
Backports: #2603

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2021-09-10 17:33:26 +08:00
Peng Tao
0bdfdad236 runtime: drop qemu-lite support
As the project is not maintained and we have not been testing against it
for a long time.

Fixes: #2529
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-08-31 10:17:06 +08:00
Peng Tao
60155756f3 runtime: fix default hypervisor path
Should not be qemu-lite.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-08-31 10:16:57 +08:00
Bo Chen
938b01aedc virtcontainers: clh: Workaround incorrect default values
Two default values defined in the 'cloud-hypervisor.yaml' have typo, and this
patch manually overwrites them with the correct value as a workaround
before the corresponding fix is landed to Cloud Hypervisor upstream.

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 932ee41b3f)
2021-08-27 13:37:47 -07:00
Bo Chen
abd708e814 virtcontainers: clh: Fix the unit test
This patch fixes the unit tests over clh.go with the updated client code.

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit bff38e4f4d)
2021-08-27 13:37:47 -07:00
Bo Chen
61babd45ed virtcontainers: clh: Use constructors to ensure proper default value
With the updated openapi-generator, the client code now handles optional
attributes correctly, and ensures to assign the right default
values. This patch enables to use those constructors to make sure the
proper default values being used.

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit d967d3cb37)
2021-08-27 13:37:47 -07:00
Bo Chen
59c51f6201 virtcontainers: clh: Migrate to use the updated client APIs
The client code (and APIs) for Cloud Hypervisor has been changed
dramatically due to the upgrade to `openapi-generator` v5.2.1. This
patch migrate the Cloud Hypervisor driver in the kata-runtime to use
those updated APIs.

The main change from the client code is that it now uses "pointer" type
to represent "optional" attributes from the input openapi specification
file.

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit a6a2e525de)
2021-08-27 13:37:47 -07:00
Bo Chen
c1f260cc40 virtcontainers: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor with the
updated `openapi-generator` v5.2.1.

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 46eb07e14f)
2021-08-27 13:37:47 -07:00
Bo Chen
4cd6909f18 virtcontainers: clh: Upgrade to the openapi-generator v5.2.1
To improve the quality and correctness of the auto-generated code, this
patch upgrade the `openapi-generator` to its latest stable release
v5.2.1.

Fixes: #2487

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 80fba4d637)
2021-08-27 13:37:47 -07:00
Fabiano Fidêncio
348795e282 Merge pull request #2233 from fgiudici/kata-monitor_liubin_cri
use CRI in kata-monitor
2021-08-20 13:58:12 +02:00
Jack Rieck
7a5ffd4a0f config: Enable jailer by default when using firecracker
Now that we have enabled CI tests for jailed firecracker and we have
fixed the  issue with removing the block storage device #2387, we
should leverage the full power of firecracker and enable jailer by
default.

Fixes: #2455
Signed-off-by: Jack Rieck <jack.rieck@sendgrid.com>
2021-08-17 19:22:09 -04:00
Chelsea Mafrica
9586d48254 tracing: Return context in runHooks() span creation
The call to Trace() in runHooks() should return a context so that
subsequent calls to runHook() produce properly ordered trace spans.

Fixes #2423

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-08-12 10:09:56 -07:00
Eric Ernst
71f304ce17 agent: watcher: cleanup mount if needed when container is removed
If a bind mount was created for watchable storage, make sure we remove
when removing a container.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-08-11 08:53:28 -07:00
Samuel Ortiz
f1a505dbfe agent: Temporarily allow unknown linters
Bump thiserror to 1.0.26 for vsock-exporter and work around
a bug in Clippy nonstandard_macro_braces lint.
(See https://github-redirect.dependabot.com/rust-lang/rust-clippy/issues/7422)

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
2021-08-11 08:53:28 -07:00
Eric Ernst
961aaff004 agent: watcher: fixes to make more robust
inotify/watchable-mount changes...

- Allow up to 16 files. It isn't that uncommon to have 3 files in a secret.
In Kubernetes, this results in 9 files in the mount (the presented files,
which are symlinks to the latest files, which are symlinks to actual files
which are in a seperate hidden directoy on the mount). Bumping from eight to 16 will
help ensure we can support "most" secret/tokens, and is still a pretty
small number to scan...

- Now we will only replace the watched storage with a bindmount if we observe
that there are too many files or if its too large. Since the scanning/updating is racy,
we should expect that we'll occassionally run into errors (ie, a file
deleted between scan / update). Rather than stopping and making a bind
mount, continue updating, as the changes will be updated the next time
check is called for that entry (every 2 seconds today).

To facilitate the 'oversized' handling, we create specific errors for too large
or too many files, and handle these specific errors when scanning the storage entry.

- When handling an oversided mount, do not remove the prior files -- we'll just
overwrite them with the bindmount. This'll help avoid the files
disappearing from the user, avoid racy cleanup and simplifies the flow.
Similarly, only mark it as a non-watched storage device after the
bindmount is created successfully.

- When creating bind mount, make sure destination exists. If we hadn't
had a successful scan before, this wouldn't exist and the mount would
fail. Update logic and unit test to cover this.

- In several spots, we were returning when there was an error (both in
scan and update). For update case, let's just log an warning and continue;
since the scan/update is racy, we should expect that we'll have
transient errors which should resolve the next time the watcher runs.

Fixes: #2402

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-08-11 08:52:51 -07:00
Fabiano Fidêncio
2aa686a0f5 Merge pull request #2409 from sameo/topic/agent
agent: Fix cargo 1.54 clippy warning
2021-08-10 23:03:00 +02:00
wangyongchao.bj
99ab91df3d docs: update the docs project url from kata 1.x to 2.x
changed the document project url in the using-vpp-and-kata.md and
runtime experimental README.md files.

Fixes: #2418

Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
2021-08-10 13:51:54 +08:00
Fabiano Fidêncio
e1e6827a2c Merge pull request #2388 from nubificus/fix_jailed_fc
virtcontainers: fc: properly remove jailed block device
2021-08-10 00:17:18 +02:00
Samuel Ortiz
233b53c048 agent: Fix cargo 1.54 clippy warning
Mostly the needless borrow one, plus a few others that are now enforced.

Fixes #2408

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
2021-08-05 18:41:55 +02:00
Francesco Giudici
2d8386ea52 kata-monitor: add few unit tests
Add cri.go unit tests

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
2021-08-05 11:41:54 +02:00
Francesco Giudici
8714a35063 kata-monitor: make code to identify kata pods simpler
just search for the "kata" substring in the runtime value and log at
info level when the runtime name/type is not found.

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
2021-08-05 11:41:54 +02:00
Francesco Giudici
68a6f011b5 kata-monitor: drop the runtime info from the sandbox cache
We keep the container engine info in the sandbox cache map, as the value
associated to the pod id (the key). Since we used that in
getMonitorAddress() only (which is gone) we can avoid storing that
information. Let's drop it.
Keep the map structure and the [put,delete]IfExists functions as we may
want to move to an event based cache update process sooner or later, and
we will need those.

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
2021-08-05 11:41:54 +02:00
Francesco Giudici
97dcc5f78a kata-monitor: drop getMonitorAddress()
since the shim socket path is statically defined in the containerd-shimv2
code, we don't need to retrieve the socket name from the filesystem:
construct the socket name using the containerd-shimv2 code.

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
2021-08-05 11:41:54 +02:00
Francesco Giudici
0b03d97d0b vendor: update vendors for kata-monitor
kata-monitor switched from containerd client to CRI. Update the
dependencies and vendored code.

go mod tidy
go mod vendor

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
2021-08-05 11:41:54 +02:00
Francesco Giudici
c2f03e8993 kata-monitor: talk to the container engine via the CRI
kata-monitor uses containerd client to retrieve information from the
container engine. This makes kata-monitor work with the containerd
container engine only.
Bin Liu (bin <bin@hyper.sh>) worked on a kata-monitor version able
to talk to any container engine leveraging the standard CRI[1].
Here, the original work of Bin Lui has been adapted on the current
kata-monitor to make it container engine independent.

[1] https://github.com/liubin/kata-containers/tree/fix/1030-use-cri-in-kata-monitor

Fixes: #1030
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
2021-08-05 11:41:54 +02:00
Chelsea Mafrica
eac05ad6d6 Merge pull request #2375 from sameo/upstream/topic/process-cwd
agent: Create the process CWD when it does not exist
2021-08-04 11:35:11 -07:00
Anastassios Nanos
64dd35ba4f virtcontainers: fc: properly remove jailed block device
When running a firecracker instance jailed, block devices
are not removed correctly, as the jailerRoot path is not
stripped from the PATCH command sent to the FC API.

This patch differentiates the jailed case from the non-jailed
one and allows the firecracker instance to be properly
terminated.

Fixes #2387

Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
2021-08-04 16:31:56 +00:00
David Gibson
d5f85698e1 vendor: Update govmm
Update to commit 3c64244cbb, in particular to get these fixes which
are needed to work with qemu-6.0 and later:

https://github.com/kata-containers/govmm/pull/192
https://github.com/kata-containers/govmm/pull/194

Git log

d27256f (qmp: Don't use deprecated 'props' field for object-add, 2021-08-03)
d8cdf9a (qemu: Drop support for versions older than 5.0, 2021-08-03)
1b02192 (Use 'host_device' driver for blockdev backends, 2021-07-29)
9518675 (add support for "sandbox" feature to qemu, 2021-07-20)
335fa81 (qemu: fix golangci-lint errors, 2021-07-21)
61b6378 (.github/workflows: reimplement github actions CI, 2021-07-21)
9d6e797 (go: support go modules, 2021-07-21)
0d21263 (qemu: support read-only nvdimm, 2021-07-21)

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-08-04 15:04:30 +10:00
David Gibson
3165095669 runtime/qemu: Use explicit "on" for kernel_irqchip parameter
Kata uses the 'kernel_irqchip' machine option to qemu.  By default it
uses it in what qemu calls the "short-form boolean" with no parameter.
That style was deprecated by qemu between 5.2 and 6.0 (commit
ccd3b3b8112b) and effectively removed entirely between 6.0 and 6.1
(commit d8fb7d0969d5).

Update ourselves for newer qemus by using an explicit
"kernel_irqchip=on".

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-08-04 14:34:11 +10:00
Carlos Venegas
27b9a68189 Merge pull request #2365 from sameo/topic/clh-tracing
virtcontainers: clh: Do not use the default HTTP client
2021-08-03 12:54:09 -05:00
Hui Zhu
e6408fe670 Container: Add initConfigResourcesMemory and call it in newContainer
The swappiness is not right if just set
io.katacontainers.container.resource.swappiness:
$ pod_yaml=pod.yaml
$ container_yaml=container.yaml
$ image="quay.io/prometheus/busybox:latest"
$ cat << EOF > "${pod_yaml}"
metadata:
  name: busybox-sandbox1
EOF
$ cat << EOF > "${container_yaml}"
metadata:
  name: busybox-killed-vmm
annotations:
  io.katacontainers.container.resource.swappiness: "100"
image:
  image: "$image"
command:
- top
EOF
$ sudo crictl pull $image
$ podid=$(sudo crictl runp $pod_yaml)
$ cid=$(sudo crictl create $podid $container_yaml $pod_yaml)
$ sudo crictl start $cid
crictl exec $cid cat /sys/fs/cgroup/memory/memory.swappiness
60

The cause of this issue is there are two elements store the resources
infomation.  They are c.config.Resources for calculateSandboxMemory and
c.GetPatchedOCISpec() for agent.
This add initConfigResourcesMemory to Container and call it in
newContainer to handle the issue.

Fixes: #2372

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-08-02 16:02:12 +08:00