Commit Graph

2192 Commits

Author SHA1 Message Date
David Gibson
ed08980fc1 agent: Remove many "panic message is not string literal" warnings
Rust 1.51 appears to have added a new warning in anticipation of Rust 2021,
which requires the format string for panic!()s (including via the various
assert!() macros) to be a string literal.  This triggers quite a few times
in the agent code.  This patch fixes them.

fixes #1626

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-04-06 11:51:34 +10:00
Snir Sheriber
13653e7b55 runtime: increase dial timeout
On some setups, starting multiple kata pods (qemu) simultaneously on the same node
might cause kata VMs booting time to increase and the pods to fail with:
Failed to check if grpc server is working: rpc error: code = DeadlineExceeded desc = timed
out connecting to vsock 1358662990:1024: unknown

Increasing default dialing timeout to 30s should cover most cases.

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
Fixes: #1543
2021-04-04 09:37:38 +03:00
Chelsea Mafrica
17b1452c2a Merge pull request #1607 from fidencio/wip/only-keep-one-VERSION-file
Only keep one VERSION file
2021-04-02 11:14:12 -07:00
Bo Chen
1511d966aa Merge pull request #1616 from egernst/dechat-deruntime
Dechat deruntime
2021-04-01 11:02:27 -07:00
Chelsea Mafrica
4a3282cf1a Merge pull request #1608 from likebreath/0331/go_fmt_clh_clinet_code
runtime: Format auto-generated client code for cloud-hypervisor API
2021-04-01 10:39:02 -07:00
Eric Ernst
a4c125a8b9 trace: move gRPC requests from debug to trace
There are many requests to the agent that happen with relatively
high frequency when a workload is running (checkRequest, as an example).

Let's move from Debug to Trace to avoid bombarding journal.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2021-04-01 09:03:26 -07:00
Eric Ernst
50fff97753 trace: move trace span chatter to trace rather than info
No human should ever read that ouptut. Let's at least move it to trace for now.

Fixes: #1615

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2021-04-01 09:02:56 -07:00
Fupan Li
5524bc806b Merge pull request #1612 from liubin/1610/use-concrete-kata-agent-config-type
runtime: use concrete KataAgentConfig instead of interface type
2021-04-01 21:26:38 +08:00
bin
6fe48329b5 runtime: use concrete KataAgentConfig instead of interface type
Kata Containers 2.0 only have one type of agent, so there is no
need to use interface as config's type

Fixes: #1610

Signed-off-by: bin <bin@hyper.sh>
2021-04-01 13:44:45 +08:00
fupan.lfp
6493942568 mount: fix the issue of missing set fsGroup
For k8s emptyDir volume, a specific fsGroup would
be set for it, thus guest should get this fsGroup
from runtime and set it properly on the emptyDir volume
in guest.

Fixes: #1580

Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
2021-04-01 11:33:26 +08:00
fupan.lfp
88e58a4f4b agent: fix the issue of missing pass fsGroup
For k8s emptyDir volume, a specific fsGroup would
be set for it, thus runtime should pass this fsGroup
to guest and set it properly on the emptyDir volume
in guest.

Fixes: #1580

Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
2021-04-01 11:33:18 +08:00
Fabiano Fidêncio
572aff53e8 build: Only keep one VERSION file
Instead of having different VERSION files spread accross the project,
let's always use the one in the topsrcdir and remove all the others,
keeping only a synlink to the topsrcdir one.

Fixes: #1579

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-03-31 23:51:20 +02:00
Bo Chen
0c38d9ecc4 runtime: Fix the format of the client code of cloud-hypervisor APIs
Regenerate the client code with the added `go-fmt` step. No functional
changes.

Fixes: #1606

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-03-31 14:41:44 -07:00
Bo Chen
52cacf8838 runtime: Format auto-generated client code for cloud-hypervisor API
This patch extends the current process of generating client code for
cloud-hypervisor API with an additional step, `go-fmt`, which will remove
the generated `client/go.mod` file and format all auto-generated code.

Fixes: #1606

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-03-31 14:36:24 -07:00
Eric Ernst
c0c7bef2b8 Merge pull request #1592 from likebreath/0330/versions_clh_v0.14.0
versions: Update cloud-hypervisor to release v0.14.1
2021-03-31 12:39:35 -07:00
Fabiano Fidêncio
a3d8554ab9 Merge pull request #1577 from liubin/feature/1576-import-runc-v2-options-types
runtime: import runtime/v2/runc/options to decode request from Docker
2021-03-31 20:35:24 +02:00
Bo Chen
84b62dc3b1 versions: Update cloud-hypervisor to release v0.14.1
Highlights for cloud-hypervisor version 0.14.0 include: 1) Structured
event monitoring; 2) MSHV improvements; 3) Improved aarch64 platform; 4)
Updated hotplug documentation; 6) PTY control for serial and
virtio-console; 7) Block device rate limiting; 8) Plan to deprecate the
support of "LinuxBoot" protocol and support PVH protocol only.

Highlights for cloud-hypervisor version 0.13.0 include: 1) Wider VFIO
device support; 2) Improve huge page support; 3) MACvTAP support; 4) VHD
disk image support; 5) Improved Virtio device threading; 6) Clean
shutdown support via synthetic power button.

Details can be found:
https://github.com/cloud-hypervisor/cloud-hypervisor/releases

Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by `openapi-generator` [1-2]. As the API changes do not
impact usages in Kata, no additional changes in kata's runtime are
needed to work with the latest version of cloud-hypervisor.

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Fixes: #1591

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-03-31 11:09:47 -07:00
Orestis Lagkas Nikolos
6255cc1959 virtcontainers/fc: Upgrade Firecracker to v0.23.1
This patch upgrades Firecracker version from v0.21.1 to v0.23.1

* Generate swagger models for v0.23.1 (from firecracker.yaml)
* Change uint64 types in TokenBucket object according to rate-limiter
implementation (introduced in commit #cfeb966)
* Update Firecracker Logger/Metrics to support the new API
* Update payload in fc.vmRunning to support the new API
* Add Metrics type to fcConfig

Fixes: #1518

Signed-off-by: Orestis Lagkas Nikolos <olagkasn@nubificus.co.uk>
2021-03-31 04:55:40 -05:00
Chelsea Mafrica
e5aa4e7eb4 Merge pull request #1563 from Jakob-Naucke/s390x-missing-contexts
virtcontainers: Fix missing contexts in s390x
2021-03-30 09:38:28 -07:00
Carlos Venegas
c748a9c278 Merge pull request #1549 from jcvenegas/2021-03-24/makefile-enable-dax-env-var
runtime: makefile allow override DAX value
2021-03-30 10:06:16 -06:00
bin
09d454ac74 runtime: import runtime/v2/runc/options to decode request from Docker
Shimv2 protocol CreateTaskRequest.Options has a type of *google_protobuf.Any.
If the call is from Docker, to decode the request,
the proto types(github.com/containerd/containerd/runtime/v2/runc/options)
should be imported.

Fixes: #1576

Signed-off-by: bin <bin@hyper.sh>
2021-03-30 19:44:00 +08:00
Tim Zhang
b58fb25d88 Merge pull request #1555 from liubin/fix/1554-install-hook-before-test
test: install mock hook binary before test
2021-03-30 14:01:56 +08:00
Eric Ernst
05680b86c4 Merge pull request #1537 from lifupan/main
cgroups: fix the issue of get wrong online cpus
2021-03-29 15:56:03 -07:00
Eric Ernst
460117a1a6 Merge pull request #1510 from littlejawa/issue_1003
build: remove unused variables from Makefile
2021-03-29 14:54:09 -07:00
Carlos Venegas
0b502d15b2 runtime: makefile allow override DAX value
Allow enable DAX using env variable

Fixes: #1547

Signed-off-by: Carlos Venegas <jos.c.venegas.munoz@intel.com>
2021-03-29 21:28:22 +00:00
Eric Ernst
24214a536a Merge pull request #1560 from egernst/fix-1559
container: on cleanup, rm container directory for mounts path
2021-03-29 14:14:52 -07:00
GabyCT
17840cb573 Merge pull request #1546 from devimc/2021-03-24/supportQEMU6
runtime: add support for QEMU 6
2021-03-29 14:33:16 -06:00
Eric Ernst
9a4e866654 container: on cleanup, rm container directory for mounts path
A wrong path was being used for container directory when
virtiofs is utilized. This resulted in a warning message in
logs when a container is killed, or completes:

level=warning msg="Could not remove container share dir"

Without proper removal, they'd later be cleaned up when the shared
path is removed as part of stopping the sandbox.

Fixes: #1559

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2021-03-29 11:39:39 -07:00
Jakob Naucke
1366f0fb9c cli: Use genericGetExpectedHostDetails on s390x
getExpectedHostDetails did not offload any work to
genericGetExpectedHostDetails on s390x. By using that function, much
redundant code can be saved. This also resolves 2 issues with the
previous version:

- The number of CPUs was not calculated.
- vcUtils.SupportsVsocks() still used the Kata v1 signature.

Fixes: #1564

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-03-29 17:58:16 +02:00
Jakob Naucke
31ced01eba virtcontainers: Fix missing contexts in s390x
#1389 has added a context for many signatures to improve trace spans.
Functions specific to s390x lack this. Add context where required. This
affects some common code signatures, since some functions that do not
require context on other architectures do require it on s390x.
Also remove an unnecessary import in test_qemu_s390x.go.

Fixes: #1562

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-03-29 17:49:27 +02:00
Jakob Naucke
52a276fbdb agent: Fix type for PROC_SUPER_MAGIC on s390x
statfs f_types are long on most architectures, but not on s390x, where
they are uint. Following the fix in rust-lang/libc at
https://github.com/rust-lang/libc/pull/1999, the custom defined
PROC_SUPER_MAGIC must be updated in a similar way.

Fixes: #1204

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-03-29 17:25:19 +02:00
Jakob Naucke
5b7c8b7d26 agent: Update cgroups-rs to 0.2.5
to pull in the chain of https://github.com/rust-lang/libc/pull/1999,
https://github.com/nix-rust/nix/pull/1372, and
https://github.com/kata-containers/cgroups-rs/pull/38. This adds statfs
constants on s390x. cgroups-rs 0.2.4 also contains this fix, but let's
move to the latest 0.2.5 right away.

Fixes: #1204

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-03-29 17:25:14 +02:00
bin
48e5e4f2f3 test: install mock hook binary before test
`make test` depends mock hook in virtcontainers directory,
before test, install it first.

And also run test as normal user and root in GitHub actions.

Fixes: #1554

Signed-off-by: bin <bin@hyper.sh>
2021-03-29 22:40:45 +08:00
James O. D. Hunt
1d448813a1 uevent: Add shutdown channel for task
Allow the uevent task to shutdown on request.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-03-29 14:32:12 +01:00
James O. D. Hunt
d8d5b4cd1d signal: Move to a new module
Move the signal handling code into a new module and refactor into the
main handler and a new SIGCHLD handling function to make the code
simpler and easier to understand.

Also added a unit test for shutdown.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-03-29 14:32:12 +01:00
James O. D. Hunt
011f7d785a logging: Rework for shutdown
Make changes to logger thread to allow the logger to be replaced with
a NOP logger (required for agent shutdown).

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-03-29 14:32:12 +01:00
James O. D. Hunt
7d5f88c0ad agent: Enable clean shutdown
The agent doesn't normally shutdown: it doesn't need to be as it is
killed *after* the workload has finished. However, a clean and ordered
shutdown sequence is required to support agent tracing, since all trace
spans need to be completed to ensure a valid trace transaction.

Enable a controlled shutdown by allowing the main threads (tasks) to be
stopped.

To allow this to happen, each thread is now passed a shutdown channel
which it must listen to asynchronously, and shut down the thread if
activity is detected on that channel.

Since some threads are created for I/O and since the standard `io::copy`
cannot be stopped, added a new `interruptable_io_copier()` function
which shares the same semantics as `io::copy()`, but which is also
passed a shutdown channel to allow asynchronous I/O operations to be
stopped cleanly.

Fixes: #1531.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-03-29 14:32:12 +01:00
James O. D. Hunt
dcb39c61f1 main: Create logger task
Encapsulate the logic for handling the task that displays logger output
into a new function to simplify the code and remove another anonymous
async block.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-03-29 14:32:11 +01:00
James O. D. Hunt
2cf2897d31 main: Use task list for stopping tasks
Maintain a list of tasks and wait on them all before main returns.

This is preparatory work for the agent shutdown: all tasks that are
started need to be added to the list. This aggregation makes it easier
to identify what needs to stop before the agent can exit cleanly.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-03-29 14:32:11 +01:00
James O. D. Hunt
039df1d727 main: Refactor main logic into new async function
Move most of the main logic into a separate async function. This makes
the code clearer and avoids the anonymous async block.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-03-29 14:32:11 +01:00
James O. D. Hunt
2a648fa760 logging: Use guard to make threaded logging safe
Return a guard variable from `create_logger()` which the caller can
implicitly drop to guarantee that all threads started by the async log
drain are stopped.

This fixes a long-standing bug [1] whereby the agent could panic with
the following error, generated by the `slog` logging crate:

```
slog::Fuse Drain: Custom { kind: Other, error: "serde serialization error: Bad file descriptor (os error 9)" }
```

[1] - See https://github.com/kata-containers/kata-containers/issues/171.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-03-29 14:32:11 +01:00
James O. D. Hunt
38f0d8d3ce config: Fix assert_error testing macro
Fixed the `assert_error!()` test macro so that it correctly handles the
scenario where the test expects an error, but the actual result was `Ok`
(no error).

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-03-29 14:32:11 +01:00
Bin Liu
594c47ab6c Merge pull request #1553 from bergwolf/ro-volumes
runtime: fix virtiofsd RO volume sharing
2021-03-29 20:43:34 +08:00
fupan.lfp
3f46e6379d cgroups: fix the issue of getting wrong online cpus
It's better to get the online cpus from
"/sys/devices/system/cpu/online" instead of from
cpuset cgroup, cause there would be an latency
between one cpu online and present in the root
cpuset cgroup.

Fixes: #1536

Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
2021-03-29 15:49:15 +08:00
Peng Tao
e34924488b runtime: fix virtiofsd RO volume sharing
Right now we rely heavily on mount propagation to share host
files/directories to the guest. However, because virtiofsd
pivots and moves itself to a separate mount namespace, the remount
mount is not present in virtiofsd's mount. And it causes guest to be
able to write to the host RO volume.

To fix it, create a private RO mount and then move it to the host mounts
dir so that it will be present readonly in the host-guest shared dir.

Fixes: #1552
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-03-29 13:54:25 +08:00
bin
532ff7c909 runtime: update virtcontainers API documentation
Virtcontainers API documentation is outdated, update documentation from the latest
source.

Fixes: #1455

Signed-off-by: bin <bin@hyper.sh>
2021-03-29 11:50:53 +08:00
Chelsea Mafrica
6fcfea8dcf runtime: Fix static check errors
Fix comment formatting and unused variable to make static checks pass.

Fixes #1550

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-03-25 12:59:01 -07:00
Chelsea Mafrica
f3ebbb1f1a runtime: Fix trace span ordering
Return ctx in trace() functions to correct span ordering.

Fixes #1550

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-03-25 11:43:04 -07:00
Bin Liu
5b5b5cc611 Merge pull request #1539 from bergwolf/ut
fix runtime UTs and enable static check
2021-03-25 16:29:45 +08:00
Julio Montes
1555bfd8b5 runtime: add support for QEMU 6
Use `on` and `off` to enable or disable features,
`no` prefix is deprecated

fixes #1545

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-03-24 10:55:35 -06:00