Commit Graph

64 Commits

Author SHA1 Message Date
shuochen0311
27fb490228 agent: add get volume stats handler in agent
retrieve the stats of direct-assigned volumes from the guest

Fixes: #3454

Signed-off-by: shuochen0311 <shuo.chen@databricks.com>
2022-03-03 18:57:02 -08:00
Bin Liu
f622d9491f Merge pull request #3253 from stevenhorsman/agent-config-cmdline
agent: Refactor command line parsing to use a framework
2022-01-05 20:25:57 +08:00
Fupan Li
615224e993 agent: move the protocols to upper libs
move the protocols to upper libs thus it can
be shared between agent and other rust runtime.

Depends-on: github.com/kata-containers/tests#4306

Fixes: #3348

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2022-01-05 16:58:06 +08:00
stevenhorsman
1c4edb9619 agent: Refactor arg parsing to use clap
Fixes: #3284

Co-authored-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2022-01-04 09:14:08 +00:00
James O. D. Hunt
b1f4e945b3 security: Update rust crate versions
Update the rust dependencies that have upstream security fixes. Issues
fixed by this change:

- [`RUSTSEC-2020-0002`](https://rustsec.org/advisories/RUSTSEC-2020-0002) (`prost` crate)
- [`RUSTSEC-2020-0036`](https://rustsec.org/advisories/RUSTSEC-2020-0036) (`failure` crate)
- [`RUSTSEC-2021-0073`](https://rustsec.org/advisories/RUSTSEC-2021-0073) (`prost-types` crate)
- [`RUSTSEC-2021-0119`](https://rustsec.org/advisories/RUSTSEC-2021-0119) (`nix` crate)

This change also includes:

- Minor code changes for the new version of `prometheus` for the agent.

- A *downgrade* of the version of the `futures` crate to the (new)
  latest version (`0.3.17`) since version `0.3.18` was removed [1].

Fixes: #3296.

[1] - See https://crates.io/crates/futures/versions

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-12-22 07:41:16 +00:00
James O. D. Hunt
4a2be13c60 agent: Upgrade nix version for security fix
Running `cargo audit` showed that the `nix` package for the agent and
the `rustjail` and `vsock-exporter` local crates need to be updated to
resolve rust security issue
[RUSTSEC-2021-0119](https://rustsec.org/advisories/RUSTSEC-2021-0119).
Hence, bumped `nix` to the latest version (which required changes to
work with the new, simpler `errno` handling).

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-11-30 12:58:15 +00:00
James O. D. Hunt
256d5008dc agent: Update crate versions
Run `cargo update` to update to the latest crate dependency versions.

The agent is an application so this includes expanding the partially
specified semvers to full semver values for the following crates,
which makes those crates consistent with the other agent dependencies:

- `futures`
- `regex`
- `scan_fmt`
- `tokio`

Fixes: #3124.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-11-30 12:58:15 +00:00
James O. D. Hunt
8ab90e1068 agent-ctl: Allow API specification in JSON format
Update the `agent-ctl` tool to allow API fields to be specified in JSON
format, either directly on the command-line, or via a file URI.

This feature is made possible by enabling `serde` support in the agent
`protocols` crate. Careful use of the `serde` macros allows the
`agent-ctl` tool to accept _partially_ specified API objects in JSON
format; fields that are not specified are set to the default value for
their respective types.

`build.rs` changes based on work by Fupan.

Fixes: #2978.

Contributions-by: Fupan Li <lifupan@gmail.com>
Contributions-by: Bin Liu <bin@hyper.sh>
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-11-10 10:16:04 +00:00
Manabu Sugimoto
3be50adab9 agent: Add support for Seccomp
The kata-agent supports seccomp feature based on the OCI runtime specification.
This seccomp capability in the kata-agent is enabled by default.
However, it is not enforced by default: users need to enable that by setting
`disable_guest_seccomp` to `false` in the main configuration file.

Fixes: #1476

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2021-10-27 19:06:13 +09:00
Jianyong Wu
57c0f93f54 agent: fix race condition when test watcher
create_tmpfs won't pass as the race condition in watcher umount. quote
James's words here:

1. Rust runs all tests in parallel.
2. Mounts are a process-wide, not a per-thread resource.
The only test that calls watcher.mount() is create_tmpfs().
However, other tests create BindWatcher objects.
3. BindWatcher's drop() implementation calls self.cleanup(),
which calls unmount for the mountpoint create_tmpfs() asserts.
4. The other tests are calling unmount whenever a BindWatcher goes
out of scope.

To avoid that issue, let the tests using BindWatcher in watcher and
sandbox.rs run sequentially.

Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-10-24 17:31:53 +08:00
Samuel Ortiz
0ea2e3af07 agent: config: Allow for building the configuration from a file
When the kernel command line includes a agent.config_file=<path> entry,
then we will try to override the default confiuguration values with the
ones we parse from a TOML file at <path>.

As the configuration file overrides the default values, we need to go
through a simplified builder that convert a set of Option<> fields into
the actual AgentConfig structure.

Fixes: #1837

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
2021-10-07 00:37:40 +02:00
Eric Ernst
961aaff004 agent: watcher: fixes to make more robust
inotify/watchable-mount changes...

- Allow up to 16 files. It isn't that uncommon to have 3 files in a secret.
In Kubernetes, this results in 9 files in the mount (the presented files,
which are symlinks to the latest files, which are symlinks to actual files
which are in a seperate hidden directoy on the mount). Bumping from eight to 16 will
help ensure we can support "most" secret/tokens, and is still a pretty
small number to scan...

- Now we will only replace the watched storage with a bindmount if we observe
that there are too many files or if its too large. Since the scanning/updating is racy,
we should expect that we'll occassionally run into errors (ie, a file
deleted between scan / update). Rather than stopping and making a bind
mount, continue updating, as the changes will be updated the next time
check is called for that entry (every 2 seconds today).

To facilitate the 'oversized' handling, we create specific errors for too large
or too many files, and handle these specific errors when scanning the storage entry.

- When handling an oversided mount, do not remove the prior files -- we'll just
overwrite them with the bindmount. This'll help avoid the files
disappearing from the user, avoid racy cleanup and simplifies the flow.
Similarly, only mark it as a non-watched storage device after the
bindmount is created successfully.

- When creating bind mount, make sure destination exists. If we hadn't
had a successful scan before, this wouldn't exist and the mount would
fail. Update logic and unit test to cover this.

- In several spots, we were returning when there was an error (both in
scan and update). For update case, let's just log an warning and continue;
since the scan/update is racy, we should expect that we'll have
transient errors which should resolve the next time the watcher runs.

Fixes: #2402

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-08-11 08:52:51 -07:00
Bin Liu
6b00806bb8 Merge pull request #2243 from egernst/bump-tokio
agent/agent-ctl: update tokio to 1.8.1
2021-07-20 13:56:32 +08:00
Eric Ernst
acf6932863 agent: update tokio to 1.8.1
Update to latest tokio to address RUSTSEC-2021-0072:
 Task dropped in wrong thread when aborting `LocalSet` task

Update the toml to specify just 1.x for the tokio version.

Fixes: #2165

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-07-14 17:18:21 -07:00
Tim Zhang
73d3798cb1 vsock-exporter: switch to tokio runtime
Make the vsock-exporter async totally using tokio runtime.
And delay the timing of the connection to trace-forwarder so that
it is easy to reconnect when the connection was broken.

Fixes: #2234

Signed-off-by: Tim Zhang <tim@hyper.sh>
2021-07-14 20:16:05 +08:00
Tim Zhang
7960689ef7 tracing: replace SimpleSpanProcessor with BatchSpanProcessor
This change make tokio could be use in vsock-exporter.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2021-07-14 15:59:52 +08:00
bin
aa264f915f agent: update netlink libraries
Update rtnetlink to use crate.io to make cargo vendor work.
Add vendor/ to .gitignore.

Fixes: #2111

Signed-off-by: bin <bin@hyper.sh>
2021-06-30 22:39:50 +08:00
Fabiano Fidêncio
7d37fbfdfb Merge pull request #2115 from sameo/topic/rust-nix
cargo: Use latest nix crate for all Rust code bases
2021-06-28 08:18:53 +02:00
Samuel Ortiz
f6294226e8 cargo: Use latest nix crate for all Rust code bases
Our dependencies already bring several versions of nix, we should avoid
adding even more fragementation.

Fixes #2114

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
2021-06-25 03:38:37 +02:00
Maksym Pavlenko
6a93e5d593 agent: Initial watchable-bind implementation
Add support for watchable-bind storage driver. When watchable-bind storage
is present, the agent will create a watchable path in a tmpfs, and poll the
watchable-bind source to keep this new mount-point up to date.

This poll will allow the agent to present the mount-point to the
container, allowing for inotify usage by the container workload.

If a mount becomes too large, either in file count or in overall size,
we want to stop treating it as watchable, and instead just treat as a
bindmount. This'll help avoid DoS by growing tmpfs too large, as well
as limiting time spent scanning files. If a watchable-bind grows beyond
8 files (arbitrary sane number for certs/secrets) or 1MB (limit on ConfigMap size),
we treat it as a normal bind.

Fixes: #1879

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>

agent: watcher: SandboxStorages check loop cleanup
2021-06-24 10:07:06 -07:00
Tim Zhang
799cb27234 agent: Upgrade mio to v0.7.13 to fix epoll_fd leak problem
Fixes: #2035
Fixes: tokio-rs/tokio/#3809

Signed-off-by: Tim Zhang <tim@hyper.sh>
2021-06-15 11:35:49 +08:00
Manabu Sugimoto
a1247bc0bb agent: Conform to the latest nix version (0.21.0)
We need to fix some agent's code to conform to the latest nix crate
to be able to use new features of the nix.

Fixes: #1987

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2021-06-10 16:58:51 +09:00
Tim Zhang
9e3349c18e agent: Fix fd leak caused by netlink
See also: little-dude/netlink#165

Fixes: #1952

Because the author of netlink has no time to maintain the crate
(https://github.com/little-dude/netlink/issues/161), so we
need to switch the dependency to github temporarily.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2021-06-03 17:23:37 +08:00
Chelsea Mafrica
33c12b6d08 Merge pull request #1929 from jodh-intel/add-agent-tracing
tracing: Add basic VSOCK tracing
2021-06-02 11:45:41 -07:00
James O. D. Hunt
a9a0eccf33 tracing: Add basic VSOCK tracing
Implement an openTelemetry custom exporter that sends trace spans to a
VSOCK socket. A VSOCK-to-span converter (such as the Kata trace
forwarder) needs to be running on the host to allow systems like Jaeger
to capture the trace spans.

By default, tracing is not enabled (meaning a NOP tracer is used). To
activate tracing, set the `agent.kata.enable_tracing=true` in the
configuration file.

The type of tracing this change introduces is "static isolated"
tracing. See [1] for further details.

> **Note:**
>
> This change only provides the foundational changes for agent
> tracing work. The feature is _not_ yet complete since it does
> not yet show the correct trace hierarchy.

Fixes: #60.

[1] - https://github.com/kata-containers/agent/blob/master/TRACING.md

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-06-02 18:00:05 +01:00
Tim Zhang
9bf781d704 agent: Upgrade tokio-vsock to fix fd leak of vsock socket
Fixes: #1950

The further information: rust-vsock/vsock-rs#15

Signed-off-by: Tim Zhang <tim@hyper.sh>
2021-06-02 16:03:09 +08:00
James O. D. Hunt
45f02227b2 tracing: Add trace points
Use the tracing crate to create automatic trace spans for the _majority_
of top-level modules.

Note that not all functions in the top-level modules can be traced:

- Some functions cannot be traced due to the requirement that all
  function parameters implement the `Debug` trait. In some cases (such
  as `netlink.rs`), objects are being passed that are defined in
  different crates and which do not implement `Debug`.
- Some functions may never return (`signal.rs`).
- Some functions are inlined.
- Some functions are very simple getter/setter functions.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-05-27 10:42:58 +01:00
Fabiano Fidêncio
f52468bea7 agent/agent-ctl: Replace prctl crate by the capctl one
While evaluating the possibility of having kata-agent statically linked
to the GNU libc, we've ended up facing some issues with prctl.

When debugging the issues, we figured out that the crate hasn't been
maintained since 2015 and that the capctl one is a good 1:1 replacement
for what we need.

Fixes: #1844

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-05-19 20:16:26 +02:00
Fabiano Fidêncio
8aefc79314 agent: Perform a cargo update
While in the beginning of the development cycle, let's perform a `cargo
update`.

Fixes: #1883

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-05-19 09:43:17 +02:00
GabyCT
aac852a0bc Merge pull request #1561 from Jakob-Naucke/s390x-statfs-constants
agent: s390x statfs constants
2021-04-06 11:11:40 -05:00
Jakob Naucke
5b7c8b7d26 agent: Update cgroups-rs to 0.2.5
to pull in the chain of https://github.com/rust-lang/libc/pull/1999,
https://github.com/nix-rust/nix/pull/1372, and
https://github.com/kata-containers/cgroups-rs/pull/38. This adds statfs
constants on s390x. cgroups-rs 0.2.4 also contains this fix, but let's
move to the latest 0.2.5 right away.

Fixes: #1204

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-03-29 17:25:14 +02:00
James O. D. Hunt
7d5f88c0ad agent: Enable clean shutdown
The agent doesn't normally shutdown: it doesn't need to be as it is
killed *after* the workload has finished. However, a clean and ordered
shutdown sequence is required to support agent tracing, since all trace
spans need to be completed to ensure a valid trace transaction.

Enable a controlled shutdown by allowing the main threads (tasks) to be
stopped.

To allow this to happen, each thread is now passed a shutdown channel
which it must listen to asynchronously, and shut down the thread if
activity is detected on that channel.

Since some threads are created for I/O and since the standard `io::copy`
cannot be stopped, added a new `interruptable_io_copier()` function
which shares the same semantics as `io::copy()`, but which is also
passed a shutdown channel to allow asynchronous I/O operations to be
stopped cleanly.

Fixes: #1531.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-03-29 14:32:12 +01:00
David Gibson
d5a9d56e79 agent: Update Cargo.lock for earlier dependency change
Commit 81607e34 updated src/agent/rustjail/Cargo.toml, to remove an
unneeded dependency.  That causes cargo to update src/agent/Cargo.lock
on each build.  However, the change to Cargo.lock wasn't checked in
meaning anyone working on the agent code will get bogus diffs with every
build.  Check in the missing file to fix this.

fixes #1505

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-03-12 16:51:30 +11:00
Tim Zhang
02079dbb4f agent: upgrade tokio to 1.0
Fixes: #1257

Signed-off-by: Tim Zhang <tim@hyper.sh>
2021-02-25 14:38:32 +08:00
Manabu Sugimoto
e1dce3a369 rustjail: use rlimit crate
The current implementation of rustjail uses the specific setrlimit.
This patch uses rlimit crate for maintainability.

Fixes: #1372

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2021-02-08 18:43:56 +09:00
Manabu Sugimoto
a252d861e3 rustjail: get all capabilities dynamically
The runtime determines the kernel capability set at runtime.

Fixes: #1370

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2021-02-07 16:39:14 +09:00
Tim Zhang
b25575b430 agent: remove crate signal-hook which are no longer used
Had replaced by tokio::signal.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2021-02-03 18:30:18 +08:00
fupan.lfp
448771f53d rustjail: fix the issue of container's cgroup root path
We should create the container's cgroup under the system's
cgroup default path such as "/sys/fs/cgroup/<sub system>",
instead of under the kata-agnet's process's cgroup path,
which would under the systemd's cgroup such as
"/sys/fs/cgroup/systemd/system.slice/kata-agent.service"

Fixes: #1319

Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
2021-01-27 15:38:45 +08:00
Maksym Pavlenko
96762ab7ab agent: Remove old netlink crate
Cleans up unused code.

Fixes: #1294

Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2021-01-19 09:58:50 -08:00
Maksym Pavlenko
23f3aefa1d agent: Implement new netlink module
This PR adds new netlink module (based on `rtnetlink` crate), so we don’t have to
write a low level code to interact with netlink sockets, but use a high level API.

As a side effect, `rtnetlink` crate got full IPv6 support, so it fixes #1171

Fixes: #1294

Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2021-01-19 09:44:50 -08:00
Tim Zhang
9f79ddb9df agent: use tokio Notify instead of epoll to fix #1160
Fixes: #1160

Signed-off-by: Tim Zhang <tim@hyper.sh>
2021-01-18 15:38:19 +08:00
Tim Zhang
332fa4c65f agent: switch to async runtime
Fixes: #1209

Signed-off-by: Tim Zhang <tim@hyper.sh>
2021-01-18 15:38:15 +08:00
Maksym Pavlenko
5561755e3c agent: Initial switch to async runtime
This commit includes minimal changes in order to switch to Tokio:
- Update protocol crate to generate async server code
- Adds async entry point to the Agent
- Updates agent services signatures in rpc.rs

Fixes: #1209

Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2021-01-11 16:50:53 +08:00
Tim Zhang
91c6ba74fa Merge pull request #1225 from Tim-Zhang/update-cgroup-to-0.2.0
agent: upgrade cgroups to 0.2.0
2021-01-05 19:50:05 +08:00
Tim Zhang
157e055fdd agent: upgrade crate cgroups to 0.2.0
Fixes: #1224

35ecd6f (origin/change-name, change-name) Update readme
eb6577e Change package name to cgroups-rs
8f6a7e0 Merge pull request #19 from Tim-Zhang/0.2.0
9baa065 (origin/0.2.0, 0.2.0) release: v0.2.0
e160df0 Make read_i64_from private and merge read_str_from to its caller
e1e05d3 Make new_with_relative_paths=new and load_with_relative_paths=new in v2
a89f4a0 Support set notify_on_release & release_agent
61a0957 Fix set_swappiness in cgroup v2
0592045 Ignore kmem in cgroup v2
c254fff Update readme
438d774 Fix test
42ee1ba Make Cgroup can be stored in struct
b6bb5ae docs: Hide Re-exports
d2882b1 Print cause when println!("{}")
abcb5ed Add more logs for create_dir error in controller.create
1f188be Detect subsystems and get root from /proc/self/mountinfo
fbd7164 Fix warnings in tests
f342254 Remove Box wrap of Cgroup.hire
cd998f3 Do not place cgroup under relative path read from cgroup by default
1ac76b6 Make function find_v1_mount pub
121f78d Expose deletion error
0f76570 Avoid exception caused by cgroup writeback feature
10650e2 Update tests to adapt new type of fields in resource
567cdb4 Use Option as resource fields, remove the update switch: update_values
0c18b08 Support customized attributes for CpuController and MemController
ca610bb add add_task_by_tgid

Signed-off-by: Tim Zhang <tim@hyper.sh>
2021-01-05 11:35:34 +08:00
Liu Jiang
406a91ffdd agent: consume ttrpc crate from crates.io
The ttrpc v0.3.0 has been published to crates.io, so consume from
crates.io.

Fixes: #1213

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
2020-12-22 09:46:41 +08:00
bin liu
1ca415d87e agent: exit from exec hangs if background process is present
This is the Rust porting of https://github.com/kata-containers/agent/pull/371

`read_stdout`/`read_stderr` is blocking rpc calls, if exec process
exited, these calls is on blocking state for reading on process's
term master fd, and can't get a chance to break the wait.

In this PR, `read_stdout`/`read_stderr` will not read directly from
a term master of a process, instead, it will first have to get
an fd to read from newly added `epoller.poll()`. `epoller.poll()` may returns:

- the term master fd of exec process, if the process is running.
- a fd(piped fd) will return EOF when reading to indicate that th process is exited.

Fixes: #1160

Signed-off-by: bin liu <bin@hyper.sh>
2020-12-07 10:52:44 +08:00
James O. D. Hunt
8907a33907 agent: Only show ttrpc logs for trace log level
Only display the `ttrpc` crate log output when full logging
(trace level) is enabled.

This is a slight abuse of log levels but provides developers and testers
what they need whilst also keeping the logs relatively quiet for the
default info log level (the `ttrpc` crate logging is a bit "chatty").

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-11-06 10:45:05 +00:00
James O. D. Hunt
21cd7ad172 agent: Log ttrpc messages
The `ttrpc` crate uses the `log` crate for logging. But the agent uses
the `slog` crate. This means that currently, all `ttrpc` log messages
are being discarded.

Use the `slog-stdlog` create to redirect `log` crate logging calls into
`slog` so they are visible in the agents log output.

Fixes: #978.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-11-06 10:05:02 +00:00
Tim Zhang
fdc33fb7bf agent/protocols: Generate proto files programmatically
Build proto with build.rs

Fixes: #1019

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-22 16:12:15 +08:00