Move the signal handling code into a new module and refactor into the
main handler and a new SIGCHLD handling function to make the code
simpler and easier to understand.
Also added a unit test for shutdown.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Make changes to logger thread to allow the logger to be replaced with
a NOP logger (required for agent shutdown).
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
The agent doesn't normally shutdown: it doesn't need to be as it is
killed *after* the workload has finished. However, a clean and ordered
shutdown sequence is required to support agent tracing, since all trace
spans need to be completed to ensure a valid trace transaction.
Enable a controlled shutdown by allowing the main threads (tasks) to be
stopped.
To allow this to happen, each thread is now passed a shutdown channel
which it must listen to asynchronously, and shut down the thread if
activity is detected on that channel.
Since some threads are created for I/O and since the standard `io::copy`
cannot be stopped, added a new `interruptable_io_copier()` function
which shares the same semantics as `io::copy()`, but which is also
passed a shutdown channel to allow asynchronous I/O operations to be
stopped cleanly.
Fixes: #1531.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Encapsulate the logic for handling the task that displays logger output
into a new function to simplify the code and remove another anonymous
async block.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Maintain a list of tasks and wait on them all before main returns.
This is preparatory work for the agent shutdown: all tasks that are
started need to be added to the list. This aggregation makes it easier
to identify what needs to stop before the agent can exit cleanly.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Move most of the main logic into a separate async function. This makes
the code clearer and avoids the anonymous async block.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Return a guard variable from `create_logger()` which the caller can
implicitly drop to guarantee that all threads started by the async log
drain are stopped.
This fixes a long-standing bug [1] whereby the agent could panic with
the following error, generated by the `slog` logging crate:
```
slog::Fuse Drain: Custom { kind: Other, error: "serde serialization error: Bad file descriptor (os error 9)" }
```
[1] - See https://github.com/kata-containers/kata-containers/issues/171.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Fixed the `assert_error!()` test macro so that it correctly handles the
scenario where the test expects an error, but the actual result was `Ok`
(no error).
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
It's better to get the online cpus from
"/sys/devices/system/cpu/online" instead of from
cpuset cgroup, cause there would be an latency
between one cpu online and present in the root
cpuset cgroup.
Fixes: #1536
Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
Make use of the `const` values for error messages that were previously
only used for the unit tests. This guarantees consistency.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Validate the container ID as we cannot / should not rely on the
container manager / runtime to do this.
Fixes: #1520.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
To make debugging and testing easier, allow the ttRPC server address to
be specified via `/proc/cmdline` as `agent.server_addr=`.
Fixes: #1516.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Some variable are initialized in the Makefile, but never used.
Removing them to clean up the Makefile.
Fixes: #1003
Signed-off-by: Julien Ropé <jrope@redhat.com>
Commit 81607e34 updated src/agent/rustjail/Cargo.toml, to remove an
unneeded dependency. That causes cargo to update src/agent/Cargo.lock
on each build. However, the change to Cargo.lock wasn't checked in
meaning anyone working on the agent code will get bogus diffs with every
build. Check in the missing file to fix this.
fixes#1505
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since the crate dirs::home_dir function depends on the
libc's api: getpwuid_r, but this api function wouldn't
be static linked on glibc, thus we'd better to figure
out an alternative way to get the home dir from /etc/passwd.
For much more info about this glibc's issue, please see:
https://sourceware.org/bugzilla/show_bug.cgi?id=19341.
This commit read and parse the "/etc/passwd" directly and
fetch the corresponding uid's home dir.
Fixes: #675
Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
Add target to run codecov report locally.
Useful to identify what are the missing lines
to be covered by unit test.
Fixes: #1487
Signed-off-by: Carlos Venegas <jos.c.venegas.munoz@intel.com>
When do pass guest device files to container, the source
file wouldn't be a regular file, but we also need to create
a corresponding destination file to bind mount source file
to it. Thus it's better to check whether the source file
was a directory instead of regular file.
Fixes: #1477
Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
Agent sends -1 PID when invoking OCI hooks.
OCI state struct is initialized before obtaining PID, so this PR moves
`oci_state` call down, right after we get the id.
Fixes: #1458
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
Port kata-containers/agent#883 to the Rust Agent.
In the event that the virtiofs device is already mounted at the
requested destination, don't error out. We'll check before attempting to
mount to see if the destination is already a mount point. If so, skip
doing the mount in the agent.
This facilitates mounting the sharedfs automatically in the guest before
the agent service starts.
Signed-off-by: Eric Ernst eric.g.ernst@gmail.comFixes: #1398
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
If the container has exited, the sender in notifier watching OOM events
will be dropped after the loop exited, and recv() from the according
receiver will get None.
This will lead two problems for get_oom_event rpc all from agent:
- return an wrong OOM event.
- continuously return OOM events.
Fixes: #1369
Signed-off-by: bin <bin@hyper.sh>
Currently pcipath_to_sysfs() generates the path to the root bus node in
sysfs via create_pci_root_bus_path(). This is inconvenient for testing,
though, so instead make it take this as a parameter and generate the path
in the (single) caller. As a bonus this will make life a bit easier when
we want to support machines with multiple PCI roots.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
pcipath_to_sysfs takes a PCI path, with a particular format. A number of
places implicitly need strings in that format, many of them repeat the
description. To make things safer and briefer use the pci::Path type for
the purpose more widely, and just describe the string formatting of it at
the type definition.
Then, update variable names and comments throughout to call things in
this format "PCI path", rather than "PCI identifier", which is vague,
or "PCI address" which is just plain wrong. Likewise we change names and
comments which incorrectly refer to sysfs paths as a "PCI address".
This changes the grpc proto definitions, but because it's just
changing the name of a field without changing the field number, it
shouldn't change the actual protocol.
A loose forward port of
da4bc1d184
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>