Commit Graph

477 Commits

Author SHA1 Message Date
Eric Ernst
324b026a77 Merge pull request #1604 from wainersm/agent_mount-1
agent: log the mount point if it is already mounted
2021-04-08 08:26:12 -07:00
Tim Zhang
24b0703fda agent: fix test for the debug console
Fix test for the debug console.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2021-04-08 14:57:40 +08:00
Tim Zhang
790332575b agent: async the debug console
Make the debug console in this commit.
Finish the rework of debug console.

Fixes: #1647

Signed-off-by: Tim Zhang <tim@hyper.sh>
2021-04-08 14:57:36 +08:00
James O. D. Hunt
9017e1100b agent: start to rework the debug console
It's the first commit of the rework.

Fixes: #1647

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-04-08 09:57:48 +08:00
Eric Ernst
15c2d7ed30 Merge pull request #1400 from ManaSugi/update-oci-seccomp
oci: Update seccomp configuration
2021-04-07 15:18:19 -07:00
GabyCT
d922070c50 Merge pull request #1644 from lifupan/fix_env
rustjail: fix the issue of missing default home env
2021-04-07 10:16:07 -05:00
GabyCT
81bcded9a3 Merge pull request #1492 from dgibson/uevent
Make uevent watching mechanism more flexible
2021-04-07 10:15:33 -05:00
fupan.lfp
a938d90310 rustjail: fix the issue of missing default home env
first get the "HOME" env from "/etc/passwd", if
there's no corresponding uid entry in /etc/passwd,
then set "/" as the home env.

Fixes: #1643

Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
2021-04-07 15:11:28 +08:00
Wainer dos Santos Moschetta
49eec92038 agent: log the tag and mount point if it is already mounted
On commit 17e9a2cff5 it was introduced a guard for the case the mount point is already
mounted. Instead of log only the mount tag ("kataShared") with this change it will print
both tag and mount point path.

Fixes: #1398
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2021-04-06 14:14:59 -04:00
GabyCT
aac852a0bc Merge pull request #1561 from Jakob-Naucke/s390x-statfs-constants
agent: s390x statfs constants
2021-04-06 11:11:40 -05:00
David Gibson
0828f9ba70 agent/uevent: Introduce wait_for_uevent() helper
get_device_name() contains logic to wait for a specific uevent, then
extract the /dev node name from it.  In future we're going to want similar
logic to wait on uevents, but using different match criteria, or getting
different information out.

To simplify this, add a wait_for_uevent() helper in the uevent module,
which takes an explicit UeventMatcher object and returns the whole uevent
found.

To make testing easier, we also extract the cut down uevent watcher from
test_get_device_name() into a new spawn_test_watcher() helper.  Its used
for both test_get_device_name() and a new test_wait_for_uevent() amd will
be useful for more tests in future.

fixes #1484

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-04-06 21:14:52 +10:00
David Gibson
16ed55e440 agent/device: Use consistent matching for past and future uevents
get_device_name() looks at kernel uevents to work out the device name for
a given PCI (usually) address.  However, when we call it we can't know if
the uevent we're interested in has already happened (in which case it will
have been recorded in Sandbox::uevent_map) or yet to come, in which case
we need to register to watch it.

However, we currently match differently against past and future events.
For past events we simply look for a sysfs path including the address, but
for future events we use a complex bit of logic in the is_match() closure.
Change it to use the exact same matching logic in both cases.

fixes #1397

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-04-06 21:14:33 +10:00
David Gibson
4b16681d87 agent/uevent: Put matcher object rather than "device address" in watch list
Currently, Sandbox::uevent_watchers lists uevents to watch for by a
"device address" string.  This is not very clearly defined, and is
matched against events with a rather complex closure created in
Uevent::process_add().

That closure makes a bunch of fragile assumptions about what sort of
events we could ever be interested in.  In some ways it is too
restrictive (requires everything to be a block device), but in others
is not restrictive enough (allows things matching NVDIMM paths, even
if we're looking for a PCI block device).

To allow the clients more precise control over uevent matching, we
define a new UeventMatcher trait with a method to match uevents.  We
then have the atchers list include UeventMatcher trait objects which
are used directly by Uevent::process_add(), instead of constructing
our match directly from dev_addr.

For now we don't actually change the matching function, or even use
multiple different trait implementations, but we'll refine that in
future.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-04-06 21:14:18 +10:00
David Gibson
b8b322482c agent/uevent: Consolidate event matching logic
The event matching logic in Uevent::process_add() is split into two parts.
The first checks if we care about the event at all, the second checks
whether the event is relevant to a particular watcher.

However, we're going to be adding more types of watchers in future, which
will make the global filter too restrictive.  Fold the two bits of logic
together into a per-watcher filter function.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-04-06 20:59:43 +10:00
David Gibson
d2caff6c55 agent: Re-organize uevent processing
Uevent::process() is a bit oddly organized.  It treats the onlining of
hotplugged memory as the "default" case, although that's quite specific,
while treating the handling of hotplugged block devices more like a special
case, although that's pretty close to being very general.

Furthermore splitting Uevent::is_block_add_event() from
Uevent::handle_block_add_event() doesn't make a lot of sense, since their
logic is intimately related to each other.

Alter the code to be a bit more sensible: first split on the "action" type
since that's the most fundamental difference, then handle the memory
onlining special case, then the block device add (which will become a lot
more general in future changes).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-04-06 20:59:20 +10:00
David Gibson
55ed2ddd07 agent: Store uevent watchers in Vec rather than HashMap
Sandbox:dev_watcher is a HashMap from a "device address" to a channel used
to notify get_device_name() that a suitable uevent has been found.
However, "device address" isn't well defined, having somewhat different
meanings for different device/event types.  We never actually look up this
HashMap by key, except to remove entries.

Not looking up by key suggests that a map is not the appropriate data
structure here.  Furthermore, HashMap imposes limitations on the types
which will prevent some future extensions we want.

So, replace the HashMap with a Vec<Option<>>.  We need the Option<> so that
we can remove entries by index (removing them from the Vec completely would
hange the indices of other entries, possibly breaking concurrent work.

This does mean that the vector will keep growing as we watch for different
events during startup.  However, we don't expect the number of device
events we watch for during a run to be very large, so that shouldn't be
a problem.  We can optimize this later if it becomes a problem.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-04-06 20:59:19 +10:00
David Gibson
91e0ef5c90 agent/uevent: Report whole Uevents to device watchers
Currently, when Uevent::handle_block_add_event() receives an event matching
a registered watcher, it reports the /dev node name from the event back
to the watcher.

This changes it to report the entire uevent, not just the /dev node name.
This will allow various future extensions.  It also makes the client side
of the uevent watching - get_device_name() - more consistent between its
two paths: finding a past uevent in Sandbox::uevent_map() or waiting for
a new uevent via a watcher.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-04-06 20:58:47 +10:00
David Gibson
3642005479 agent: Store whole Uevent in map, rather than just /dev name
Sandbox::pci_device_map contains a mapping from sysfs paths to /dev entries
which is used by get_device_name() to look up the right /dev node.  But,
the map only supplies the answer if the uevent for the device has already
been received, otherwise get_device_name() has to wait for it.

However the matching for already-received and yet-to-come uevents isn't
quite the same which makes the whole system fragile.

In order to make sure the matching for both cases is identical, we need the
already-received side to store the whole uevent to match against, not just
the sysfs path and device name.

So, rename pci_device_map to uevent_map and store the whole uevent there
verbatim.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-04-06 20:58:47 +10:00
David Gibson
0616202580 agent/device: Move GLOBAL_DEVICE_WATCHER into Sandbox
In Kata 1.x, both the sysToDevMap and the deviceWatchers are in the sandbox
structure.  For some reason in Kata 2.x, the device watchers have moved to
a separate global variable, GLOBAL_DEVICE_WATCHER.

This is a bad idea: apart from introducing an extra global variable
unnecessarily, it means that Sandbox::pci_device_map and
GLOBAL_DEVICE_WATCHER are protected by separate mutexes.  Since the
information in these two structures has to be kept in sync with each other,
it makes much more sense to keep them both under the same single Sandbox
mutex.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-04-06 20:58:45 +10:00
David Gibson
11ae32e3c0 agent/device: Fix path matching for PCI devices
For the case of virtio-blk PCI devices, when matching uevents we create
a pci_p temporary.  However, we build it incorrectly: the dev_addr values
we use for PCI devices are a relative sysfs paths from the PCI root to the
device in question *including an initial /*.  But when we construct pci_p
we add an extra /, meaning the resulting path will *not* match properly.

AFAICT the only reason we got away with this is because in practice the
virtio-blk devices where discovered by the kernel before we looked for them
meaning the loosed matching in get_device_name() was used, rather than the
pci_p logic in handle_block_add_event().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-04-06 20:58:06 +10:00
David Gibson
4f60880414 agent/device: Update test_get_device_name()
The current test_get_device_name(), ported from Kata 1.x doesn't really
reflect how the function is used in practice.  The example path appears
to be for a virtio-blk device, but it's an s390 specific variant, not a
PCI device.  The s390 form isn't actually supported by any of the existing
users of get_device_name().

Change it to a plausible virtio-blk-pci style path to better test how
get_device_name() will actually be used in practice.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-04-06 20:49:48 +10:00
Bin Liu
117c59150d Merge pull request #1613 from Tim-Zhang/pipestream-shutdown-do-nothing
Don't do anything in Pipestream::shutdown
2021-04-06 14:03:00 +08:00
Tim Zhang
ee6a590db1 agent: add test test_pipestream_shutdown
Make sure PipeStream::shutdown() do not close the inner fd.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2021-04-06 11:44:56 +08:00
Tim Zhang
4a2d437043 agent: don't do anything in Pipestream::shutdown
The only right way to shutdown pipe is drop it
Otherwise PipeStream will conflict with its twins
Because they both have the same fd, and both registered.

Fixes: #1614

Signed-off-by: Tim Zhang <tim@hyper.sh>
2021-04-06 11:44:38 +08:00
Peng Tao
d5600641dd Merge pull request #1603 from lifupan/fix_fsgroup
Fix fsgroup
2021-04-06 11:35:03 +08:00
David Gibson
e3e670c56f agent/device: Forward port test for get_device_name() from Kata 1.x
Kata 1.x had a testcase for the equivalent getDeviceName function in Go,
this adapts it to Rust and adds it to Kata 2.x.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-04-06 13:29:37 +10:00
David Gibson
ed08980fc1 agent: Remove many "panic message is not string literal" warnings
Rust 1.51 appears to have added a new warning in anticipation of Rust 2021,
which requires the format string for panic!()s (including via the various
assert!() macros) to be a string literal.  This triggers quite a few times
in the agent code.  This patch fixes them.

fixes #1626

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-04-06 11:51:34 +10:00
fupan.lfp
6493942568 mount: fix the issue of missing set fsGroup
For k8s emptyDir volume, a specific fsGroup would
be set for it, thus guest should get this fsGroup
from runtime and set it properly on the emptyDir volume
in guest.

Fixes: #1580

Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
2021-04-01 11:33:26 +08:00
Fabiano Fidêncio
572aff53e8 build: Only keep one VERSION file
Instead of having different VERSION files spread accross the project,
let's always use the one in the topsrcdir and remove all the others,
keeping only a synlink to the topsrcdir one.

Fixes: #1579

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-03-31 23:51:20 +02:00
Eric Ernst
05680b86c4 Merge pull request #1537 from lifupan/main
cgroups: fix the issue of get wrong online cpus
2021-03-29 15:56:03 -07:00
Eric Ernst
460117a1a6 Merge pull request #1510 from littlejawa/issue_1003
build: remove unused variables from Makefile
2021-03-29 14:54:09 -07:00
Jakob Naucke
52a276fbdb agent: Fix type for PROC_SUPER_MAGIC on s390x
statfs f_types are long on most architectures, but not on s390x, where
they are uint. Following the fix in rust-lang/libc at
https://github.com/rust-lang/libc/pull/1999, the custom defined
PROC_SUPER_MAGIC must be updated in a similar way.

Fixes: #1204

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-03-29 17:25:19 +02:00
Jakob Naucke
5b7c8b7d26 agent: Update cgroups-rs to 0.2.5
to pull in the chain of https://github.com/rust-lang/libc/pull/1999,
https://github.com/nix-rust/nix/pull/1372, and
https://github.com/kata-containers/cgroups-rs/pull/38. This adds statfs
constants on s390x. cgroups-rs 0.2.4 also contains this fix, but let's
move to the latest 0.2.5 right away.

Fixes: #1204

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-03-29 17:25:14 +02:00
James O. D. Hunt
1d448813a1 uevent: Add shutdown channel for task
Allow the uevent task to shutdown on request.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-03-29 14:32:12 +01:00
James O. D. Hunt
d8d5b4cd1d signal: Move to a new module
Move the signal handling code into a new module and refactor into the
main handler and a new SIGCHLD handling function to make the code
simpler and easier to understand.

Also added a unit test for shutdown.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-03-29 14:32:12 +01:00
James O. D. Hunt
011f7d785a logging: Rework for shutdown
Make changes to logger thread to allow the logger to be replaced with
a NOP logger (required for agent shutdown).

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-03-29 14:32:12 +01:00
James O. D. Hunt
7d5f88c0ad agent: Enable clean shutdown
The agent doesn't normally shutdown: it doesn't need to be as it is
killed *after* the workload has finished. However, a clean and ordered
shutdown sequence is required to support agent tracing, since all trace
spans need to be completed to ensure a valid trace transaction.

Enable a controlled shutdown by allowing the main threads (tasks) to be
stopped.

To allow this to happen, each thread is now passed a shutdown channel
which it must listen to asynchronously, and shut down the thread if
activity is detected on that channel.

Since some threads are created for I/O and since the standard `io::copy`
cannot be stopped, added a new `interruptable_io_copier()` function
which shares the same semantics as `io::copy()`, but which is also
passed a shutdown channel to allow asynchronous I/O operations to be
stopped cleanly.

Fixes: #1531.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-03-29 14:32:12 +01:00
James O. D. Hunt
dcb39c61f1 main: Create logger task
Encapsulate the logic for handling the task that displays logger output
into a new function to simplify the code and remove another anonymous
async block.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-03-29 14:32:11 +01:00
James O. D. Hunt
2cf2897d31 main: Use task list for stopping tasks
Maintain a list of tasks and wait on them all before main returns.

This is preparatory work for the agent shutdown: all tasks that are
started need to be added to the list. This aggregation makes it easier
to identify what needs to stop before the agent can exit cleanly.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-03-29 14:32:11 +01:00
James O. D. Hunt
039df1d727 main: Refactor main logic into new async function
Move most of the main logic into a separate async function. This makes
the code clearer and avoids the anonymous async block.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-03-29 14:32:11 +01:00
James O. D. Hunt
2a648fa760 logging: Use guard to make threaded logging safe
Return a guard variable from `create_logger()` which the caller can
implicitly drop to guarantee that all threads started by the async log
drain are stopped.

This fixes a long-standing bug [1] whereby the agent could panic with
the following error, generated by the `slog` logging crate:

```
slog::Fuse Drain: Custom { kind: Other, error: "serde serialization error: Bad file descriptor (os error 9)" }
```

[1] - See https://github.com/kata-containers/kata-containers/issues/171.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-03-29 14:32:11 +01:00
James O. D. Hunt
38f0d8d3ce config: Fix assert_error testing macro
Fixed the `assert_error!()` test macro so that it correctly handles the
scenario where the test expects an error, but the actual result was `Ok`
(no error).

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-03-29 14:32:11 +01:00
fupan.lfp
3f46e6379d cgroups: fix the issue of getting wrong online cpus
It's better to get the online cpus from
"/sys/devices/system/cpu/online" instead of from
cpuset cgroup, cause there would be an latency
between one cpu online and present in the root
cpuset cgroup.

Fixes: #1536

Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
2021-03-29 15:49:15 +08:00
bin
532ff7c909 runtime: update virtcontainers API documentation
Virtcontainers API documentation is outdated, update documentation from the latest
source.

Fixes: #1455

Signed-off-by: bin <bin@hyper.sh>
2021-03-29 11:50:53 +08:00
Bin Liu
5b5b5cc611 Merge pull request #1539 from bergwolf/ut
fix runtime UTs and enable static check
2021-03-25 16:29:45 +08:00
James O. D. Hunt
2fc7f75724 Merge pull request #1521 from jodh-intel/verify-cid
Verify container ID
2021-03-24 13:27:58 +00:00
Peng Tao
fc0f93aef9 actions: enable unit tests in PR check
Right now we only run UTs for agent. We need to run it for *ALL*
components.

Fixes: #1538
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-03-24 20:12:14 +08:00
Bin Liu
018454be44 Merge pull request #1534 from Tim-Zhang/rework-execute_hook
rustjail: rework execute_hook
2021-03-24 14:09:09 +08:00
Tim Zhang
40861fbab5 Merge pull request #1517 from jodh-intel/agent-server-address-cmdline
agent: Allow server address to be specified on kernel command-line
2021-03-23 19:33:25 +08:00
Tim Zhang
0e4b28e838 rustjail: rework execute_hook
Fixes: #1532

Signed-off-by: Tim Zhang <tim@hyper.sh>
2021-03-22 20:20:30 +08:00