Commit Graph

634 Commits

Author SHA1 Message Date
bin
3458073d09 agent: create directories for watchable-bind mounts
In function `update_target`, if the updated source is a directory,
we should create the corresponding directory.

Fixes: #3140

Signed-off-by: bin <bin@hyper.sh>
2021-12-03 14:32:08 +08:00
Eric Ernst
db9cd1078f watcher: tests: ensure there is 20ms delay between fs writes
We noticed s390x test failures on several of the watcher unit tests.

Discovered that on s390 in particular, if we update a file in quick
sucecssion, the time stampe on the file would not be unique between the
writes. Through testing, we observe that a 20 millisecond delay is very
reliable for being able to observe the timestamp update. Let's ensure we
have this delay between writes for our tests so our tests are more
reliable.

In "the real world" we'll be polling for changes every 2 seconds, and
frequency of filesystem updates will be on order of minutes and days,
rather that microseconds.

Fixes: #2946

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-11-19 13:04:26 -08:00
Eric Ernst
a51a1f6d06 watchers: handle symlinked directories, dir removal
- Even a directory could be a symlink - check for this. This is very
common when using configmaps/secrets
- Add unit test to better mimic a configmap, configmap update
- We would never remove directories before. Let's ensure that these are
added to the watched_list, and verify in unit tests
- Update unit tests which exercise maximum number of files per entry. There's a change
in behavior now that we consider directories/symlinks watchable as well.
For these tests, it means we support one less file in a watchable mount.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-11-19 13:04:26 -08:00
Eric Ernst
5bc1c209b2 watchers: don't dereference symlinks when copying files
The current implementation just copies the file, dereferencing any
simlinks in the process. This results in symlinks no being preserved,
and a change in layout relative to the mount that we are making
watchable.

What we want is something like "cp -d"

This isn't available in a crate, so let's go ahead and introduce a copy
function which will create a symlink with same relative path if the
source file is a symlink. Regular files are handled with the standard
fs::copy.

Introduce a unit test to verify symlinks are now handled appropriately.

Fixes: #2950

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-11-19 13:04:24 -08:00
GabyCT
f611785fdc Merge pull request #2967 from jodh-intel/enable-debug-logs
logging: Enable agent debug output for release builds
2021-11-04 10:04:59 -06:00
GabyCT
86b5bb5801 Merge pull request #2940 from ManaSugi/seccomp-aarch64
agent: "Revert agent: Disable seccomp feature on aarch64 temporarily"
2021-11-04 09:38:45 -06:00
James O. D. Hunt
bcf3e82cf0 logging: Enable agent debug output for release builds
Raise the `slog` maximum log level feature for release code from `info`
to `debug` by changing the `slog` maximum level features in the shared
`logging` crate. This allows the consumers of the `logging` crate (the
agent, the `trace-forwarder` and the `agent-ctl` tool) to produce debug
output when their debug options are enabled. Currently, those options
will essentially be a NOP (unless using a debug version of the code).

Testing showed that setting the `slog` maximum level features in the
rust manifest files for the consumers of the `logging` crate has no
impact: those values are ignored, so they have been removed and replaced
with a comment stating the levels are set in the `logging` crate.

Fixes: #2966.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-11-04 11:42:47 +00:00
Manabu Sugimoto
b468dc500a agent: Use dup3 system call in unit tests of seccomp
Use `dup3` system call instead of `dup2` in unit tests of seccomp
because `dup2` is obsolete on aarch64.

Fixes: #2939

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2021-11-03 15:49:23 +09:00
Manabu Sugimoto
1aaa0599d9 agent: "Revert agent: Disable seccomp feature on aarch64 temporarily"
Re-enable seccomp feature on aarch64 because CI is ready
by https://github.com/kata-containers/tests/pull/4124.

This reverts commit 42add7f201.

Fixes: #2939

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2021-11-02 22:53:38 +09:00
bin
1e331f7542 agent: refactor process IO processing
Move closing IO into process.rs and use macro
to reduce codes.

Fixes: #2944

Signed-off-by: bin <bin@hyper.sh>
2021-11-02 15:49:11 +08:00
GabyCT
7b406d5561 Merge pull request #2037 from c3d/issue/2036-is-not-exist
agent: Make wording of error message match CRI-O test suite
2021-10-29 17:25:06 -05:00
Samuel Ortiz
4280415149 agent: Fix the configuration sample file
All endpoint names share the `Request` suffix.
Also, the current list is based on functions, not requests.

Fixes #2916

Reported-by: Jakob Naucke <jakob.naucke@ibm.com>
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2021-10-27 06:02:33 +02:00
Manabu Sugimoto
42add7f201 agent: Disable seccomp feature on aarch64 temporarily
In order to pass CI test of aarch64, it is necessary to run
`ci/install_libseccomp.sh` before ruuning unit tests in
`jenkins_job_build.sh`.
However, `ci/install_libseccomp.sh` is not available
until PR #1788 including this commit is merged in the mainline.
Therefore, we disable seccomp feature on aarch64 temporarily.
After #1788 lands and CI is fixed, this commit will be reverted.

Fixes: #1476

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2021-10-27 19:06:13 +09:00
Manabu Sugimoto
3be50adab9 agent: Add support for Seccomp
The kata-agent supports seccomp feature based on the OCI runtime specification.
This seccomp capability in the kata-agent is enabled by default.
However, it is not enforced by default: users need to enable that by setting
`disable_guest_seccomp` to `false` in the main configuration file.

Fixes: #1476

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2021-10-27 19:06:13 +09:00
Bin Liu
8d8604e10f Merge pull request #2893 from liubin/fix/2892-print-error-instead-of-return
agent: do not return error but print it if task wait failed
2021-10-26 17:48:17 +08:00
bin
5f5eca6b8e agent: do not return error but print it if task wait failed
Do not return error but print it if task wait failed
and let program continue to run the next code.

Fixes: #2892

Signed-off-by: bin <bin@hyper.sh>
2021-10-26 11:43:39 +08:00
Fupan Li
3d0fe433c6 Merge pull request #2889 from lht/handle-uevent-remove-actions
agent: Handle uevent remove actions
2021-10-25 19:08:20 +08:00
James O. D. Hunt
ec3aa1694b Merge pull request #2844 from jongwu/unit_test
enable unit test on arm
2021-10-25 10:58:21 +01:00
Bin Liu
01fdeb7641 Merge pull request #2891 from ManaSugi/fix/unify-form
rustjail: Consistent coding style of LinuxDevice type
2021-10-25 14:03:03 +08:00
Haitao Li
a13e2f77b8 agent: Handle uevent remove actions
uevents with action=remove was ignored causing the agent to reuse stale
data in the device map. This patch adds handling of such uevents.

Fixes #2405

Signed-off-by: Haitao Li <lihaitao@gmail.com>
2021-10-25 14:41:32 +11:00
David Gibson
730b9c433f agent/device: Create device nodes for VFIO devices
Add and adjust the vfio devices in the inner container spec so that
rustjail will create device nodes for them.

In order to do that, we also need to make sure the VFIO device node is
ready within the guest VM first.  That may take (slightly) longer than
just the underlying PCI device(s) being ready, because vfio-pci needs
to initialize.  So, add a helper function that will wait for a
specific VFIO device node to be ready, using the existing uevent
listening mechanism.  It also returns the device node name for the
device (though in practice it will always /dev/vfio/NN where NN is the
group number).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:28:33 +11:00
David Gibson
175f9b06e9 rustjail: Allow container devices in subdirectories
Many device nodes go directly under /dev, however some are conventionally
placed in subdirectories under /dev.  For example /dev/vfio/vfio or
/dev/pts/ptmx.

Currently, attempting to pass such a device into a Kata container will fail
because mknod() will get an ENOENT because the parent directory is missing
(or an equivalent error for bind_dev()).

Correct that by making subdirectories as necessary in create_devices().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:28:33 +11:00
David Gibson
9891efc61f rustjail: Correct sanity checks on device path
For each user supplied device, create_devices() checks that the given path
actually is in /dev, by checking that its path starts with /dev and does
not contain "..".

However, this has subtle errors because it's interpreting the path as a raw
string without considering separators.  It will accept the path /devfoo
which it should not, while it will not accept the valid (though weird)
paths /dev/... and /dev/a..b.

Correct this by using std::path::Path methods designed for the purpose.
Having done this, it's trivial to also generate the relative path that
mknod_dev() or bind_dev() will need, so do that at the same time.

We also move this logic into a helper function so that we can add some unit
tests for it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:28:33 +11:00
David Gibson
d6b62c029e rustjail: Change mknod_dev() and bind_dev() to take relative device path
Both these functions take the absolute path from LinuxDevice and drop the
leading '/' to make a relative path.  They do that with a simple
&dev.path[1..].  That can be technically incorrect in some edge cases such
as a path with redundant /s like "//dev//sda".

To handle cases like that, have the explicit relative path passed into
these functions.  For now we calculate it in the same buggy way, but we'll
fix that shortly.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:28:33 +11:00
David Gibson
2680c0bfee rustjail: Provide useful context on device node creation errors
create_devices() within the rustjail module is responsible for creating
device nodes within the (inner) containers.  Errors that occur here will
be propagated up, but are likely to be low level failures of mknod() - e.g.
ENOENT or EACCESS - which won't be very useful without context when
reported all the way up to the runtime without the context of what we were
trying to do.

Add some anyhow context information giving the details of the device we
were trying to create when it failed.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:28:33 +11:00
David Gibson
42b92b2b05 agent/device: Allow container devname to differ from the host
Currently, update_spec_device() assumes that the proper device path in the
(inner) container is the same as the device path specified in the outer OCI
spec on the host.

Usually that's correct.  However for VFIO group devices we actually need
the container to see the VM's device path, since it's normal to correlate
that with IOMMU group information from sysfs which will be different in the
guest and which we can't namespace away.

So, add an extra "final_path" parameter to update_spec_device() to allow
callers to chose the device path that should be used for the inner
container.  All current callers pass the same thing as container_path, but
that will change in future.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:28:33 +11:00
David Gibson
827a41f973 agent/device: Refactor update_spec_device_list()
update_spec_device_list() is used to update the container configuration to
change device major/minor numbers configured by the Kata client based on
host details to values suitable for the sandbox VM, which may differ.  It
takes a 'device' object, but the only things it actually uses from there
are container_path and vm_path.

Refactor this as update_spec_device(), taking the host and guest paths to
the device as explicit parameters.  This makes the function more
self-contained and will enable some future extensions.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:28:33 +11:00
David Gibson
8ceadcc5a9 agent/device: Sanity check guest IOMMU groups
Each VFIO device passed into the guest could represent a whole IOMMU group
of devices on the host.  Since these devices aren't DMA isolated from each
other, they must appear as the same IOMMU group in the guest as well.

The VMM should enforce that for us, but double check it, since things can't
work otherwise.  This also means we determine the guest IOMMU group for the
VFIO device, which we'll be needing later.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:28:33 +11:00
David Gibson
ff59db7534 agent/device: Add function to get IOMMU group for a PCI device
For upcoming VFIO extensions we'll need to work with the IOMMU groups of
VFIO devices.  This helps us towards that by adding pci_iommu_group() to
retrieve the IOMMU group (if any) of a given PCI device.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:28:33 +11:00
David Gibson
13b06a35d5 agent/device: Rebind VFIO devices to VFIO driver inside guest
VFIO devices can be added to a Kata container and they will be passed
through to the sandbox guest.  However, inside the guest those devices
will bind to a native guest driver, so they will no longer appear as VFIO
devices within the guest.  This behaviour differs from runc or other
conventional container runtimes.

This code allows the agent to match the behaviour of other runtimes,
if instructed to by kata-runtime.  VFIO devices it's informed about
with the "vfio" type instead of the existing "vfio-gk" type will be
rebound to the vfio-pci driver within the guest.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:28:33 +11:00
David Gibson
e22bd78249 agent/device: Add helper function for binding a guest device to a driver
For better VFIO support, we're going to need to take control of which guest
driver controls specific guest devices.  To assist with that, add the
pci_driver_override() function to force a specific guest device to be
bound to a specific guest driver.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:28:33 +11:00
Manabu Sugimoto
b40eedc9f7 rustjail: Consistent coding style of LinuxDevice type
Use `"c".to_string` in the device type of `dev/full`
in order to consistent with the coding style of other devices

Fixes: #2890

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2021-10-25 09:15:59 +09:00
Jianyong Wu
57c0f93f54 agent: fix race condition when test watcher
create_tmpfs won't pass as the race condition in watcher umount. quote
James's words here:

1. Rust runs all tests in parallel.
2. Mounts are a process-wide, not a per-thread resource.
The only test that calls watcher.mount() is create_tmpfs().
However, other tests create BindWatcher objects.
3. BindWatcher's drop() implementation calls self.cleanup(),
which calls unmount for the mountpoint create_tmpfs() asserts.
4. The other tests are calling unmount whenever a BindWatcher goes
out of scope.

To avoid that issue, let the tests using BindWatcher in watcher and
sandbox.rs run sequentially.

Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-10-24 17:31:53 +08:00
Bin Liu
1cb38ecbe7 Merge pull request #2843 from zhaojizhuang/fixroute
agent: Do not fail when trying to adding existing routes
2021-10-18 15:52:29 +08:00
James O. D. Hunt
321be0f794 tracing: Remove trace mode and trace type
Remove the `trace_mode` and `trace_type` agent tracing options as
decided in the Architecture Committee meeting.

See:

- https://github.com/kata-containers/kata-containers/pull/2062

Fixes: #2352.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-10-15 10:09:38 +01:00
zhaojizhuang
7d0b616cf3 agent: Do not fail when trying to adding existing routes
Adding a route that already exists should not be a reason for the agent to fail
booting and thus preventing the sandbox to start.

Fixes #2712

Signed-off-by: zhaojizhuang <571130360@qq.com>
2021-10-14 18:38:26 +02:00
Peng Tao
176dee6f37 agent: exec should inherit container process capabilities
Otherwise rustjail would not set its capabilities and it ends up getting
all capabilities.

Fixes: #2828
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-10-13 17:24:52 +08:00
Bin Liu
b7cd4ca2b8 Merge pull request #2813 from liubin/fix/2812-flush-root-span
agent: flush root span before process finish
2021-10-11 18:46:09 +08:00
bin
2d7b65e8eb agent: flush root span before process finish
Variables in rust will be dropped at the end of the function.

In function real_main the trace will be shut down by `tracer::end_tracing()`,
but at this time the root span is in an active state, so this root span
will not be sent to the trace collector.

This can be fixed by dropping the root span manually.

Fixes: #2812

Signed-off-by: bin <bin@hyper.sh>
2021-10-11 17:14:37 +08:00
David Gibson
72044180e4 agent/device: Return PCI address from wait_for_pci_device()
wait_for_pci_device() waits for the PCI device at the given path to become
ready, but it doesn't currently give you any meaningful handle on that
device.

Change the signature, so that it returns the PCI address of the device.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-08 16:52:49 +11:00
David Gibson
e50b05d93c agent/pci: Add type to represent PCI addresses
Add a new pci::Address type which represents a guest PCI address in
DDDD:BB:SS.F form.

fixes #2745

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-08 16:52:49 +11:00
David Gibson
8528157b9b agent/pci: Extend Slot type to represent PCI function as well
pci::Slot represents a PCI slot.  However, in all cases where we use it, we
actually care about addressing a specific PCI function.  So, at the moment
we can only refer to function 0 in each slot.

Replace pci::Slot with pci::SlotFn to represent both the slot and function.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-08 16:52:49 +11:00
Samuel Ortiz
08360c981d agent: Add an agent configutation file example
With all endpoints allowed.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2021-10-07 04:04:52 +02:00
Samuel Ortiz
8a4e69d237 agent: rpc: Return UNIMPLEMENTED for not allowed endpoints
From the endpoints string described through the configuration file, we
build a hash set of allowed enpoints. If a configuration files does not
include an endpoints section, we assume all endpoints are not allowed.
If there is no configuration file, then all endpoints are allowed.

Then for every ttrpc request, we check if the name of the endpoint is
part of the hashset. If it is not, then we return ttrcp::UNIMPLEMENTED.

Fixes: #1837

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
2021-10-07 04:04:32 +02:00
Samuel Ortiz
0ea2e3af07 agent: config: Allow for building the configuration from a file
When the kernel command line includes a agent.config_file=<path> entry,
then we will try to override the default confiuguration values with the
ones we parse from a TOML file at <path>.

As the configuration file overrides the default values, we need to go
through a simplified builder that convert a set of Option<> fields into
the actual AgentConfig structure.

Fixes: #1837

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
2021-10-07 00:37:40 +02:00
Samuel Ortiz
63539dc9fd agent: config: Add allowed endpoints
They will define the list of endpoints that an agent supports.
They're empty and non actionable for now.

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
2021-10-07 00:37:40 +02:00
Samuel Ortiz
a953fea324 agent: config: Simplify configuration creation
We dont need a constructor and derive directly from the command line
parsing.

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
2021-10-07 00:37:40 +02:00
Samuel Ortiz
b888edc2fc agent: config: Implement Default
A single constructor setting default value is a typical pattern for a
Default implementation.

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
2021-10-07 00:37:40 +02:00
Samuel Ortiz
a44cde7e8d agent: netlink: Use the grpc IP family field when updating the route
Not all routes have either a gateway or a destination IP.
Interface routes, where the source, destination and gateway are undefined,
will default to IP v4 with the current is_ipv6() check even when they
are v6 routes.

We use the provided gRPC Route.Family field instead. This field is built
from the host netlink messages, and is a reliable way of finding out
a route's IP family.

Fixes: #2768

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2021-10-01 14:39:46 +02:00
Samuel Ortiz
99450bd1f7 agent: protos: Add a Family field to the Route payload
Our check for the IP family is working as long as we have either a
gateway or a destination IP. Some routes are missing both.
The RT netlink messages provide the IP family information for each
route, so we can carry that piece of information up to the guest. That
will allow for a more reliable route IP family determination.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2021-10-01 14:35:17 +02:00