kata-containers

mirror of https://github.com/aljazceru/kata-containers.git synced 2026-01-09 17:34:25 +01:00

Author	SHA1	Message	Date
bin	3458073d09	agent: create directories for watchable-bind mounts In function `update_target`, if the updated source is a directory, we should create the corresponding directory. Fixes: #3140 Signed-off-by: bin <bin@hyper.sh>	2021-12-03 14:32:08 +08:00
Eric Ernst	db9cd1078f	watcher: tests: ensure there is 20ms delay between fs writes We noticed s390x test failures on several of the watcher unit tests. Discovered that on s390 in particular, if we update a file in quick sucecssion, the time stampe on the file would not be unique between the writes. Through testing, we observe that a 20 millisecond delay is very reliable for being able to observe the timestamp update. Let's ensure we have this delay between writes for our tests so our tests are more reliable. In "the real world" we'll be polling for changes every 2 seconds, and frequency of filesystem updates will be on order of minutes and days, rather that microseconds. Fixes: #2946 Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-11-19 13:04:26 -08:00
Eric Ernst	a51a1f6d06	watchers: handle symlinked directories, dir removal - Even a directory could be a symlink - check for this. This is very common when using configmaps/secrets - Add unit test to better mimic a configmap, configmap update - We would never remove directories before. Let's ensure that these are added to the watched_list, and verify in unit tests - Update unit tests which exercise maximum number of files per entry. There's a change in behavior now that we consider directories/symlinks watchable as well. For these tests, it means we support one less file in a watchable mount. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-11-19 13:04:26 -08:00
Eric Ernst	5bc1c209b2	watchers: don't dereference symlinks when copying files The current implementation just copies the file, dereferencing any simlinks in the process. This results in symlinks no being preserved, and a change in layout relative to the mount that we are making watchable. What we want is something like "cp -d" This isn't available in a crate, so let's go ahead and introduce a copy function which will create a symlink with same relative path if the source file is a symlink. Regular files are handled with the standard fs::copy. Introduce a unit test to verify symlinks are now handled appropriately. Fixes: #2950 Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-11-19 13:04:24 -08:00
GabyCT	f611785fdc	Merge pull request #2967 from jodh-intel/enable-debug-logs logging: Enable agent debug output for release builds	2021-11-04 10:04:59 -06:00
GabyCT	86b5bb5801	Merge pull request #2940 from ManaSugi/seccomp-aarch64 agent: "Revert agent: Disable seccomp feature on aarch64 temporarily"	2021-11-04 09:38:45 -06:00
James O. D. Hunt	bcf3e82cf0	logging: Enable agent debug output for release builds Raise the `slog` maximum log level feature for release code from `info` to `debug` by changing the `slog` maximum level features in the shared `logging` crate. This allows the consumers of the `logging` crate (the agent, the `trace-forwarder` and the `agent-ctl` tool) to produce debug output when their debug options are enabled. Currently, those options will essentially be a NOP (unless using a debug version of the code). Testing showed that setting the `slog` maximum level features in the rust manifest files for the consumers of the `logging` crate has no impact: those values are ignored, so they have been removed and replaced with a comment stating the levels are set in the `logging` crate. Fixes: #2966. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2021-11-04 11:42:47 +00:00
Manabu Sugimoto	b468dc500a	agent: Use dup3 system call in unit tests of seccomp Use `dup3` system call instead of `dup2` in unit tests of seccomp because `dup2` is obsolete on aarch64. Fixes: #2939 Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>	2021-11-03 15:49:23 +09:00
Manabu Sugimoto	1aaa0599d9	agent: "Revert agent: Disable seccomp feature on aarch64 temporarily" Re-enable seccomp feature on aarch64 because CI is ready by https://github.com/kata-containers/tests/pull/4124. This reverts commit `42add7f201`. Fixes: #2939 Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>	2021-11-02 22:53:38 +09:00
bin	1e331f7542	agent: refactor process IO processing Move closing IO into process.rs and use macro to reduce codes. Fixes: #2944 Signed-off-by: bin <bin@hyper.sh>	2021-11-02 15:49:11 +08:00
GabyCT	7b406d5561	Merge pull request #2037 from c3d/issue/2036-is-not-exist agent: Make wording of error message match CRI-O test suite	2021-10-29 17:25:06 -05:00
Samuel Ortiz	4280415149	agent: Fix the configuration sample file All endpoint names share the `Request` suffix. Also, the current list is based on functions, not requests. Fixes #2916 Reported-by: Jakob Naucke <jakob.naucke@ibm.com> Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2021-10-27 06:02:33 +02:00
Manabu Sugimoto	42add7f201	agent: Disable seccomp feature on aarch64 temporarily In order to pass CI test of aarch64, it is necessary to run `ci/install_libseccomp.sh` before ruuning unit tests in `jenkins_job_build.sh`. However, `ci/install_libseccomp.sh` is not available until PR #1788 including this commit is merged in the mainline. Therefore, we disable seccomp feature on aarch64 temporarily. After #1788 lands and CI is fixed, this commit will be reverted. Fixes: #1476 Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>	2021-10-27 19:06:13 +09:00
Manabu Sugimoto	3be50adab9	agent: Add support for Seccomp The kata-agent supports seccomp feature based on the OCI runtime specification. This seccomp capability in the kata-agent is enabled by default. However, it is not enforced by default: users need to enable that by setting `disable_guest_seccomp` to `false` in the main configuration file. Fixes: #1476 Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>	2021-10-27 19:06:13 +09:00
Bin Liu	8d8604e10f	Merge pull request #2893 from liubin/fix/2892-print-error-instead-of-return agent: do not return error but print it if task wait failed	2021-10-26 17:48:17 +08:00
bin	5f5eca6b8e	agent: do not return error but print it if task wait failed Do not return error but print it if task wait failed and let program continue to run the next code. Fixes: #2892 Signed-off-by: bin <bin@hyper.sh>	2021-10-26 11:43:39 +08:00
Fupan Li	3d0fe433c6	Merge pull request #2889 from lht/handle-uevent-remove-actions agent: Handle uevent remove actions	2021-10-25 19:08:20 +08:00
James O. D. Hunt	ec3aa1694b	Merge pull request #2844 from jongwu/unit_test enable unit test on arm	2021-10-25 10:58:21 +01:00
Bin Liu	01fdeb7641	Merge pull request #2891 from ManaSugi/fix/unify-form rustjail: Consistent coding style of LinuxDevice type	2021-10-25 14:03:03 +08:00
Haitao Li	a13e2f77b8	agent: Handle uevent remove actions uevents with action=remove was ignored causing the agent to reuse stale data in the device map. This patch adds handling of such uevents. Fixes #2405 Signed-off-by: Haitao Li <lihaitao@gmail.com>	2021-10-25 14:41:32 +11:00
David Gibson	730b9c433f	agent/device: Create device nodes for VFIO devices Add and adjust the vfio devices in the inner container spec so that rustjail will create device nodes for them. In order to do that, we also need to make sure the VFIO device node is ready within the guest VM first. That may take (slightly) longer than just the underlying PCI device(s) being ready, because vfio-pci needs to initialize. So, add a helper function that will wait for a specific VFIO device node to be ready, using the existing uevent listening mechanism. It also returns the device node name for the device (though in practice it will always /dev/vfio/NN where NN is the group number). Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:28:33 +11:00
David Gibson	175f9b06e9	rustjail: Allow container devices in subdirectories Many device nodes go directly under /dev, however some are conventionally placed in subdirectories under /dev. For example /dev/vfio/vfio or /dev/pts/ptmx. Currently, attempting to pass such a device into a Kata container will fail because mknod() will get an ENOENT because the parent directory is missing (or an equivalent error for bind_dev()). Correct that by making subdirectories as necessary in create_devices(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:28:33 +11:00
David Gibson	9891efc61f	rustjail: Correct sanity checks on device path For each user supplied device, create_devices() checks that the given path actually is in /dev, by checking that its path starts with /dev and does not contain "..". However, this has subtle errors because it's interpreting the path as a raw string without considering separators. It will accept the path /devfoo which it should not, while it will not accept the valid (though weird) paths /dev/... and /dev/a..b. Correct this by using std::path::Path methods designed for the purpose. Having done this, it's trivial to also generate the relative path that mknod_dev() or bind_dev() will need, so do that at the same time. We also move this logic into a helper function so that we can add some unit tests for it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:28:33 +11:00
David Gibson	d6b62c029e	rustjail: Change mknod_dev() and bind_dev() to take relative device path Both these functions take the absolute path from LinuxDevice and drop the leading '/' to make a relative path. They do that with a simple &dev.path[1..]. That can be technically incorrect in some edge cases such as a path with redundant /s like "//dev//sda". To handle cases like that, have the explicit relative path passed into these functions. For now we calculate it in the same buggy way, but we'll fix that shortly. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:28:33 +11:00
David Gibson	2680c0bfee	rustjail: Provide useful context on device node creation errors create_devices() within the rustjail module is responsible for creating device nodes within the (inner) containers. Errors that occur here will be propagated up, but are likely to be low level failures of mknod() - e.g. ENOENT or EACCESS - which won't be very useful without context when reported all the way up to the runtime without the context of what we were trying to do. Add some anyhow context information giving the details of the device we were trying to create when it failed. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:28:33 +11:00
David Gibson	42b92b2b05	agent/device: Allow container devname to differ from the host Currently, update_spec_device() assumes that the proper device path in the (inner) container is the same as the device path specified in the outer OCI spec on the host. Usually that's correct. However for VFIO group devices we actually need the container to see the VM's device path, since it's normal to correlate that with IOMMU group information from sysfs which will be different in the guest and which we can't namespace away. So, add an extra "final_path" parameter to update_spec_device() to allow callers to chose the device path that should be used for the inner container. All current callers pass the same thing as container_path, but that will change in future. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:28:33 +11:00
David Gibson	827a41f973	agent/device: Refactor update_spec_device_list() update_spec_device_list() is used to update the container configuration to change device major/minor numbers configured by the Kata client based on host details to values suitable for the sandbox VM, which may differ. It takes a 'device' object, but the only things it actually uses from there are container_path and vm_path. Refactor this as update_spec_device(), taking the host and guest paths to the device as explicit parameters. This makes the function more self-contained and will enable some future extensions. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:28:33 +11:00
David Gibson	8ceadcc5a9	agent/device: Sanity check guest IOMMU groups Each VFIO device passed into the guest could represent a whole IOMMU group of devices on the host. Since these devices aren't DMA isolated from each other, they must appear as the same IOMMU group in the guest as well. The VMM should enforce that for us, but double check it, since things can't work otherwise. This also means we determine the guest IOMMU group for the VFIO device, which we'll be needing later. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:28:33 +11:00
David Gibson	ff59db7534	agent/device: Add function to get IOMMU group for a PCI device For upcoming VFIO extensions we'll need to work with the IOMMU groups of VFIO devices. This helps us towards that by adding pci_iommu_group() to retrieve the IOMMU group (if any) of a given PCI device. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:28:33 +11:00
David Gibson	13b06a35d5	agent/device: Rebind VFIO devices to VFIO driver inside guest VFIO devices can be added to a Kata container and they will be passed through to the sandbox guest. However, inside the guest those devices will bind to a native guest driver, so they will no longer appear as VFIO devices within the guest. This behaviour differs from runc or other conventional container runtimes. This code allows the agent to match the behaviour of other runtimes, if instructed to by kata-runtime. VFIO devices it's informed about with the "vfio" type instead of the existing "vfio-gk" type will be rebound to the vfio-pci driver within the guest. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:28:33 +11:00
David Gibson	e22bd78249	agent/device: Add helper function for binding a guest device to a driver For better VFIO support, we're going to need to take control of which guest driver controls specific guest devices. To assist with that, add the pci_driver_override() function to force a specific guest device to be bound to a specific guest driver. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:28:33 +11:00
Manabu Sugimoto	b40eedc9f7	rustjail: Consistent coding style of LinuxDevice type Use `"c".to_string` in the device type of `dev/full` in order to consistent with the coding style of other devices Fixes: #2890 Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>	2021-10-25 09:15:59 +09:00
Jianyong Wu	57c0f93f54	agent: fix race condition when test watcher create_tmpfs won't pass as the race condition in watcher umount. quote James's words here: 1. Rust runs all tests in parallel. 2. Mounts are a process-wide, not a per-thread resource. The only test that calls watcher.mount() is create_tmpfs(). However, other tests create BindWatcher objects. 3. BindWatcher's drop() implementation calls self.cleanup(), which calls unmount for the mountpoint create_tmpfs() asserts. 4. The other tests are calling unmount whenever a BindWatcher goes out of scope. To avoid that issue, let the tests using BindWatcher in watcher and sandbox.rs run sequentially. Fixes: #2809 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2021-10-24 17:31:53 +08:00
Bin Liu	1cb38ecbe7	Merge pull request #2843 from zhaojizhuang/fixroute agent: Do not fail when trying to adding existing routes	2021-10-18 15:52:29 +08:00
James O. D. Hunt	321be0f794	tracing: Remove trace mode and trace type Remove the `trace_mode` and `trace_type` agent tracing options as decided in the Architecture Committee meeting. See: - https://github.com/kata-containers/kata-containers/pull/2062 Fixes: #2352. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2021-10-15 10:09:38 +01:00
zhaojizhuang	7d0b616cf3	agent: Do not fail when trying to adding existing routes Adding a route that already exists should not be a reason for the agent to fail booting and thus preventing the sandbox to start. Fixes #2712 Signed-off-by: zhaojizhuang <571130360@qq.com>	2021-10-14 18:38:26 +02:00
Peng Tao	176dee6f37	agent: exec should inherit container process capabilities Otherwise rustjail would not set its capabilities and it ends up getting all capabilities. Fixes: #2828 Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2021-10-13 17:24:52 +08:00
Bin Liu	b7cd4ca2b8	Merge pull request #2813 from liubin/fix/2812-flush-root-span agent: flush root span before process finish	2021-10-11 18:46:09 +08:00
bin	2d7b65e8eb	agent: flush root span before process finish Variables in rust will be dropped at the end of the function. In function real_main the trace will be shut down by `tracer::end_tracing()`, but at this time the root span is in an active state, so this root span will not be sent to the trace collector. This can be fixed by dropping the root span manually. Fixes: #2812 Signed-off-by: bin <bin@hyper.sh>	2021-10-11 17:14:37 +08:00
David Gibson	72044180e4	agent/device: Return PCI address from wait_for_pci_device() wait_for_pci_device() waits for the PCI device at the given path to become ready, but it doesn't currently give you any meaningful handle on that device. Change the signature, so that it returns the PCI address of the device. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-08 16:52:49 +11:00
David Gibson	e50b05d93c	agent/pci: Add type to represent PCI addresses Add a new pci::Address type which represents a guest PCI address in DDDD:BB:SS.F form. fixes #2745 Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-08 16:52:49 +11:00
David Gibson	8528157b9b	agent/pci: Extend Slot type to represent PCI function as well pci::Slot represents a PCI slot. However, in all cases where we use it, we actually care about addressing a specific PCI function. So, at the moment we can only refer to function 0 in each slot. Replace pci::Slot with pci::SlotFn to represent both the slot and function. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-08 16:52:49 +11:00
Samuel Ortiz	08360c981d	agent: Add an agent configutation file example With all endpoints allowed. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2021-10-07 04:04:52 +02:00
Samuel Ortiz	8a4e69d237	agent: rpc: Return UNIMPLEMENTED for not allowed endpoints From the endpoints string described through the configuration file, we build a hash set of allowed enpoints. If a configuration files does not include an endpoints section, we assume all endpoints are not allowed. If there is no configuration file, then all endpoints are allowed. Then for every ttrpc request, we check if the name of the endpoint is part of the hashset. If it is not, then we return ttrcp::UNIMPLEMENTED. Fixes: #1837 Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>	2021-10-07 04:04:32 +02:00
Samuel Ortiz	0ea2e3af07	agent: config: Allow for building the configuration from a file When the kernel command line includes a agent.config_file=<path> entry, then we will try to override the default confiuguration values with the ones we parse from a TOML file at <path>. As the configuration file overrides the default values, we need to go through a simplified builder that convert a set of Option<> fields into the actual AgentConfig structure. Fixes: #1837 Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>	2021-10-07 00:37:40 +02:00
Samuel Ortiz	63539dc9fd	agent: config: Add allowed endpoints They will define the list of endpoints that an agent supports. They're empty and non actionable for now. Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>	2021-10-07 00:37:40 +02:00
Samuel Ortiz	a953fea324	agent: config: Simplify configuration creation We dont need a constructor and derive directly from the command line parsing. Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>	2021-10-07 00:37:40 +02:00
Samuel Ortiz	b888edc2fc	agent: config: Implement Default A single constructor setting default value is a typical pattern for a Default implementation. Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>	2021-10-07 00:37:40 +02:00
Samuel Ortiz	a44cde7e8d	agent: netlink: Use the grpc IP family field when updating the route Not all routes have either a gateway or a destination IP. Interface routes, where the source, destination and gateway are undefined, will default to IP v4 with the current is_ipv6() check even when they are v6 routes. We use the provided gRPC Route.Family field instead. This field is built from the host netlink messages, and is a reliable way of finding out a route's IP family. Fixes: #2768 Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2021-10-01 14:39:46 +02:00
Samuel Ortiz	99450bd1f7	agent: protos: Add a Family field to the Route payload Our check for the IP family is working as long as we have either a gateway or a destination IP. Some routes are missing both. The RT netlink messages provide the IP family information for each route, so we can carry that piece of information up to the guest. That will allow for a more reliable route IP family determination. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2021-10-01 14:35:17 +02:00

1 2 3 4 5 ...

634 Commits