Variables in rust will be dropped at the end of the function.
In function real_main the trace will be shut down by `tracer::end_tracing()`,
but at this time the root span is in an active state, so this root span
will not be sent to the trace collector.
This can be fixed by dropping the root span manually.
Fixes: #2812
Signed-off-by: bin <bin@hyper.sh>
wait_for_pci_device() waits for the PCI device at the given path to become
ready, but it doesn't currently give you any meaningful handle on that
device.
Change the signature, so that it returns the PCI address of the device.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add a new pci::Address type which represents a guest PCI address in
DDDD:BB:SS.F form.
fixes#2745
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
pci::Slot represents a PCI slot. However, in all cases where we use it, we
actually care about addressing a specific PCI function. So, at the moment
we can only refer to function 0 in each slot.
Replace pci::Slot with pci::SlotFn to represent both the slot and function.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
From the endpoints string described through the configuration file, we
build a hash set of allowed enpoints. If a configuration files does not
include an endpoints section, we assume all endpoints are not allowed.
If there is no configuration file, then all endpoints are allowed.
Then for every ttrpc request, we check if the name of the endpoint is
part of the hashset. If it is not, then we return ttrcp::UNIMPLEMENTED.
Fixes: #1837
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
When the kernel command line includes a agent.config_file=<path> entry,
then we will try to override the default confiuguration values with the
ones we parse from a TOML file at <path>.
As the configuration file overrides the default values, we need to go
through a simplified builder that convert a set of Option<> fields into
the actual AgentConfig structure.
Fixes: #1837
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
They will define the list of endpoints that an agent supports.
They're empty and non actionable for now.
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
A single constructor setting default value is a typical pattern for a
Default implementation.
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
Not all routes have either a gateway or a destination IP.
Interface routes, where the source, destination and gateway are undefined,
will default to IP v4 with the current is_ipv6() check even when they
are v6 routes.
We use the provided gRPC Route.Family field instead. This field is built
from the host netlink messages, and is a reliable way of finding out
a route's IP family.
Fixes: #2768
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Our check for the IP family is working as long as we have either a
gateway or a destination IP. Some routes are missing both.
The RT netlink messages provide the IP family information for each
route, so we can carry that piece of information up to the guest. That
will allow for a more reliable route IP family determination.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
The agent initiates a PCI rescan from two places. One is triggered
for each virtio-blk PCI device, and one is triggered unconditionally
when we start a new container.
The PCI bus rescan code was added long time ago in Clear Containers due to
lack of ACPI support in QEMU 2.9 + q35. Since Kata routinely plugs devices
under a PCIe-to-PCI bridge, that left SHPC as the only available hotplug
mechanism.
However, while Kata was using SHPC on the qemu side, it wasn't actually
using it on the guest side. Due to a quirk of our guest kernel
configuration, the SHPC driver never bound to the bridge, and *no* hotplug
was working at all. To work around that, Kata was forcing the rescan
manually, which would discover the new device. That was very fragile (we
were arguably relying on a kernel bug). Even if we were using SHPC
propertly, it includes a mandatory 5s delay during plug operations
(designed for physical cards and human operators), which makes it
unsuitable quick start up.
Worse, the forced PCI rescans could race with either SHPC or PCIe native
hotplug sequences, causing several problems. In some cases this could put
the device into an entirely broken state where it wouldn't respond to
config space accesses at all.
Since pull request #2323 was merged, we have instead used ACPI hotplug
which is both fast, and more solid in terms of semantics and races. So,
the forced PCI rescans are no longer necessary. Remove them all.
fixes#683
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
do_add_swap() has some mildly complex code to translate the PCI path of
a virtio-blk device (where the swap will reside) into a /dev path. However,
the device module already has get_virtio_blk_pci_device_name() which does
exactly that. The existing code has some further advantages: it uses
more precise matching of the sysfs paths, and if necessary it will wait for
the device to be added to the guest.
While we're there, remove an unnecessary 'as u8' from the PCI path
construction: pci::Path::new() already accepts anything which implements
TryInfo<u8>, which u32 certainly does.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently, VFIO devices attached to a Kata container aren't described to
the agent at all. We essentially just hope they're ready by the time
we've entered the container proper, which is usually the case because of
the PCI rescan - but that causes other problems.
This adds a new device type to the agent representing VFIO devices. The
agent will use its existing uevent watching mechanisms to wait for the
associated guest PCI device to appear before proceeding.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently the constants giving the names for each device/driver type in
the protocol are in mount.rs, and used in device.rs. Since these constants
are inherently related to, well, devices, it makes more sense to put them
in device.rs and use them from mount.rs.
This will become even more so with planned extensions which will add some
device types that will not be used in mount.rs at all.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We would only create the target when updating files. We need to make
sure that we create the target if the source is a directory. Without
this, we'll fail to start a container that utilizes an empty configmap,
for example.
Add unit tests for this.
Fixes: #2638
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
The only remaining callers of ensure_destination_exists() are in its own
unit tests. So, just remove it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
mount_storage() first makes sure the mount point for the storage volume
exists. It uses fs::create_dir_all() in the case of 9p or virtiofs volumes
otherwise ensure_destination_exists(). But.. ensure_destination_exists()
boils down to an fs::create_dir_all() in most cases anyway. The only case
it doesn't is for a bind fstype, where it creates a file instead of a
directory. But, that's not correct anyway because we need to create either
a file or a directory depending on the source of the bind mount, which
ensure_destination_exists() doesn't know.
The 9p/virtiofs paths also check if the mountpoint exists before calling
fs::create_dir_all(), which is unnecessary (fs::create_dir_all already
handles that case).
mount_storage() does have the information to know what we need to create,
so have it explicitly call ensure_destination_file_exists() for the bind
mount to a non-directory case, and fs::create_dir_all() in all other cases.
fixes#2390
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
ensure_destination_exists() can create either a directory or a regular file
depending on the arguments. This patch extracts the regular file specific
option into its own helper: ensure_destination_file_exists(). This:
- Avoids doing some steps in the directory case (they're already handled
by create_dir_all())
- Enables some further future cleanups
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
struct Baremount contains the information necessary to make a new mount.
As a datastructure, however, it's pointless, since every user just
constructs it, immediately calls the BareMount::mount() method then
discards the structure.
Simplify the code by making this a direct function call baremount().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
BareMount::mount does some complicated marshalling and uses unsafe code to
call into the mount(2) system call. However, we're already using the nix
crate which provides a more Rust-like wrapper for mount(2). We're even
already using nix::mount::umount and nix::mount::MsFlags from the same
module.
In the same way, we can replace the direct usage of libc::umount() with
nix::mount::umount() in one of the tests.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Rust 1.47.0 which is the latest we note as tested in versions.yaml is now
getting fairly old - many current distros have newer versions (e.g.
Rust 1.54.0 in Fedora 34). Bring this more up to date.
Note that this is only updating the 'newest-version', not the minimum
required version.
The new version changes the name of the 'clippy::unknown_clipp_lints'
option to simply 'unknown_lints' so we need to change that as well to avoid
warnings.
fixes#2633
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There is no need to keep multiple copies of the license file in
different directory. We can just use the top level one for the project.
Fixes: #2553
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
inotify/watchable-mount changes...
- Allow up to 16 files. It isn't that uncommon to have 3 files in a secret.
In Kubernetes, this results in 9 files in the mount (the presented files,
which are symlinks to the latest files, which are symlinks to actual files
which are in a seperate hidden directoy on the mount). Bumping from eight to 16 will
help ensure we can support "most" secret/tokens, and is still a pretty
small number to scan...
- Now we will only replace the watched storage with a bindmount if we observe
that there are too many files or if its too large. Since the scanning/updating is racy,
we should expect that we'll occassionally run into errors (ie, a file
deleted between scan / update). Rather than stopping and making a bind
mount, continue updating, as the changes will be updated the next time
check is called for that entry (every 2 seconds today).
To facilitate the 'oversized' handling, we create specific errors for too large
or too many files, and handle these specific errors when scanning the storage entry.
- When handling an oversided mount, do not remove the prior files -- we'll just
overwrite them with the bindmount. This'll help avoid the files
disappearing from the user, avoid racy cleanup and simplifies the flow.
Similarly, only mark it as a non-watched storage device after the
bindmount is created successfully.
- When creating bind mount, make sure destination exists. If we hadn't
had a successful scan before, this wouldn't exist and the mount would
fail. Update logic and unit test to cover this.
- In several spots, we were returning when there was an error (both in
scan and update). For update case, let's just log an warning and continue;
since the scan/update is racy, we should expect that we'll have
transient errors which should resolve the next time the watcher runs.
Fixes: #2402
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Although the OCI specification does not explictly requires that, we
should create the process CWD if it does not exist, before chdir'ing
to it. Without that fizx, the kata-agent fails to create a container
and returns a grpc error when it's trying to change the containerd
working directory to an non existing folder.
runc, the OCI runtime reference implementation, also creates the process
CWD when it's not part of the container rootfs.
Fixes#2374
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
Add new fuction AddSwap. When agent get AddSwap, it will get the device
name from PCIPath and set the device as the swap device.
Fixes: #2201
Signed-off-by: Hui Zhu <teawater@antfin.com>
'FLAGS' hash map has bool to indicate if the flag should be cleared or
not. But in parse_mount_flags_and_options() we set the flag even 'clear'
is true. This results in a 'rw' mount being mounted as 'MS_RDONLY'.
Fixes: #2262
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
It's better to check whether the destination file exists
before creating them, if it had been existed, then return
directly.
Fixes: #2247
Signed-off-by: fupan.lfp <fupan.lfp@antgroup.com>
Update to latest tokio to address RUSTSEC-2021-0072:
Task dropped in wrong thread when aborting `LocalSet` task
Update the toml to specify just 1.x for the tokio version.
Fixes: #2165
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Make the vsock-exporter async totally using tokio runtime.
And delay the timing of the connection to trace-forwarder so that
it is easy to reconnect when the connection was broken.
Fixes: #2234
Signed-off-by: Tim Zhang <tim@hyper.sh>
This has a similar intent as the go code, but not totally equal. For
the go code we want to ensure that the vendored code is up-to-date,
while here we want to ensure that `cargo vendor` actually works.
We happened to release a few tarballs where `cargo vendor` didn't work
and it causes some pain for downstream maintainers.
Related: #2159
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
get_mounts() parses /proc/self/mountinfo in order to get the mountpoints
for various cgroup filesystems. One of the entries in mountinfo is the
"device" for each filesystem, but for virtual filesystems like /proc, /sys
and cgroups, the device entry is arbitrary. Depending on the exact rootfs
setup, it can end up being "-".
This breaks get_mounts() because it uses " - " as a separator. There
really is a " - " separator in mountinfo, but in this case the device entry
shows up as a second one. Fix this, by changing a split to a splitn, which
will effectively only consider the first " - " in the line.
While we're there, make the warning message more useful, by having it
actually show which line it wasn't able to parse.
fixes#2182
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>