In file src/agent/rustjail/src/validator.rs,
these two functions are not used:
- get_namespace_path
- check_host_ns
Fixes: #1783
Signed-off-by: bin <bin@hyper.sh>
For k8s emptyDir volume, a specific fsGroup would
be set for it, thus guest should get this fsGroup
from runtime and set it properly on the EphemeralStorage
volume in guest.
Fixes: #1580
Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
Currently we implement the Default trait for NamespaceType. It doesn't
really make sense to have a default for this type though - you really need
to know what type of namespace you're setting. In fact the Default
implementation is never used, so we can just drop it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We had some code that initialized a Uevent to the default value, then set
specific fields to various values. This can be accomplished inside the one
initialized using the ..Default::default() syntax. Making this change
stops clippy from complaining.
fixes#1611
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We have one place where we create an empty vector then immediately push
something into it. We can do this in one step using the vec![] macro,
which stops clippy complaining.
fixes#1611
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The various type implementing the UeventMatcher trait have new() methods
which return a Result<>, however none of them can actually fail. This is
a leftover from their development where some versions could fail to
initialize. Remove the unneccessary wrappers to silence clippy.
fixes#1611
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently these are in all-caps, to match typical capitalization of IPC,
UTS and PID in the world at large. However, this violates Rust's
capitalization conventions and makes clippy complain.
fixes#1611
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Clippy (in Rust 1.51 at least) has some complaints about this closure
inside execute_hook() because it uses explicit returns in some places
where it doesn't need them, because they're the last expression in the
function.
That isn't necessarily obvious from a glance, but we can make clippy happy
and also make things a little clearer: first we replace a somewhat verbose
'match' using Option::ok_or_else(), then rearrange the remaining code to
put all the error path first with an explicit return then the "happy" path
as the stright line exit with an implicit return.
fixes#1611
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
PathBuf is an owned, mutable Path. We don't need those properties in
get_value_from_cgroup() so we can use a Path instead. This may be slightly
safer, and definitely stops clippy (version 1.51 at least) from
complaining.
fixes#1611
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
DEFAULT_ALLOWED_DEVICES and DEFAULT_DEVICES are essentially global
constant lists. They're implemented as a lazy_static! initialized Vec
values.
The code to initialize them creates an empty Vec then pushes values
onto it. We can simplify this a bit by using the vec! macro. This
might be slightly more efficient, and it definitely stops recent
clippy versions (e.g. 1.51) from complaining about it.
fixes#1611
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Recent versions of clippy (e.g. in Rust 1.51) complain about a number
of names in the oci crate, which don't obey Rust's normal CamelCasing
conventions.
It's pretty clear that these don't obey the usual rules because they
are attempting to preserve conventional casing of existing acronyms
they incorporate ("VM", "POSIX", etc.). However, it's been my
experience that matching the case and name conventions of your
environs is more important than matching case with external norms.
Therefore, this patch changes all the identifiers in the oci crate to
match Rust conventions. Their users in the rustjail crate are updated
to match.
fixes#1611
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This comment appears to be connected specifically with this function, but
has some other items separating it for no particular reason. It also has
a typo. Correct both.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Functions in rustjail deal with both the local oci module's data structure
and the protocol::oci module's data structure. Since these both cover the
OCI container config they are quite similar and have many identically named
types.
To avoid conflicts, we import many things from those modules with altered
names. However the names we use oci* and grpc* don't fit the normal Rust
capitalization convention for types.
However by renaming the import of the 'protocols::oci' module itself to
'grpc', we can actually get rid of the many renames by just qualifying at
each use site with only a very small increase in verbosity. As a bonus
this gets rid of multiple 'use' items scattered through the file.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The situation is not a IPC scene, pipe(2) is too heavy.
We have tokio::sync:⌚:channel after tokio has been introduced.
The channel has better performance and easy to use.
Fixes: #1721
Signed-off-by: Tim Zhang <tim@hyper.sh>
Update:
- Make the type of errnoRet in oci.proto oneof
- Update seccomp_grpc_to_oci that can set errnoRet as EPREM if the
value is empty.
- Update the oci.pb.go based on the above fixes
- Add seccomp errnoRet and flags option to configs in rustjail
Fixes: #1719
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
register_memory_event_v2() includes a closure spawned as an async task
with tokio. At the end of that closure, there's a test for a closed fd
exiting if so. But this is right at the end of the closure when it was
about to exit anyway, so this does nothing.
This code was originally an explicit thread, converted to a tokio task
by 332fa4c "agent: switch to async runtime". It looks like there was an
error during conversion, where this logic was accidentally moved out of the
while loop above, where it makes a lot more sense.
Put it back into the loop.
fixes#1702
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently runtime and agent special case virtio-blk devices under clh,
ostensibly because the PCI address information is not available in that
case.
In fact, cloud-hypervisor's VmAddDiskPut API does return a PciDeviceInfo,
which includes a PCI address. That API is broken, because PCI addressing
depends on guest (firmware or OS) actions that the hypervisor won't know
about. clh only gets away with this because it only uses a single PCI root
and never uses PCI bridges, in which case the guest addresses are
accurately predictable: they always have domain and bus zero.
Until https://github.com/kata-containers/kata-containers/pull/1190, Kata
couldn't handle PCI addressing unless there was exactly one bridge, which
might be why this was actually special-cased for clh.
With #1190 merged, we can handle more general PCI paths, and we can derive
a trivial (one element) PCI path from the information that the clh API
gives us. We can use that to remove this special case.
fixes#1431
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
DevAddrMatcher existed purely as a transitional step as we refined the
uevent matching logic for each of the different device types we care about.
We've now done that, so it can be removed along with several related
pieces.
fixes#1628
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Use the new uevent matching infrastructure to refine the matching for pmem
devices to something more pinned down to that device type. While we're
there, fix a few anciliary problems with get_pmem_device_name():
- The name is poor - the *input* to this function is the expected device
name, so the result isn't helpful, except that it needs to wait for the
device to be ready in the guest. Change it to wait_for_pmem_device() and
explicitly check that the returned device name matches the one expected.
- Remove an incorrect comment in nvdimm_storage_handler() (the only caller)
which appears to have been copied from the virtio-blk path, but then
become stale.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Current get_scsi_device_name() uses the legacy uevent matching which
isn't very precise. This refines it to use a specific matcher
implementation. While we're at it:
- No longer insist on the SCSI controller being under the PCI root.
It generally will be, but there's no particular reason to require
it.
The matcher still has a problem in that it won't work sensibly if
there are multiple SCSI busses in the guest. Fixing that requires
changes on the runtime side as well, though, so it's beyond scope for
this change.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There are some problems with get_pci_device_name():
1) It's misnamed: in fact it is only used for handling virtio-blk PCI
devices. It's also only correct for virtio-blk devices, the event
matching doesn't locate the "raw" PCI device, but rather the block
device created by virtio-blk as a child of the PCI device itself.
2) The uevent matching is imprecise. As all things using the legacy
DevAddrMatcher, it matches on a bunch of conditions used across several
different device types, not all of which make sense for virtio-blk pci
devices specifically.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
first get the "HOME" env from "/etc/passwd", if
there's no corresponding uid entry in /etc/passwd,
then set "/" as the home env.
Fixes: #1643
Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
On commit 17e9a2cff5 it was introduced a guard for the case the mount point is already
mounted. Instead of log only the mount tag ("kataShared") with this change it will print
both tag and mount point path.
Fixes: #1398
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
get_device_name() contains logic to wait for a specific uevent, then
extract the /dev node name from it. In future we're going to want similar
logic to wait on uevents, but using different match criteria, or getting
different information out.
To simplify this, add a wait_for_uevent() helper in the uevent module,
which takes an explicit UeventMatcher object and returns the whole uevent
found.
To make testing easier, we also extract the cut down uevent watcher from
test_get_device_name() into a new spawn_test_watcher() helper. Its used
for both test_get_device_name() and a new test_wait_for_uevent() amd will
be useful for more tests in future.
fixes#1484
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
get_device_name() looks at kernel uevents to work out the device name for
a given PCI (usually) address. However, when we call it we can't know if
the uevent we're interested in has already happened (in which case it will
have been recorded in Sandbox::uevent_map) or yet to come, in which case
we need to register to watch it.
However, we currently match differently against past and future events.
For past events we simply look for a sysfs path including the address, but
for future events we use a complex bit of logic in the is_match() closure.
Change it to use the exact same matching logic in both cases.
fixes#1397
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently, Sandbox::uevent_watchers lists uevents to watch for by a
"device address" string. This is not very clearly defined, and is
matched against events with a rather complex closure created in
Uevent::process_add().
That closure makes a bunch of fragile assumptions about what sort of
events we could ever be interested in. In some ways it is too
restrictive (requires everything to be a block device), but in others
is not restrictive enough (allows things matching NVDIMM paths, even
if we're looking for a PCI block device).
To allow the clients more precise control over uevent matching, we
define a new UeventMatcher trait with a method to match uevents. We
then have the atchers list include UeventMatcher trait objects which
are used directly by Uevent::process_add(), instead of constructing
our match directly from dev_addr.
For now we don't actually change the matching function, or even use
multiple different trait implementations, but we'll refine that in
future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The event matching logic in Uevent::process_add() is split into two parts.
The first checks if we care about the event at all, the second checks
whether the event is relevant to a particular watcher.
However, we're going to be adding more types of watchers in future, which
will make the global filter too restrictive. Fold the two bits of logic
together into a per-watcher filter function.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Uevent::process() is a bit oddly organized. It treats the onlining of
hotplugged memory as the "default" case, although that's quite specific,
while treating the handling of hotplugged block devices more like a special
case, although that's pretty close to being very general.
Furthermore splitting Uevent::is_block_add_event() from
Uevent::handle_block_add_event() doesn't make a lot of sense, since their
logic is intimately related to each other.
Alter the code to be a bit more sensible: first split on the "action" type
since that's the most fundamental difference, then handle the memory
onlining special case, then the block device add (which will become a lot
more general in future changes).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Sandbox:dev_watcher is a HashMap from a "device address" to a channel used
to notify get_device_name() that a suitable uevent has been found.
However, "device address" isn't well defined, having somewhat different
meanings for different device/event types. We never actually look up this
HashMap by key, except to remove entries.
Not looking up by key suggests that a map is not the appropriate data
structure here. Furthermore, HashMap imposes limitations on the types
which will prevent some future extensions we want.
So, replace the HashMap with a Vec<Option<>>. We need the Option<> so that
we can remove entries by index (removing them from the Vec completely would
hange the indices of other entries, possibly breaking concurrent work.
This does mean that the vector will keep growing as we watch for different
events during startup. However, we don't expect the number of device
events we watch for during a run to be very large, so that shouldn't be
a problem. We can optimize this later if it becomes a problem.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently, when Uevent::handle_block_add_event() receives an event matching
a registered watcher, it reports the /dev node name from the event back
to the watcher.
This changes it to report the entire uevent, not just the /dev node name.
This will allow various future extensions. It also makes the client side
of the uevent watching - get_device_name() - more consistent between its
two paths: finding a past uevent in Sandbox::uevent_map() or waiting for
a new uevent via a watcher.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Sandbox::pci_device_map contains a mapping from sysfs paths to /dev entries
which is used by get_device_name() to look up the right /dev node. But,
the map only supplies the answer if the uevent for the device has already
been received, otherwise get_device_name() has to wait for it.
However the matching for already-received and yet-to-come uevents isn't
quite the same which makes the whole system fragile.
In order to make sure the matching for both cases is identical, we need the
already-received side to store the whole uevent to match against, not just
the sysfs path and device name.
So, rename pci_device_map to uevent_map and store the whole uevent there
verbatim.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In Kata 1.x, both the sysToDevMap and the deviceWatchers are in the sandbox
structure. For some reason in Kata 2.x, the device watchers have moved to
a separate global variable, GLOBAL_DEVICE_WATCHER.
This is a bad idea: apart from introducing an extra global variable
unnecessarily, it means that Sandbox::pci_device_map and
GLOBAL_DEVICE_WATCHER are protected by separate mutexes. Since the
information in these two structures has to be kept in sync with each other,
it makes much more sense to keep them both under the same single Sandbox
mutex.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
For the case of virtio-blk PCI devices, when matching uevents we create
a pci_p temporary. However, we build it incorrectly: the dev_addr values
we use for PCI devices are a relative sysfs paths from the PCI root to the
device in question *including an initial /*. But when we construct pci_p
we add an extra /, meaning the resulting path will *not* match properly.
AFAICT the only reason we got away with this is because in practice the
virtio-blk devices where discovered by the kernel before we looked for them
meaning the loosed matching in get_device_name() was used, rather than the
pci_p logic in handle_block_add_event().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>