Not all routes have either a gateway or a destination IP.
Interface routes, where the source, destination and gateway are undefined,
will default to IP v4 with the current is_ipv6() check even when they
are v6 routes.
We use the provided gRPC Route.Family field instead. This field is built
from the host netlink messages, and is a reliable way of finding out
a route's IP family.
Fixes: #2768
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Our check for the IP family is working as long as we have either a
gateway or a destination IP. Some routes are missing both.
The RT netlink messages provide the IP family information for each
route, so we can carry that piece of information up to the guest. That
will allow for a more reliable route IP family determination.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
We need to be able to get the IP family from the netlink route meesages,
and the Route.Family field only got recently added to the netlink
package.
The update generates static check warnings about the call for
nethandler.Delete() being deprecated in favor of a Close() call instead.
So we include the s/Delete()/Close()/ change as part of this PR.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Reduce the cloud-hypervisor log level from `Debug` to `Info` when hypervisor
debug is enabled. This is required since `Debug` level:
- Is overkill for debugging hypervisor failures.
- Effectively hides the output from the guest kernel and userland: CLH
generates so much output that the output from the guest gets "lost in
the noise" (experiments show that for each full CLH debug message, at most
1 _byte_ of guest output is displayed).
Fixes: #2726.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
modify the make script of the check-go-static, changing the `./cli` path to `./cmd/kata-runtime`
Fixes: #2765
Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
Run tests sometimes generate pkg/containerd-shim-v2/monitor_address,
and `git status` will treat it as a new file.
Package containerd-shim-v2 has moved to pkg/containerd-shim-v2,
the monitor_address in .gitignore should be updated too.
Fixes: #2762
Signed-off-by: bin <bin@hyper.sh>
It seems the client (crio) can send multiple requests to stop the Kata VM,
resulting a nil reference if the uid has already been cleaned up by a different thread.
Fixes#2743
Signed-off-by: Feng Wang <feng.wang@databricks.com>
Since we now have "unix://" kind of socket returned by the
SocketAddress() function, there is no more need to build the sandbox
storage path dynamically to keep OS compatibility.
Fixes: #2738
Suggested-by: Christophe de Dinechin <dinechin@redhat.com>
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
Rectify the values of testModuleData with the correct
types in TestCCCheckCLiFunction in kata-check_(!x86)_test.go
Fixes: #2735
Signed-off-by: Amulya Meka <amulmek1@in.ibm.com>
The agent initiates a PCI rescan from two places. One is triggered
for each virtio-blk PCI device, and one is triggered unconditionally
when we start a new container.
The PCI bus rescan code was added long time ago in Clear Containers due to
lack of ACPI support in QEMU 2.9 + q35. Since Kata routinely plugs devices
under a PCIe-to-PCI bridge, that left SHPC as the only available hotplug
mechanism.
However, while Kata was using SHPC on the qemu side, it wasn't actually
using it on the guest side. Due to a quirk of our guest kernel
configuration, the SHPC driver never bound to the bridge, and *no* hotplug
was working at all. To work around that, Kata was forcing the rescan
manually, which would discover the new device. That was very fragile (we
were arguably relying on a kernel bug). Even if we were using SHPC
propertly, it includes a mandatory 5s delay during plug operations
(designed for physical cards and human operators), which makes it
unsuitable quick start up.
Worse, the forced PCI rescans could race with either SHPC or PCIe native
hotplug sequences, causing several problems. In some cases this could put
the device into an entirely broken state where it wouldn't respond to
config space accesses at all.
Since pull request #2323 was merged, we have instead used ACPI hotplug
which is both fast, and more solid in terms of semantics and races. So,
the forced PCI rescans are no longer necessary. Remove them all.
fixes#683
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
do_add_swap() has some mildly complex code to translate the PCI path of
a virtio-blk device (where the swap will reside) into a /dev path. However,
the device module already has get_virtio_blk_pci_device_name() which does
exactly that. The existing code has some further advantages: it uses
more precise matching of the sysfs paths, and if necessary it will wait for
the device to be added to the guest.
While we're there, remove an unnecessary 'as u8' from the PCI path
construction: pci::Path::new() already accepts anything which implements
TryInfo<u8>, which u32 certainly does.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We send information about several kinds of devices to the agent so
that it can apply specific handling. We don't currently do this with
VFIO devices. However we need to do that so that the agent can
properly wait for VFIO devices to be ready (previously it did that
using a PCI rescan which may not be reliable and has some very bad
side effects).
This patch collates and sends the relevant information.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Both appendBlockDevice and appendVhostUserBlkDevice start by using
GetDeviceByID to lookup the api.Device object corresponding to their
ContainerDevice object. However their common caller, appendDevices() has
already done this.
This changes it so the looked up api.Device is passed to the individual
append*Device() functions. This slightly reduces duplicated work, but more
importantly it makes it clearer that append*Device() don't need to check
for a nil result from GetDeviceByID, since the caller has already done
that.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
For several device types which correspond to a PCI device in the guest
we record the device's PCI path in the guest. We don't currently do
that for VFIO devices, but we're going to need to for better handling
of SR-IOV devices.
To accomplish this, we have to determine the guest PCI path from the
information the VMM gives us:
For qemu, we query the slot of the device and its bridge from QMP.
For cloud-hypervisor, the device add interface gives us a guest PCI
address. In fact this represents a design error in the clh API -
there's no way it can really know the guest PCI address in general.
It works in this case, because clh doesn't use PCI bridges, so the
device will always be on the root bus. Based on that, the PCI path is
simply the device's slot number.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
hotplugVFIODevice() has several different paths depending if we're
plugging into a root port or a PCIE<->PCI bridge and if we're using a
regular or mediated VFIO device.
We're going to want some common code on the successful exit path here,
so refactor the function to allow that without duplication.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently, VFIO devices attached to a Kata container aren't described to
the agent at all. We essentially just hope they're ready by the time
we've entered the container proper, which is usually the case because of
the PCI rescan - but that causes other problems.
This adds a new device type to the agent representing VFIO devices. The
agent will use its existing uevent watching mechanisms to wait for the
associated guest PCI device to appear before proceeding.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently the constants giving the names for each device/driver type in
the protocol are in mount.rs, and used in device.rs. Since these constants
are inherently related to, well, devices, it makes more sense to put them
in device.rs and use them from mount.rs.
This will become even more so with planned extensions which will add some
device types that will not be used in mount.rs at all.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A few "go fmt" errors appear to have crept it. Clean them up with
"go fmt ./..." in the src/runtime directory.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We would only create the target when updating files. We need to make
sure that we create the target if the source is a directory. Without
this, we'll fail to start a container that utilizes an empty configmap,
for example.
Add unit tests for this.
Fixes: #2638
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
We now support any container engine CRI compliant. Let's bump the
kata-monitor version to 0.2.0.
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
This commit stops the container engine polling in favor of
the kata sandbox storage path monitoring.
The pod cache list is now refreshed based on fs events and synced with
the container engine only when needed.
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
When the container engine is different than containerd or CRI-O we
lack proper detection of kata workloads and consider all the pods as
kata ones.
Instead of querying the container engine for the lower level runtime
used in each pod, check if a directory matching the pod exists in
the virtualcontainers sandboxes storage path.
This provides a container engine independent way to check for kata pods.
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
Under certain circumstances[0] Kata will attempt to use SHPC hotplug
for PCI devices on the guest. In fact we explicitly enable SHPC on
our PCI to PCI bridges, regardless of the qemu default.
SHPC was designed a long, long time ago for physical hotplugging and
works very poorly for a virtual environment. In particular it has a
mandatory 5s delay to allow a (real, human) operator to back out the
operation if they press a button by mistake. This alone makes it
unusable for a fast start up application like Kata.
Worse, the agent forces a PCI rescan during startup. That will race
with the SHPC hotplug operation causing the device to go into a bad
state where config space can't be accessed from the guest at all.
The only reason we've sort of gotten away with this is that our
default guest kernel configuration triggers what's arguably a kernel
bug effectively disabling SHPC. That makes the agent rescan the only
reason we see the new device.
Now that we require a qemu >=6.1, which includes ACPI PCI hotplug on
the q35 machine, we can explicitly disable SHPC in all cases. It's
nothing but trouble.
fixes#2174
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
qemuArchBase.appendBridges is never actually used, because the bare
qemuArchBase type is itself never used (outside of unit tests). Instead
*all* the subclasses of qemuArchBase override appendBridges() to call
the very similar, but not identical genericAppendBridges. So, we can
remove the qemuArchBase.appendBridges implementation.
Furthermore, all those subclasses override appendBridges() in exactly
the same way, and so we can remove *those* definitions and replace the
base class qemuArchBase appendBridges() with that version, calling
genericAppendBridges().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>