Currently, update_spec_device() assumes that the proper device path in the
(inner) container is the same as the device path specified in the outer OCI
spec on the host.
Usually that's correct. However for VFIO group devices we actually need
the container to see the VM's device path, since it's normal to correlate
that with IOMMU group information from sysfs which will be different in the
guest and which we can't namespace away.
So, add an extra "final_path" parameter to update_spec_device() to allow
callers to chose the device path that should be used for the inner
container. All current callers pass the same thing as container_path, but
that will change in future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
update_spec_device_list() is used to update the container configuration to
change device major/minor numbers configured by the Kata client based on
host details to values suitable for the sandbox VM, which may differ. It
takes a 'device' object, but the only things it actually uses from there
are container_path and vm_path.
Refactor this as update_spec_device(), taking the host and guest paths to
the device as explicit parameters. This makes the function more
self-contained and will enable some future extensions.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Each VFIO device passed into the guest could represent a whole IOMMU group
of devices on the host. Since these devices aren't DMA isolated from each
other, they must appear as the same IOMMU group in the guest as well.
The VMM should enforce that for us, but double check it, since things can't
work otherwise. This also means we determine the guest IOMMU group for the
VFIO device, which we'll be needing later.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
For upcoming VFIO extensions we'll need to work with the IOMMU groups of
VFIO devices. This helps us towards that by adding pci_iommu_group() to
retrieve the IOMMU group (if any) of a given PCI device.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
VFIO devices can be added to a Kata container and they will be passed
through to the sandbox guest. However, inside the guest those devices
will bind to a native guest driver, so they will no longer appear as VFIO
devices within the guest. This behaviour differs from runc or other
conventional container runtimes.
This code allows the agent to match the behaviour of other runtimes,
if instructed to by kata-runtime. VFIO devices it's informed about
with the "vfio" type instead of the existing "vfio-gk" type will be
rebound to the vfio-pci driver within the guest.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
For better VFIO support, we're going to need to take control of which guest
driver controls specific guest devices. To assist with that, add the
pci_driver_override() function to force a specific guest device to be
bound to a specific guest driver.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This PR includes these optimize changes:
- Remove the dependency on the container engine.
The old code uses runc to generate config.json and
Docker to export rootfs, that will be heavy and need
additional dependency.
Using a fixed config for busybox image can avoid
the heavy processing above.
- Moved duplicate code to pkg/katatestutils package
Fixes: #2752
Signed-off-by: bin <bin@hyper.sh>
cri-containerd project has been merged into containerd repo, and
we should not reference it any more in code and docs.
This commit will use containerd package instead of cri-containerd
package.
Fixes: #2791
Signed-off-by: bin <bin@hyper.sh>
The tracing tags for api.go contain `"packages"` as a tag name,
whereas all other tags contain `"package"`.
Fixes: #2847
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Display a pseudo path to the sandbox socket in the output of
`kata-runtime env` for those hypervisors that use Hybrid VSOCK.
The path is not a real path since the command does not create a sandbox.
The output includes a `{ID}` tag which would be replaced with the real
sandbox ID (name) when the sandbox was created.
This feature is only useful for agent tracing with the trace forwarder
where the configured hypervisor uses Hybrid VSOCK.
Note that the features required a new `setConfig()` method to be added
to the `hypervisor` interface. This isn't normally needed as the
specified hypervisor configuration passed to `setConfig()` is also
passed to `createSandbox()`. However the new call is required by
`kata-runtime env` to display the correct socket path for Firecracker.
The new method isn't wholly redundant for the main code path though as
it's now used by each hypervisor's `createSandbox()` call.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Add support for Hybrid VSOCK. Unlike standard vsock (`vsock(7)`), under
hybrid VSOCK, the hypervisor creates a "master" *UNIX* socket on the
host. For guest-initiated VSOCK connections (such as the Kata agent uses
for agent tracing), the hypervisor will then attempt to open a VSOCK
port-specific variant of the socket which it expects a server to be
listening on. Running the trace forwarder with the new `--socket-path`
option and passing it the Hypervisor specific master UNIX socket path,
the trace forwarder will listen on the VSOCK port-specific socket path
to handle Kata agent traces.
For further details and examples, see the README or run the
trace forwarder with `--help`.
Fixes: #2786.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Adding a route that already exists should not be a reason for the agent to fail
booting and thus preventing the sandbox to start.
Fixes#2712
Signed-off-by: zhaojizhuang <571130360@qq.com>
The 'quiet' kernel parameter can avoid guest kernel logs while booting,
which can reduce boot time.
Fix: #2820
Signed-off-by: Bo Chen <chen.bo@intel.com>
We will need to have console output from the guest only for debugging
purposes. As a result, we can turn-off both the serial and
virtio-console devices by default for better boot time.
Fixes: #2820
Signed-off-by: Bo Chen <chen.bo@intel.com>
Variables in rust will be dropped at the end of the function.
In function real_main the trace will be shut down by `tracer::end_tracing()`,
but at this time the root span is in an active state, so this root span
will not be sent to the trace collector.
This can be fixed by dropping the root span manually.
Fixes: #2812
Signed-off-by: bin <bin@hyper.sh>
The variable for 'name' in config-settings.go.in was previously
hardcoded as "kata". In e7c42fb it was changed to the runtime name,
which is "kata-runtime". Add a variable to specify a syslog identifier
for consistency for tests and documentation that use it.
Fixes#2806
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Update the sandbox dir clean up logic to be more appropriate
Add different seeds for randInt() method
Fixes#2770
Signed-off-by: Feng Wang <feng.wang@databricks.com>
This patch adds an option "disable_seccomp" to the config
hypervisor.clh, from which users can disable the `seccomp`
feature from Cloud Hypervisor when needed (for debugging purposes).
Fixes: #2782
Signed-off-by: Bo Chen <chen.bo@intel.com>
This patch enables the `seccomp` feature from Cloud Hypervisor which
provides fine-grained allowed syscalls for each of its worker
threads. It brings important security benefits, while would increase
memory footprint.
Fixes: #2782
Signed-off-by: Bo Chen <chen.bo@intel.com>
Shim management server is running in a go routine, in test mode
this will cause the directory where the listen socket
file(/run/vc/sbs/777-77-77777777/shim-monitor.sock) in leak
after the tests finished.
Fixes: #2805
Signed-off-by: bin <bin@hyper.sh>
wait_for_pci_device() waits for the PCI device at the given path to become
ready, but it doesn't currently give you any meaningful handle on that
device.
Change the signature, so that it returns the PCI address of the device.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add a new pci::Address type which represents a guest PCI address in
DDDD:BB:SS.F form.
fixes#2745
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
pci::Slot represents a PCI slot. However, in all cases where we use it, we
actually care about addressing a specific PCI function. So, at the moment
we can only refer to function 0 in each slot.
Replace pci::Slot with pci::SlotFn to represent both the slot and function.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This commit does two chagnes:
- move code for managing temp users to rootless.go.
- use common function in qemu.go when shutdown the VM.
Fixes: #2759
Signed-off-by: bin <bin@hyper.sh>
From the endpoints string described through the configuration file, we
build a hash set of allowed enpoints. If a configuration files does not
include an endpoints section, we assume all endpoints are not allowed.
If there is no configuration file, then all endpoints are allowed.
Then for every ttrpc request, we check if the name of the endpoint is
part of the hashset. If it is not, then we return ttrcp::UNIMPLEMENTED.
Fixes: #1837
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
When the kernel command line includes a agent.config_file=<path> entry,
then we will try to override the default confiuguration values with the
ones we parse from a TOML file at <path>.
As the configuration file overrides the default values, we need to go
through a simplified builder that convert a set of Option<> fields into
the actual AgentConfig structure.
Fixes: #1837
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
They will define the list of endpoints that an agent supports.
They're empty and non actionable for now.
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
A single constructor setting default value is a typical pattern for a
Default implementation.
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>