Remove earlier functionality that tries to assign PCI path to vfio
devices from the host assuming pci slots to start from 1.
Get this from the hypervisor instead.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Block(virtio-blk) and vfio devices are currently not handled correctly
by the agent as the agent is not provided with correct PCI paths for
these devices.
The PCI paths for these devices can be inferred from the PCI information
provided by the hypervisor when the device is added.
Hence changing the add_device trait function to return a device copy
with PCI info potentially provided by the hypervisor. This can then be
provided to the agent to correctly detect devices within the VM.
This commit includes implementation for PCI info update for
cloud-hupervisor for virtio-blk devices with stubs provided for other
hypervisors.
Removing Vsock from the DeviceType enum as Vsock currently does not
implement the Device Trait, it has no attach and detach trait functions
among others. Part of the reason is because these functions require Vsock
to implement Clone trait as these functions need cloned copies to be
passed down the hypervisor.
The change introduced for returning a device copy from the add_device
hypervisor trait explicitly requires a device to implement
Copy trait. Hence removing Vsock from the DeviceType enum for now, as
its implementation is incomplete and not currently used.
Note, one of the blockers for adding the Clone trait to Vsock is that it
currently includes a file handle which cannot be cloned. For Clone and
Device Traits to be implemented for Vsock, it requires an implementation
change in the future for it to be cloneable.
Fixes: #8283
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
By modifying RuntimeLevelFilter drain to improve logging control,
enabling isolation of change effect of the loggers between components,
tuning clh logs to be logged according to their log levels
given by cloud-hypervisor.
Fixes: #8310
Signed-off-by: Ruoqing He <linuxwatcher@outlook.com>
If you attempt to create a container (a TD) on a TDX system using a
custom build of Cloud Hypervisor (CH) that was not built with the `tdx`
CH feature, Kata will report the following, somewhat cryptic, CH error:
```
ApiError(VmBoot(InvalidPayload))
```
Newer versions of CH now report their build-time features in the ping
API response message so we now use that, if available, to detect this
scenario and generate a user-friendly error message instead.
This changes improves the readability of `handle_guest_protection()` and
adds a couple of additional tests for that method.
Fixes: #8152.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Improve the way `handle_guest_protection()` is structured by inverting
the logic and checking the value of the `confidential_guest` setting
before checking the guest protection. This makes the code easier to
understand.
> **Notes:**
>
> - This change also unconditionally saves the available guest protection
> (where previously it was only saved when `confidential_guest=true`).
> This explains the minor unit test fix.
>
> - This changes also errors if the CH driver finds an unexpected
> protection (since only Intel TDX is currently tested).
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Improve the `GuestProtection` handling to detect the version of
Intel TDX available.
The TDX version is now logged by the Cloud Hypervisor driver.
Fixes: #8147.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
This change adds support for adding and removing vfio devices for
cloud-hypervisor.
Fixes: #6691
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Enable the Cloud Hypervisor driver (the `cloud-hypervisor` build feature) for the rust runtime.
Fixes: #6264.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Allow Cloud Hypervisor to create a confidential guest (a TD or
"Trust Domain") rather than a VM (Virtual Machine) on Intel systems
that provide TDX functionality.
> **Notes:**
>
> - At least currently, when built with the `tdx` feature, Cloud Hypervisor
> cannot create a standard VM on a TDX capable system: it can only create
> a TD. This implies that on TDX capable systems, the Kata Configuration
> option `confidential_guest=` must be set to `true`. If it is not, Kata
> will detect this and display the following error:
>
> ```
> TDX guest protection available and must be used with Cloud Hypervisor (set 'confidential_guest=true')
> ```
>
> - This change expands the scope of the protection code, changing
> Intel TDX specific booleans to more generic "available guest protection"
> code that could be "none" or "TDX", or some other form of guest
> protection.
Fixes: #6448.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Introduce a few new constants (for PCI segment count and FS queues) and
move the disk queue constants to `convert.rs` to allow them to be used
there too.
> **Note:**
>
> This change gives the `ShareFs` code it's own set of values rather
> than relying on the disk queue constants.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Modify the Cloud Hypervisor `add_device()` method to add `ShareFs` and
`Network` devices to the list of pending devices since only these two
device types need to be cached before VM startup. Full details in the
comments.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
This PR adds support for adding a network device before starting the
cloud-hypervisor VM.
Support for adding and removing network devices is not really added to
the resource manager, so supporting this for cloud-hypervisor is not
scoped in this PR.
This also changes "pending_devices" for clh implementation from an
Option of vector to simply a vector. This simplifies the structure a bit
as we can simple iterate over the pending devices instead of having to
check for a "Some" value as this is not really required.
Fixes: #6333
Signed-off-by: Shuaiyi Zhang <zhang_syi@qq.com>
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
This commit allows us to specify the huge page backend when enabling huge
page. Currently, we support two backends: thp and hugetlbfs, the default
is hugetlbfs.
To ensure backward compatibility, we introduce another configuration item
"hugepage_type" to select the memory backend, which is available only when
"enable_hugepages" is true. Besides, we add an annotation
"io.katacontainers.config.hypervisor.hugepage_type" to configure huge page
type per pod.
Fixes: #6703
Signed-off-by: Guixiong Wei <weiguixiong@bytedance.com>
Signed-off-by: Yipeng Yin <yinyipeng@bytedance.com>
Removed the following kernel command line options:
- `earlyprintk=ttyS0`
- `initcall_debug`
Both these options are only useful when debugging a guest kernel failure
which is not a common occurrence.
Further, the `earlyprintk=` option can have a large negative performance
impact (it can increase the VM boot time significantly).
If the user wishes to use either of these options, they can add them to the
`kernel_params=` setting in the Kata configuration file's hypervisor
stanza.
Fixes: #7886.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Currently, virtio_vsock are still outside of the device
manager. This causes some management issues,such as the
inability to unify PCI address management.
Just do some work for hybrid vsock.
Fixes: #7655
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Kata containers as VM-based containers are allowed to run in the host
netns. That is, the network is able to isolate in the L2. The network
performance will benefit from this architecture, which eliminates as many
hops as possible. We called it a Directly Attachable Network (DAN for
short).
The network devices are placed at the host netns by the CNI plugins. The
configs are saved at {dan_conf}/{sandbox_id}.json in the format of JSON,
including device name, type, and network info. At the very beginning stage,
the DAN only supports host tap devices. More devices, like the DPDK, will
be supported in later versions.
The format of file looks like as below:
```json
{
"netns": "/path/to/netns",
"devices": [{
"name": "eth0",
"guest_mac": "xx:xx:xx:xx:xx",
"device": {
"type": "vhost-user",
"path": "/tmp/test",
"queue_num": 1,
"queue_size": 1
},
"network_info": {
"interface": {
"ip_addresses": ["192.168.0.1/24"],
"mtu": 1500,
"ntype": "tuntap",
"flags": 0
},
"routes": [{
"dest": "172.18.0.0/16",
"source": "172.18.0.1",
"gateway": "172.18.31.1",
"scope": 0,
"flags": 0
}],
"neighbors": [{
"ip_address": "192.168.0.3/16",
"device": "",
"state": 0,
"flags": 0,
"hardware_addr": "xx:xx:xx:xx:xx"
}]
}
}]
}
```
Fixes: #1922
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
1. Implemented metrics collection for runtime-rs shim and dragonball hypervisor.
2. Described the current supported metrics in runtime-rs.(docs/design/kata-metrics-in-runtime-rs.md)
Fixes: #5017
Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
In order to make it easier for developers to contribute to Dragonball,
we decide to migrate all dragonball-sandbox crates to Kata.
fixes: #7262
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Add an extra parameter in `bind_mount_unchecked` to specify
the propagation type: "shared" or "slave".
Fixes: #7017
Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
Currently, network endpoints are separate from the device manager
and need to be included for proper management. In order to do so,
we need to refactor the implementation of the network endpoints.
The first step is to restructure the NetworkConfig and NetworkDevice
structures.
Next, we will implement the virtio-net driver and add the Network
device to the Device Manager.
Finally, we'll unify entries with do_handle_device for each endpoint.
Fixes: #7215
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Let's take the same approach of the go runtime, instead, and allocate
the maximum allowed number of vcpus instead.
Fixes: #7270
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Vfio support introduce build error on AArch64. Remove arch related
annotation can avoid this error.
Fixes: #7187
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Unlike the previous usage which requires creating
/dev/xxx by mknod on the host, the new approach will
fully utilize the DirectVolume-related usage method,
and pass the spdk controller to vmm.
And a user guide about using the spdk volume when run
a kata-containers. it can be found in docs/how-to.
Fixes: #6526
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Limitations:
As no ready rust vmm's vfio manager is ready, it only supports
part of vfio in runtime-rs. And the left part is to call vmm
interfaces related to vfio add/remove.
So when vmm/vfio manager ready, a new PR will be pushed to
narrow the gap.
Fixes: #6525
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Support vcpu resizing on runtime side:
1. Calculate vcpu numbers in resource_manager using all the containers'
linux_resources in the spec.
2. Call the hypervisor(vmm) to do the vcpu resize.
3. Call the agent to online vcpus.
Fixes: #5030
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
The key aspects of the DM implementation refactoring as below:
1. reduce duplicated code
Many scenarios have similar steps when adding devices. so to reduce
duplicated code, we should create a common method abstracted and use
it in various scenarios.
do_handle_device:
(1) new_device with DeviceConfig and return device_id;
(2) try_add_device with device_id and do really add device;
(3) return device info of device's info;
2. return full info of Device Trait get_device_info
replace the original type DeviceConfig with full info DeviceType.
3. refactor find_device method.
Fixes: #5656
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>