Commit Graph

1714 Commits

Author SHA1 Message Date
Fabiano Fidêncio
5e9cf75937 vc: utils: Rename CalculateMilliCPUs() to CalculateCPUsF()
With the change done in the last commit, instead of calculating milli
cpus, we're actually converting the CPUs to a fraction number, a float.

Let's update the function name (and associated vars) to represent that
change.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-11-10 18:26:01 +01:00
Fabiano Fidêncio
e477ed0e86 runtime: Improve vCPU allocation for the VMMs
First of all, this is a controversial piece, and I know that.

In this commit we're trying to make a less greedy approach regards the
amount of vCPUs we allocate for the VMM, which will be advantageous
mainly when using the `static_sandbox_resource_mgmt` feature, which is
used by the confidential guests.

The current approach we have basically does:
* Gets the amount of vCPUs set in the config (an integer)
* Gets the amount of vCPUs set as limit (an integer)
* Sum those up
* Starts / Updates the VMM to use that total amount of vCPUs

The fact we're dealing with integers is logical, as we cannot request
500m vCPUs to the VMMs.  However, it leads us to, in several cases, be
wasting one vCPU.

Let's take the example that we know the VMM requires 500m vCPUs to be
running, and the workload sets 250m vCPUs as a resource limit.

In that case, we'd do:
* Gets the amount of vCPUs set in the config: 1
* Gets the amount of vCPUs set as limit: ceil(0.25)
* 1 + ceil(0.25) = 1 + 1 = 2 vCPUs
* Starts / Updates the VMM to use 2 vCPUs

With the logic changed here, what we're doing is considering everything
as float till just before we start / update the VMM. So, the flow
describe above would be:
* Gets the amount of vCPUs set in the config: 0.5
* Gets the amount of vCPUs set as limit: 0.25
* ceil(0.5 + 0.25) = 1 vCPUs
* Starts / Updates the VMM to use 1 vCPUs

In the way I've written this patch we introduce zero regressions, as
the default values set are still the same, and those will only be
changed for the TEE use cases (although I can see firecracker, or any
other user of `static_sandbox_resource_mgmt=true` taking advantage of
this).

There's, though, an implicit assumption in this patch that we'd need to
make explicit, and that's that the default_vcpus / default_memory is the
amount of vcpus / memory required by the VMM, and absolutely nothing
else.  Also, the amount set there should be reflected in the
podOverhead for the specific runtime class.

One other possible approach, which I am not that much in favour of
taking as I think it's **less clear**, is that we could actually get the
podOverhead amount, subtract it from the default_vcpus (treating the
result as a float), then sum up what the user set as limit (as a float),
and finally ceil the result.  It could work, but IMHO this is **less
clear**, and **less explicit** on what we're actually doing, and how the
default_vcpus / default_memory should be used.

Fixes: #6909

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2023-11-10 18:25:57 +01:00
Fabiano Fidêncio
b0157ad73a runtime: confidential: Do not set the max_vcpu to cpu
We don't have to do this since we're relying on the
`static_sandbox_resource_mgmt` feature, which gives us the correct
amount of memory and CPUs to be allocated.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-11-10 12:58:20 +01:00
Archana Shinde
1611723465 Merge pull request #8379 from likebreath/1103/clh_v36.0
Upgrade to Cloud Hypervisor v36.0
2023-11-08 21:10:41 -08:00
Archana Shinde
268d4d622f Merge pull request #8389 from justxuewei/vm-capable-test
runtime: Fix TestCheckHostIsVMContainerCapable unstablity issue
2023-11-08 12:14:04 -08:00
Archana Shinde
92a517156c Merge pull request #8367 from amshinde/add-nerdctl-ipvlan-test
network: Fix network hotplug for ipvlan and macvlan endpoints for qemu and add tests
2023-11-08 11:45:13 -08:00
Xuewei Niu
acd9057c7b runtime: Fix TestCheckHostIsVMContainerCapable unstablity issue
TestCheckHostIsVMContainerCapable removes sysModuleDir to simulate a
case that the kernel modules are not loaded. However,
checkKernelModules() executes modprobe <module> if a module not
found in that directory. Loading those modules is required to be denied
temporarily.

Fixes: #8390

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-08 22:40:08 +08:00
Archana Shinde
a6272733e7 network: Fix network hotplug for ipvlan and macvlan endpoints.
Since moving from network coldplug to hotplug, the only case verified
was veth endpoints. Support for network hotplug for ipvlan and macvlan was
broken/not added. Fix it.

Fixes: #8391

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-11-07 10:13:51 -08:00
Beraldo Leal
dd530ba8ee tests: fixes AMD errors
TestCheckHostIsVMContainerCapable is failing on AMD machines.
kata-check_amd64_test.go:96 has no AMD modules, also getCPUType is
missing.

Fixes #8384.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-11-06 16:49:59 +00:00
Beraldo Leal
7641c19f74 runtime: bump containerd for gogo deprecation
This update includes necessary changes due to the version bump of
containerd and its dependencies. It's part of a broader initiative to
phase out gogo protobuf, which has been deprecated, and to align with
the current supported libraries.

Fixes #7420.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-11-06 16:49:59 +00:00
Beraldo Leal
16fa2c39e6 protocols: replace gogo/types.Empty and Any
by Google versions.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-11-06 16:49:58 +00:00
Beraldo Leal
5d88c78a6e protocols: generating agent.pb.go
a3b003c345 modified agent but agent.pb.go
was not updated.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-11-06 16:49:58 +00:00
Bo Chen
071667f1ca runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v35.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.

Fixes: #8378

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-11-03 10:47:06 -07:00
Fabiano Fidêncio
40cc397218 Merge pull request #8255 from cmaf/migrate-checks-fixes-links
docs: Fix broken links
2023-11-01 14:46:30 +01:00
David Esparza
2a17d3889e Merge pull request #8334 from amshinde/ipvlan-nerdctl-fix
network: Fix network attach for ipvlan and macvlan
2023-10-30 16:00:32 -06:00
Archana Shinde
f53f86884f network: Fix network attach for ipvlan and macvlan
We used the approach of cold-plugging network interface for pre-shimv2
support for docker.Since the hotplug approach was not required,
we never really got to implementing hotplug support for certain network
endpoints, ipvlan and macvlan being among them.

Since moving to shimv2 interface as the default for
runtime, we switched to hotplugging the network interface for supporting
docker and nerdctl. This was done for veth endpoints only.

Implement the hot-attach apis for ipvlan and macvlan as well to support
ipvlan and macvlan networks with docker and nerdctl.

Fixes: #8333

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-10-27 21:42:37 -07:00
Chelsea Mafrica
0608e20a01 docs: Fix broken links
Update broken links so that static checks pass.

Fixes #8254

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2023-10-26 10:17:01 -07:00
James O. D. Hunt
d707fa2c0d kata-runtime/kata-ctl: Add security details to output
Add the hypervisor security details to the output of the `kata-runtime
env` and `kata-ctl env` commands so the user can see, amongst other
things, the value of `confidential_guest`.

Fixes: #8313.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-10-25 16:34:42 +01:00
Fabiano Fidêncio
328ba0da99 Merge pull request #7647 from jongwu/use_pcie_virt
AArch64: runtime: use pcie root port to do pci/pcie device hotplug
2023-10-25 09:17:13 +02:00
James O. D. Hunt
048cc70654 Merge pull request #8213 from jodh-intel/validate-hypervisor-cfg-name
runtime: Validate hypervisor section name in config file
2023-10-19 07:40:58 +01:00
Jianyong Wu
f9c9d8f645 runtime: QemuVirt: hotadd virtio-mem dev to pcie root port
Hotplug virtio-mem device to pcie root port for Qemu Virt.

Fixes: #7646
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-10-18 06:35:57 +00:00
Jianyong Wu
ef18c9550c runtime:qemuvirt: hotadd net dev to pcie root port
Hotplug network device to pcie root port as this is the only way on
QemuVirt.

Fixes: #7646
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-10-18 06:35:57 +00:00
Jianyong Wu
f1aec98f9d qemu/virt: use pcie_root_port to do device hotplug for virt
ACPI PCI device hotplug on qemu virt is not supported. The only way to
hotplug pci device is pcie native way. Thus we need create pcie root
port as default.

Pcie root port number depends on following:
1. reserved one for network device as default;
2. virtio-mem dev;
3. add enough port for vhost user blk dev;

Fixes: #7646
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-10-18 06:35:57 +00:00
Jianyong Wu
28a41e1d16 runtime: add a new API for Network interface
Add GetEndpointsNum API for Network Interface to get the number of
network endpoints. This is used for caculate the number of pcie root
port for QemuVirt.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-10-18 06:35:57 +00:00
James O. D. Hunt
3e8cf6959c runtime: Validate hypervisor section name in config file
Previously, if you accidentally modified the name of the hypervisor
section in the config file, the default golang runtime gives a cryptic
error message ("`VM memory cannot be zero`"). This can be demonstrated
using the `kata-runtime` utility program which uses the same golang
config package as the actual runtime (`containerd-shim-kata-v2`):

```bash
$ kata-runtime env >/dev/null; echo $?
0
$ sudo sed -i 's!^\[hypervisor\.qemu\]!\[hypervisor\.foo\]!g' /etc/kata-containers/configuration.toml
$ kata-runtime env >/dev/null; echo $?
VM memory cannot be zero
1
```

The hypervisor name is now validated so that the behaviour becomes:

```bash
$ kata-runtime env >/dev/null; echo $?
0
$ sudo sed -i 's!^\[hypervisor\.qemu\]!\[hypervisor\.foo\]!g' /etc/kata-containers/configuration.toml
$ ./kata-runtime env >/dev/null; echo $?
/etc/kata-containers/configuration.toml: configuration file contains invalid hypervisor section: "foo"
1
```

Fixes: #8212.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-10-12 13:53:37 +01:00
Peng Tao
d7660d82a0 runtime: unify gopkg.in/yaml.v3 to v3.0.1
The older versions have Denial of Service issues.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-10 03:56:45 +00:00
Peng Tao
fc9a107e8e runtime: unify swag and testify dependency
So that we don't need to depend on that many versions of them.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-10 03:56:45 +00:00
Peng Tao
79ebb959c5 runtime: update runc dependency to v1.1.9
To pick up security fixes.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-10 03:56:45 +00:00
Peng Tao
7f3e8bd65e runtime: unify golang.org/x/text to v0.7.0
The older versions contain security issues.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-10 03:56:45 +00:00
Peng Tao
df325ae371 runtime: update golang.org/x/net to v0.7.0
To pick up fix for the following issue:

A maliciously crafted HTTP/2 stream could cause excessive CPU
consumption in the HPACK decoder, sufficient to cause a denial of
service from a small number of small requests.

Fixes: #8190
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-10 03:56:39 +00:00
Zvonko Kaiser
7c934dc7da gpu: Fix cold-plug of VFIO devices
We need to do proper sandbox sizing when we're doing cold-plug introduce CDI,
the de-facto standard for enabling devices in containers. containerd
will pass-through annotations for accumulated CPU,Memory and now CDI
devices. With that information sandbox sizing can be derived correctly.

Fixes: #7331

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-09-28 09:49:13 +00:00
Greg Kurz
defbb64ac8 Merge pull request #8036 from rye-stripe/bugfix/overhead-metrics
runtime: fix reading cgroup stats of sandboxes
2023-09-27 19:39:55 +02:00
Bo Chen
dfd0c9fa9a runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v35.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.

Fixes: #8057

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-09-25 12:22:37 -07:00
Peteris Rudzusiks
94e2ccc2d5 runtime: fix reading cgroup stats of sandboxes
The cgroup stats come from resourcecontrol package in the form of pointers
to structs. The sandbox Stat() method incorrectly was expecting structs.
This caused the cpu and memory stats to always be 0, which in turn caused
incorrect pod overhead metrics.

Fixes #8035

Signed-off-by: Peteris Rudzusiks <rye@stripe.com>
2023-09-21 17:00:53 +02:00
Alexandru Matei
d507d189bb fc: Add support for noflush cache option
Firecracker supports noflush semantic via Unsafe cache type.
There is no support for direct i/o, remove it from config file

Fixes: #7823

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2023-09-21 14:48:24 +03:00
Alexandru Matei
2ca781518a clh: Direct IO support for block devices
Clh suports direct i/o for disks. It doesn't
offer any support for noflush, removed passing
of option to cloud-hypervisor internal config

Fixes: #7798

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2023-09-21 14:48:24 +03:00
Wainer Moschetta
87e64a07ed Merge pull request #7979 from beraldoleal/gogo-removal
protocol: remove gogoprotobuff tests
2023-09-20 22:38:10 -03:00
Beraldo Leal
730ef51693 deps: updating dependencies
Updating dependencies after make check, make test.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-19 16:54:35 -04:00
Dan Mihai
82ff2db460 runtime: support kernel params including spaces
Support quoted kernel command line parameters that include space
characters. Example:

dm-mod.create="dm-verity,,,ro,0 736328 verity 1
/dev/vda1 /dev/vda2 4096 4096 92041 0 sha256
f211b9f1921ef726d57a72bf82be23a510076639fa8549ade10f85e214e0ddb4
065c13dfb5b4e0af034685aa5442bddda47b17c182ee44ba55a373835d18a038"

Fixes: #8003

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-09-19 20:26:38 +00:00
Beraldo Leal
604a9dd673 protocol: remove gogoprotobuff tests
This is part of a bigger effort to drop gogoprotobuff from our code
base. IIUC, those options are basically used by *pb_test.go, and since
we are dropping gogoprotobuff and those are auto generated tests, let's
just remove it.

Fixes #7978.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-19 12:55:42 -04:00
Fabiano Fidêncio
84c0d59d23 Merge pull request #7985 from fidencio/topic/clh-use-static_sandbox_resource_mgmt-as-default-on-arm
clh: arm: Use static_sandbox_resource_mgmt=true
2023-09-19 09:25:34 +02:00
Fabiano Fidêncio
c3ee913bf6 Merge pull request #7953 from gkurz/extra-monitor-socket
runtime/qemu: Rework QMP/HMP support
2023-09-18 19:04:14 +02:00
Fabiano Fidêncio
72599f1911 clh: arm: Use static_sandbox_resource_mgmt=true
Users have noticed that this is needed, as CLH does not yet implement a
way to hotplug resources on aarh64.

With this patch, when building for x86_64, I can see the this is the
resulting config:
```
$ ARCH=amd64 make
...

$ cat config/configuration-clh.toml | grep static_sandbox_resource_mgmt
static_sandbox_resource_mgmt=false

```

And when building for aarch64:
```
$ ARCH=arm64 make
...

$ cat config/configuration-clh.toml | grep static_sandbox_resource_mgmt
static_sandbox_resource_mgmt=true
```

Fixes: #7941

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 14:14:10 +02:00
Jeremi Piotrowski
dfa6af54df Merge pull request #7806 from jongwu/clh_serial
clh:arm64: use arm AMBA UART for hypervisor debug
2023-09-18 12:29:07 +02:00
Greg Kurz
1f16b6627b runtime/qemu: Rework QMP/HMP support
PR #6146 added the possibility to control QEMU with an extra HMP socket
as an aid for debugging. This is great for development or bug chasing
but this raises some concerns in production.

The HMP monitor allows to temper with the VM state in a variety of ways.
This could be intentionally or mistakenly used to inject subtle bugs in
the VM that would be extremely hard if not even impossible to debug. We
definitely don't want that to be enabled by default.

The feature is currently wired to the `enable_debug` setting in the
`[hypervisor.qemu]` section of the configuration file. This setting has
historically been used to control "debug output" and it is used as such
by some downstream users (e.g. Openshift). Forcing people to have the
extra HMP backdoor at the same time is abusive and dangerous.

A new `extra_monitor_socket` is added to `[hypervisor.qemu]` to give
fine control on whether the HMP socket is wanted or not. This setting
is still gated by `enable_debug = true` to make it clear it is for
debug only. The default is to not have the HMP socket though. This
isn't backward compatible with #6416 but it is for the sake of "better
safe than sorry".

An extra monitor socket makes the QEMU instance untrusted. A warning is
thus logged to the journal when one is requested.

While here, also allow the user to choose between HMP and QMP for the
extra monitor socket. Motivation is that QMP offers way more options to
control or introspect the VM than HMP does. Users can also ask for
pretty json formatting well suited for human reading. This will improve
the debugging experience.

This feature is only made visible in the base and GPU configurations
of QEMU for now.

Fixes #7952

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-09-18 12:13:01 +02:00
Peng Tao
6eedd9b0b9 Merge pull request #7738 from Xuanqing-Shi/7732/handle-non-empty-endpoints-in-RemoveEndpoints
runtime: incorrect handling of non-empty []Endpoint parameter in Remo…
2023-09-18 10:58:28 +08:00
Jianyong Wu
241c355e07 clh:arm64: use arm AMBA uart for hypervisor debug
cloud hypervisor on arm64 only support arm AMBA UART(pl011) as
tty. So, the console should be set to "ttyAMA0" instead of "ttyS0"
when enable hypervisor debug mode.

Fixes: #5080
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-09-15 01:44:23 +00:00
Jeremi Piotrowski
3a1db7a86b runtime: clh: Support enabling iommu
by enabling IOMMU on the default PCI segment. For hotplug to work we need a
virtualized iommu and clh exposes one if there is some device or PCI segment
that requests it. I would have preferred to add a separate PCI segment for
hotplugging vfio devices but unfortunately kata assumes there is only one
segment all over the place. See create_pci_root_bus_path(),
split_vfio_pci_option() and grep for '0000'.

Enabling the IOMMU on the default PCI segment requires passing enabling IOMMU on
every device that is attached to it, which is why it is sprinkled all over the
place.

CLH does not support IOMMU for VirtioFs, so I've added a non IOMMU segment for
that device.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
bfc93927fb runtime: Remove redundant check in checkPCIeConfig
There is no way for this branch to be hit, as port is only set when it is
different than config.NoPort.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
7c4e73b609 runtime: Add test cases for checkPCIeConfig
These test cases shows which options are valid for CLH/Qemu, and test that we
correctly catch unsupported combinations.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00