VhostUserDeviceAttrs::PCIAddr didn't actually store a PCI address
(DDDD:BB:DD.F), but rather a PCI path. Use the PciPath type and
rename things to make that clearer.
TestHandleBlockVolume previously used the bizarre value "0001:01"
which is neither a PCI address nor a PCI path for this value. Change
it to a valid PCI path - it appears the actual value didn't matter for
that test, as long as it was consistent.
Forward port of
3596058c67fixes#1040
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
BlockDrive::PCIAddr doesn't actually store a PCI address
(DDDD:BB:DD.F) but a PCI path. Use the PciPath type and rename things
to make that clearer.
TestHandleBlockVolume() previously used a bizarre value "0002:01" for
the "PCI address" which was neither an actual PCI address, nor a PCI
path. Update it to use a PCI path - the actual value appears not to
matter in this test, as long as its consistent throughout.
Forward port of
64751f377bfixes#1040
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The "PCI address" returned by Endpoint::PciPath() isn't actually a PCI
address (DDDD:BB:DD.F), but rather a PCI path. Rename and use the
PciPath type to clean this up and the various parts of the network
code connected to it.
Forward port of
3e589713cffixes#1040
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Now that we have types to represent PCI paths on both the agent and
runtime sides, we can update the protocol definitionto use clearer
terminology.
Note that this doesn't actually change the agent protocol, because it just
renames a field without changing its field ID or type.
While we're there fix a trivial rustfmt error in
src/agent/protocols/build.rs
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is a dedicated data type for representing PCI paths, that is, PCI
devices described by the slot numbers of the bridges we need to reach
them.
There are a number of places that uses strings with that structure for
things. The plan is to use this data type to consolidate their
handling. These are essentially Go equivalents of the pci::Slot and
pci::Path types introduced in the Rust agent.
Forward port of
185b3ab044
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Today we only clear out the cpuset details when doing an update call on
existing container/pods. This works in the case of Kubernetes, but not
in the case where we are explicitly setting the cpuset details at boot
time. For example, if you are running a single container via docker ala:
docker run --cpuset-cpus 0-3 -it alpine sh
What would happen is the cpuset info would be passed in with the
container spec for create container request to the agent. At that point
in time, there'd only be the defualt number of CPUs available in the
guest (1), so you'd be left with cpusets set to 0. Next, we'd hotplug
the vCPUs, providing 0-4 CPUs in the guest, but the cpuset would never
be updated, leaving the application tied to CPU 0.
Ouch.
Until the day we support cpusets in the guest, let's make sure that we
start off clearing the cpuset fields.
Fixes: #1405
Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
This commit includes two changes:
- migrate from opentracing to opentelemetry
- add jaeger configuration items
Fixes: #1351
Signed-off-by: bin <bin@hyper.sh>
acpi is enabled for kata 1.x, port and rebase code for 2.x
including:
runtime: enable pflash;
agent: add acpi support for pci bus path;
packaging: enable CONFIG_RTC_DRV_EFI;
Fixes: #1317
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Highlights for cloud-hypervisor version v0.12.0 include: removal of
`vhost-user-net` and `vhost-user-block` self spawning, migration of
`vhost-user-fs` backend, ARM64 enhancements with full support of
`--watchdog` for rebooting, and enhanced `info` HTTP API to include the
details of devices used by the VM including VFIO devices.
Fixes: #1315
Signed-off-by: Bo Chen <chen.bo@intel.com>
On pod delete, we were looking to read files that we had just deleted. In particular,
stopSandbox for QEMU was called (we cleanup up vmpath), and then QEMU's
save function was called, which immediately checks for the PID file.
Let's only update the persist store for QEMU if QEMU is actually
running. This'll avoid Error messages being displayed when we are
stopping and deleting a sandbox:
```
level=error msg="Could not read qemu pid file"
```
I reviewed CLH, and it looks like it is already taking appropriate
action, so no changes needed.
Ideally we won't spend much time saving state to persist.json unless
there's an actual error during stop/delete/shutdown path, as the persist will
also be removed after the pod is removed. We may want to optimize this,
as currently we are doing a persist store when deleting each container
(after the sandbox is stopped, VM is killed), and when we stop the sandbox.
This'll require more rework... tracked in:
https://github.com/kata-containers/kata-containers/issues/1181Fixes: #1179
Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
If the upcast from resultingRoutes to *grpc.IRoutes fails, we return
(nil, err), but previous code ensures that err is nil at that point, so we
return no error.
fixes#1206
Forward port of
0ffaeeb5d8
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
If the upcast from resultingInterfaces to *grpc.Interfaces fails, we
return (nil, err), but previous code ensures that err is nil at that
point, so we return no error.
Forward port of
b86e904c2dfixes#1206
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
oci.proto imports "google/protobuf/wrappers.proto", but doesn't appear to
use it, which causes a warning from protoc when we compile it. Remove the
import to fix the warning.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We should always cleanup the vm directory when doing `stopSandbox`,
while we are skipping the cleanup process on some error code paths when
using cloud-hypervisor driver.
Fixes: #1098
Signed-off-by: Bo Chen <chen.bo@intel.com>
In PR 1079, CleanupContainer's parameter of sandboxID is changed to VCSandbox, but at cleanup,
there is no VCSandbox is constructed, we should load it from disk by loadSandboxConfig() in
persist.go. This commit reverts parts of #1079Fixes: #1119
Signed-off-by: bin liu <bin@hyper.sh>
Remove global sandbox variable, and save *Sandbox to hypervisor struct.
For some needs, hypervisor may need to use methods from Sandbox.
Signed-off-by: bin liu <bin@hyper.sh>
Update cloud-hypervisor to commit 2706319.
Fixes a limitation in OpenAPITools/openapi-generator tool,
it's impossible to send go zero types, like false and 0 to
cloud-hypervisor because `omitempty` is added if a field is not
required.
See cloud-hypervisor/cloud-hypervisor#1961 for more information
Signed-off-by: Julio Montes <julio.montes@intel.com>
Guest consumes 120Mb more of memory when DAX is enabled and the default
FS cache size (8G) is used. Disable dax when it is not required
reducing guest's memory footprint.
Without this patch:
```
7fdea4000000-7fdee4000000 rw-s 18850589 /memfd:ch_ram (deleted)
Size: 1048576 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 187876 kB
```
With this patch:
```
7fa970000000-7fa9b0000000 rw-s 612001 /memfd:ch_ram (deleted)
Size: 1048576 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 57308 kB
Pss: 56722 kB
```
fixes#1100
Signed-off-by: Julio Montes <julio.montes@intel.com>
Use `if err := q.qmpSetup(); err != nil` to reduce code and make it easy
to read. And remove checking err if last function call also return an error,
return the function call directly.
Fixes: #1081
Signed-off-by: bin liu <bin@hyper.sh>
The release v0.11.0 of cloud-hypervisor features the following changes:
1) Improved Linux Boot Time, 2) `SIGTERM/SIGINT` Interrupt Signal,
Handling 3) Default Log Level Changed, 4) `io_uring` support by default
for `virtio-block` (on host kernel version 5.8+), 5) Windows Guest
Support, 6) New `--balloon` Parameter Added, 7) Experimental
`virtio-watchdog` Support, 8) Bug fixes.
Fixes: #1089
Signed-off-by: Bo Chen <chen.bo@intel.com>
Make `asset.go` the arbiter of asset annotations by removing all asset
annotations lists from other parts of the codebase.
This makes the code simpler, easier to maintain, and more robust.
Specifically, the previous behaviour was inconsistent as the following
ways:
- `createAssets()` in `sandbox.go` was not handling the following asset
annotations:
- firmware:
- `io.katacontainers.config.hypervisor.firmware`
- `io.katacontainers.config.hypervisor.firmware_hash`
- hypervisor:
- `io.katacontainers.config.hypervisor.path`
- `io.katacontainers.config.hypervisor.hypervisor_hash`
- hypervisor control binary:
- `io.katacontainers.config.hypervisor.ctlpath`
- `io.katacontainers.config.hypervisor.hypervisorctl_hash`
- jailer:
- `io.katacontainers.config.hypervisor.jailer_path`
- `io.katacontainers.config.hypervisor.jailer_hash`
- `addAssetAnnotations()` in the `oci` package was not handling the
following asset annotations:
- hypervisor:
- `io.katacontainers.config.hypervisor.path`
- `io.katacontainers.config.hypervisor.hypervisor_hash`
- hypervisor control binary:
- `io.katacontainers.config.hypervisor.ctlpath`
- `io.katacontainers.config.hypervisor.hypervisorctl_hash`
- jailer:
- `io.katacontainers.config.hypervisor.jailer_path`
- `io.katacontainers.config.hypervisor.jailer_hash`
This change fixes the bug where specifying a custom hypervisor path via an
asset annotation was having no effect.
Fixes: #1085.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Add missing annotation definitions for a hypervisor control binary:
- `io.katacontainers.config.hypervisor.ctlpath`
- `io.katacontainers.config.hypervisor.hypervisorctl_hash`
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
When guest panic, dump guest kernel memory to host filesystem.
And also includes:
- hypervisor config
- hypervisor version
- and state of sandbox
Fixes: #1012
Signed-off-by: bin liu <bin@hyper.sh>
bindmount remount events are not propagated through mount subtrees,
so we have to remount the shared dir mountpoint directly.
E.g.,
```
mkdir -p source dest foo source/foo
mount -o bind --make-shared source dest
mount -o bind foo source/foo
echo bind mount rw
mount | grep foo
echo remount ro
mount -o remount,bind,ro source/foo
mount | grep foo
```
would result in:
```
bind mount rw
/dev/xvda1 on /home/ubuntu/source/foo type ext4 (rw,relatime,discard,data=ordered)
/dev/xvda1 on /home/ubuntu/dest/foo type ext4 (rw,relatime,discard,data=ordered)
remount ro
/dev/xvda1 on /home/ubuntu/source/foo type ext4 (ro,relatime,discard,data=ordered)
/dev/xvda1 on /home/ubuntu/dest/foo type ext4 (rw,relatime,discard,data=ordered)
```
The reason is that bind mount creats new mount structs and attaches them to different mount subtrees.
However, MS_REMOUNT only looks for existing mount structs to modify and does not try to propagate the
change to mount structs in other subtrees.
Fixes: #1061
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
In cloud-hypervisor, it provides a single unified way of unplugging
devices, e.g. the `/vm.RemoveDevice` HTTP API. Taking advantage of this
API, we can simplify our implementation of `hotplugRemoveDevice` in
`clh.go`, where we can consolidate similar code paths for different
device unplug (e.g. no need to implement `hotplugRemoveBlockDevice` and
`hotplugRemoveVfioDevice` separately). We will only need to retrieve the
right `deviceID` based on the type of devices, and use the single
unified HTTP API for device unplug.
Fixes: #1076
Signed-off-by: Bo Chen <chen.bo@intel.com>
Cloud-hypervisor supports DAX, let's enable it to reduce its memory
footprint.
Before this patch:
**19.96M**
```
20448kB -- [/usr/share/kata-containers/kata.img]
```
With this patch:
**10.83M**
```
11100kB -- [/usr/share/kata-containers/kata.img]
```
fixes#1056
Signed-off-by: Julio Montes <julio.montes@intel.com>
Allow API consumers to change the maximum number of ports in the
virtio-serial devices, setting a lower number of ports can improve the
boot time and reduce the attack surface.
Before this patch on arm64:
[ 0.028664] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[ 0.055031] printk: console [hvc0] enabled
After this patch on arm64:
[ 0.028484] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[ 0.031370] printk: console [hvc0] enabled
Fixes: #2676
Signed-off-by: Jia He <justin.he@arm.com>