Commit Graph

266 Commits

Author SHA1 Message Date
David Gibson
72cb9287a0 vhost-user-blk: Use PciPath type for vhost user devices
VhostUserDeviceAttrs::PCIAddr didn't actually store a PCI address
(DDDD:BB:DD.F), but rather a PCI path.  Use the PciPath type and
rename things to make that clearer.

TestHandleBlockVolume previously used the bizarre value "0001:01"
which is neither a PCI address nor a PCI path for this value.  Change
it to a valid PCI path - it appears the actual value didn't matter for
that test, as long as it was consistent.

Forward port of
3596058c67

fixes #1040

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-02-19 09:56:08 +11:00
David Gibson
74f5b5febe runtime/block: Use PciPath type through block code
BlockDrive::PCIAddr doesn't actually store a PCI address
(DDDD:BB:DD.F) but a PCI path.  Use the PciPath type and rename things
to make that clearer.

TestHandleBlockVolume() previously used a bizarre value "0002:01" for
the "PCI address" which was neither an actual PCI address, nor a PCI
path.  Update it to use a PCI path - the actual value appears not to
matter in this test, as long as its consistent throughout.

Forward port of
64751f377b

fixes #1040

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-02-19 09:56:08 +11:00
David Gibson
32b40f5fe4 runtime/network: Use PciPath type through network handling
The "PCI address" returned by Endpoint::PciPath() isn't actually a PCI
address (DDDD:BB:DD.F), but rather a PCI path.  Rename and use the
PciPath type to clean this up and the various parts of the network
code connected to it.

Forward port of
3e589713cf

fixes #1040

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-02-19 09:56:08 +11:00
David Gibson
7e92831c7a protocols: Update PCI path names / terminology in agent protocol def
Now that we have types to represent PCI paths on both the agent and
runtime sides, we can update the protocol definitionto use clearer
terminology.

Note that this doesn't actually change the agent protocol, because it just
renames a field without changing its field ID or type.

While we're there fix a trivial rustfmt error in
src/agent/protocols/build.rs

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-02-19 09:56:08 +11:00
David Gibson
8e5fd8ee84 runtime: Introduce PciSlot and PciPath types
This is a dedicated data type for representing PCI paths, that is, PCI
devices described by the slot numbers of the bridges we need to reach
them.

There are a number of places that uses strings with that structure for
things.  The plan is to use this data type to consolidate their
handling.  These are essentially Go equivalents of the pci::Slot and
pci::Path types introduced in the Rust agent.

Forward port of
185b3ab044

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-02-19 09:56:05 +11:00
Eric Ernst
3721351324 runtime: cpuset: when creating container, don't pass cpuset details
Today we only clear out the cpuset details when doing an update call on
existing container/pods. This works in the case of Kubernetes, but not
in the case where we are explicitly setting the cpuset details at boot
time. For example, if you are running a single container via docker ala:

docker run --cpuset-cpus 0-3 -it alpine sh

What would happen is the cpuset info would be passed in with the
container spec for create container request to the agent. At that point
in time, there'd only be the defualt number of CPUs available in the
guest (1), so you'd be left with cpusets set to 0. Next, we'd hotplug
the vCPUs, providing 0-4 CPUs in the guest, but the cpuset would never
be updated, leaving the application tied to CPU 0.

Ouch.

Until the day we support cpusets in the guest, let's make sure that we
start off clearing the cpuset fields.

Fixes: #1405

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2021-02-11 17:38:15 -08:00
Fupan Li
5d1432210c Merge pull request #1352 from liubin/fix/migrate-opentracing-to-opentelemetry
runtime: migrate from opentracing to opentelemetry
2021-02-09 10:18:10 +08:00
Chelsea Mafrica
38b5a43267 Merge pull request #1318 from jongwu/acpi
arm64: enable acpi for qemu/virt.
2021-02-03 16:37:49 -08:00
bin
17df9b119d runtime: migrate from opentracing to opentelemetry
This commit includes two changes:
- migrate from opentracing to opentelemetry
- add jaeger configuration items

Fixes: #1351

Signed-off-by: bin <bin@hyper.sh>
2021-02-03 17:30:49 +08:00
Jianyong Wu
b7a1f752c0 arm64: enable acpi for qemu/virt.
acpi is enabled for kata 1.x, port and rebase code for 2.x
including:
runtime: enable pflash;
agent: add acpi support for pci bus path;
packaging: enable CONFIG_RTC_DRV_EFI;

Fixes: #1317
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-01-29 22:12:43 +08:00
Bo Chen
c2d14cdeea versions: Update cloud-hypervisor to release v0.12.0
Highlights for cloud-hypervisor version v0.12.0 include: removal of
`vhost-user-net` and `vhost-user-block` self spawning, migration of
`vhost-user-fs` backend, ARM64 enhancements with full support of
`--watchdog` for rebooting, and enhanced `info` HTTP API to include the
details of devices used by the VM including VFIO devices.

Fixes: #1315

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-01-25 10:58:19 -08:00
Eric Ernst
789fd7c1c6 blk-dev: hotplug readonly if applicable
If a block based volume is read only, let's make sure we add as a RO
device

Fixes: #1246

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2021-01-12 14:50:54 -08:00
Eric Ernst
12777b26e4 volumes: cleanup / minor refactoring
Update some headers, very minor refactoring

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2021-01-12 14:50:47 -08:00
Eric Ernst
542e93d987 Merge pull request #1180 from egernst/qemu-cleanup-check
qemu: no state to save if QEMU isn't running
2021-01-06 11:17:54 -08:00
Eric Ernst
9a7bcccc8e qemu: no state to save if QEMU isn't running
On pod delete, we were looking to read files that we had just deleted. In particular,
stopSandbox for QEMU was called (we cleanup up vmpath), and then QEMU's
save function was called, which immediately checks for the PID file.

Let's only update the persist store for QEMU if QEMU is actually
running. This'll avoid Error messages being displayed when we are
stopping and deleting a sandbox:

```
level=error msg="Could not read qemu pid file"
```

I reviewed CLH, and it looks like it is already taking appropriate
action, so no changes needed.

Ideally we won't spend much time saving state to persist.json unless
there's an actual error during stop/delete/shutdown path, as the persist will
also be removed after the pod is removed. We may want to optimize this,
as currently we are doing a persist store when deleting each container
(after the sandbox is stopped, VM is killed), and when we stop the sandbox.
This'll require more rework... tracked in:
  https://github.com/kata-containers/kata-containers/issues/1181

Fixes: #1179

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-12-21 11:29:44 -08:00
David Gibson
e004616b02 runtime/network: Fix error reporting in listRoutes()
If the upcast from resultingRoutes to *grpc.IRoutes fails, we return
(nil, err), but previous code ensures that err is nil at that point, so we
return no error.

fixes #1206

Forward port of
0ffaeeb5d8

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-12-18 14:36:09 +11:00
David Gibson
1ae8e81abb runtime/network: Correct error reporting in listInterfaces()
If the upcast from resultingInterfaces to *grpc.Interfaces fails, we
return (nil, err), but previous code ensures that err is nil at that
point, so we return no error.

Forward port of
b86e904c2d

fixes #1206

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-12-18 14:35:50 +11:00
David Gibson
a19263e58d agent/protocols: Remove unneeded import from oci.proto
oci.proto imports "google/protobuf/wrappers.proto", but doesn't appear to
use it, which causes a warning from protoc when we compile it.  Remove the
import to fix the warning.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-12-17 13:06:41 +11:00
Bo Chen
647331ace6 runtime: clh: Enforce to call 'cleanupVM' for 'stopSandbox'
We should always cleanup the vm directory when doing `stopSandbox`,
while we are skipping the cleanup process on some error code paths when
using cloud-hypervisor driver.

Fixes: #1098

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-12-01 17:27:44 -08:00
bin liu
fdbf7d3222 virtcontainers: revert CleanupContainer from PR 1079
In PR 1079, CleanupContainer's parameter of sandboxID is changed to VCSandbox, but at cleanup,
there is no VCSandbox is constructed, we should load it from disk by loadSandboxConfig() in
persist.go. This commit reverts parts of #1079

Fixes: #1119

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-17 10:31:33 +08:00
bin liu
4e3a8c0124 runtime: remove global sandbox variable
Remove global sandbox variable, and save *Sandbox to hypervisor struct.
For some needs, hypervisor may need to use methods from Sandbox.

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-13 09:47:09 +08:00
bin liu
290203943c runtime: delete sandboxlist.go and sandboxlist_test.go
Delete sandboxlist.go and sandboxlist_test.go under virtcontainers package.

Fixes: #1078

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-13 09:47:09 +08:00
Julio Montes
36f65ce182 runtime: clh: update cloud-hypervisor
Update cloud-hypervisor to commit 2706319.
Fixes a limitation in OpenAPITools/openapi-generator tool,
it's impossible to send go zero types, like false and 0 to
cloud-hypervisor because `omitempty` is added if a field is not
required.
See cloud-hypervisor/cloud-hypervisor#1961 for more information

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-11-12 09:33:56 -06:00
Julio Montes
e1396f0402 runtime: clh: disable virtiofs DAX when FS cache size is 0
Guest consumes 120Mb more of memory when DAX is enabled and the default
FS cache size (8G) is used. Disable dax when it is not required
reducing guest's memory footprint.

Without this patch:

```
7fdea4000000-7fdee4000000 rw-s 18850589 /memfd:ch_ram (deleted)
Size:            1048576 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:              187876 kB
```

With this patch:

```
7fa970000000-7fa9b0000000 rw-s 612001  /memfd:ch_ram (deleted)
Size:            1048576 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:               57308 kB
Pss:               56722 kB
```

fixes #1100

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-11-12 09:33:56 -06:00
Peng Tao
3c88106f65 Merge pull request #1084 from liubin/fix/1081-clean-codes
runtime: clean/refactor code
2020-11-11 10:09:10 +08:00
Bo Chen
359ab16a8f Merge pull request #1090 from likebreath/1106/clh_upgrade_v0.11.0
versions: Update cloud-hypervisor to release v0.11.0
2020-11-09 15:51:09 -08:00
bin liu
b8414045bf runtime: remove nsenter
remove code for nsenter

Fixes: #1081

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-09 11:42:51 +08:00
bin liu
e3510be867 runtime: use one line if statement to check if err is nil for qemu.go
Use `if err := q.qmpSetup(); err != nil` to reduce code and make it easy
to read. And remove checking err if last function call also return an error,
return the function call directly.

Fixes: #1081

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-09 11:42:45 +08:00
Fupan Li
d22c7cf00b Merge pull request #1013 from liubin/feature/1012-dump-guest-memroy-on-panic
Dump guest memory when kernel panic for QEMU
2020-11-09 09:46:28 +08:00
Bo Chen
92c1c4c690 versions: Update cloud-hypervisor to release v0.11.0
The release v0.11.0 of cloud-hypervisor features the following changes:
1) Improved Linux Boot Time, 2) `SIGTERM/SIGINT` Interrupt Signal,
Handling 3) Default Log Level Changed, 4) `io_uring` support by default
for `virtio-block` (on host kernel version 5.8+), 5) Windows Guest
Support, 6) New `--balloon` Parameter Added, 7) Experimental
`virtio-watchdog` Support, 8) Bug fixes.

Fixes: #1089

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-11-06 16:19:31 -08:00
Archana Shinde
6160043c01 Merge pull request #1077 from likebreath/1103/clh_refactor_device_unplug
clh: Consolidate the code path for device unplug
2020-11-06 16:00:56 -08:00
Peng Tao
c7a2b12fab Merge pull request #1086 from jodh-intel/2.0-dev-fix-annotations
annotations: Improve asset annotation handling
2020-11-06 10:29:22 +08:00
James O. D. Hunt
5ced96e96d hypervisor: Remove unused methods
Deleted `HypervisorConfig`'s unused  `CustomFirmwareAsset()` and
`JailerAssetPath()` methods.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-11-05 12:15:47 +00:00
James O. D. Hunt
e82c9daec3 annotations: Improve asset annotation handling
Make `asset.go` the arbiter of asset annotations by removing all asset
annotations lists from other parts of the codebase.

This makes the code simpler, easier to maintain, and more robust.

Specifically, the previous behaviour was inconsistent as the following
ways:

- `createAssets()` in `sandbox.go` was not handling the following asset
  annotations:

    - firmware:
      - `io.katacontainers.config.hypervisor.firmware`
      - `io.katacontainers.config.hypervisor.firmware_hash`

    - hypervisor:
      - `io.katacontainers.config.hypervisor.path`
      - `io.katacontainers.config.hypervisor.hypervisor_hash`

    - hypervisor control binary:
      - `io.katacontainers.config.hypervisor.ctlpath`
      - `io.katacontainers.config.hypervisor.hypervisorctl_hash`

    - jailer:
      - `io.katacontainers.config.hypervisor.jailer_path`
      - `io.katacontainers.config.hypervisor.jailer_hash`

- `addAssetAnnotations()` in the `oci` package was not handling the
  following asset annotations:

    - hypervisor:
      - `io.katacontainers.config.hypervisor.path`
      - `io.katacontainers.config.hypervisor.hypervisor_hash`

    - hypervisor control binary:
      - `io.katacontainers.config.hypervisor.ctlpath`
      - `io.katacontainers.config.hypervisor.hypervisorctl_hash`

    - jailer:
      - `io.katacontainers.config.hypervisor.jailer_path`
      - `io.katacontainers.config.hypervisor.jailer_hash`

This change fixes the bug where specifying a custom hypervisor path via an
asset annotation was having no effect.

Fixes: #1085.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-11-05 12:15:42 +00:00
James O. D. Hunt
0f26f1cd6f annotations: Add missing hypervisor control annotation
Add missing annotation definitions for a hypervisor control binary:

- `io.katacontainers.config.hypervisor.ctlpath`
- `io.katacontainers.config.hypervisor.hypervisorctl_hash`

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-11-05 12:12:58 +00:00
James O. D. Hunt
76064e3e2d asset: Formatting, grammar and whitespace
Improve formatting, grammar and whitespace.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-11-05 12:12:51 +00:00
bin liu
40418f6d88 runtime: add geust memory dump
When guest panic, dump guest kernel memory to host filesystem.
And also includes:
- hypervisor config
- hypervisor version
- and state of sandbox

Fixes: #1012

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-05 16:04:21 +08:00
Peng Tao
a958eaa8d3 runtime: mount shared mountpoint readonly
bindmount remount events are not propagated through mount subtrees,
so we have to remount the shared dir mountpoint directly.

E.g.,
```
mkdir -p source dest foo source/foo

mount -o bind --make-shared source dest

mount -o bind foo source/foo
echo bind mount rw
mount | grep foo
echo remount ro
mount -o remount,bind,ro source/foo
mount | grep foo
```
would result in:
```
bind mount rw
/dev/xvda1 on /home/ubuntu/source/foo type ext4 (rw,relatime,discard,data=ordered)
/dev/xvda1 on /home/ubuntu/dest/foo type ext4 (rw,relatime,discard,data=ordered)
remount ro
/dev/xvda1 on /home/ubuntu/source/foo type ext4 (ro,relatime,discard,data=ordered)
/dev/xvda1 on /home/ubuntu/dest/foo type ext4 (rw,relatime,discard,data=ordered)
```

The reason is that bind mount creats new mount structs and attaches them to different mount subtrees.
However, MS_REMOUNT only looks for existing mount structs to modify and does not try to propagate the
change to mount structs in other subtrees.

Fixes: #1061
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-11-04 17:51:49 +08:00
Peng Tao
125e21cea3 runtime: readonly mounts should be readonly bindmount on the host
So that we get protected at the VM boundary not just the guest kernel.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-11-04 17:51:49 +08:00
Bo Chen
93d7962510 clh: Consolidate the code path for device unplug
In cloud-hypervisor, it provides a single unified way of unplugging
devices, e.g. the `/vm.RemoveDevice` HTTP API. Taking advantage of this
API, we can simplify our implementation of `hotplugRemoveDevice` in
`clh.go`, where we can consolidate similar code paths for different
device unplug (e.g. no need to implement `hotplugRemoveBlockDevice` and
`hotplugRemoveVfioDevice` separately). We will only need to retrieve the
right `deviceID` based on the type of devices, and use the single
unified HTTP API for device unplug.

Fixes: #1076

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-11-03 15:46:38 -08:00
Julio Montes
77b50969ea runtime: cloud-hypervisor: reduce memory footprint
Cloud-hypervisor supports DAX, let's enable it to reduce its memory
footprint.

Before this patch:

**19.96M**

```
20448kB -- [/usr/share/kata-containers/kata.img]
```

With this patch:

**10.83M**

```
11100kB -- [/usr/share/kata-containers/kata.img]
```

fixes #1056

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-29 14:21:57 -06:00
Peng Tao
b316661818 runtime: remove the unused proto files
These are moved to the agent and no longer needed.

Fixes: #1028
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-25 10:57:38 +08:00
Peng Tao
54e23c8302 agent: move gogo.proto out of the github.com namespance
To follow the same namespace scope as other proto files.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-25 10:44:53 +08:00
Peng Tao
583e6ed3e5 agent: types.pb.go is not regenerated
When types.proto was relocated, types.pb.go is not regenerated and still
references the old location.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-25 10:35:35 +08:00
Archana Shinde
5f0b83cc54 Merge pull request #1000 from jongwu/pci
arm64: correct bridge type for QEMUVIRT
2020-10-22 13:53:27 -07:00
bin liu
5b065eb599 runtime: change govmm package
Change govmm package name from github.com/intel/govmm
to github.com/kata-containers/govmm

Fixes: #859

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-22 21:27:49 +08:00
Jianyong Wu
9eab301526 arm64: correct bridge type for QEMUVIRT
port forward PR https://github.com/kata-containers/runtime/pull/3017

Fixes: #3016
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2020-10-20 14:09:03 +08:00
Julio Montes
f162e7e960 Merge pull request #948 from justin-he/max_ports
virtcontainers: Append max_ports to virtio-serial device
2020-10-19 08:55:06 -05:00
Jia He
da79b4be67 virtcontainers: Append max_ports to virtio-serial device
Allow API consumers to change the maximum number of ports in the
virtio-serial devices, setting a lower number of ports can improve the
boot time and reduce the attack surface.

Before this patch on arm64:
[    0.028664] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    0.055031] printk: console [hvc0] enabled

After this patch on arm64:
[    0.028484] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    0.031370] printk: console [hvc0] enabled

Fixes: #2676
Signed-off-by: Jia He <justin.he@arm.com>
2020-10-16 23:40:54 +08:00
Peng Tao
0d5d69e8cd Merge pull request #902 from c3d/bug-v2/launchpad-1878234-access
Validate runtime annotations
2020-10-16 15:47:45 +08:00