3879 Commits

Author SHA1 Message Date
James O. D. Hunt
91a390f072 docs: Create hypervisor summary document
Split some of the core hypervisor details out of the virtualisation
document and present in a simpler fashion for new users.

Fixes: #1063.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-11-16 11:52:40 +00:00
Christophe de Dinechin
53b5d063e9 agent: Adjust OOM Score to avoid agent being killed.
Under stress, the agent can be OOM-killed, which exists the sandbox.
One possible hard-to-diagnose manifestation is a virtiofsd crash.

Fixes: #1111

Reported-by: Qian Cai <caiqian@redhat.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-11-13 11:10:19 +01:00
Tim Zhang
06b9294c7d Merge pull request #1110 from liubin/fix/1109-add-enable_pprof
runtime: change configuration key name from EnablePprof to enable_pprof
2020-11-13 17:44:34 +08:00
bin liu
14a21c3ab1 runtime: change configuration key name from EnablePprof to enable_pprof
Key name in configuration file is in snake case but not camel case.
And the key is processed as `enable_pprof` in code, the configuration
template file should replace `EnablePprof` it by `enable_pprof`

Fixes: #1109

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-13 14:52:56 +08:00
bin liu
4e3a8c0124 runtime: remove global sandbox variable
Remove global sandbox variable, and save *Sandbox to hypervisor struct.
For some needs, hypervisor may need to use methods from Sandbox.

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-13 09:47:09 +08:00
bin liu
290203943c runtime: delete sandboxlist.go and sandboxlist_test.go
Delete sandboxlist.go and sandboxlist_test.go under virtcontainers package.

Fixes: #1078

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-13 09:47:09 +08:00
Julio Montes
36f65ce182 runtime: clh: update cloud-hypervisor
Update cloud-hypervisor to commit 2706319.
Fixes a limitation in OpenAPITools/openapi-generator tool,
it's impossible to send go zero types, like false and 0 to
cloud-hypervisor because `omitempty` is added if a field is not
required.
See cloud-hypervisor/cloud-hypervisor#1961 for more information

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-11-12 09:33:56 -06:00
Julio Montes
e1396f0402 runtime: clh: disable virtiofs DAX when FS cache size is 0
Guest consumes 120Mb more of memory when DAX is enabled and the default
FS cache size (8G) is used. Disable dax when it is not required
reducing guest's memory footprint.

Without this patch:

```
7fdea4000000-7fdee4000000 rw-s 18850589 /memfd:ch_ram (deleted)
Size:            1048576 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:              187876 kB
```

With this patch:

```
7fa970000000-7fa9b0000000 rw-s 612001  /memfd:ch_ram (deleted)
Size:            1048576 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:               57308 kB
Pss:               56722 kB
```

fixes #1100

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-11-12 09:33:56 -06:00
James O. D. Hunt
8f38265be4 release: Fix release candidate to major version upgrade check
Fix `kata-runtime kata-check`'s network version check which was failing
when the user was running a release candidate build and the latest
release was a major one, two examples of the error being:

- `BUG: unhandled scenario: current version: 1.12.0-rc0, latest version: 1.12.0`
- `BUG: unhandled scenario: current version: 2.0.0-rc0, latest version: 2.0.0`

Fixes: #1104.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-11-12 10:07:18 +00:00
James O. D. Hunt
2e0bf40adb tests: Ensure semver build metadata is ignored
According to the Semantic Versioning specification, build metadata must
be ignored for version comparisions, so add some explicit tests for this
scenario to `TestGetNewReleaseType()`.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-11-12 10:06:15 +00:00
James O. D. Hunt
4024a8274b release: Make error format string consistent
Use `%s` for both semver parameters.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-11-12 10:04:00 +00:00
Fupan Li
671a2be313 Merge pull request #1094 from liubin/fix/991
runtime: sleep 1 second after GetOOMEvent failed
2020-11-11 14:33:57 +08:00
Peng Tao
9dbd1007d7 Merge pull request #1070 from jing-wang4/readme
Agent: README updates for build on ppc64le
2020-11-11 10:15:22 +08:00
Peng Tao
3c88106f65 Merge pull request #1084 from liubin/fix/1081-clean-codes
runtime: clean/refactor code
2020-11-11 10:09:10 +08:00
bin liu
cb0e6094ff runtime: sleep 1 second after GetOOMEvent failed
In some cases, for example agent crashed and not marked dead yet, the GetOOMEvent
will return errors like `connection reset by peer` or `ttrpc: closed`. Do a sleep
with 1 second (agent check interval) and let agent health check to do the check.

Fixes: #991

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-10 12:02:31 +08:00
Bo Chen
359ab16a8f Merge pull request #1090 from likebreath/1106/clh_upgrade_v0.11.0
versions: Update cloud-hypervisor to release v0.11.0
2020-11-09 15:51:09 -08:00
bin liu
b8414045bf runtime: remove nsenter
remove code for nsenter

Fixes: #1081

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-09 11:42:51 +08:00
bin liu
e3510be867 runtime: use one line if statement to check if err is nil for qemu.go
Use `if err := q.qmpSetup(); err != nil` to reduce code and make it easy
to read. And remove checking err if last function call also return an error,
return the function call directly.

Fixes: #1081

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-09 11:42:45 +08:00
Fupan Li
d22c7cf00b Merge pull request #1013 from liubin/feature/1012-dump-guest-memroy-on-panic
Dump guest memory when kernel panic for QEMU
2020-11-09 09:46:28 +08:00
Bo Chen
92c1c4c690 versions: Update cloud-hypervisor to release v0.11.0
The release v0.11.0 of cloud-hypervisor features the following changes:
1) Improved Linux Boot Time, 2) `SIGTERM/SIGINT` Interrupt Signal,
Handling 3) Default Log Level Changed, 4) `io_uring` support by default
for `virtio-block` (on host kernel version 5.8+), 5) Windows Guest
Support, 6) New `--balloon` Parameter Added, 7) Experimental
`virtio-watchdog` Support, 8) Bug fixes.

Fixes: #1089

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-11-06 16:19:31 -08:00
Archana Shinde
6160043c01 Merge pull request #1077 from likebreath/1103/clh_refactor_device_unplug
clh: Consolidate the code path for device unplug
2020-11-06 16:00:56 -08:00
James O. D. Hunt
b85914c960 Merge pull request #979 from jodh-intel/2.0-dev-show-ttrpc-logs
agent: Log ttrpc messages
2020-11-06 13:45:48 +00:00
James O. D. Hunt
8907a33907 agent: Only show ttrpc logs for trace log level
Only display the `ttrpc` crate log output when full logging
(trace level) is enabled.

This is a slight abuse of log levels but provides developers and testers
what they need whilst also keeping the logs relatively quiet for the
default info log level (the `ttrpc` crate logging is a bit "chatty").

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-11-06 10:45:05 +00:00
James O. D. Hunt
21cd7ad172 agent: Log ttrpc messages
The `ttrpc` crate uses the `log` crate for logging. But the agent uses
the `slog` crate. This means that currently, all `ttrpc` log messages
are being discarded.

Use the `slog-stdlog` create to redirect `log` crate logging calls into
`slog` so they are visible in the agents log output.

Fixes: #978.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-11-06 10:05:02 +00:00
James O. D. Hunt
286eebf087 agent: Add env var to set log level
Add support for a `KATA_AGENT_LOG_LEVEL` environment variable for testing.
This is the equivalent to the `agent.log=` kernel command line option.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-11-06 10:05:02 +00:00
James O. D. Hunt
b9c6db4bb8 agent: Add env var tests
Add some tests for the existing `KATA_AGENT_SERVER_ADDR` environment
variable feature.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-11-06 10:05:02 +00:00
James O. D. Hunt
705e995589 agent: Add env var comment
Add a comment stating what the server address environment variable is
for.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-11-06 10:05:02 +00:00
Peng Tao
c7a2b12fab Merge pull request #1086 from jodh-intel/2.0-dev-fix-annotations
annotations: Improve asset annotation handling
2020-11-06 10:29:22 +08:00
James O. D. Hunt
5ced96e96d hypervisor: Remove unused methods
Deleted `HypervisorConfig`'s unused  `CustomFirmwareAsset()` and
`JailerAssetPath()` methods.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-11-05 12:15:47 +00:00
James O. D. Hunt
e82c9daec3 annotations: Improve asset annotation handling
Make `asset.go` the arbiter of asset annotations by removing all asset
annotations lists from other parts of the codebase.

This makes the code simpler, easier to maintain, and more robust.

Specifically, the previous behaviour was inconsistent as the following
ways:

- `createAssets()` in `sandbox.go` was not handling the following asset
  annotations:

    - firmware:
      - `io.katacontainers.config.hypervisor.firmware`
      - `io.katacontainers.config.hypervisor.firmware_hash`

    - hypervisor:
      - `io.katacontainers.config.hypervisor.path`
      - `io.katacontainers.config.hypervisor.hypervisor_hash`

    - hypervisor control binary:
      - `io.katacontainers.config.hypervisor.ctlpath`
      - `io.katacontainers.config.hypervisor.hypervisorctl_hash`

    - jailer:
      - `io.katacontainers.config.hypervisor.jailer_path`
      - `io.katacontainers.config.hypervisor.jailer_hash`

- `addAssetAnnotations()` in the `oci` package was not handling the
  following asset annotations:

    - hypervisor:
      - `io.katacontainers.config.hypervisor.path`
      - `io.katacontainers.config.hypervisor.hypervisor_hash`

    - hypervisor control binary:
      - `io.katacontainers.config.hypervisor.ctlpath`
      - `io.katacontainers.config.hypervisor.hypervisorctl_hash`

    - jailer:
      - `io.katacontainers.config.hypervisor.jailer_path`
      - `io.katacontainers.config.hypervisor.jailer_hash`

This change fixes the bug where specifying a custom hypervisor path via an
asset annotation was having no effect.

Fixes: #1085.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-11-05 12:15:42 +00:00
James O. D. Hunt
0f26f1cd6f annotations: Add missing hypervisor control annotation
Add missing annotation definitions for a hypervisor control binary:

- `io.katacontainers.config.hypervisor.ctlpath`
- `io.katacontainers.config.hypervisor.hypervisorctl_hash`

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-11-05 12:12:58 +00:00
James O. D. Hunt
76064e3e2d asset: Formatting, grammar and whitespace
Improve formatting, grammar and whitespace.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-11-05 12:12:51 +00:00
bin liu
40418f6d88 runtime: add geust memory dump
When guest panic, dump guest kernel memory to host filesystem.
And also includes:
- hypervisor config
- hypervisor version
- and state of sandbox

Fixes: #1012

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-05 16:04:21 +08:00
Jianyong Wu
6c2fc233e2 agent: create pci root Bus Path for arm64
port https://github.com/kata-containers/agent/pull/860 here.

Fixes: #1059
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2020-11-05 12:18:09 +08:00
Peng Tao
a958eaa8d3 runtime: mount shared mountpoint readonly
bindmount remount events are not propagated through mount subtrees,
so we have to remount the shared dir mountpoint directly.

E.g.,
```
mkdir -p source dest foo source/foo

mount -o bind --make-shared source dest

mount -o bind foo source/foo
echo bind mount rw
mount | grep foo
echo remount ro
mount -o remount,bind,ro source/foo
mount | grep foo
```
would result in:
```
bind mount rw
/dev/xvda1 on /home/ubuntu/source/foo type ext4 (rw,relatime,discard,data=ordered)
/dev/xvda1 on /home/ubuntu/dest/foo type ext4 (rw,relatime,discard,data=ordered)
remount ro
/dev/xvda1 on /home/ubuntu/source/foo type ext4 (ro,relatime,discard,data=ordered)
/dev/xvda1 on /home/ubuntu/dest/foo type ext4 (rw,relatime,discard,data=ordered)
```

The reason is that bind mount creats new mount structs and attaches them to different mount subtrees.
However, MS_REMOUNT only looks for existing mount structs to modify and does not try to propagate the
change to mount structs in other subtrees.

Fixes: #1061
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-11-04 17:51:49 +08:00
Peng Tao
125e21cea3 runtime: readonly mounts should be readonly bindmount on the host
So that we get protected at the VM boundary not just the guest kernel.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-11-04 17:51:49 +08:00
AIsland
b6f8a1d5af docs: Fix incorrect docs in config file
Correct the default configuration of [hypervisor.qemu] shared_fs in configuration-qemu.toml to virtio-fs in kata 2.0.

Fixes: #1054

Signed-off-by: AIsland <yuchunyu01@inspur.com>
2020-11-04 09:58:02 +08:00
Bo Chen
93d7962510 clh: Consolidate the code path for device unplug
In cloud-hypervisor, it provides a single unified way of unplugging
devices, e.g. the `/vm.RemoveDevice` HTTP API. Taking advantage of this
API, we can simplify our implementation of `hotplugRemoveDevice` in
`clh.go`, where we can consolidate similar code paths for different
device unplug (e.g. no need to implement `hotplugRemoveBlockDevice` and
`hotplugRemoveVfioDevice` separately). We will only need to retrieve the
right `deviceID` based on the type of devices, and use the single
unified HTTP API for device unplug.

Fixes: #1076

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-11-03 15:46:38 -08:00
Jing Wang
18a2245986 Agent: README updates for build on ppc64le
README updates for agent build on ppc64le

  Fixes: #1069

Signed-off-by: Jing Wang <jing.wang4@ibm.com>
2020-11-03 15:29:43 +00:00
Jing Wang
655f2649b3 Agent: README updates for build on ppc64le
README updates for agent build on ppc64le

  Fixes: #1069

Signed-off-by: Jing Wang <jing.wang4@ibm.com>
2020-11-03 15:24:08 +00:00
Jing Wang
dfe364f885 Agent: README updates for build on ppc64le
README updates for agent build on ppc64le

Fixes: #1069

Signed-off-by: Jing Wang <jing.wang4@ibm.com>
2020-11-02 20:26:36 +00:00
Peng Tao
bf57cd844e Merge pull request #1057 from devimc/29-10-2020/clh/improveMemFoot
runtime: cloud-hypervisor: reduce memory footprint
2020-11-02 15:13:06 +08:00
Bin Liu
8823ca31ad Merge pull request #1042 from devimc/2020-10-21/unitests/sandbox.rs
agent: Improve unit test coverage for src/sandbox.rs
2020-10-30 11:26:29 +08:00
Bin Liu
7b9013f047 Merge pull request #1035 from lifupan/fix_thread_panic
rustjail: fix the issue of create thread failed causing current thread panic
2020-10-30 11:25:32 +08:00
Julio Montes
77b50969ea runtime: cloud-hypervisor: reduce memory footprint
Cloud-hypervisor supports DAX, let's enable it to reduce its memory
footprint.

Before this patch:

**19.96M**

```
20448kB -- [/usr/share/kata-containers/kata.img]
```

With this patch:

**10.83M**

```
11100kB -- [/usr/share/kata-containers/kata.img]
```

fixes #1056

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-29 14:21:57 -06:00
Julio Montes
196e8d81cf Merge pull request #1032 from devimc/2020-10-21/unitests/container.rs
Improve unit test coverage for rustjail/container.rs
2020-10-28 16:08:23 -06:00
Julio Montes
2e1a8f0ae9 agent: Improve unit test coverage for src/sandbox.rs
Improve unit test coverage for src/sandbox.rs

fixes #293

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-28 08:10:46 -06:00
fupan.lfp
172d015e1b rustjail: fix the issue of create thread failed causing thread panic
It's should catch the failed error of spawning a new thread, otherwise,
it would cause the current thread panic.

Fixes: #1034

Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
2020-10-27 23:31:34 +08:00
Julio Montes
9e93463bb6 agent/rustjail: improve unit test coverage for rustjail/container.rs
Improve unit test coverage for rustjail/container.rs

fixes #282

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-27 09:28:38 -06:00
Julio Montes
ad4f7b86f2 agent/rustjail: make mount and umount2 public
make mount and umount2 public, this way they can be
used in other files

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-27 09:28:38 -06:00