Commit Graph

400 Commits

Author SHA1 Message Date
Eric Ernst
3721351324 runtime: cpuset: when creating container, don't pass cpuset details
Today we only clear out the cpuset details when doing an update call on
existing container/pods. This works in the case of Kubernetes, but not
in the case where we are explicitly setting the cpuset details at boot
time. For example, if you are running a single container via docker ala:

docker run --cpuset-cpus 0-3 -it alpine sh

What would happen is the cpuset info would be passed in with the
container spec for create container request to the agent. At that point
in time, there'd only be the defualt number of CPUs available in the
guest (1), so you'd be left with cpusets set to 0. Next, we'd hotplug
the vCPUs, providing 0-4 CPUs in the guest, but the cpuset would never
be updated, leaving the application tied to CPU 0.

Ouch.

Until the day we support cpusets in the guest, let's make sure that we
start off clearing the cpuset fields.

Fixes: #1405

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2021-02-11 17:38:15 -08:00
Fabiano Fidêncio
e2c8c7e603 Merge pull request #1031 from knittl/feature/kata-option-aliases
cli: Add aliases for `kata-` options
2021-02-10 16:22:13 +01:00
Daniel Knittl-Frank
056d742c17 docs: Update documentation with new prefixless config options
Remove the old config options from the documentation and replace them
with the new form (without the redundant `kata-` prefix).

Signed-off-by: Daniel Knittl-Frank <knittl89+git@googlemail.com>
2021-02-10 07:55:18 +01:00
Daniel Knittl-Frank
02ee8b0b8a cli: Add aliases for kata- options
Remove `kata-` prefix from options `kata-config` and
`kata-show-default-config-paths`.

Fixes #1011

Signed-off-by: Daniel Knittl-Frank <knittl89+git@googlemail.com>
2021-02-10 07:55:18 +01:00
Fupan Li
5d1432210c Merge pull request #1352 from liubin/fix/migrate-opentracing-to-opentelemetry
runtime: migrate from opentracing to opentelemetry
2021-02-09 10:18:10 +08:00
bin
3406502706 runtime: add jaeger configuration items
add configuration items in Kata Containers
configuration file to let users specify jaeger
collector address, and user/password.

Signed-off-by: bin <bin@hyper.sh>
2021-02-09 08:02:05 +08:00
Chelsea Mafrica
38b5a43267 Merge pull request #1318 from jongwu/acpi
arm64: enable acpi for qemu/virt.
2021-02-03 16:37:49 -08:00
bin
17df9b119d runtime: migrate from opentracing to opentelemetry
This commit includes two changes:
- migrate from opentracing to opentelemetry
- add jaeger configuration items

Fixes: #1351

Signed-off-by: bin <bin@hyper.sh>
2021-02-03 17:30:49 +08:00
bin
b6c2a60509 kata-monitor: set buildmode to exe to avoid build failing
CGO_ENABLED=0 and -buildmode=pie are not compatible and may lead build failing in some OS.
Specify buildmode=exe to overwrite the value set in BUILDFLAGS

Fixes: #1343

Signed-off-by: bin <bin@hyper.sh>
2021-02-02 14:47:21 +08:00
Chelsea Mafrica
6be910bdc1 Merge pull request #1134 from egernst/kata-monitor-cleanup
kata-monitor: allow for building for alpine
2021-02-01 16:19:36 -08:00
James O. D. Hunt
de9487744f Merge pull request #1253 from snir911/fix-poststop
shimv2: log a warning and continue on post-stop hook failure
2021-02-01 14:44:39 +00:00
Jianyong Wu
b7a1f752c0 arm64: enable acpi for qemu/virt.
acpi is enabled for kata 1.x, port and rebase code for 2.x
including:
runtime: enable pflash;
agent: add acpi support for pci bus path;
packaging: enable CONFIG_RTC_DRV_EFI;

Fixes: #1317
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-01-29 22:12:43 +08:00
Carlos Venegas
3e3bfb9a42 Merge pull request #1321 from likebreath/clh_v0.12.0
versions: Update cloud-hypervisor to release v0.12.0
2021-01-26 17:07:02 -06:00
Bo Chen
c2d14cdeea versions: Update cloud-hypervisor to release v0.12.0
Highlights for cloud-hypervisor version v0.12.0 include: removal of
`vhost-user-net` and `vhost-user-block` self spawning, migration of
`vhost-user-fs` backend, ARM64 enhancements with full support of
`--watchdog` for rebooting, and enhanced `info` HTTP API to include the
details of devices used by the VM including VFIO devices.

Fixes: #1315

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-01-25 10:58:19 -08:00
Snir Sheriber
0e57393fcc shimv2: log a warning and continue on post-start hook failure
According to runtime-spec:
The poststart hooks MUST be invoked by the runtime. If any poststart
hook fails, the runtime MUST log a warning, but the remaining hooks
and lifecycle continue as if the hook had succeeded

Fixes: #1252

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2021-01-25 16:29:55 +02:00
Snir Sheriber
e7043fe284 shimv2: log a warning and continue on post-stop hook failure
According to runtime-spec:
The poststop hooks MUST be invoked by the runtime. If any
poststop hook fails, the runtime MUST log a warning, but
the remaining hooks and lifecycle continue as if the hook
had succeeded.

Fixes: #1252

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2021-01-25 16:29:47 +02:00
Carlos Venegas
6f3d591763 clh: Use vanilla kernel.
Qemu config alredy use vanilla kernel build for virtiofs.

Lets make cosisntent the usage of kernel.

Depends-on: github.com/kata-containers/tests#3172

Fixes: #1302

Signed-off-by: Carlos Venegas <jos.c.venegas.munoz@intel.com>
2021-01-22 20:00:20 +00:00
Chelsea Mafrica
b24a2d2e48 Merge pull request #904 from cmaf/tracing-shimv2
shimv2: Add tracing to shimv2
2021-01-14 16:38:28 -08:00
Eric Ernst
789fd7c1c6 blk-dev: hotplug readonly if applicable
If a block based volume is read only, let's make sure we add as a RO
device

Fixes: #1246

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2021-01-12 14:50:54 -08:00
Eric Ernst
12777b26e4 volumes: cleanup / minor refactoring
Update some headers, very minor refactoring

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2021-01-12 14:50:47 -08:00
Eric Ernst
fbc1d123e8 vendor: revendor govmm
Update govmm to add RO blk hotplug support.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2021-01-11 18:11:50 -08:00
GabyCT
a6d52d3da1 Merge pull request #1208 from GabyCT/topic/addgithu
github: Add github actions
2021-01-08 14:27:19 -06:00
Archana Shinde
ebd9fcc2c3 actions: Run static checks before make agent
Run static checks prior to building the agent.Checks
fail if run after since the compilation process
produces new rust code.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2021-01-08 11:04:54 -06:00
Fabiano Fidêncio
ce27c00ee2 Merge pull request #1217 from snir911/fix_hanging_pods
shimv2: Avoid double removing of container from sandbox
2021-01-08 15:00:54 +01:00
Eric Ernst
542e93d987 Merge pull request #1180 from egernst/qemu-cleanup-check
qemu: no state to save if QEMU isn't running
2021-01-06 11:17:54 -08:00
Snir Sheriber
5c464018ed shimv2: Avoid double removing of container from sandbox
RemoveContainerRequest results in calling to deleteContainer, according
to spec calling to RemoveContainer is idempotent and "must not return
an error if the container has already been removed", hence, don't
return error if the error reports that the container is not found.

Fixes: #836

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2020-12-27 18:04:06 +02:00
Eric Ernst
9a7bcccc8e qemu: no state to save if QEMU isn't running
On pod delete, we were looking to read files that we had just deleted. In particular,
stopSandbox for QEMU was called (we cleanup up vmpath), and then QEMU's
save function was called, which immediately checks for the PID file.

Let's only update the persist store for QEMU if QEMU is actually
running. This'll avoid Error messages being displayed when we are
stopping and deleting a sandbox:

```
level=error msg="Could not read qemu pid file"
```

I reviewed CLH, and it looks like it is already taking appropriate
action, so no changes needed.

Ideally we won't spend much time saving state to persist.json unless
there's an actual error during stop/delete/shutdown path, as the persist will
also be removed after the pod is removed. We may want to optimize this,
as currently we are doing a persist store when deleting each container
(after the sandbox is stopped, VM is killed), and when we stop the sandbox.
This'll require more rework... tracked in:
  https://github.com/kata-containers/kata-containers/issues/1181

Fixes: #1179

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-12-21 11:29:44 -08:00
David Gibson
e004616b02 runtime/network: Fix error reporting in listRoutes()
If the upcast from resultingRoutes to *grpc.IRoutes fails, we return
(nil, err), but previous code ensures that err is nil at that point, so we
return no error.

fixes #1206

Forward port of
0ffaeeb5d8

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-12-18 14:36:09 +11:00
David Gibson
1ae8e81abb runtime/network: Correct error reporting in listInterfaces()
If the upcast from resultingInterfaces to *grpc.Interfaces fails, we
return (nil, err), but previous code ensures that err is nil at that
point, so we return no error.

Forward port of
b86e904c2d

fixes #1206

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-12-18 14:35:50 +11:00
Bin Liu
caa6965c17 Merge pull request #1183 from wainersm/runtime_destdir
runtime: Allow to overwrite DESTDIR
2020-12-17 14:10:56 +08:00
David Gibson
a19263e58d agent/protocols: Remove unneeded import from oci.proto
oci.proto imports "google/protobuf/wrappers.proto", but doesn't appear to
use it, which causes a warning from protoc when we compile it.  Remove the
import to fix the warning.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-12-17 13:06:41 +11:00
Wainer dos Santos Moschetta
10e9bfc6f7 runtime: Allow to overwrite DESTDIR
On runtime/Makefile the value of DESTDIR is set to "/", unless one
pass that variable as an argument to `make`. This change will
allow its overwrite if DESTDIR is exported in the environment as
well.

Fixes #1182

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2020-12-09 09:04:04 -05:00
Chelsea Mafrica
49e7151d3d shimv2: Add tracing
Add trace calls to shimv2 that create spans for functions in service.go.
Tracing starts in New(), which is forked twice and is followed by either
StartShim() or Create().

Tracing cannot start without the value for Trace enabled from the
runtime config so load the config in New(), which results in it being
loaded every time New() is called in addition to where it is originally
loaded after Create().

Fixes #903

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2020-12-04 19:38:44 -08:00
Fabiano Fidêncio
f7383ef835 Merge pull request #1166 from cmaf/fix-ctx-port
shimv2: handle ctx passed by containerd
2020-12-03 19:45:52 +01:00
Bin Liu
4e0a7e31f9 Merge pull request #1103 from likebreath/1111/clh_fix_cleanupVM
runtime: clh: Enforce to call 'cleanupVM' for 'stopSandbox'
2020-12-03 17:34:26 +08:00
Chelsea Mafrica
0155fe1260 shimv2: handle ctx passed by containerd
Sometimes shim process cannot be shutdown because of container list
is not empty. This container list is written in shim service, while
creating container. We find that if containerd cancel its Create
Container Request due to timeout, but runtime didn't handle it properly
and continue creating action, then this container cannot be deleted at
all. So we should make sure the ctx passed to Create Service rpc call
is effective.

Fixes #1088

Signed-off-by: Yves Chan <shanks.cyp@gmail.com>
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2020-12-02 14:28:31 -08:00
Bo Chen
647331ace6 runtime: clh: Enforce to call 'cleanupVM' for 'stopSandbox'
We should always cleanup the vm directory when doing `stopSandbox`,
while we are skipping the cleanup process on some error code paths when
using cloud-hypervisor driver.

Fixes: #1098

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-12-01 17:27:44 -08:00
Eric Ernst
2f1cb7995f kata-monitor: allow for building for alpine
- add a reference Dockerfile to tools
- update kata-monitor build to:
  1) utilize the kata buildflags, which were dropped before
  2) disable CGO, so we have option for building in alpine

From root of the repository, example build:
 $ docker build -f tools/packaging/kata-monitor/Dockerfile -t kata-monitor .

Fixes: #1135

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-12-01 10:28:59 -08:00
Julio Montes
70f198d78e cli: check modules and permissions before loading a module
Before loading a module, the check subcommand should check if the
current user can load it.

fixes #3085

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-11-26 11:55:42 -06:00
Julio Montes
cb684cf8ea cli: don't fail if rate limit is exceeded
Don't fail if rate limit is exceeded since this is a
limitation/restriction of Github not a problem in the host.
Print a warning when the rate limit is exceeded.

For more information about Github's rate limit, see
https://developer.github.com/v3/#rate-limiting

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-11-26 11:50:14 -06:00
bin liu
c388ec5bef runtime: don't wait the second shim process in shim start
In first shim v2 startup(with `start` command-line option), it will start
the second shim v2 process running as ttrpc server, there is no needs to
wait the second process, because the current shim v2 process will exit immediately.

Fixes: #1127

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-18 17:18:59 +08:00
Julio Montes
1dd77e204f Merge pull request #1120 from liubin/fix/1119-revert-cleanupcontainer-api
virtcontainers: revert CleanupContainer from PR 1079
2020-11-17 09:11:29 -06:00
bin liu
fdbf7d3222 virtcontainers: revert CleanupContainer from PR 1079
In PR 1079, CleanupContainer's parameter of sandboxID is changed to VCSandbox, but at cleanup,
there is no VCSandbox is constructed, we should load it from disk by loadSandboxConfig() in
persist.go. This commit reverts parts of #1079

Fixes: #1119

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-17 10:31:33 +08:00
James O. D. Hunt
91a390f072 docs: Create hypervisor summary document
Split some of the core hypervisor details out of the virtualisation
document and present in a simpler fashion for new users.

Fixes: #1063.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-11-16 11:52:40 +00:00
Tim Zhang
06b9294c7d Merge pull request #1110 from liubin/fix/1109-add-enable_pprof
runtime: change configuration key name from EnablePprof to enable_pprof
2020-11-13 17:44:34 +08:00
bin liu
14a21c3ab1 runtime: change configuration key name from EnablePprof to enable_pprof
Key name in configuration file is in snake case but not camel case.
And the key is processed as `enable_pprof` in code, the configuration
template file should replace `EnablePprof` it by `enable_pprof`

Fixes: #1109

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-13 14:52:56 +08:00
bin liu
4e3a8c0124 runtime: remove global sandbox variable
Remove global sandbox variable, and save *Sandbox to hypervisor struct.
For some needs, hypervisor may need to use methods from Sandbox.

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-13 09:47:09 +08:00
bin liu
290203943c runtime: delete sandboxlist.go and sandboxlist_test.go
Delete sandboxlist.go and sandboxlist_test.go under virtcontainers package.

Fixes: #1078

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-13 09:47:09 +08:00
Julio Montes
36f65ce182 runtime: clh: update cloud-hypervisor
Update cloud-hypervisor to commit 2706319.
Fixes a limitation in OpenAPITools/openapi-generator tool,
it's impossible to send go zero types, like false and 0 to
cloud-hypervisor because `omitempty` is added if a field is not
required.
See cloud-hypervisor/cloud-hypervisor#1961 for more information

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-11-12 09:33:56 -06:00
Julio Montes
e1396f0402 runtime: clh: disable virtiofs DAX when FS cache size is 0
Guest consumes 120Mb more of memory when DAX is enabled and the default
FS cache size (8G) is used. Disable dax when it is not required
reducing guest's memory footprint.

Without this patch:

```
7fdea4000000-7fdee4000000 rw-s 18850589 /memfd:ch_ram (deleted)
Size:            1048576 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:              187876 kB
```

With this patch:

```
7fa970000000-7fa9b0000000 rw-s 612001  /memfd:ch_ram (deleted)
Size:            1048576 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:               57308 kB
Pss:               56722 kB
```

fixes #1100

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-11-12 09:33:56 -06:00