3879 Commits

Author SHA1 Message Date
Wainer dos Santos Moschetta
14e7042cf6 agent: Clean up commented use declarations
There are some commented use declarations, removed them all.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2020-12-29 09:16:29 -05:00
Wainer dos Santos Moschetta
5fe5b3212f agent: Fix temp prefix on Namespace::test_setup_persistent_ns
Wrong prefix on the created temp directory on the test_setup_persistent_ns
for uts namesmpace type test.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2020-12-29 09:16:29 -05:00
Wainer dos Santos Moschetta
3a891d4e8f agent: Return error on trying to persist a pid namespace
An pid namespace cannot be persisted, so add a check-and-error on
Namespace::setup() for handling that case.

Fixes #1220

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2020-12-29 09:16:26 -05:00
Snir Sheriber
5c464018ed shimv2: Avoid double removing of container from sandbox
RemoveContainerRequest results in calling to deleteContainer, according
to spec calling to RemoveContainer is idempotent and "must not return
an error if the container has already been removed", hence, don't
return error if the error reports that the container is not found.

Fixes: #836

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2020-12-27 18:04:06 +02:00
Liu Jiang
b366af9358 jail: add more test cases for validator
Fixes: #1214

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
2020-12-24 20:17:06 +08:00
Liu Jiang
d38a5d3fcf jail/validator: introduce helpers to reduce duplicated code
Fixes: #1214

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
2020-12-24 19:02:31 +08:00
Liu Jiang
76ad32136f jail/validator: avoid unwrap() for safety
Explicitly return error codes instead of unwrap().

Fixes: #1214

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
2020-12-24 19:02:13 +08:00
Liu Jiang
51fd624f3e rustjail: add more context info for errors
Fixes: #1214

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
2020-12-24 17:47:58 +08:00
Peng Tao
f1b3f2e178 Merge pull request #1150 from fidencio/wip/make-install-breaks
Add void "install" targets for both "trace-forwarder" and "agent-ctl"
2020-12-23 18:41:42 +08:00
Peng Tao
109ab54d63 Merge pull request #1212 from jiangliu/typo
oci: fix a typo in "addtionalGids"
2020-12-23 18:03:26 +08:00
Bin Liu
8d6096210e Merge pull request #1186 from maruthgoyal/2.0-dev
Don't update cpusets if no CPUs changed closes #1172
2020-12-23 10:05:59 +08:00
Liu Jiang
9321e1b21b oci: fix two incompatible issues with OCI spec
The first incompatible issue is caused by a typo, "swapiness" should
be "swappiness". The second incompatible issue is caused by a serde
format. The struct LinuxBlockIODevice is introduced for convenience,
but it also changes serialized data, so "#[serde(flatten)]" should
be used for compatibility with OCI spec.

Fixes: #1211

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
2020-12-22 11:16:15 +08:00
Liu Jiang
406a91ffdd agent: consume ttrpc crate from crates.io
The ttrpc v0.3.0 has been published to crates.io, so consume from
crates.io.

Fixes: #1213

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
2020-12-22 09:46:41 +08:00
Eric Ernst
9a7bcccc8e qemu: no state to save if QEMU isn't running
On pod delete, we were looking to read files that we had just deleted. In particular,
stopSandbox for QEMU was called (we cleanup up vmpath), and then QEMU's
save function was called, which immediately checks for the PID file.

Let's only update the persist store for QEMU if QEMU is actually
running. This'll avoid Error messages being displayed when we are
stopping and deleting a sandbox:

```
level=error msg="Could not read qemu pid file"
```

I reviewed CLH, and it looks like it is already taking appropriate
action, so no changes needed.

Ideally we won't spend much time saving state to persist.json unless
there's an actual error during stop/delete/shutdown path, as the persist will
also be removed after the pod is removed. We may want to optimize this,
as currently we are doing a persist store when deleting each container
(after the sandbox is stopped, VM is killed), and when we stop the sandbox.
This'll require more rework... tracked in:
  https://github.com/kata-containers/kata-containers/issues/1181

Fixes: #1179

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-12-21 11:29:44 -08:00
Liu Jiang
6181570ccc oci: fix a typo in "addtionalGids"
There's a typo in "addtionalGids", which should be "additionalGids".

Fixes: #1211

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
2020-12-22 00:03:27 +08:00
Maruth Goyal
4af5beda35 agent/sandbox: Don't update cpuset when ncpus = 0
When receiving an OnlineCpuMemory RPC, if the number of CPUs to be
made available is 0, then updating the cpusets is a redundant operation.

Fixes: #1172

Signed-off-by: Maruth Goyal <maruthgoyal@gmail.com>
2020-12-18 18:11:16 +05:30
David Gibson
e004616b02 runtime/network: Fix error reporting in listRoutes()
If the upcast from resultingRoutes to *grpc.IRoutes fails, we return
(nil, err), but previous code ensures that err is nil at that point, so we
return no error.

fixes #1206

Forward port of
0ffaeeb5d8

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-12-18 14:36:09 +11:00
David Gibson
1ae8e81abb runtime/network: Correct error reporting in listInterfaces()
If the upcast from resultingInterfaces to *grpc.Interfaces fails, we
return (nil, err), but previous code ensures that err is nil at that
point, so we return no error.

Forward port of
b86e904c2d

fixes #1206

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-12-18 14:35:50 +11:00
Bin Liu
caa6965c17 Merge pull request #1183 from wainersm/runtime_destdir
runtime: Allow to overwrite DESTDIR
2020-12-17 14:10:56 +08:00
Bin Liu
3b87d10d79 Merge pull request #1191 from mxpv/fd
Don't leak fd when reseeding rng
2020-12-17 12:50:55 +08:00
David Gibson
a19263e58d agent/protocols: Remove unneeded import from oci.proto
oci.proto imports "google/protobuf/wrappers.proto", but doesn't appear to
use it, which causes a warning from protoc when we compile it.  Remove the
import to fix the warning.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-12-17 13:06:41 +11:00
David Gibson
a19cf28c26 agent/protocols: Remove some unnecessary include directives from protoc
The generate_go_sources() function in update-generate-proto.sh adds a
number of include directives to the protoc command line.  Some of these
don't appear to be necessary to correctly compile the agent's protocol
files, so remove them.

Amongst other things were directives pointing at the old Kata1 runtime and
agent repositories.  Those ones could be actively harmful by causing odd
dependencies of the Kata2 build on the Kata1 repositories.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-12-16 12:10:27 +11:00
David Gibson
2b4520904c agent/protocols: Remove some unneeded dependencies for protocol generation
src/agent/protocols/hack/update-generated-proto.sh checks for the presence
of protoc-gen-rust and ttrpc_rust_plugin, but it doesn't actually need
them.  Those tools are needed to generate Rust code from the gRPC proto
files, but that's already handled in src/agent/protocols/build.rs using
Cargo for dependency management.

This script is only needed for the Go code, for which the other tools are
sufficient.

fixes #1198

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-12-16 12:10:27 +11:00
Maksym Pavlenko
3db1c8059d agent: Don't leak fd when reseeding rng
This PR wraps fd raw descriptor with File, so it'll be properly closed once exited.

Fixes: #1192

Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2020-12-11 16:18:41 -08:00
Wainer dos Santos Moschetta
10e9bfc6f7 runtime: Allow to overwrite DESTDIR
On runtime/Makefile the value of DESTDIR is set to "/", unless one
pass that variable as an argument to `make`. This change will
allow its overwrite if DESTDIR is exported in the environment as
well.

Fixes #1182

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2020-12-09 09:04:04 -05:00
Peng Tao
e167bf30e3 Merge pull request #1165 from liubin/fix/exec-hang-when-bg-process-running
agent: exit from exec hangs if background process is present
2020-12-08 20:32:23 +08:00
bin liu
1ca415d87e agent: exit from exec hangs if background process is present
This is the Rust porting of https://github.com/kata-containers/agent/pull/371

`read_stdout`/`read_stderr` is blocking rpc calls, if exec process
exited, these calls is on blocking state for reading on process's
term master fd, and can't get a chance to break the wait.

In this PR, `read_stdout`/`read_stderr` will not read directly from
a term master of a process, instead, it will first have to get
an fd to read from newly added `epoller.poll()`. `epoller.poll()` may returns:

- the term master fd of exec process, if the process is running.
- a fd(piped fd) will return EOF when reading to indicate that th process is exited.

Fixes: #1160

Signed-off-by: bin liu <bin@hyper.sh>
2020-12-07 10:52:44 +08:00
Chelsea Mafrica
49e7151d3d shimv2: Add tracing
Add trace calls to shimv2 that create spans for functions in service.go.
Tracing starts in New(), which is forked twice and is followed by either
StartShim() or Create().

Tracing cannot start without the value for Trace enabled from the
runtime config so load the config in New(), which results in it being
loaded every time New() is called in addition to where it is originally
loaded after Create().

Fixes #903

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2020-12-04 19:38:44 -08:00
Peng Tao
4bca7312c7 Merge pull request #1158 from liubin/fix/1156-fix-cpuset
handle vcpus properly utilized in the guest
2020-12-04 22:32:15 +08:00
Fabiano Fidêncio
f7383ef835 Merge pull request #1166 from cmaf/fix-ctx-port
shimv2: handle ctx passed by containerd
2020-12-03 19:45:52 +01:00
Bin Liu
4e0a7e31f9 Merge pull request #1103 from likebreath/1111/clh_fix_cleanupVM
runtime: clh: Enforce to call 'cleanupVM' for 'stopSandbox'
2020-12-03 17:34:26 +08:00
Chelsea Mafrica
0155fe1260 shimv2: handle ctx passed by containerd
Sometimes shim process cannot be shutdown because of container list
is not empty. This container list is written in shim service, while
creating container. We find that if containerd cancel its Create
Container Request due to timeout, but runtime didn't handle it properly
and continue creating action, then this container cannot be deleted at
all. So we should make sure the ctx passed to Create Service rpc call
is effective.

Fixes #1088

Signed-off-by: Yves Chan <shanks.cyp@gmail.com>
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2020-12-02 14:28:31 -08:00
Archana Shinde
f96cdc1a67 Merge pull request #1114 from c3d/bug/1111-agent-oom-killer
agent: Adjust OOM Score to avoid agent being killed.
2020-12-02 11:40:35 -08:00
bin liu
a793b8d90d agent: update cpuset of container path
After cpu hot-plugged is available, cpuset for containers will be written into
cgroup files recursively, the paths should include container's cgroup path, and up
to root path of cgroup filesystem.

Fixes: #1156, #1159

Signed-off-by: bin liu <bin@hyper.sh>
2020-12-02 10:38:26 +08:00
bin liu
705182d04e agent: ignore updating cpuset error when update cgroups
The result of `cpuset_controller.set_cpus(&cpu.cpus)` is unwrapped,
this will lead creating container to fail if cpuset is set.

The sandbox's `CreateContainer` sequence is:

c, err := newContainer(s, &contConfig)
err = c.create()
  c.sandbox.agent.createContainer(c.sandbox, c) (1)
err = s.updateResources()
  oldCPUs, newCPUs, err := s.hypervisor.resizeVCPUs(sandboxVCPUs) (2)

cpuset only avaiable after `s.hypervisor.resizeVCPUs` has been called at (2),
and then cpuset is written to cgourps file.

Fixes: #1159

Signed-off-by: bin liu <bin@hyper.sh>
2020-12-02 10:38:16 +08:00
Bo Chen
647331ace6 runtime: clh: Enforce to call 'cleanupVM' for 'stopSandbox'
We should always cleanup the vm directory when doing `stopSandbox`,
while we are skipping the cleanup process on some error code paths when
using cloud-hypervisor driver.

Fixes: #1098

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-12-01 17:27:44 -08:00
Eric Ernst
2f1cb7995f kata-monitor: allow for building for alpine
- add a reference Dockerfile to tools
- update kata-monitor build to:
  1) utilize the kata buildflags, which were dropped before
  2) disable CGO, so we have option for building in alpine

From root of the repository, example build:
 $ docker build -f tools/packaging/kata-monitor/Dockerfile -t kata-monitor .

Fixes: #1135

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-12-01 10:28:59 -08:00
Fabiano Fidêncio
5e407758f6 trace-forwarder: Add void "install" target
Otherwise `make install` run from the top directory would just fail as
the target is not defined.

Fixes: #1149

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2020-11-27 15:26:23 +01:00
Julio Montes
70f198d78e cli: check modules and permissions before loading a module
Before loading a module, the check subcommand should check if the
current user can load it.

fixes #3085

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-11-26 11:55:42 -06:00
Julio Montes
cb684cf8ea cli: don't fail if rate limit is exceeded
Don't fail if rate limit is exceeded since this is a
limitation/restriction of Github not a problem in the host.
Print a warning when the rate limit is exceeded.

For more information about Github's rate limit, see
https://developer.github.com/v3/#rate-limiting

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-11-26 11:50:14 -06:00
Bin Liu
b8716d8eec Merge pull request #1141 from lifupan/fix_thread_spwan
rustjail: fork a new child process to change the pid ns
2020-11-25 15:20:36 +08:00
fupan.lfp
9216f2ad63 rustjail: fork a new child process to change the pid ns
The main process do unshare pid namespace, the process
couldn't spawn new thread, in order to avoid this issue,
fork a new child process and do the pid namespace unshare
in the new temporary process.

Fixes: #1140

Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
2020-11-23 17:57:33 +08:00
fupan.lfp
3b08376c4e rustjail: remove the network ns validation against container
Since kata containers shared the network ns with
the guest system, thus there's no need to do the
network ns check.

Fixes: #1047

Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
2020-11-23 14:41:22 +08:00
James O. D. Hunt
7c12c5481e Merge pull request #1128 from liubin/fix/1127-delete-wait
runtime: don't wait the second shim process in shim start
2020-11-18 14:19:11 +00:00
Julio Montes
f00655a40f Merge pull request #1060 from jongwu/rootbus
agent: create pci root Bus Path for arm64
2020-11-18 08:13:30 -06:00
Julio Montes
e411ebc779 Merge pull request #1126 from liubin/fix/1125-enable-lto
agent: enable lto flag for Cargo to get better optimized code
2020-11-18 08:07:58 -06:00
bin liu
c388ec5bef runtime: don't wait the second shim process in shim start
In first shim v2 startup(with `start` command-line option), it will start
the second shim v2 process running as ttrpc server, there is no needs to
wait the second process, because the current shim v2 process will exit immediately.

Fixes: #1127

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-18 17:18:59 +08:00
bin liu
d6acc4c09c agent: enable lto flag for Cargo to get better optimized code
The lto setting controls the -C lto flag which controls LLVM's link time optimizations.
LTO can produce better optimized code, using whole-program analysis,
at the cost of longer linking time.

https://doc.rust-lang.org/cargo/reference/profiles.html#lto

Fixes: #1125

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-18 15:50:27 +08:00
Julio Montes
1dd77e204f Merge pull request #1120 from liubin/fix/1119-revert-cleanupcontainer-api
virtcontainers: revert CleanupContainer from PR 1079
2020-11-17 09:11:29 -06:00
bin liu
fdbf7d3222 virtcontainers: revert CleanupContainer from PR 1079
In PR 1079, CleanupContainer's parameter of sandboxID is changed to VCSandbox, but at cleanup,
there is no VCSandbox is constructed, we should load it from disk by loadSandboxConfig() in
persist.go. This commit reverts parts of #1079

Fixes: #1119

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-17 10:31:33 +08:00