Commit Graph

79 Commits

Author SHA1 Message Date
Eric Ernst
3caed6f88d runtime: shim: dedup client, socket addr code
(1) Add an accessor function, SocketAddress, to the shim-v2 code for
determining the shim's abstract domain socket address, given the sandbox
ID.

(2) In kata monitor, create a function, BuildShimClient, for obtaining the appropriate
http.Client for communicating with the shim's monitoring endpoint.

(3) Update the kata CLI and kata-monitor code to make use of these.

(4) Migrate some kata monitor methods to be functions, in order to ease
future reuse.

(5) drop unused namespace from functions where it is no longer needed.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-05-07 15:20:37 -07:00
Fabiano Fidêncio
4bc006c8a4 runtime: Short the shim-monitor path
Instead of having something like
"/containerd-shim/$namespace/$sandboxID/shim-monitor.sock", let's change
the approach to:
* create the file in a more neutral location "/run/vc", instead of
  "/containerd-shim";
* drop the namespace, as the sandboxID should be unique;
* remove ".sock" from the socket name.

This will result on a name that looks like:
"/run/vc/$sandboxID/shim-monitor"

Fixes: #497

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-05-07 14:20:35 -07:00
Tim Zhang
75648b0770 Merge pull request #1745 from liubin/fix/1744-add-doc-for-enable_pprof
docs: add per-Pod Kata configurations for `enable_pprof`
2021-05-07 13:45:34 +08:00
Chelsea Mafrica
3e8137399c Merge pull request #1805 from liubin/fix/1804-select-sandbox-ctx
runtime: use s.ctx instead ctx for checking cancellation
2021-05-06 09:51:47 -07:00
Chelsea Mafrica
917665ab6d Merge pull request #1751 from liubin/fix/1750-fix-comments
runtime: fix some comments
2021-05-06 08:42:15 -07:00
bin
79831fafaf runtime: use s.ctx instead ctx for checking cancellation
s.ctx should be used for checking cancellation, and the
local ctx is used for tracing.

Fixes: #1804

Signed-off-by: bin <bin@hyper.sh>
2021-05-06 17:22:53 +08:00
bin
f6d5fbf9ba runtime: fix some comments
This commint include two types of fixes for comments
in src/runtime/containerd-shim-v2/start.go.

- Update comment for calling of watchOOMEvents.
- Comments without heading spaces.

Fixes: #1750

Signed-off-by: bin <bin@hyper.sh>
2021-05-06 17:12:52 +08:00
bin
95e54e3f48 docs: add per-Pod Kata configurations for enable_pprof
Now enabling enable_pprof for individual pods is supported,
but not documented.

This commit will add per-Pod Kata configurations for `enable_pprof`
in file `docs/how-to/how-to-set-sandbox-config-kata.md`

Fixes: #1744

Signed-off-by: bin <bin@hyper.sh>
2021-04-26 22:20:49 +08:00
Chelsea Mafrica
8587e3a00b Merge pull request #1732 from liubin/fix/1731-delete-builtin-parameter
runtime: delete not used function parameter builtIn
2021-04-23 18:30:55 -07:00
bin
677f0d9904 runtime: delete not used function parameter builtIn
Parametr builtIn is not used in function updateRuntimeConfigAgent,
delete it from updateRuntimeConfigAgent and LoadConfiguration
function signature.

Fixes: #1731

Signed-off-by: bin <bin@hyper.sh>
2021-04-23 17:42:42 +08:00
Julien Ropé
d4a5413774 runtime: Fix stdout/stderr output from container being truncated
Do not close the tty as part of the stdout redirection routine.
The close is already happening a couple lines below, after all routines
have finished.

Fixes #1713

Signed-off-by: Julien Ropé <jrope@redhat.com>
2021-04-21 17:09:09 +02:00
bin
09d454ac74 runtime: import runtime/v2/runc/options to decode request from Docker
Shimv2 protocol CreateTaskRequest.Options has a type of *google_protobuf.Any.
If the call is from Docker, to decode the request,
the proto types(github.com/containerd/containerd/runtime/v2/runc/options)
should be imported.

Fixes: #1576

Signed-off-by: bin <bin@hyper.sh>
2021-03-30 19:44:00 +08:00
Chelsea Mafrica
6fcfea8dcf runtime: Fix static check errors
Fix comment formatting and unused variable to make static checks pass.

Fixes #1550

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-03-25 12:59:01 -07:00
Chelsea Mafrica
f3ebbb1f1a runtime: Fix trace span ordering
Return ctx in trace() functions to correct span ordering.

Fixes #1550

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-03-25 11:43:04 -07:00
Bin Liu
5b5b5cc611 Merge pull request #1539 from bergwolf/ut
fix runtime UTs and enable static check
2021-03-25 16:29:45 +08:00
James O. D. Hunt
2fc7f75724 Merge pull request #1521 from jodh-intel/verify-cid
Verify container ID
2021-03-24 13:27:58 +00:00
Peng Tao
74192d179d runtime: fix static check errors
It turns out we have managed to break the static checker in many
difference places with the absence of static checker in github action.
Let's fix them while enabling static checker in github actions...

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-03-24 20:10:19 +08:00
Peng Tao
8e71c4fc7a runtime: fix missing context argument in mocked sandbox APIs
Missing context.Context in several APIs.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-03-23 16:19:46 +08:00
James O. D. Hunt
b265870997 runtime: Validate CID
Validate the container ID as we cannot rely on the container manager
doing this.

Fixes: #1520.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-03-18 11:19:32 +00:00
Chelsea Mafrica
9e4932a6e2 runtime: use root span for shimv2 tracing
Add rootCtx to service struct in shimv2 to use as parent of spans
created in shimv2 for a more organized trace ouput.

Fixes #1355

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-03-16 17:39:28 -07:00
Chelsea Mafrica
6b0dc60dda runtime: Fix ordering of trace spans
A significant number of trace calls did not use a parent context that
would create proper span ordering in trace output. Add local context to
functions for use in trace calls to facilitate proper span ordering.
Additionally, change whether trace function returns context in some
functions in virtcontainers and use existing context rather than
background context in bindMount() so that span exists as a child of a
parent span.

Fixes #1355

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-03-16 17:39:28 -07:00
Fupan Li
62d30ca2b6 Merge pull request #1498 from liubin/fix/1497-task-exit-pid
runtime: return hypervisor Pid in TaskExit event
2021-03-11 12:58:28 +08:00
Eric Ernst
f0d49851db exec: ensure sup groups are added to agent request
Extra groups were not being handled when exec'ing. Ensure
that these are handled.

Before this, running a pod with:
```
 ...snippet...
 securityContext:
   fsGroup: 266
   runAsGroup: 51020
   runAsUser: 264
```

And then exec'ing would not supply the fsGroup:
```
$ kubectl exec -it kata-bb  -- sh -c id
uid=264 gid=51020
```

Fixes: #1500

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2021-03-09 17:17:55 -08:00
bin
b034458960 runtime: return hypervisor Pid in TaskExit event
Other RPC calls return Pid of hypervisor, the TaskExit should
return the same Pid.

Fixes: #1497

Signed-off-by: bin <bin@hyper.sh>
2021-03-09 17:41:44 +08:00
Wainer dos Santos Moschetta
21bdaaf84f runtime: Fix missing 'name' field on containerd-shim-v2 logs
Each Kata Containers application should generate log records with a specified
structure. Currently on containerd-shim-v2's logs, the required 'name' field
is missing. This changed its logger to append the application name on each
and every emitted entries.

Fixes #1479
Related-to: github.com/kata-containers/tests/issues/3260
Suggested-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2021-03-01 16:52:24 -05:00
Bin Liu
61f0291d63 Merge pull request #1452 from lifupan/main
shimv2: return the hypervisor's pid as the container pid
2021-03-01 15:48:01 +08:00
Eric Ernst
0f7098339b runtime: check if error loading runtime config
Looks like we inadvertantly removed the check on the loadRuntimeConfig
error return value. Adding back...

Fixes: #1474

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2021-02-26 10:51:38 -08:00
fupan.lfp
bc0ac526a2 shimv2: return the hypervisor's pid as the container pid
Since the kata's hypervisor process is in the network namespace,
which is close to container's process, and some host metrics
such as cadvisor can use this pid to access the network namespace
to get some network metrics. Thus this commit replace the shim's
pid with the hypervisor's pid.

Fixes: #1451

Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
2021-02-24 13:26:05 +08:00
Chelsea Mafrica
a44b27291c runtime: Create tracer later in shimv2
Remove loading of configuration from New() because we do not know the
correct configuration file for the runtime until Create() and so that it
is not loaded more than once. Start tracer in create() so that it is
created after the runtime config is loaded in its original location.

Fixes #1411

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-02-17 19:45:48 -08:00
bin
17df9b119d runtime: migrate from opentracing to opentelemetry
This commit includes two changes:
- migrate from opentracing to opentelemetry
- add jaeger configuration items

Fixes: #1351

Signed-off-by: bin <bin@hyper.sh>
2021-02-03 17:30:49 +08:00
Snir Sheriber
0e57393fcc shimv2: log a warning and continue on post-start hook failure
According to runtime-spec:
The poststart hooks MUST be invoked by the runtime. If any poststart
hook fails, the runtime MUST log a warning, but the remaining hooks
and lifecycle continue as if the hook had succeeded

Fixes: #1252

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2021-01-25 16:29:55 +02:00
Snir Sheriber
e7043fe284 shimv2: log a warning and continue on post-stop hook failure
According to runtime-spec:
The poststop hooks MUST be invoked by the runtime. If any
poststop hook fails, the runtime MUST log a warning, but
the remaining hooks and lifecycle continue as if the hook
had succeeded.

Fixes: #1252

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2021-01-25 16:29:47 +02:00
Chelsea Mafrica
b24a2d2e48 Merge pull request #904 from cmaf/tracing-shimv2
shimv2: Add tracing to shimv2
2021-01-14 16:38:28 -08:00
Snir Sheriber
5c464018ed shimv2: Avoid double removing of container from sandbox
RemoveContainerRequest results in calling to deleteContainer, according
to spec calling to RemoveContainer is idempotent and "must not return
an error if the container has already been removed", hence, don't
return error if the error reports that the container is not found.

Fixes: #836

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2020-12-27 18:04:06 +02:00
Chelsea Mafrica
49e7151d3d shimv2: Add tracing
Add trace calls to shimv2 that create spans for functions in service.go.
Tracing starts in New(), which is forked twice and is followed by either
StartShim() or Create().

Tracing cannot start without the value for Trace enabled from the
runtime config so load the config in New(), which results in it being
loaded every time New() is called in addition to where it is originally
loaded after Create().

Fixes #903

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2020-12-04 19:38:44 -08:00
Chelsea Mafrica
0155fe1260 shimv2: handle ctx passed by containerd
Sometimes shim process cannot be shutdown because of container list
is not empty. This container list is written in shim service, while
creating container. We find that if containerd cancel its Create
Container Request due to timeout, but runtime didn't handle it properly
and continue creating action, then this container cannot be deleted at
all. So we should make sure the ctx passed to Create Service rpc call
is effective.

Fixes #1088

Signed-off-by: Yves Chan <shanks.cyp@gmail.com>
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2020-12-02 14:28:31 -08:00
bin liu
c388ec5bef runtime: don't wait the second shim process in shim start
In first shim v2 startup(with `start` command-line option), it will start
the second shim v2 process running as ttrpc server, there is no needs to
wait the second process, because the current shim v2 process will exit immediately.

Fixes: #1127

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-18 17:18:59 +08:00
bin liu
fdbf7d3222 virtcontainers: revert CleanupContainer from PR 1079
In PR 1079, CleanupContainer's parameter of sandboxID is changed to VCSandbox, but at cleanup,
there is no VCSandbox is constructed, we should load it from disk by loadSandboxConfig() in
persist.go. This commit reverts parts of #1079

Fixes: #1119

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-17 10:31:33 +08:00
bin liu
4e3a8c0124 runtime: remove global sandbox variable
Remove global sandbox variable, and save *Sandbox to hypervisor struct.
For some needs, hypervisor may need to use methods from Sandbox.

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-13 09:47:09 +08:00
bin liu
cb0e6094ff runtime: sleep 1 second after GetOOMEvent failed
In some cases, for example agent crashed and not marked dead yet, the GetOOMEvent
will return errors like `connection reset by peer` or `ttrpc: closed`. Do a sleep
with 1 second (agent check interval) and let agent health check to do the check.

Fixes: #991

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-10 12:02:31 +08:00
bin liu
40418f6d88 runtime: add geust memory dump
When guest panic, dump guest kernel memory to host filesystem.
And also includes:
- hypervisor config
- hypervisor version
- and state of sandbox

Fixes: #1012

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-05 16:04:21 +08:00
Bin Liu
43da14e7b3 Merge pull request #752 from YchauWang/clear-moke-code01
runtime: Clear the VCMock 1.x API Methods from 2.0
2020-10-09 17:41:21 +08:00
Peng Tao
5611283ec5 runtime: fix golint errors
Need to run gofmt -s on them.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-09-28 11:01:07 +08:00
Peng Tao
5596eaa31d Merge pull request #441 from liubin/feature/245-add-debug-console
kata 2.0: add debug console service
2020-09-28 10:06:13 +08:00
Peng Tao
b20ca6334b Merge pull request #733 from cailca/732
shimv2: add a comment in checkAndMount()
2020-09-27 16:29:51 +08:00
bin liu
febdf8f68c runtime: add debug console service
Add `kata-runtime exec` to enter guest OS
through shell started by agent

Fixes: #245

Signed-off-by: bin liu <bin@hyper.sh>
2020-09-27 10:57:17 +08:00
Ychau Wang
1839dfd95a runtime: Clear the VCMock 1.x API Methods from 2.0
Clear the 1.x branch api methods in the 2.0. Keep the same methods to
the VC interface, like the VCImpl struct.

Fixes: #751

Signed-off-by: Ychau Wang <wangyongchao.bj@inspur.com>
2020-09-18 16:30:12 +08:00
Peng Tao
7e33e36f4a Merge pull request #698 from liubin/feature/146-add-cgroup-v2-for-agent
agent: add cgroup v2 support
2020-09-18 14:45:38 +08:00
Peng Tao
6487044fa1 shimv2: trust cached status when deleting containers
vc status might not be accurate because it does not watch container
status change.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-09-16 16:47:56 +08:00
Peng Tao
325a4f868d shimv2: do not kill a stopped exec process
Same as containers, it is possible for an exec process to stop so
quickly that containerd may send a parallel Kill request. We should
just return success in such case.

Fixes: #716
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-09-16 16:47:46 +08:00