Commit Graph

281 Commits

Author SHA1 Message Date
Georgina Kinge
4f80ea1962 CCv0: Merge main into CCv0 branch
Merge remote-tracking branch 'upstream/main' into CCv0

Fixes: #4507
Signed-off-by: Georgina Kinge <georgina.kinge@ibm.com>
2022-06-22 10:06:27 +01:00
Eric Ernst
72049350ae Merge pull request #4288 from fengwang666/enable-qemu-sandbox
runtime: enable sandbox feature on qemu
2022-06-21 09:22:26 -07:00
Liang Zhou
ef925d40ce runtime: enable sandbox feature on qemu
Enable "-sandbox on" in qemu can introduce another protect layer
on the host, to make the secure container more secure.

The default option is disable because this feature may introduce some
performance cost, even though user can enable
/proc/sys/net/core/bpf_jit_enable to reduce the impact.

Fixes: #2266

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-06-17 15:30:46 -07:00
Chelsea Mafrica
28995301b3 tracing: Remove whitespace from root span
Remove space from root span name to follow camel casing of other tracing
span names in the runtime and to make parsing easier in testing.

Fixes #4483

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2022-06-17 12:07:37 -07:00
Megan Wright
eeff63375f CCv0: Merge main into CCv0 branch
Merge in snap fix

Signed-off-by: Megan Wright <megan.wright@ibm.com>
2022-06-16 10:55:42 +01:00
Bin Liu
553ec46115 Merge pull request #4436 from alex-matei/fix/sandbox-mem-overflow
runtime: fix error when trying to parse sandbox sizing annotations
2022-06-16 11:18:24 +08:00
Megan Wright
94695869b0 CCv0: Merge main into CCv0 branch
Merge remote-tracking branch 'upstream/main' into CCv0

Fixes: #4460
Signed-off-by: Megan-Wright <megan.wright@ibm.com>
2022-06-15 11:05:51 +01:00
Bin Liu
81acfc1286 Merge pull request #4425 from liubin/fix/4376-change-log-level-of-getoomevent
shim: change the log level for GetOOMEvent call failures
2022-06-13 17:53:11 +08:00
Alexandru Matei
721ca72a64 runtime: fix error when trying to parse sandbox sizing annotations
Changed bitsize for parsing functions to 64-bit in order to avoid
parsing errors.

Fixes #4435

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2022-06-11 18:51:10 +03:00
Eric Ernst
4ebf9d38b9 Merge pull request #4310 from egernst/core-sched
shim: add support for core scheduling
2022-06-08 17:42:45 +02:00
Megan Wright
aa9d875a8d CCv0: Merge main into CCv0 branch
Merge remote-tracking branch 'upstream/main' into CCv0

Fixes: #4424
Signed-off-by: Megan Wright <megan.wright@ibm.com>
2022-06-08 15:51:18 +01:00
Bin Liu
eff4e1017d shim: change the log level for GetOOMEvent call failures
GetOOMEvent is a blocking call that will fail if
the container exit, in this case, it's not an error or warning.

Changing the log level for logs in case of GetOOMEvent call fails
will reduce log noise in a large cluster that has pods
creating/deleting frequently.

Fixes: #4376

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-06-08 22:17:24 +08:00
Eric Ernst
430da47215 Merge pull request #4360 from fengwang666/shim-leak
runtime: ignore ESRCH error from stop container
2022-06-02 12:42:19 -07:00
Feng Wang
9726f56fdc runtime: force stop container after the container process exits
Set thestop container force flag to true so that the container state is always set to
“StateStopped” after the container wait goroutine is finished. This is necessary for
the following delete container step to succeed.

Fixes: #4359

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-06-02 08:17:08 -07:00
Eric Ernst
d2df1209a5 docs: describe kata handling for core-scheduling
Add initial documentation for core-scheduling.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-05-31 16:17:00 -07:00
Michael Crosby
22b6a94a84 shim: add support for core scheduling
In linux 5.14 and hopefully some backports, core scheduling allows processes to
be co scheduled within the same domain on SMT enabled systems.

Containerd impl sets the core sched domain when launching a shim. This
allows a clean way for each shim(container/pod) to be in its own domain and any
additional containers, (v2 pods) be be launched with the same domain as well as
any exec'd process added to the container.

kernel docs: https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/core-scheduling.html

For Kata specifically, we will look for SCHED_CORE environment variable
to be set to indicate we shuold create a new schedule core domain.

This is equivalent to the containerd shim's PR: e48bbe8394

Fixes: #4309

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Michael Crosby <michael@thepasture.io>
2022-05-31 10:10:40 -07:00
Eric Ernst
3201ad0830 shim-client: ensure we check resp status for Put/Post
Without this, potential errors are silently dropped. Let's ensure we
return the error code as well as potenial data from the response.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-05-31 09:27:58 -07:00
Eric Ernst
0706fb28ac kata-runtime: shmgmt: make url usage consistent
Before, we had a mix of slash, etc. Unfortunately, when cleaning URL
paths, serve mux seems to mangle the request method, resulting in each
request being a GET (instead of PUT or POST).

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-05-31 09:27:58 -07:00
Eric Ernst
2a09378dd9 shim-client: add support for DoPut
While at it, make sure we check for nil in DoPost

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-05-31 09:27:58 -07:00
Eric Ernst
640173cfc2 shim-mgmt: Add endpoint handler for interacting with iptables
Add two endpoints: ip6tables, iptables.

Each url handler supports GET and PUT operations. PUT expects
the requests' data to be []bytes, and to contain iptable information in
format to be consumed by iptables-restore.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-05-31 09:27:58 -07:00
Yibo Zhuang
44c6d5bcea runtime: direct-volume stats update to use GET parameter
The go default http mux AFAIK doesn’t support pattern
routing so right now client is padding the url
for direct-volume stats with a subpath of the volume
path and this will always result in 404 not found returned
by the shim.

This change will update the shim to take the volume
path as a GET query parameter instead of a subpath.
If the parameter is missing or empty, then return
400 BadRequest to the client.

Fixes: #4297

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
2022-05-30 16:02:29 +02:00
Snir Sheriber
a48d13f68d runtime: allow annotation configuration to use_legacy_serial
and update the docs and test

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-05-30 16:02:29 +02:00
Snir Sheriber
060fed814c qemu: allow using legacy serial device for the console
This allows to get guest early boot logs which are usually
missed when virtconsole is used.
- It utilizes previous work on the govmm side:
https://github.com/kata-containers/govmm/pull/203
- unit test added

Fixes: #4237
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-05-30 16:02:29 +02:00
Yibo Zhuang
ffdc065b4c runtime: direct-volume stats update to use GET parameter
The go default http mux AFAIK doesn’t support pattern
routing so right now client is padding the url
for direct-volume stats with a subpath of the volume
path and this will always result in 404 not found returned
by the shim.

This change will update the shim to take the volume
path as a GET query parameter instead of a subpath.
If the parameter is missing or empty, then return
400 BadRequest to the client.

Fixes: #4297

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
2022-05-20 18:41:51 -07:00
Snir Sheriber
f4994e486b runtime: allow annotation configuration to use_legacy_serial
and update the docs and test

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-05-18 18:58:21 +03:00
Snir Sheriber
c67b9d2975 qemu: allow using legacy serial device for the console
This allows to get guest early boot logs which are usually
missed when virtconsole is used.
- It utilizes previous work on the govmm side:
https://github.com/kata-containers/govmm/pull/203
- unit test added

Fixes: #4237
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-05-17 12:06:11 +03:00
Megan Wright
ef1ae5bc93 CCv0: Merge main into CCv0 branch
Merge remote-tracking branch 'upstream/main' into CCv0

Fixes: #4200
Signed-off-by: Megan Wright <megan.wright@.ibm.com>
2022-05-04 11:26:50 +01:00
Fabiano Fidêncio
511f7f822d config: Add DiskRateLimiter* to Cloud Hypervisor
Let's add the newly added disk rate limiter configurations to the Cloud
Hypervisor's hypervisor configuration.

Right now those are not used anywhere, and there's absolutely no way the
users can set those up.  That's coming later in this very same series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:27:15 +02:00
Fabiano Fidêncio
5b18575dfe hypervisor: Add disk bandwidth and operations rate limiters
This is the disk counterpart of the what was introduced for the network
as part of the previous commits in this series.

The newly added fields are:
* DiskRateLimiterBwMaxRate, defined in bits per second, which is used to
  control the network I/O bandwidth at the VM level.
* DiskRateLimiterBwOneTimeBurst, also defined in bits per second, which
  is used to define an *initial* max rate, which doesn't replenish.
* DiskRateLimiterOpsMaxRate, the operations per second equivalent of the
  DiskRateLimiterBwMaxRate.
* DiskRateLimiterOpsOneTimeBurst, the operations per second equivalent of
  the DiskRateLimiterBwOneTimeBurst.

For now those extra fields have only been added to the hypervisor's
configuration and they'll be used in the coming patches of this very
same series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:27:11 +02:00
Fabiano Fidêncio
c9f6496d6d config: Add NetRateLimiter* to Cloud Hypervisor
Let's add the newly added network rate limiter configurations to the
Cloud Hypervisor's hypervisor configuration.

Right now those are not used anywhere, and there's absolutely no way the
users can set those up.  That's coming later in this very same series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:22:42 +02:00
Fabiano Fidêncio
2d35e6066d hypervisor: Add network bandwidth and operations rate limiters
In a similar way to what's already exposed as RxRateLimiterMaxRate and
TxRateLimiterMaxRate, let's add four new fields to the Hypervisor's
configuration.

The values added are related to bandwidth and operations rate limiters,
which have to be added so we can expose I/O throttling configurations to
users using Cloud Hypervisor as their preferred VMM.

The reason we cannot simply re-use {Rx,Tx}RateLimiterMaxRate is because
Cloud Hypervisor exposes a single MaxRate to be used for both inbound
and outbound queues.

The newly added fields are:
* NetRateLimiterBwMaxRate, defined in bits per second, which is used to
  control the network I/O bandwidth at the VM level.
* NetRateLimiterBwOneTimeBurst, also defined in bits per second, which
  is used to define an *initial* max rate, which doesn't replenish.
* NetRateLimiterOpsMaxRate, the operations per second equivalent of the
  NetRateLimiterBwMaxRate.
* NetRateLimiterOpsOneTimeBurst, the operations per second equivalent of
  the NetRateLimiterBwOneTimeBurst.

For now those extra fields have only been added to the hypervisor's
configuration and they'll be used in the coming patches of this very
same series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:22:42 +02:00
Georgina Kinge
67015ac1d7 CCv0: Merge main into CCv0 branch
Merge remote-tracking branch 'upstream/main' into CCv0

Fixes: #4157
Signed-off-by: Georgina Kinge <georgina.kinge@ibm.com>
2022-04-27 10:39:08 +01:00
David Gibson
f7ba21c86f runtime: Clean up mock hook logs in tests
The tests in hook_test.go run a mock hook binary, which does some debug
logging to /tmp/mock_hook.log.  Currently we don't clean up those logs
when the tests are done.  Use a test cleanup function to do this.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:14:52 +10:00
David Gibson
90b2f5b776 runtime: Make SetupOCIConfigFile clean up after itself
SetupOCIConfigFile creates a temporary directory with os.MkDirTemp().  This
means the callers need to register a deferred function to remove it again.
At least one of them was commented out meaning that a /temp/katatest-
directory was leftover after the unit tests ran.

Change to using t.TempDir() which as well as better matching other parts of
the tests means the testing framework will handle cleaning it up.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:14:52 +10:00
Megan Wright
738ae8c60e CCv0: Merge main into CCv0 branch
Merge remote-tracking branch 'upstream/main' into CCv0

Fixes: #4115
Signed-off-by: Megan-Wright <megan.wright.ibm.com>
2022-04-20 11:32:31 +01:00
Bin Liu
b19bfac7cd Merge pull request #4042 from yibozhuang/direct-assign-fsgroup
fsGroup support for direct-assigned volume
2022-04-16 10:23:15 +08:00
Bin Liu
362201605e Merge pull request #4055 from fgiudici/kata-monitor_pprof
kata-monitor: update the hrefs in the debug/pprof index page
2022-04-16 08:12:18 +08:00
Chelsea Mafrica
32f92e75cc Merge pull request #4021 from fengwang666/direct-volume-bug
runtime: Base64 encode the direct volume mountInfo path
2022-04-13 13:15:38 -07:00
Megan Wright
a36e9ba87f CCv0: Merge main into CCv0 branch
Merge remote-tracking branch 'upstream/main' into CCv0

Fixes: #4090
Signed-off-by: Megan Wright <megan.wright@ibm.com>
2022-04-13 09:54:32 +01:00
Francesco Giudici
86977ff780 kata-monitor: update the hrefs in the debug/pprof index page
kata-monitor allows to get data profiles from the kata shim
instances running on the same node by acting as a proxy
(e.g., http://$NODE_ADDRESS:8090/debug/pprof/?sandbox=$MYSANDBOXID).
In order to proxy the requests and the responses to the right shim,
kata-monitor requires to pass the sandbox id via a query string in the
url.

The profiling index page proxied by kata-monitor contains the link to all
the data profiles available. All the links anyway do not contain the
sandbox id included in the request: the links result then broken when
accessed through kata-monitor.
This happens because the profiling index page comes from the kata shim,
which will not include the query string provided in the http request.

Let's add on-the-fly the sandbox id in each href tag returned by the kata
shim index page before providing the proxied page.

Fixes: #4054

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
2022-04-12 15:53:59 +02:00
Yibo Zhuang
532d53977e runtime: fsGroup support for direct-assigned volume
The fsGroup will be specified by the fsGroup key in
the direct-assign mountinfo metadate field.
This will be set when invoking the kata-runtime
binary and providing the key, value pair in the metadata
field. Similarly, the fsGroupChangePolicy will also
be provided in the mountinfo metadate field.

Adding an extra fields FsGroup and FSGroupChangePolicy
in the Mount construct for container mount which will
be populated when creating block devices by parsing
out the mountInfo.json.

And in handleDeviceBlockVolume of the kata-agent client,
it checks if the mount FSGroup is not nil, which
indicates that fsGroup change is required in the guest,
and will provide the FSGroup field in the protobuf to
pass the value to the agent.

Fixes #4018

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
2022-04-11 08:41:13 -07:00
bin
f8cc5d1ad8 kata-monitor: add some links when generating pages for browsers
Add some links to rendered webpages for better user experience,
let users can jump to pages only by clicking links in browsers.

Fixes: #4061

Signed-off-by: bin <bin@hyper.sh>
2022-04-11 09:29:56 +08:00
Georgina Kinge
8add48d759 CCv0: Merge main into CCv0 branch
Merge remote-tracking branch 'upstream/main' into CCv0

Fixes: #4047
Signed-off-by: Georgina Kinge <georgina.kinge@ibm.com>
2022-04-07 10:58:17 +01:00
Fabiano Fidêncio
b39caf43f1 Merge pull request #3923 from Jakob-Naucke/no-initrd-se
runtime: Allow and require no initrd for SE
2022-04-05 09:26:07 +02:00
Feng Wang
354cd3b9b6 runtime: Base64 encode the direct volume mountInfo path
This is to avoid accidentally deleting multiple volumes.

Fixes #4020

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-04-04 19:56:46 -07:00
Eng Zer Jun
59c7165ee1 test: use T.TempDir to create temporary test directory
The directory created by `T.TempDir` is automatically removed when the
test and all its subtests complete.

This commit also updates the unit test advice to use `T.TempDir` to
create temporary directory in tests.

Fixes: #3924

Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2022-03-31 09:31:36 +08:00
bin
5e1c30d484 runtime: add logs around sandbox monitor
For debugging purposes, add some logs.

Fixes: #3815

Signed-off-by: bin <bin@hyper.sh>
2022-03-29 16:59:12 +08:00
bin
fb8be96194 runtime: stop getting OOM events when ttrpc: closed error
getOOMEvents is a long-waiting call, it will retry when failed.
For cases of agent shutdown, the retry should stop.

When the agent hasn't detected agent has died, we can also check
whether the error is "ttrpc: closed".

Fixes: #3815

Signed-off-by: bin <bin@hyper.sh>
2022-03-29 16:39:01 +08:00
Feng Wang
0928eb9f4e agent: Kill the all the container processes of the same cgroup
Otherwise the container process might leak and cause an unclean exit

Fixes: #3913

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-27 10:06:58 -07:00
Jakob Naucke
ff17c756d2 runtime: Allow and require no initrd for SE
Previously, it was not permitted to have neither an initrd nor an image.
However, this is the exact config to use for Secure Execution, where the
initrd is part of the image to be specified as `-kernel`. Require the
configuration of no initrd for Secure Execution.

Also
- remove redundant code for image/initrd checking -- no need to check in
  `newQemuHypervisorConfig` (calling) when it is also checked in
  `getInitrdAndImage` (called)
- use `QemuCCWVirtio` constant when possible

Fixes: #3922
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2022-03-25 18:36:12 +01:00