Commit Graph

92 Commits

Author SHA1 Message Date
Snir Sheriber
b821511992 cgroups: Fix systemd cgroup support
As github.com/containerd/cgroups doesn't support scope
units which are essential in some cases lets create
the cgroups manually and load it trough the cgroups
api
This is currently done only when there's single sandbox
cgroup (sandbox_cgroup_only=true), otherwise we set it
as static cgroup path as it used to be (until a proper
soultion for overhead cgroup under systemd will be
suggested)

Backport-from: #2959
Fixes: #2868
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2021-11-14 09:41:35 +02:00
Snir Sheriber
ea83ff1fc3 runtime: remove prefix when cgroups are managed by systemd
as done previously in 9949daf4dc

Backport-from: #2959
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2021-11-14 09:37:24 +02:00
Chelsea Mafrica
09d5d8836b runtime: tracing: Change method for adding tags
In later versions of OpenTelemetry label.Any() is deprecated. Create
addTag() to handle type assertions of values. Change AddTag() to
variadic function that accepts multiple keys and values.

Fixes #2547

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-11-04 10:19:05 -07:00
David Gibson
a0825badf6 Merge pull request #2795 from dgibson/vfio-as-vfio
Allow VFIO devices to be used as VFIO devices in the container
2021-10-25 14:25:26 +11:00
David Gibson
57ab408576 runtime: Introduce "vfio_mode" config variable and annotation
In order to support DPDK workloads, we need to change the way VFIO devices
will be handled in Kata containers.  However, the current method, although
it is not remotely OCI compliant has real uses.  Therefore, introduce a new
runtime configuration field "vfio_mode" to control how VFIO devices will be
presented to the container.

We also add a new sandbox annotation -
io.katacontainers.config.runtime.vfio_mode - to override this on a
per-sandbox basis.

For now, the only allowed value is "guest-kernel" which refers to the
current behaviour where VFIO devices added to the container will be bound
to whatever driver in the VM kernel claims them.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:29:29 +11:00
Manohar Castelino
52268d0ece hypervisor: Expose the hypervisor itself
Export the top level hypervisor type

s/hypervisor/Hypervisor

Fixes: #2880

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-10-22 16:46:02 -07:00
Manohar Castelino
f434bcbf6c hypervisor: createSandbox is CreateVM
Last of a series of commits to export the top level
hypervisor generic methods.

s/createSandbox/CreateVM

Fixes #2880

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
76f1ce9e30 hypervisor: startSandbox is StartVM
s/startSandbox/StartVM

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
a6385c8fde hypervisor: stopSandbox is StopVM
Renaming. There is no Sandbox specific logic except tracing.

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
8f78e1cc19 hypervisor: The SandboxConsole is the VM's console
update naming

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
4d47aeef2e hypervisor: Export generic interface methods
This is in preparation for creating a seperate hypervisor package.
Non functional change.

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
6baf2586ee hypervisor: Minimal exports of generic hypervisor internal fields
Export commonly used hypervisor fields and utility functions.
These need to be exposed to allow the hypervisor to be consumed
externally.

Note: This does not change the hypervisor interface definition.
Those changes will be separate commits.

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
James O. D. Hunt
e61f5e2931 runtime: Show socket path in kata-env output
Display a pseudo path to the sandbox socket in the output of
`kata-runtime env` for those hypervisors that use Hybrid VSOCK.

The path is not a real path since the command does not create a sandbox.
The output includes a `{ID}` tag which would be replaced with the real
sandbox ID (name) when the sandbox was created.

This feature is only useful for agent tracing with the trace forwarder
where the configured hypervisor uses Hybrid VSOCK.

Note that the features required a new `setConfig()` method to be added
to the `hypervisor` interface. This isn't normally needed as the
specified hypervisor configuration passed to `setConfig()` is also
passed to `createSandbox()`. However the new call is required by
`kata-runtime env` to display the correct socket path for Firecracker.
The new method isn't wholly redundant for the main code path though as
it's now used by each hypervisor's `createSandbox()` call.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-10-15 11:45:29 +01:00
Fabiano Fidêncio
47170e302a Merge pull request #2616 from Bevisy/main-2615
sandbox: Allow the device to be accessed,such as /dev/null and /dev/u…
2021-09-23 13:04:18 +02:00
Samuel Ortiz
9bed2ade0f virtcontainers: Convert to the new cgroups package API
The new API is based on containerd's cgroups package.
With that conversion we can simpligy the virtcontainers sandbox code and
also uniformize our cgroups external API dependency. We now only depend
on containerd/cgroups for everything cgroups related.

Depends-on: github.com/kata-containers/tests#3805
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-09-14 07:09:34 +02:00
Samuel Ortiz
dc7e9bce73 virtcontainers: sandbox: Host cgroups partitioning
This is a simplification of the host cgroup handling by partitioning the
host cgroups into 2: A sandbox cgroup and an overhead cgroup.

The sandbox cgroup is always created and initialized. The overhead
cgroup is only available when sandbox_cgroup_only is unset, and is
unconstrained on all controllers. The goal of having an overhead cgroup
is to be more flexible on how we manage a pod overhead. Having such
cgroup will allow for setting a fixed overhead per pod, for a subset of
controllers, while at the same time not having the pod being accounted
for those resources.

When sandbox_cgroup_only is not set, we move all non vCPU threads
to the overhead cgroup and let them run unconstrained. When it is set,
all pod related processes and threads will run in the sandbox cgroup.

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
2021-09-14 07:09:29 +02:00
Samuel Ortiz
f811026c77 virtcontainers: Unconditionally create the sandbox cgroup manager
Regardless of the sandbox_cgroup_only setting, we create the sandbox
cgroup manager and set the sandbox cgroup path at the same time.

Without doing this, the hypervisor constraint routine is mostly a NOP as
the sandbox state cgroup path is not initialized.

Fixes #2184

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
2021-09-14 07:05:57 +02:00
Binbin Zhang
58e77a3c13 sandbox: Allow the device to be accessed,such as /dev/null and /dev/urandom
If the device has no permission, such as /dev/null, /dev/urandom,
it needs to be added into cgroup.

Fixes: #2615

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2021-09-13 20:47:16 +08:00
Binbin Zhang
71f915c63f sandbox: Add device permissions such as /dev/null to cgroup
adds the default devices for unix such as /dev/null, /dev/urandom to
the container's resource cgroup spec

Fixes: #2539

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2021-09-09 15:33:24 +08:00
Peng Tao
c0daa4ebff Merge pull request #2513 from cmaf/tracing-tracingtags-consistency
tracing: Change runtime tracing tags to vars
2021-08-31 10:25:10 +08:00
Snir Sheriber
0c7789fad6 runtime: Add container field to logs
and unified field naming

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2021-08-30 10:09:05 +03:00
Chelsea Mafrica
8f0f949abf tracing: Move dynamically added attributes to Trace()
Where possible, move attributes added with AddTag() to Trace() call to
reduce the amount of code used for tracing.

Fixes #2512

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-08-27 08:26:40 -07:00
Chelsea Mafrica
8058e97212 tracing: Change runtime tracing tags to vars
Tracing tags are stored inconsistently throughout the runtime. Change
all instances of tracing tags to variables.

Fixes #2512

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-08-26 15:55:32 -07:00
Hui Zhu
767a41ce56 updateResources: Log result after calculateSandboxMemory
Log result after calculateSandboxMemory in updateResources.

Fixes: #2367

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-08-01 09:57:44 +08:00
Julio Montes
47d95dc1c6 runtime: virtcontainers: fix govet fieldalignment
Fix structures alignment

fixes #2271

Depends-on: github.com/kata-containers/tests#3727

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-07-20 11:59:15 -05:00
Hui Zhu
cb6b7667cd runtime: Add option "enable_guest_swap" to config hypervisor.qemu
This commit add option "enable_guest_swap" to config hypervisor.qemu.
It will enable swap in the guest. Default false.
When enable_guest_swap is enabled, insert a raw file to the guest as the
swap device if the swappiness of a container (set by annotation
"io.katacontainers.container.resource.swappiness") is bigger than 0.
The size of the swap device should be
swap_in_bytes (set by annotation
"io.katacontainers.container.resource.swap_in_bytes") - memory_limit_in_bytes.
If swap_in_bytes is not set, the size should be memory_limit_in_bytes.
If swap_in_bytes and memory_limit_in_bytes is not set, the size should be
default_memory.

Fixes: #2201

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-07-19 23:22:06 +08:00
Hui Zhu
243d4b8689 runtime: Sandbox: Add addSwap and removeSwap
addSwap will create a swap file, hotplug it to hypervisor as a special
block device and let agent to setup it in the guest kernel.
removeSwap will remove the swap file.

Just QEMU support addSwap.

Fixes: #2201

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-07-19 23:20:40 +08:00
Fabiano Fidêncio
3a9ecbcca5 Merge pull request #2231 from liubin/fix/2230-register-defer-callback-at-early-stage
runtime: Register defer function at early stage
2021-07-14 17:50:48 +02:00
bin
39546a1070 runtime: delete not used functions
Delete some not used functions in sandbox.go

Fixes: #2230

Signed-off-by: bin <bin@hyper.sh>
2021-07-14 19:42:50 +08:00
bin
d0bc148fe0 runtime: Register defer function at early stage
Register defer function at early stage ensure that
it can be called if the startSandbox fails.

Fixes: #2230

Signed-off-by: bin <bin@hyper.sh>
2021-07-14 17:20:53 +08:00
bin
350acb2d6e virtcontainers: refactoring code for error handling in sandbox
Use a defined error variable replade inplace error, and shortcut
for handling errors returned from function calls.

Fixes: #2187

Signed-off-by: bin <bin@hyper.sh>
2021-07-14 14:28:58 +08:00
Benjamin Porter
b10e3e22b5 tracing: Consolidate tracing into a new katatrace package
Removes custom trace functions defined across the repo and creates
a single trace function in a new katatrace package. Also moves
span tag management into this package and provides a function to
dynamically add a tag at runtime, such as a container id, etc.

Fixes #1162

Signed-off-by: Benjamin Porter <bporter816@gmail.com>
2021-07-11 14:19:51 -05:00
Jakob Naucke
8310a3d70a virtcontainers: Don't fail memory hotplug
Architectures that do not support memory hotplugging will fail when
memory limits are set because that amount is hotplugged. Issue a warning
instead. The long-term solution is virtio-mem.

Fixes: #1412
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-06-24 10:58:06 +02:00
Julio Montes
ecdd137c6f runtime: do not hot-remove PMEM devices
PMEM devices cannot be hot-removed from a running VM.

fixes #2018

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-06-18 09:02:03 -05:00
bin
784025bb08 runtime: add more traces for network
Add traces for all the endpoinnt types
and the main interface functions.
Record errors for some traces.

Fixes: #1956

Signed-off-by: bin <bin@hyper.sh>
2021-06-07 11:38:40 +08:00
Chelsea Mafrica
8ca0207281 tracing: Add sandbox and container ID to trace spans
Add sandbox, container, and hypervisor IDs to trace spans. Note that
some spans in sandbox.go are created with a trace() call from api.go.
These spans have additional attributes set after span creation to
overwrite the api attributes.

Fixes #1878

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-06-02 21:53:54 -07:00
Tim Zhang
476ec9bd86 Merge pull request #1948 from liubin/fix/1947-fix-comments
runtime: fix some comments and logs
2021-06-02 10:52:01 +08:00
Peng Tao
41e04495f4 Merge pull request #1943 from bergwolf/cleanup2
cleanup TODOs in runtime
2021-06-01 14:16:46 +08:00
bin
b68334a1a8 runtime: fix some comments and logs
This commit fix some conments/logs.
And add some logs for debug.

Fixes: #1947

Signed-off-by: bin <bin@hyper.sh>
2021-06-01 09:04:18 +08:00
Peng Tao
2e29ef9cab runtime: remove TODO comment from StatusContainer
It is no longer valid as containerd already doesn't treat container pid
as host process pid.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-05-29 11:10:32 +08:00
Chelsea Mafrica
05a46fede0 tracing: Make runtime span attributes more consistent
Span attributes (tags) are not consistent in runtime tracing, so
designate and use core attributes such source, package, subsystem, and
type as span metadata for more understandable output.

Use WithAttributes() during span creation to reduce calls to
SetAttributes().

Modify Trace() in katautils to accept slice of attributes so multiple
functions using different attributes can use it.

Fixes #1852

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-05-27 10:07:11 -07:00
Peng Tao
35151f1786 runtime: sandbox delete should succeed after verifying sandbox state
Otherwise we might block delete and create orphan containers.

Fixes: #1039

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-05-13 14:05:49 -07:00
Hui Zhu
831224aa22 Sandbox: Fix ContainerConfig ptr in CreateContainer and createContainers
The pointer that send to newContainer in CreateContainer and
createContainers is not the pointer that point to the address in
s.config.Containers.

This commit fix this issue.

Fixes: #1758

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-04-27 16:44:22 +08:00
James O. D. Hunt
9256e590dc shutdown: Don't sever console watcher too early
Fixed logic used to handle static agent tracing.

For a standard (untraced) hypervisor shutdown, the runtime kills the VM
process once the workload has finished. But if static agent tracing is
enabled, the agent running inside the VM is responsible for the
shutdown. The existing code handled this scenario but did not wait for
the hypervisor process to end. The outcome of this being that the
console watcher thread was killed too early.

Although not a problem for an untraced system, if static agent tracing
was enabled, the logs from the hypervisor would be truncated, missing the
crucial final stages of the agents shutdown sequence.

The fix necessitated adding a new parameter to the `stopSandbox()` API,
which if true requests the runtime hypervisor logic simply to wait for
the hypervisor process to exit rather than killing it.

Fixes: #1696.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-04-15 15:22:00 +01:00
bin
421439c633 API: remove ProcessListContainer/ListProcesses
This commit will remove ProcessListContainer API from VCSandbox
and ListProcesses from agent.proto.

Fixes: #1668

Signed-off-by: bin <bin@hyper.sh>
2021-04-09 17:34:25 +08:00
bin
d75fe95685 virtcontainers: replace newStore by store in Sandbox struct
The property name make newcomers confused when reading code.
Since in Kata Containers 2.0 there will only be one type of store,
so it's safe to replace it by `store` simply.

Fixes: #1660

Signed-off-by: bin <bin@hyper.sh>
2021-04-08 23:59:16 +08:00
Chelsea Mafrica
f3ebbb1f1a runtime: Fix trace span ordering
Return ctx in trace() functions to correct span ordering.

Fixes #1550

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-03-25 11:43:04 -07:00
Peng Tao
74192d179d runtime: fix static check errors
It turns out we have managed to break the static checker in many
difference places with the absence of static checker in github action.
Let's fix them while enabling static checker in github actions...

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-03-24 20:10:19 +08:00
Chelsea Mafrica
6b0dc60dda runtime: Fix ordering of trace spans
A significant number of trace calls did not use a parent context that
would create proper span ordering in trace output. Add local context to
functions for use in trace calls to facilitate proper span ordering.
Additionally, change whether trace function returns context in some
functions in virtcontainers and use existing context rather than
background context in bindMount() so that span exists as a child of a
parent span.

Fixes #1355

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-03-16 17:39:28 -07:00
Eric Ernst
48ed8f3c4a runtime: add support for readonly sandbox bindmounts
If specified, sandbox_bind_mounts identifies host paths to be
mounted (ro) into the sandboxes shared path. This is only valid
if filesystem sharing is utilized.

The provided path(s) will be bindmounted (ro) into the shared fs directory on
the host, and thus mapped into the guest. If defaults are utilized,
these mounts should be available in the guest at
`/var/run/kata-containers/shared/containers/sandbox-mounts`

These will not be exposed to the container workloads, and are only
added for potential guest-services to consume (example: expose certs
into the guest that are available on the host).

Fixes: #1464

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2021-03-04 10:04:25 -08:00