Some use cases may just require passing extra arguments to virtiofsd,
and having this disabled by default makes it impossible to set when
using kata-deploy, as changes in the configuration file would be
overwritten by the daemon-set.
With this in mind, let's allow users to pass whatever thet need (and
here I'm specifically looking at `--xattr`) as a virtio_fs_extra_arg.
Fixes: #7853
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit b1dd09a4d3)
Signed-off-by: Greg Kurz <groug@kaod.org>
The `-o` option is the legacy way to configure virtiofsd, inherited
from the C implementation. The rust implementation honours it for
compatibility but it logs deprecation warnings.
Let's use the replacement options in the go shim code. Also drop
references to `-o` from the configuration TOML file.
Fixes#7111
Signed-off-by: Greg Kurz <groug@kaod.org>
After we have a guest kernel with builtin initramfs which
provide the rootfs measurement capability and Kata rootfs
image with hash device, we need set related root hash value
and measure config to the kernel params in kata configuration file.
Fixes: #6674
Signed-off-by: Wang, Arron <arron.wang@intel.com>
When this option is enabled the runtime will attempt to determine the
appropriate sandbox size (memory, CPU) before booting the virtual
machine.
As TEEs do not support memory and CPU hotplug, this approach must be
used.
Fixes: #6818
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The AmdSev firmware package should be used with
measured direct boot. If the expected hashes are not
injected into the firmware binary by the VMM, the
guest will not boot. This is required for security.
Currently the main branch does not have the extended
shim support for SEV, which tells the VMM to inject
the expected hashes.
We ship the standard OVMF package to use with SNP,
so let's switch SEV to that for now. This will need
to be changed back when shim support for SEV(-ES)
is added to main.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
We have been using the C version of virtiofsd on ppc64le. Now that the issue with
rust virtiofsd have been fixed, let's switch to it.
Fixes: #4259
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
Let's specifically name the `gpu` runtime class as `nvidia-gpu`. By
doing this we keep the door open and ease the life of the next vendor
adding GPU support for Kata Containers.
Fixes: #6553
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
SNP requires many specific configurations, so let's make
a new SNP configuration file that we can use with the
kata-qemu-snp runtime class.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Signed-off-by: Alex Carter <Alex.Carter@ibm.com>
Adding config file that can be used with qemu-sev runtime class.
Since SEV has limited hotplug support, increase
the pod overhead to account for fixed resource usage.
Fixes: #6572
Signed-off-by: Unmesh Deodhar <udeodhar@amd.com>
SEV requires special OVMF to work with kernel hashes.
Thus, adding changes that builds this custom OVMF for SEV.
Fixes: #6572
Signed-Off-By: Unmesh Deodhar <udeodhar@amd.com>
We need to set hotplug on pci root port and enable at least one
root port. Also set the guest-hooks-dir to the correct path
Fixes: #6675
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
As the QEMU configuration for TDX differs quite a lot from the normal
QEMU configuration, let's add a new configuration file for the QEMU TDX.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
$ make install
make: *** No rule to make target 'containerd-shim-kata-v2', needed by 'install-containerd-shim-v2'. Stop.
Spotted when building kata-runtime with a different name for
SHIMV2_OUTPUT. For instance, trying to keep different runtime binaries
installed at the same time, one from master and another from lets say,
the CCv0 branch, with the following small change applied.
diff --git a/src/runtime/Makefile b/src/runtime/Makefile
index 95efaff78..2bab9eb75 100644
--- a/src/runtime/Makefile
+++ b/src/runtime/Makefile
@@ -231,7 +231,7 @@ SED = sed
CLI_DIR = cmd
SHIMV2 = containerd-shim-kata-v2
-SHIMV2_OUTPUT = $(bCURDIR)/$(SHIMV2)
+SHIMV2_OUTPUT = $(CURDIR)/$(SHIMV2)-ccv0
SHIMV2_DIR = $(CLI_DIR)/$(SHIMV2)
MONITOR = kata-monitor
Fixes: #6398
Signed-off-by: Eduardo Lima (Etrunko) <etrunko@redhat.com>
Currently the symbolic link for virtiofsd which is used as
a valid path is not updated on every CI run. Fix it by
using the actual path of installation.
Fixes: #6311
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
For kata containers, rootfs is used in the read-only way.
EROFS can noticably decrease metadata overhead.
On the basis of supporting the EROFS file system, it supports using the config parameter to switch the file system used by rootfs.
Fixes: #6063
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
The .git-commit can be a multiple line file, potentially confusing
the Darwin linker for example.
Fixes: #6046
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Substitution in the yq install script doesn't like zsh, and additionally
the version of yq we're using doesn't have a darwin/arm64 build so grab
the amd64 version and let rosetta work its magic.
Additionally swap to abspath from readlink -m for the printing of what binaries
to install, as the -m flag doesn't exist on the BSD variant, and this
should be the same behavior.
Fixes: #5970
Signed-off-by: Danny Canter <danny@dcantah.dev>
Pass SELinux policy for containers to the agent if `disable_guest_selinux`
is set to `false` in the runtime configuration. The `container_t` type
is applied to the container process inside the guest by default.
Users can also set a custom SELinux policy to the container process using
`guest_selinux_label` in the runtime configuration. This will be an
alternative configuration of Kubernetes' security context for SELinux
because users cannot specify the policy in Kata through Kubernetes's security
context. To apply SELinux policy to the container, the guest rootfs must
be CentOS that is created and built with `SELINUX=yes`.
Fixes: #4812
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
The default vhost-user-fs queue-size of qemu is 128 now. Set it to 1024
by default which is same as clh. Also make this value configurable.
Fixes: #5694
Signed-off-by: liyuxuan.darfux <liyuxuan.darfux@bytedance.com>
Currently ACRN hypervisor support in Kata2.x releases is broken.
This commit re-enables ACRN hypervisor support and also refactors
the code so as to remove dependency on Sandbox.
Fixes#3027
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Adds initrd configuration option to the configuration.toml that is
generated for the setup using QEMU.
Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>
This configuration will allow users to choose between different
I/O backends for qemu, with the default being io_uring.
This will allow users to fallback to a different I/O mechanism while
running on kernels olders than 5.1.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Expose the newly added `default_maxmemory` to the project's Makefile and
to the configuration files.
Fixes: #4516
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Enable "-sandbox on" in qemu can introduce another protect layer
on the host, to make the secure container more secure.
The default option is disable because this feature may introduce some
performance cost, even though user can enable
/proc/sys/net/core/bpf_jit_enable to reduce the impact.
Fixes: #2266
Signed-off-by: Feng Wang <feng.wang@databricks.com>
So far this has been done for x86_64. Now that the support for building
and testing has been added for all arches, let's do the second part of
the switch.
We're still not done yet for powerpc, as some a virtifosd crash on the
rust version has been found by the maintainer.
Fixes: #4258, #4260
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Since #902 the `io.katacontainers.config.hypervisor` pod annotations
have only been permitted if explicitly allowed in the global
configuration. The default global configuration allows no such
annotations. That's important because several of those annotations
would cause Kata to execute arbitrary binaries, and so were wildly
unsafe.
However, this is inconvenient for the
`io.katacontainers.config.hypervisor.enable_iommu` annotation
specifically, which controls whether the sandbox VM includes a vIOMMU.
A guest side vIOMMU is necessary to implement VFIO passthrough devices
with `vfio_mode = vfio`, so enabling that mode of operation currently
requires a global configuration change, and can't just be enabled
per-pod.
Unlike some of the other hypervisor annotations, the `enable_iommu`
annotation is quite safe. By default the vIOMMU is not present, so
allowing a user to override it for a pod only improves their
facilities for isolation. Even if the global default were changed to
enable the vIOMMU, that doesn't compel the guest kernel to use it, so
allowing a user to disable the vIOMMU doesn't materially affect
isolation either.
Therefore, allow the io.katacontainers.config.hypervisor.enable_iommu
annotation to work in the default configurations.
fixes#4330
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
As now we build and ship the rust version of virtiofsd, which is not
tied to QEMU, we need to update its default location to match with where
we're installing this binary.
Fixes: #4249
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The go unit tests for the runtime are invoked by the helper script
ci/go-test.sh. Which calls the run_go_test() function in ci/lib.sh. Which
calls into .ci/go-test.sh from the tests repository.
But.. the runtime is the only user of this script, and generally stuff for
unit tests (rather than functional or integration tests) lives in the main
repository, not the tests repository.
So, just move the actual script into src/runtime. A change to remove it
from the tests repo will follow.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
src/runtime/virtcontainers/hook/mock contains a simple example hook in Go.
The only thing this is used for is for some tests in
src/runtime/pkg/katautils/hook_test.go. It doesn't really have anything
to do with the rest of the virtcontainers package.
So, move it next to the test code that uses it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Running unit tests should generally have minimal dependencies on
things outside the build tree. It *definitely* shouldn't modify
system wide things outside the build tree. Currently the runtime
"make test" target does so, though.
Several of the tests in src/runtime/pkg/katautils/hook_test.go require a
sample hook binary. They expect this hook in
/usr/bin/virtcontainers/bin/test/hook, so the makefile, as root, installs
the test binary to that location.
Go tests automatically run within the package's directory though, so
there's no need to use a system wide path. We can use a relative path to
the binary build within the tree just as easily.
fixes#3941
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This change introduces the `disable_guest_empty_dir` config option,
which allows the user to change whether a Kubernetes emptyDir volume is
created on the guest (the default, for performance reasons), or the host
(necessary if you want to pass data from the host to a guest via an
emptyDir).
Fixes#2053
Signed-off-by: Evan Foster <efoster@adobe.com>
Removed redundant and duplicated build options to build
`kata-monitor` the same way as the other components:
- `CGO_ENABLED=0` is not necessary.
- `-buildmode=exe` is not necessary since `BUILDFLAGS` already sets the
build mode.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
removes --tags selinux handling in the makefile (part of it introduced here: d78ffd6)
and makes selinux configurable via configuration.toml
Fixes: #3631
Signed-off-by: Tanweer Noor <tnoor@apple.com>
As kata with qemu has supported lazyload, so this pr aims to
bring lazyload ability to kata with clh.
Fixes#3654
Signed-off-by: luodaowen.backend <luodaowen.backend@bytedance.com>
German Maglione, one of the current virtio-fs developers, has brought to
our attention that using "announce-submounts" could help us to prevent
inode number collisions.
This feature was introduced a year ago or so by Hanna Reitz as part of
the 08dce386e77eb9ab044cb118e5391dc9ae11c5a8, and as we already mandate
QEMU >= 6.1.0, let's take advantage of that.
Fixes: #3507
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
When building with `ARCH=x86_64`, the previous `Makefile` will use it
without checking and cause:
Makefile:319: *** "ERROR: No hypervisors known for architecture x86_64 (looked for: acrn firecracker qemu cloud-hypervisor)". Stop.
This commit fix the above issue by checking `ARCH` no matter where it
is assigned.
Fixes: #3444
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
firmware can be split into FIRMWARE_VARS.fd (UEFI variables as
configuration) and FIRMWARE_CODE.fd (UEFI program image). UEFI
variables can be customized per each user while UEFI code is kept same.
fixes#3583
Signed-off-by: Julio Montes <julio.montes@intel.com>
There are software and hardware architectures which do not support
dynamically adjusting the CPU and memory resources associated with a
sandbox. For these, today, they rely on "default CPU" and "default
memory" configuration options for the runtime, either set by annotation
or by the configuration toml on disk.
In the case of a single container (launched by ctr, or something like
"docker run"), we could allow for sizing the VM correctly, since all of
the information is already available to us at creation time.
In the sandbox / pod container case, it is possible for the upper layer
container runtime (ie, containerd or crio) could send a specific
annotation indicating the total workload resource requirements
associated with the sandbox creation request.
In the case of sizing information not being provided, we will follow
same behavior as today: start the VM with (just) the default CPU/memory.
If this information is provided, we'll track this as Workload specific
resources, and track default sizing information as Base resources. We
will update the hypervisor configuration to utilize Base+Workload
resources, thus starting the VM with the appropriate amount of CPU and
memory.
In this scenario (we start the VM with the "right" amount of
CPU/Memory), we do not want to update the VM resources when containers
are added, or adjusted in size.
This functionality is introduced behind a configuration flag,
`static_sandbox_resource_mgmt`. This is defaulted to false for all
configurations except Firecracker, which is set to true.
This'll greatly improve UX for folks who are utilizing
Kata with a VMM or hardware architecture that doesn't support hotplug.
Note, users will still be unable to do in place vertical pod autoscaling
or other dynamic container/pod sizing with this enabled.
Fixes: #3264
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
When no options are passed to -ldflags, it passes
incorrect values(in this case, $BUILDFLAGS) to it.
Fix passing empty values by passing $KATA_LDFLAGS
in quotes.
Fixes: #3521
Signed-off-by: Amulya Meka <amulmek1@in.ibm.com>
`enable_swap` option was added long time ago to add
`-realtime mlock=off` to the QEMU's command line.
Kata now supports QEMU 6, `-realtime` option has been deprecated and
`mlock=on` is causing unexpected behaviors in kata.
This patch removes support for `enable_swap`, `-realtime` and `mlock=`
since they are causing bugs in kata.
Signed-off-by: Julio Montes <julio.montes@intel.com>