Fixes: #7573
To enable this feature, build your rootfs using AGENT_POLICY=yes. The
default is AGENT_POLICY=no.
Building rootfs using AGENT_POLICY=yes has the following effects:
1. The kata-opa service gets included in the Guest image.
2. The agent gets built using AGENT_POLICY=yes.
After this patch, the shim calls SetPolicy if and only if a Policy
annotation is attached to the sandbox/pod. When creating a sandbox/pod
that doesn't have an attached Policy annotation:
1. If the agent was built using AGENT_POLICY=yes, the new sandbox uses
the default agent settings, that might include a default Policy too.
2. If the agent was built using AGENT_POLICY=no, the new sandbox is
executed the same way as before this patch.
Any SetPolicy calls from the shim to the agent fail if the agent was
built using AGENT_POLICY=no.
If the agent was built using AGENT_POLICY=yes:
1. The agent reads the contents of a default policy file during sandbox
start-up.
2. The agent then connects to the OPA service on localhost and sends
the default policy to OPA.
3. If the shim calls SetPolicy:
a. The agent checks if SetPolicy is allowed by the current
policy (the current policy is typically the default policy
mentioned above).
b. If SetPolicy is allowed, the agent deletes the current policy
from OPA and replaces it with the new policy it received from
the shim.
A typical new policy from the shim doesn't allow any future SetPolicy
calls.
4. For every agent rpc API call, the agent asks OPA if that call
should be allowed. OPA allows or not a call based on the current
policy, the name of the agent API, and the API call's inputs. The
agent rejects any calls that are rejected by OPA.
When building using AGENT_POLICY_DEBUG=yes, additional Policy logging
gets enabled in the agent. In particular, information about the inputs
for agent rpc API calls is logged in /tmp/policy.txt, on the Guest VM.
These inputs can be useful for investigating API calls that might have
been rejected by the Policy. Examples:
1. Load a failing policy file test1.rego on a different machine:
opa run --server --addr 127.0.0.1:8181 test1.rego
2. Collect the API inputs from Guest's /tmp/policy.txt and test on the
machine where the failing policy has been loaded:
curl -X POST http://localhost:8181/v1/data/agent_policy/CreateContainerRequest \
--data-binary @test1-inputs.json
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Now that we have propper AP device support add a
unit test for testing the correct Attach/Detach of AP devices.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Removing HotplugVFIOonRootBus which is obsolete with the latest PCI
topology changes, users can set cold_plug_vfio or hot_plug_vfio either
in the configuration.toml or via annotations.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Currently, even when using devmapper, if the VMM supports virtio-fs /
virtio-9p, that's used to share a few files between the host and the
guest.
This *needed*, as we need to share with the guest contents like secrets,
certificates, and configurations, via Kubernetes objects like configMaps
or secrets, and those are rotated and must be updated into the guest
whenever the rotation happens.
However, there are still use-cases users can live with just copying
those files into the guest at the pod creation time, and for those
there's absolutely no need to have a shared filesystem process running
with no extra obvious benefit, consuming memory and even increasing the
attack surface used by Kata Containers.
For the case mentioned above, we should allow users, making it very
clear which limitations it'll bring, to run Kata Containers with
devmapper without actually having to use a shared file system, which is
already the approach taken when using Firecracker as the VMM.
Fixes: #7207
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Now it is possible to configure the PCIe topology via annotations
and addded a simple test, checking for Invalid and RootPort
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Removed the configuration of PCIeRootPort and PCIeSwitchPort, those
values can be deduced in createPCIeTopology
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
The hypervisor_state file was the wrong location for the PCIe Port
settings, moved everything under device umbrella, where it can be
consumed more easily and we do not get into circular deps.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
For now, image nvdimm on qemu/arm64 depends on UEFI/ACPI, so if there
is no firmware offered, it should be disabled.
Fixes: #6468
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
This change enables to run cloud-hypervisor VMM using a non-root user
when rootless flag is set true in the configuration
Fixes: #2567
Signed-off-by: Feng Wang <fwang@confluent.io>
For kata containers, rootfs is used in the read-only way.
EROFS can noticably decrease metadata overhead.
On the basis of supporting the EROFS file system, it supports using the config parameter to switch the file system used by rootfs.
Fixes: #6063
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
Change cache mod from literal to const and place them in one place.
Also set default cache mode from `none` to `never` in
`pkg/katautils/config-settings.go.in`.
Fixes: #6151
Signed-off-by: Bin Liu <bin@hyper.sh>
Moby relies on the prestart hooks to configure network endpoints. We
should rescan the netns after running them so that the newly added
endpoints can be found and plugged to the guest.
Fixes: #5941
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Pass SELinux policy for containers to the agent if `disable_guest_selinux`
is set to `false` in the runtime configuration. The `container_t` type
is applied to the container process inside the guest by default.
Users can also set a custom SELinux policy to the container process using
`guest_selinux_label` in the runtime configuration. This will be an
alternative configuration of Kubernetes' security context for SELinux
because users cannot specify the policy in Kata through Kubernetes's security
context. To apply SELinux policy to the container, the guest rootfs must
be CentOS that is created and built with `SELINUX=yes`.
Fixes: #4812
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
The default vhost-user-fs queue-size of qemu is 128 now. Set it to 1024
by default which is same as clh. Also make this value configurable.
Fixes: #5694
Signed-off-by: liyuxuan.darfux <liyuxuan.darfux@bytedance.com>
It seems that bumping the version of golang and golangci-lint new format
changes are required.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The package has been deprecated as part of 1.16 and the same
functionality is now provided by either the io or the os package.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is based on a patch from @niteeshkd that adds a config
parameter to choose between AMD SEV and SEV-SNP VMs as the
confidential guest type in case both types are supported. SEV is
the default.
Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>
This configuration will allow users to choose between different
I/O backends for qemu, with the default being io_uring.
This will allow users to fallback to a different I/O mechanism while
running on kernels olders than 5.1.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Enable Kata runtime to handle `disable_selinux` flag properly in order
to be able to change the status by the runtime configuration whether the
runtime applies the SELinux label to VMM process.
Fixes: #4599
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
The tests ensure that interactions between drop-ins and the base
configuration.toml and among drop-ins themselves work as intended,
basically that files are evaluated in the correct order (base file
first, then drop-ins in alphabetical order) and the last one to set
a specific key wins.
Signed-off-by: Pavel Mores <pmores@redhat.com>
updateFromDropIn() uses the infrastructure built by previous commits to
ensure no contents of 'tomlConfig' are lost during decoding. To do
this, we preserve the current contents of our tomlConfig in a clone and
decode a drop-in into the original. At this point, the original
instance is updated but its Agent and/or Hypervisor fields are
potentially damaged.
To merge, we update the clone's Agent/Hypervisor from the original
instance. Now the clone has the desired Agent/Hypervisor and the
original instance has the rest, so to finish, we just need to move the
clone's Agent/Hypervisor to the original.
Signed-off-by: Pavel Mores <pmores@redhat.com>
These functions take a TOML key - an array of individual components,
e.g. ["agent" "kata" "enable_tracing"], as returned by BurntSushi - and
two 'tomlConfig' instances. They copy the value of the struct field
identified by the key from the source instance to the target one if
necessary.
This is only done if the TOML key points to structures stored in
maps by 'tomlConfig', i.e. 'hypervisor' and 'agent'. Nothing needs to
be done in other cases.
Signed-off-by: Pavel Mores <pmores@redhat.com>
For 'tomlConfig' substructures stored in Golang maps - 'hypervisor' and
'agent' - BurntSushi doesn't preserve their previous contents as it does
for substructures stored directly (e.g. 'runtime'). We use reflection
to work around this.
This commit adds three primitive operations to work with struct fields
identified by their `toml:"..."` tags - one to get a field value, one to
set a field value and one to assign a source struct field value to the
corresponding field of a target.
Signed-off-by: Pavel Mores <pmores@redhat.com>
Let's add a `default_maxmemory` configuration, which allows the admins
to set the maximum amount of memory to be used by a VM, considering the
initial amount + whatever ends up being hotplugged via the pod limits.
By default this value is 0 (zero), and it means that the whole physical
RAM is the limit.
Fixes: #4516
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Policy for whats valid/invalid within the config varies by VMM, host,
and by silicon architecture. Let's keep katautils simple for just
translating a toml to the hypervisor config structure, and leave
validation to virtcontainers.
Without this change, we're doing duplicate validation.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>