Let's bring in the latest release of Containerd, 1.6.1, released on
March 2nd, 2022.
With this, we take the opportunity to remove containerd/api reference as
we shouldn't need a separate module only for the API.
Here's the list of changes needed in the code due to the bump:
* stop using `grpc.WithInsecure()` as it's been deprecated
- use `grpc.WithTransportCredentials(insecure.NewCredentials())`
instead
Fixes: #3820
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As this is just a initial vcpu hotplug support, thread and socket has
not been supported. So, don't set socket and thread when hotadd cpu for
arm/virt.
Fixes: #3280
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Mount the direct-assigned block device fs only once and keep a refcount
in the guest. Also use the ro flag inside the options field to determine
whether the block device and filesystem should be mounted as ro
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
Translate the volume path from host-known path to guest-known path
and forward the request to kata agent.
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
During the container creation, it will parse the mount info file
of the direct assigned volumes and update the in memory mount object.
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
Add GetVolumeStats and ResizeVolume APIs for the runtime to query stat
and resize fs in the guest.
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
If, for some reason, we're able to launch cloud hypervisor but not able
to boot the VM up, the virtiofsd process would be left behind.
Let's ensure, via defer, that we stop virtiofsd in case of errors.
Fixes: #3819
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
kata-containers/pulls#3771 added TDX support for Cloud Hypervisor, but
two big things got overlooked while doing that.
1. virtio-fs, as of now, cannot be part of the trust boundary, so the
Confidential Guest will not be using it.
2. virtio-block hotplug should be enabled in order to use virtio-block
for the rootfs (used with the devmapper plugin).
When trying to use cloud-hypervisor with TDX using virtio-fs, we're
facing the following error on the guest kernel:
```
virtiofs virtio2: device must provide VIRTIO_F_ACCESS_PLATFORM
```
After checking and double-checking with virtiofs and cloud-hypervisor
developers, it happens as confidential containers might put some
limitations on the device, so it can't access all of the guests' memory
and that's where this restriction seems to be coming from. Vivek
mentioned that virtiofsd do not support VIRTIO_F_ACCESS_PLATFORM (aka
VIRTIO_F_IOMMU_PLATFORM) yet, and that for ecrypted guests virtiofs may
not be the best solution at the moment.
@sboeuf put this in a very nice way: "if the virtio-fs driver doesn't
support VIRTIO_F_ACCESS_PLATFORM, then the pages corresponding to the
virtqueues and the buffers won't be marked as SHARED, meaning the VMM
won't have access to it".
Interestingly enough, it works with QEMU, and it may be due to some
change done on the patched QEMU that @devimc is packaging, but we won't
take the path to figure out what was the change and patch
cloud-hypervisor on the same way, because of 1.
Fixes: #3810
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As mounting volumes into the guest requires SharedFS setup, let's ensure
we error out if trying to do so in a situation where SharedFS is not
supported.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
supportsSharedFS() is a new method to be used to ensure that no SharedFS
specifics are called when, for a reason or another, Cloud Hypervisor is
in a mode where SharedFSs are not supported.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similarly to the `createVirtiofsDaemon` and `stopVirtiofsDaemon` methos,
let's introduce and use loadVirtiofsDaemon, at it'll also be handy later
in this series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similary to the `createVirtiofsDaemon` method, let's introduce and use
its counterpart, as it'll also be handy later in this series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similarly to what's been done with the `createVirtiofsDaemon`, let's
create a `setupVirtiofsDaemon` one.
It will also become handy later in this series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's introduce and use a new `createVirtiofsDaemon` method. Its name
says it all, and it'll be handy later in this series when, spoiler
alert, SharedFS cannot be used (in such cases as in Confidential
Guests).
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This reverts commit df8ffecde0, as device
hotplug *is* supported and, more than that, is very much needed when
using virtio-blk instead of virtio-fs.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
- Mostly blank lines after `+build` -- see
https://pkg.go.dev/go/build@go1.14.15 -- this is, to date, enforced by
`gofmt`.
- 1.17-style go:build directives are also added.
- Spaces in govmm/vmm_s390x.go
Fixes: #3769
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
Add a stub for utils_darwin to facilitate building this package on
Darwin. We can probably drop this empty stub if we have better
abstraction for the various parts of virtcontainers that call it
today...
Fixes:# 3777
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
We need to convert them to uint64 as their types may differ on various
host OSes, but unix.Major|Minor takes a uint64 regardless.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Nydusd uses a bufio.Scanner to check if nydusd process has
existed, but stderr/stdout passed to Cmd is self-created pipe,
this pipe will not be closed if the process start failing.
Use standard Cmd.StdoutPipe can close the stdout and kata shim
will detect the existence of the nydusd process, then call cmd.Wait to
reap the process' resources.
Fixes: #3783
Signed-off-by: bin <bin@hyper.sh>
A copy and paste mistake was made and the error on HotplugRemoveDevice()
should be about removal and not about addition.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
removes --tags selinux handling in the makefile (part of it introduced here: d78ffd6)
and makes selinux configurable via configuration.toml
Fixes: #3631
Signed-off-by: Tanweer Noor <tnoor@apple.com>
Switching to the generic FilesystemSharer brings 2 majors improvements:
1. Remove container and sandbox specific code from kata_agent.go
2. Allow for non Linux implementations to provide ways to share
container files and root filesystems with the Kata Linux guest.
Fixes#3622
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
With the Linux implementation of the FilesystemSharer interface, we can
now remove all host filesystem sharing code from kata_agent and keep it
where it belongs: sandbox.go.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
This gathers the current kata agent and container filesystem sharing
code into a FilesystemSharer implementation.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Filesystem sharing here means the ability to share some parts of the
host filesystem with the guest. It's mostly about sharing files and
container bundle root filesystems.
In order to allow for different file and rootfs sharing implementations,
we define a FilesystemSharer interface.
This interface provides a preparation step, where concrete
implementations will be able to e.g. prepare the host filesysstem.
Then it provides 2 methods, one for sharing any file (regular file or a
directory) and another one for sharing a container root filesystem
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Let's enable TDX support for Cloud Hypervisor, using td-shim as its
desired firmware.
Fixes: #3632
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
NVDIMM is also not supported with Confidential Guests and Virtio Block
devices should be used instead.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similarly to VCPUs and Device hotplug, Confidential Guests also do not
support Memory hotplug.
Let's make it clear in the documentation and guard the code on both QEMU
and Cloud Hypervisor side to ensure we don't advertise Memory hotplug as
being supported when running Confidential Guests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similarly to VCPUs hotplug, Confidential Guests also do not support
Device hotplug.
Let's make it clear in the documentation and guard the code on both QEMU
and Cloud Hypervisor side to ensure we don't advertise Device hotplug as
being supported when running Confidential Guests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As confidential guests do not support VCPUs hotplug, let's set the
"DefaultMaxVCPUs" value to "NumVCPUs".
The reason to do this is to ensure that guests will be started with the
correct amount of VCPUs, without giving to the guest with all the
possible VCPUs the host could provide.
One clear side effect of this limitation is that workloads that would
require more VCPUs on their yaml definition will not run on this
scenario.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
ConfidentialGuest is an option already present and exposed for QEMU,
which is used for using Kata Containers together with different sorts of
Guest Protections, such as TDX and SEV for x86_64, PEF for ppc64le, and
SE for s390x.
Right now we error out in case confidential_guest is enabled, as we will
be implementing the needed blocks for this as part of this series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is a small code refactor removing a deadcode based the checks
already done in the generic hypervisor abstraction.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The hypervisor code already defines 3 common kernel root params for the
following cases:
* NVDIMM
* NVDIMM without DAX support
* Virtio Block
As parameters used for cloud-hypervisor have an overlap with the ones
provided by the NVDIMM case, let's take advantage of that.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's bump the Cloud Hypervisor version to 5343e09e7b8db, as that brings
a few fixes we're interested in, such as:
* hypervisor, vmm: Handle TDX hypercalls with INVALID_OPERAND
- https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3723
- This is needed for the TDX support on the cloud hypervisor driver,
which is part of this very same series.
* openapi: Update the PciBdf types
- https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3748
- This is needed due to a change in a DeviceNode field, which would
cause a marshalling / demarshalling error when running with a
version of cloud-hypervisor that includes the TDX fixes mentioned
above.
* scripts: dev_cli: Don't quote $features_build
* scripts: dev_cli: Add --features option
- https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3773
- This is needed due to changes in the scripts used to build Cloud
Hypervisor, which are used as part of Kata Containers CIs and
github actions.
Due to this change, we're also adapting the build scripts as part
of this very same commit.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
in TestHandleHugepages. On s390x, hugepage sizes must be set at boot, so
test with any that are present (default is 1M).
Depends-on: github.com/kata-containers/kata-containers#3770
Fixes: #3763
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
To resourcecontrol, and make it consistent with the fact that cgroups
are a Linux implementation of the ResourceController interface.
Fixes: #3601
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>