If kata containers is using vfio and vhost net,the unbinding
of vfio would be hang. In the scenario, vhost net kernel thread
takes a reference to the qemu's mm, and the reference also includes
the mmap regions on the vfio device file. so vhost kernel thread
would be not released when qemu is killed as the vhost file
descriptor still is opened by shim v2 process, and the vfio device
is not released because there's still a reference to the mmap.
Fixes: #1669
Signed-off-by: Yang, Wei <w90p710@gmail.com>
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
When using experimental feature "newstore", we save and load devices
information from `persist.json` instead of `devices.json`, in such case,
file `devices.json` isn't needed anymore, so remove it.
Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
Add configuration options to support the various Kata agent tracing
modes and types. See the comments in the built configuration files for
details:
- `cli/config/configuration-fc.toml`
- `cli/config/configuration-qemu.toml`
Fixes#1369.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Make `newAgentConfig()` return an explicit error rather than handling
the error scenario by simply returning the `error` object in the
`interface{}` return type. The old behaviour was confusing and
inconsistent with the other functions creating a new config type (shim,
proxy, etc).
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
This reverts commit 196661bc0d.
Reverting because cri-o with devicemapper started
to fail after this commit was merged.
Fixes: #1574.
Signed-off-by: Salvador Fuentes <salvador.fuentes@intel.com>
We can use the same data structure to describe both of them.
So that we can handle them similarly.
Fixes: #1566
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Set new persist storage driver "virtcontainers/persist/" as "experimental"
feature.
One day when this can fully work and we're ready to move to 2.0, we'll move
it from "experimental" feature to formal feature.
At that time, the "virtcontainers/filesystem_resource_storage.go" can be removed
completely.
Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
Address some comments:
* fix persist driver func names for better understanding
* modify some logic, add some returned error etc
Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
we need to notify guest kernel about memory hot-added event via probe interface.
hot-added memory deivce should be sliced into the size of memory section.
Fixes: #1149
Signed-off-by: Penny Zheng <penny.zheng@arm.com>
If kata-runtime supports memory hotplug via probe interface, we need to reconstruct
memoryDevice to store relevant status, which are addr and probe. addr specifies the
physical address of the memory device, and probe determines it is hotplugged via
acpi-driven or probe interface.
Fixes: #1149
Signed-off-by: Penny Zheng <penny.zheng@arm.com>
In order to support memory hotplug via probe interface in kata-runtime,
firstly, we need to verify whether guest kernel is capable of that.
Fixes: #1149
Signed-off-by: Penny Zheng <penny.zheng@arm.com>
container's rootfs is a string type, which cannot represent a
block storage backed rootfs which hasn't been mounted.
Change it to a mount alike struct as below:
RootFs struct {
// Source specify the BlockDevice path
Source string
// Target specify where the rootfs is mounted if it has been mounted
Target string
// Type specifies the type of filesystem to mount.
Type string
// Options specifies zero or more fstab style mount options.
Options []string
// Mounted specifies whether the rootfs has be mounted or not
Mounted bool
}
If the container's rootfs has been mounted as before, then this struct can be
initialized as: RootFs{Target: <rootfs>, Mounted: true} to be compatible with
previous case.
Fixes:#1158
Signed-off-by: lifupan <lifupan@gmail.com>
gometalinter is deprecated and will be archived April '19. The
suggestion is to switch to golangci-lint which is apparently 5x faster
than gometalinter.
Partially Fixes: #1377
Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
Make cpu and memory calculation in a different function
this help to reduce the function complexity and easy unit test.
Fixes: #1296
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Fixes#1226
Add new flag "experimental" for supporting underworking features.
Some features are under developing which are not ready for release,
there're also some features which will break compatibility which is not
suitable to be merged into a kata minor release(x version in x.y.z)
For getting these features above merged earlier for more testing, we can
mark them as "experimental" features, and move them to formal features
when they are ready.
Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
The store refactor (#1066) inadvertently broke runtime tracing as it
created new contexts containing trace spans.
Reworking the store changes to re-use the existing context resolves the
problem since runtime tracing assumes a single context.
Fixes#1277.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
We were grabbing a running total of quota and period for each container
and then calculating the number of resulting vCPUs. Summing period
doesn't make sense. To simplify, let's just calculate mCPU per
container, keep a running total of mCPUs requested, and then translate
to sandbox vCPUs after.
Fixes: #1292
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
All containers run in different cgroups even the sandbox, with this new
implementation the sandbox cpu cgroup wil be equal to the sum of all its
containers and the hypervisor process will be placed there impacting to the
containers running in the sandbox (VM). The default number of vcpus is
used when the sandbox has no constraints. For example, if default_vcpus
is 2, then quota will be 200000 and period 100000.
**c-ray test**
http://www.futuretech.blinkenlights.nl/c-ray.html
```
+=============================================+
| | 6 threads 6cpus | 1 thread 1 cpu |
+=============================================+
| current | 40 seconds | 122 seconds |
+==============================================
| new | 37 seconds | 124 seconds |
+==============================================
```
current = current cgroups implementation
new = new cgroups implementation
**workload**
```yaml
apiVersion: v1
kind: Pod
metadata:
name: c-ray
annotations:
io.kubernetes.cri.untrusted-workload: "true"
spec:
restartPolicy: Never
containers:
- name: c-ray-1
image: docker.io/devimc/c-ray:latest
imagePullPolicy: IfNotPresent
args: ["-t", "6", "-s", "1600x1200", "-r", "8", "-i",
"/c-ray-1.1/sphfract", "-o", "/tmp/output.ppm"]
resources:
limits:
cpu: 6
- name: c-ray-2
image: docker.io/devimc/c-ray:latest
imagePullPolicy: IfNotPresent
args: ["-t", "1", "-s", "1600x1200", "-r", "8", "-i",
"/c-ray-1.1/sphfract", "-o", "/tmp/output.ppm"]
resources:
limits:
cpu: 1
```
fixes#1153
Signed-off-by: Julio Montes <julio.montes@intel.com>
We convert the whole virtcontainers code to use the store package
instead of the resource_storage one. The resource_storage removal will
happen in a separate change for a more logical split.
This change is fairly big but mostly does not change the code logic.
What really changes is when we create a store for a container or a
sandbox. We now need to explictly do so instead of just assigning a
filesystem{} instance. Other than that, the logic is kept intact.
Fixes: #1099
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to fix#1059, we want to create a hypervisor package. Some of
the hypervisor implementations (qemu) depend on the network and endpoint
interfaces. We can not have a virtcontainers -> hypervisor -> network,
endpoint -> virtcontainers cyclic dependency.
So before creating the hypervisor package, we need to decouple the
network API from the virtcontainers one.
Fixes: #1180
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
There's only one real implementer of the network interface and no real
need to implement anything else. We can just go ahead and remove this
abstraction.
Fixes: #1179
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The agent code creates a directory at
`/run/kata-containers/shared/sandboxes/sbid/` to hold shared data
between host and guest. We need to clean it up when removing a sandbox.
Fixes: #1138
Signed-off-by: Peng Tao <bergwolf@gmail.com>
In order to move the hypervisor implementations into their own package,
we need to put the asset type into the types package and break the
hypervisor->asset->virtcontainers->hypervisor cyclic dependency.
Fixes: #1119
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We always call waitSandbox after we start the VM (startSandbox), so
let's simplify the hypervisor interface and integrate waiting for the VM
into startSandbox.
This makes startSandbox a blocking call, but that is practically the
case today.
Fixes: #1009
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We always combine the hypervisor init and createSandbox, because what
we're trying to do is simply that: Set the hypervisor and have it create
a sandbox.
Instead of keeping a method with vague semantics, remove init and
integrate the actual hypervisor setup phase into the createSandbox one.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We can now remove all the sandbox shared types and convert the rest of
the code to using the new internal types package.
This commit includes virtcontainers, cli and containerd-shim changes in
one atomic change in order to not break bisect'ibility.
Fixes: #1095
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Since we're going to have both external and internal types packages, we
alias the external one as vcTypes. And the internal one will be usable
through the types namespace.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Pass Seccomp profile to the agent only if
the configuration.toml allows it to be passed
and the agent/image is seccomp capable.
Fixes: #688
Signed-off-by: Nitesh Konkar niteshkonkar@in.ibm.com
cri containerd calls kill on stopped sandbox and if we
fail the call, it can cause `cri stopp` command to fail
too.
Fixes: #1084
Signed-off-by: Peng Tao <bergwolf@gmail.com>
Now that stopVM() also calls agent.stopSandbox(), we can have the
sandbox Stop() call using stopVM() directly and avoid code duplication.
Fixes: #1011
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>