VMCache is a new function that creates VMs as caches before using it.
It helps speed up new container creation.
The function consists of a server and some clients communicating
through Unix socket. The protocol is gRPC in protocols/cache/cache.proto.
The VMCache server will create some VMs and cache them by factory cache.
It will convert the VM to gRPC format and transport it when gets
requestion from clients.
Factory grpccache is the VMCache client. It will request gRPC format
VM and convert it back to a VM. If VMCache function is enabled,
kata-runtime will request VM from factory grpccache when it creates
a new sandbox.
VMCache has two options.
vm_cache_number specifies the number of caches of VMCache:
unspecified or == 0 --> VMCache is disabled
> 0 --> will be set to the specified number
vm_cache_endpoint specifies the address of the Unix socket.
This commit just includes the core and the client of VMCache.
Currently, VM cache still cannot work with VM templating and vsock.
And just support qemu.
Fixes: #52
Signed-off-by: Hui Zhu <teawater@hyper.sh>
This patch fixes the issue where various version of snapshotters,
overlay, block based graphdriver, containerd-shim-v2 overlay, block
based snapshotters mount & create rootfs differently and kata should be
able to handle them all.
The current version of the code always assumes that a folder named
'rootfs' exists within the mount device and that is the path the
container should start at. This patch checks the existing mount point
and if it is the same as the rootFs passed to the container, we no
longer add a suffix to the container's rootfs path.
Fixes: #1325
Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
Co-Authored-by: Manohar Castelino <manohar.r.castelino@intel.com>
If enter to vircontainers directory and do make check-go-test, the makefile
does not found the kata .ci directory use relative path to makefile.
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Simplify empty string proxy type handling and cast invalid proxy type to
ProxyType.
Fixes: #1312
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
kata builtin proxy has always watched the qemu's console
whether proxy's debug is set or not, this is not aligned
with kata cli. This patch will change it and watch the
qemu's console only when proxy's debug is set in kata config.
Fixes: #1318
Signed-off-by: fupan <lifupan@gmail.com>
The store refactor (#1066) inadvertently broke runtime tracing as it
created new contexts containing trace spans.
Reworking the store changes to re-use the existing context resolves the
problem since runtime tracing assumes a single context.
Fixes#1277.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
reason: When excutes ExecuteNetCCWDeviceAdd, the DevID is always "virtio-".
If add-iface multy times, qemu may report "dumplicated id:virtio-".
Fixes: #1305
Signed-off-by: xueshaojia <xueshaojia@huawei.com>
We were grabbing a running total of quota and period for each container
and then calculating the number of resulting vCPUs. Summing period
doesn't make sense. To simplify, let's just calculate mCPU per
container, keep a running total of mCPUs requested, and then translate
to sandbox vCPUs after.
Fixes: #1292
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
We have 7 types of endpoints, but forget ipvlan in unmarshal funciton.
So add it and refactor for cyclomatic complexity reason.
Fixes#1254
Signed-off-by: Ruidong Cao <caoruidong@huawei.com>
sandbox cgroup use V1NoConstraints, this only create memory subsystem,
but when delete, load parent cgroup always use `cgroups.V1`, so other
subsystem path can not be find, sandbox cgroup can not be deleted.
Fixes: #1263
Signed-off-by: Ace-Tang <aceapril@126.com>
Commit affd6e3216 ("devices: add reference
count for devices.") introduced an attach count for devices. The
vhost-user-blk device increments the counter instead of decrementing it
when detaching.
Fixes: #1259
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Sometimes qemu/qmp commands error out and VM files
get left behind on the host filesystem. Clen them up
irrespective of `stopSandbox` succeeds or fails.
Fixes: #1246
Signed-off-by: Nitesh Konkar niteshkonkar@in.ibm.com
container is killed by force, container's state MUST change its state to stop
immediately to avoid leaving it in a bad state.
fixes#1088
Signed-off-by: Julio Montes <julio.montes@intel.com>
All containers run in different cgroups even the sandbox, with this new
implementation the sandbox cpu cgroup wil be equal to the sum of all its
containers and the hypervisor process will be placed there impacting to the
containers running in the sandbox (VM). The default number of vcpus is
used when the sandbox has no constraints. For example, if default_vcpus
is 2, then quota will be 200000 and period 100000.
**c-ray test**
http://www.futuretech.blinkenlights.nl/c-ray.html
```
+=============================================+
| | 6 threads 6cpus | 1 thread 1 cpu |
+=============================================+
| current | 40 seconds | 122 seconds |
+==============================================
| new | 37 seconds | 124 seconds |
+==============================================
```
current = current cgroups implementation
new = new cgroups implementation
**workload**
```yaml
apiVersion: v1
kind: Pod
metadata:
name: c-ray
annotations:
io.kubernetes.cri.untrusted-workload: "true"
spec:
restartPolicy: Never
containers:
- name: c-ray-1
image: docker.io/devimc/c-ray:latest
imagePullPolicy: IfNotPresent
args: ["-t", "6", "-s", "1600x1200", "-r", "8", "-i",
"/c-ray-1.1/sphfract", "-o", "/tmp/output.ppm"]
resources:
limits:
cpu: 6
- name: c-ray-2
image: docker.io/devimc/c-ray:latest
imagePullPolicy: IfNotPresent
args: ["-t", "1", "-s", "1600x1200", "-r", "8", "-i",
"/c-ray-1.1/sphfract", "-o", "/tmp/output.ppm"]
resources:
limits:
cpu: 1
```
fixes#1153
Signed-off-by: Julio Montes <julio.montes@intel.com>
cpu cgroups are container's specific hence all containers even the sandbox
should be able o create, delete and update their cgroups. The cgroup crated
matches with the cgroup path passed by the containers manager.
fixes#1117fixes#1118fixes#1021
Signed-off-by: Julio Montes <julio.montes@intel.com>
since all generic* could bring unused linter warnings, which lead to
CI crash, we add nolint comment to avoid them.
Fixes: #1200
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Penny Zheng <penny.zheng@arm.com>
since generic func genericAppendBridges and genericBridges
is also applied for machine type QemuVirt, we use it as implementation
for appendBridges and bridges on aarch64.
since const defaultPCBridgeBus is used in generic func
genericAppendBridges for pc machine, we should define it once
in generic file, instead of redefining it in different
arch-specific files.
Fixes: #1200
Signed-off-by: Penny Zheng <penny.zheng@arm.com>
original tests for func RunningOnVMM are sort of amd64-specific,
since all other archs don't support nested VMM for now.
Fixes: #1200
Signed-off-by: Penny Zheng <penny.zheng@arm.com>
refine a set of test functions under qemu_arm64_test.go. e.g. test
func for memoryTopology shouldn't be the same one on amd64, since
for now, we don't support nvdimm on arm64.
Fixes: #1200
Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Adding debug messages which state which files
are being created/deleted could be helpful in
analyzing situations like leaky pod issues.
Fixes: #1234
Signed-off-by: Nitesh Konkar niteshkonkar@in.ibm.com
We are creating Store directories but never removing them.
Calling into a VM factory created vm Stop() will now clean the VM Store
artifacts up.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
With the Stores conversion, the newContainer() cyclomatic complexity
went over 15. We fix that by extracting the block devices creation
routine out of newContainer.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Now that we converted the virtcontainers code to the store package, we
can remove all the resource storage old code.
Fixes: #1099
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We convert the whole virtcontainers code to use the store package
instead of the resource_storage one. The resource_storage removal will
happen in a separate change for a more logical split.
This change is fairly big but mostly does not change the code logic.
What really changes is when we create a store for a container or a
sandbox. We now need to explictly do so instead of just assigning a
filesystem{} instance. Other than that, the logic is kept intact.
Fixes: #1099
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The ItemLock API allows for taking shared and exclusive locks on all
items.
For virtcontainers, this is specialized into taking locks on the Lock
item, and will be used for sandbox locking.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The Raw API creates a raw item, i.e. an item that must be handled
directly by the caller. A raw item is one that's not defined by the
store.Item enum, i.e. it is a custom, caller defined one.
The caller gets a URL back and is responsible for handling the item
directly from this URL.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This is basically a Store dispatcher, for storing items into their right
Store (either configuration or state).
There's very little logic here, except for finding out which store an
item belongs to in the virtcontainers context.
vc.go also provides virtcontainers specific utilities.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>