When using experimental feature "newstore", we save and load devices
information from `persist.json` instead of `devices.json`, in such case,
file `devices.json` isn't needed anymore, so remove it.
Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
This change updates the isSystemMount check for mountSharedDirMounts
when setting up shared directory mounts for the container and uses
the source of the mount instead of the destination for the check.
We want to exclude system mounts from the host side as they
shouldn't be mounted into the container.
We do however want to allow system mounts within the
container as denying them can prevent some containers from
running properly.
Fixes#1591
Signed-off-by: Alex Price <aprice@atlassian.com>
This reverts commit 196661bc0d.
Reverting because cri-o with devicemapper started
to fail after this commit was merged.
Fixes: #1574.
Signed-off-by: Salvador Fuentes <salvador.fuentes@intel.com>
We can use the same data structure to describe both of them.
So that we can handle them similarly.
Fixes: #1566
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Set new persist storage driver "virtcontainers/persist/" as "experimental"
feature.
One day when this can fully work and we're ready to move to 2.0, we'll move
it from "experimental" feature to formal feature.
At that time, the "virtcontainers/filesystem_resource_storage.go" can be removed
completely.
Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
Address some comments:
* fix persist driver func names for better understanding
* modify some logic, add some returned error etc
Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
In privileged mode, all host devices are supposed to be passed
to the container in config.json. Skip floppy drives.
Fixes#1551
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
prepend a kata specific string to oci cgroup path to
form a different cgroup path, thus cAdvisor couldn't
find kata containers cgroup path on host to prevent it
from grabbing the stats data.
Fixes:#1488
Signed-off-by: lifupan <lifupan@gmail.com>
container's rootfs is a string type, which cannot represent a
block storage backed rootfs which hasn't been mounted.
Change it to a mount alike struct as below:
RootFs struct {
// Source specify the BlockDevice path
Source string
// Target specify where the rootfs is mounted if it has been mounted
Target string
// Type specifies the type of filesystem to mount.
Type string
// Options specifies zero or more fstab style mount options.
Options []string
// Mounted specifies whether the rootfs has be mounted or not
Mounted bool
}
If the container's rootfs has been mounted as before, then this struct can be
initialized as: RootFs{Target: <rootfs>, Mounted: true} to be compatible with
previous case.
Fixes:#1158
Signed-off-by: lifupan <lifupan@gmail.com>
Fixes: #1415
Container resources have been saved to ContainerConfig so there's no
need to save it again in state.json.
Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
k8s host empty-dir is equivalent to docker volumes.
For this case, we should just use the host directory even
for system directories.
Move the isEphemeral function to virtcontainers to not
introduce cyclic dependency.
Fixes#1417
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
We handle system directories differently, if its a bind mount
we mount the guest system directory to the container mount and
skip the 9p share mount.
However, we should not do this for docker volumes which are directories
created by Docker.
This introduces a Docker specific check, but that is the only
information available to us at the OCI layer.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
gometalinter is deprecated and will be archived April '19. The
suggestion is to switch to golangci-lint which is apparently 5x faster
than gometalinter.
Partially Fixes: #1377
Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
We were considering all empty-dir k8s volumes as backed by tmpfs.
However they can be backed by a host directory as well.
Pass those as 9p volumes, while tmpfs volumes are handled as before,
namely creating a tmpfs directory inside the guest.
The only way to detect "Memory" empty-dirs is to actually check if the
volume is mounted as a tmpfs mount, since any information of k8s
"medium" is lost at the OCI layer.
Fixes#1341
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Create cgroup path relative the cgroups mount point if it's absolute,
or create it relative to a runtime-determined location if the path
is relative.
fixes#1365fixes#1357
Signed-off-by: Julio Montes <julio.montes@intel.com>
This patch fixes the issue where various version of snapshotters,
overlay, block based graphdriver, containerd-shim-v2 overlay, block
based snapshotters mount & create rootfs differently and kata should be
able to handle them all.
The current version of the code always assumes that a folder named
'rootfs' exists within the mount device and that is the path the
container should start at. This patch checks the existing mount point
and if it is the same as the rootFs passed to the container, we no
longer add a suffix to the container's rootfs path.
Fixes: #1325
Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
Co-Authored-by: Manohar Castelino <manohar.r.castelino@intel.com>
sandbox cgroup use V1NoConstraints, this only create memory subsystem,
but when delete, load parent cgroup always use `cgroups.V1`, so other
subsystem path can not be find, sandbox cgroup can not be deleted.
Fixes: #1263
Signed-off-by: Ace-Tang <aceapril@126.com>
container is killed by force, container's state MUST change its state to stop
immediately to avoid leaving it in a bad state.
fixes#1088
Signed-off-by: Julio Montes <julio.montes@intel.com>
cpu cgroups are container's specific hence all containers even the sandbox
should be able o create, delete and update their cgroups. The cgroup crated
matches with the cgroup path passed by the containers manager.
fixes#1117fixes#1118fixes#1021
Signed-off-by: Julio Montes <julio.montes@intel.com>
With the Stores conversion, the newContainer() cyclomatic complexity
went over 15. We fix that by extracting the block devices creation
routine out of newContainer.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We convert the whole virtcontainers code to use the store package
instead of the resource_storage one. The resource_storage removal will
happen in a separate change for a more logical split.
This change is fairly big but mostly does not change the code logic.
What really changes is when we create a store for a container or a
sandbox. We now need to explictly do so instead of just assigning a
filesystem{} instance. Other than that, the logic is kept intact.
Fixes: #1099
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to move the hypervisor implementations into their own package,
we need to put the capabilities type into the types package.
Fixes: #1119
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We can now remove all the sandbox shared types and convert the rest of
the code to using the new internal types package.
This commit includes virtcontainers, cli and containerd-shim changes in
one atomic change in order to not break bisect'ibility.
Fixes: #1095
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
cri containerd calls kill on stopped sandbox and if we
fail the call, it can cause `cri stopp` command to fail
too.
Fixes: #1084
Signed-off-by: Peng Tao <bergwolf@gmail.com>
In case we use an hypervisor that cannot support filesystem sharing,
we copy files over to the VM rootfs through the gRPC protocol. This
is a nice workaround, but it only works with regular files, which
means no device file, no socket file, no directory, etc... can be
sent this way.
This is a limitation that we accept here, by simply ignoring those
non-regular files.
Fixes#1068
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
Copy files to contaier's rootfs if hypervisor doesn't supports filesystem
sharing, otherwise bind mount them in the shared directory.
see #1031
Signed-off-by: Julio Montes <julio.montes@intel.com>
- Container only is responsable of namespaces and cgroups
inside the VM.
- Sandbox will manage VM resources.
The resouces has to be re-calculated and updated:
- Create new Container: If a new container is created the cpus and memory
may be updated.
- Container update: The update call will change the cgroups of a container.
the sandbox would need to resize the cpus and VM depending the update.
To manage the resources from sandbox the hypervisor interaface adds two methods.
- resizeMemory().
This function will be used by the sandbox to request
increase or decrease the VM memory.
- resizeCPUs()
vcpus are requested to the hypervisor based
on the sum of all the containers in the sandbox.
The CPUs calculations use the container cgroup information all the time.
This should allow do better calculations.
For example.
2 containers in a pod.
container 1 cpus = .5
container 2 cpus = .5
Now:
Sandbox requested vcpus 1
Before:
Sandbox requested vcpus 2
When a update request is done only some atributes have
information. If cpu and quota are nil or 0 we dont update them.
If we would updated them the sandbox calculations would remove already
removed vcpus.
This commit also moves the sandbox resource update call at container.update()
just before the container cgroups information is updated.
Fixes: #833
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Got "docker: Error response from daemon: OCI runtime create failed:
QMP command failed: unknown." when "docker run --privileged" with kata.
In qemu part, it got:
"Could not open '/dev/sr0': Read-only file system"
or
"No medium found"
The cause is qemu need open block device to get its status.
But /dev/sr0 is a CDROM that cannot be opened.
This patch let newContainer doesn't attach device if it is a CDROM
to handle the issue.
Fixes#829
Signed-off-by: Hui Zhu <teawater@hyper.sh>