Commit Graph

105 Commits

Author SHA1 Message Date
Wei Zhang
a6b3368469 persist: merge more files with persist.json
Fixes #803

Merge more container storage files with `persist.json` including:
* devices.json
* mounts.json
* process.json

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-05-13 17:00:33 +08:00
Hui Zhu
5ba09817d8 Merge pull request #1575 from WeiZhang555/simplify-persist-api
newstore:  removing deprecated files when use new store driver
2019-05-10 15:33:22 +08:00
James O. D. Hunt
bb44f65a68 Merge pull request #1623 from awprice/system-mount-skip
mounts: fix isSystemMount check for mountSharedDirMounts
2019-05-09 09:38:11 +01:00
Wei Zhang
4c192139cf newstore: remove file "devices.json"
When using experimental feature "newstore", we save and load devices
information from `persist.json` instead of `devices.json`, in such case,
file `devices.json` isn't needed anymore, so remove it.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-05-06 14:40:08 +08:00
Graham Whaley
ea71133d1a Merge pull request #1558 from amshinde/ignore-floppy-drives
devices: Skip floppy drives while passing devices to guest
2019-05-03 17:34:11 +01:00
Alex Price
709feac057 mounts: fix isSystemMount check for mountSharedDirMounts
This change updates the isSystemMount check for mountSharedDirMounts
when setting up shared directory mounts for the container and uses
the source of the mount instead of the destination for the check.

We want to exclude system mounts from the host side as they
shouldn't be mounted into the container.

We do however want to allow system mounts within the
container as denying them can prevent some containers from
running properly.

Fixes #1591

Signed-off-by: Alex Price <aprice@atlassian.com>
2019-05-03 12:17:36 +10:00
Wei Zhang
341a988e06 persist: simplify persist api
Fixes #803

Simplify new store API to make the code easier to understand and use.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-04-30 11:54:42 +08:00
Salvador Fuentes
bc9b9e2af6 vc: Revert "vc: change container rootfs to be a mount"
This reverts commit 196661bc0d.

Reverting because cri-o with devicemapper started
to fail after this commit was merged.

Fixes: #1574.

Signed-off-by: Salvador Fuentes <salvador.fuentes@intel.com>
2019-04-23 08:56:36 -05:00
Peng Tao
196661bc0d vc: change container rootfs to be a mount
We can use the same data structure to describe both of them.
So that we can handle them similarly.

Fixes: #1566

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2019-04-20 00:42:25 -07:00
Wei Zhang
9bd4e5008c store: address comments
Address review comments

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-04-19 15:38:10 +08:00
Wei Zhang
e40dcb9376 storage: set new storage driver as "experimental"
Set new persist storage driver "virtcontainers/persist/" as "experimental"
feature.
One day when this can fully work and we're ready to move to 2.0, we'll move
it from "experimental" feature to formal feature.
At that time, the "virtcontainers/filesystem_resource_storage.go" can be removed
completely.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-04-19 15:35:33 +08:00
Wei Zhang
504c706bea storage: address comments
Address some comments:
* fix persist driver func names for better understanding
* modify some logic, add some returned error etc

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-04-19 15:33:53 +08:00
Wei Zhang
6e4149d86c persist: save and restore state from persist.json
Save and restore state from persist.json instead of state.json

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-04-19 15:33:53 +08:00
Wei Zhang
039ed4eeb8 persist: persist device data
Persist device information to relative file

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-04-19 15:33:53 +08:00
Wei Zhang
b42fde69c0 persist: demo code for persist api
Demonstrate how to make use of `virtcontainer/persist/api` data structure
package.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-04-19 15:33:53 +08:00
Archana Shinde
f6b8387814 devices: Skip floppy drives while passing devices to guest
In privileged mode, all host devices are supposed to be passed
to the container in config.json. Skip floppy drives.

Fixes #1551

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2019-04-18 11:26:07 -07:00
Peng Tao
203728676a vc: remove BlockIndex from container state
No longer used.

Fixes: #1562

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2019-04-17 22:39:42 -07:00
Fupan Li
2b45f0b2fd Merge pull request #1528 from bergwolf/grpc
shimv2 should return grpc error codes
2019-04-15 09:50:10 +08:00
Archana Shinde
9b622b7e77 Merge pull request #1485 from awprice/k8s-empty-dir-local
storage: create k8s emptyDir inside VM
2019-04-12 08:29:18 -07:00
Julio Montes
d99693a564 Merge pull request #1518 from lifupan/fixtop
virtcontainers: prepend a kata specific string to host cgroups path
2019-04-12 08:58:38 -05:00
Peng Tao
cf90751638 vc: export vc error types
So that shimv2 can convert it into grpc errors.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2019-04-12 02:01:02 -07:00
lifupan
1a1f93bc78 virtcontainers: add a kata specific prefix to host cgroups path
prepend a kata specific string to oci cgroup path to
form a different cgroup path, thus cAdvisor couldn't
find kata containers cgroup path on host to prevent it
from grabbing the stats data.

Fixes:#1488

Signed-off-by: lifupan <lifupan@gmail.com>
2019-04-12 10:30:19 +08:00
Peng Tao
616f26cfe5 types: split sandbox and container state
Since they do not really share many of the fields.

Fixes: #1434

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2019-04-09 18:59:56 -07:00
Peng Tao
d76eddf41e Merge pull request #1416 from WeiZhang555/dont-save-cgroups-to-state-file
cgroups: remove duplicate fields from state
2019-04-02 16:09:33 +08:00
Peng Tao
25d21060e3 Merge pull request #1412 from lifupan/shimv2mount
shimv2: optionally plug rootfs block storage instead of mounting it
2019-04-02 15:30:40 +08:00
lifupan
628ea46c58 virtcontainers: change container's rootfs from string to mount alike struct
container's rootfs is a string type, which cannot represent a
block storage backed rootfs which hasn't been mounted.
Change it to a mount alike struct as below:
    RootFs struct {
            // Source specify the BlockDevice path
            Source string
            // Target specify where the rootfs is mounted if it has been mounted
            Target string
            // Type specifies the type of filesystem to mount.
            Type string
            // Options specifies zero or more fstab style mount options.
            Options []string
            // Mounted specifies whether the rootfs has be mounted or not
            Mounted bool
     }

If the container's rootfs has been mounted as before, then this struct can be
initialized as: RootFs{Target: <rootfs>, Mounted: true} to be compatible with
previous case.

Fixes:#1158

Signed-off-by: lifupan <lifupan@gmail.com>
2019-04-02 10:54:05 +08:00
Wei Zhang
ad7d9b7bab cgroups: remove duplicate fields from state
Fixes: #1415

Container resources have been saved to ContainerConfig so there's no
need to save it again in state.json.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-03-26 10:34:03 +08:00
Archana Shinde
228d1512d9 mount: Add check for k8s host empty directory
k8s host empty-dir is equivalent to docker volumes.
For this case, we should just use the host directory even
for system directories.

Move the isEphemeral function to virtcontainers to not
introduce cyclic dependency.

Fixes #1417

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2019-03-25 14:06:23 -07:00
Archana Shinde
70c193132d mounts: Add check for system volumes
We handle system directories differently, if its a bind mount
we mount the guest system directory to the container mount and
skip the 9p share mount.
However, we should not do this for docker volumes which are directories
created by Docker.

This introduces a Docker specific check, but that is the only
information available to us at the OCI layer.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2019-03-25 10:49:27 -07:00
Ganesh Maharaj Mahalingam
f4428761cb lint: Update go linter from gometalinter to golangci-lint.
gometalinter is deprecated and will be archived April '19. The
suggestion is to switch to golangci-lint which is apparently 5x faster
than gometalinter.

Partially Fixes: #1377

Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
2019-03-25 08:48:13 -07:00
Julio Montes
dbc5a32b74 Merge pull request #1366 from devimc/topic/fixRelativeCgroupPath
virtcontainers: honor OCI cgroupsPath
2019-03-19 10:32:41 -06:00
Archana Shinde
47a6023382 volumes: Handle k8s empty-dirs of "default" medium type
We were considering all empty-dir k8s volumes as backed by tmpfs.
However they can be backed by a host directory as well.
Pass those as 9p volumes, while tmpfs volumes are handled as before,
namely creating a tmpfs directory inside the guest.
The only way to detect "Memory" empty-dirs is to actually check if the
volume is mounted as a tmpfs mount, since any information of k8s
"medium" is lost at the OCI layer.

Fixes #1341

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2019-03-15 09:44:10 -07:00
Julio Montes
3aaa77db22 virtcontainers: honor OCI cgroupsPath
Create cgroup path relative the cgroups mount point if it's absolute,
or create it relative to a runtime-determined location if the path
is relative.

fixes #1365
fixes #1357

Signed-off-by: Julio Montes <julio.montes@intel.com>
2019-03-14 08:13:22 -06:00
Ganesh Maharaj Mahalingam
27a92f94c8 runtime: Fix rootfs mount assumptions
This patch fixes the issue where various version of snapshotters,
overlay, block based graphdriver, containerd-shim-v2 overlay, block
based snapshotters mount & create rootfs differently and kata should be
able to handle them all.

The current version of the code always assumes that a folder named
'rootfs' exists within the mount device and that is the path the
container should start at. This patch checks the existing mount point
and if it is the same as the rootFs passed to the container, we no
longer add a suffix to the container's rootfs path.

Fixes: #1325

Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
Co-Authored-by: Manohar Castelino <manohar.r.castelino@intel.com>
2019-03-05 13:41:37 -08:00
Ace-Tang
454775fb97 cgroups: fix failed to remove sandbox cgroup
sandbox cgroup use V1NoConstraints, this only create memory subsystem,
but when delete, load parent cgroup always use `cgroups.V1`, so other
subsystem path can not be find, sandbox cgroup can not be deleted.

Fixes: #1263

Signed-off-by: Ace-Tang <aceapril@126.com>
2019-02-21 17:34:34 +08:00
Julio Montes
62c393c119 virtcontainers: change container's state to stop asap
container is killed by force, container's state MUST change its state to stop
immediately to avoid leaving it in a bad state.

fixes #1088

Signed-off-by: Julio Montes <julio.montes@intel.com>
2019-02-19 13:13:44 -06:00
Julio Montes
9758cdba7c virtcontainers: move cpu cgroup implementation
cpu cgroups are container's specific hence all containers even the sandbox
should be able o create, delete and update their cgroups. The cgroup crated
matches with the cgroup path passed by the containers manager.

fixes #1117
fixes #1118
fixes #1021

Signed-off-by: Julio Montes <julio.montes@intel.com>
2019-02-19 13:13:44 -06:00
Samuel Ortiz
7b0376f3d3 virtcontainers: Fix container.go cyclomatic complexity
With the Stores conversion, the newContainer() cyclomatic complexity
went over 15. We fix that by extracting the block devices creation
routine out of newContainer.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-02-07 00:59:33 +01:00
Samuel Ortiz
fad23ea54e virtcontainers: Conversion to Stores
We convert the whole virtcontainers code to use the store package
instead of the resource_storage one. The resource_storage removal will
happen in a separate change for a more logical split.

This change is fairly big but mostly does not change the code logic.
What really changes is when we create a store for a container or a
sandbox. We now need to explictly do so instead of just assigning a
filesystem{} instance. Other than that, the logic is kept intact.

Fixes: #1099

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-02-07 00:59:29 +01:00
Frank Cao
6c3277e013 Merge pull request #1126 from jcvenegas/allow-update-on-ready
update: allow do update on ready.
2019-01-18 11:03:12 +08:00
Jose Carlos Venegas Munoz
7228bab79b container: update: Allow updates once container is created
Before, we would only allow for a container-update command
to proceed if the container was in the running state. So
long as the container is created, this should be allowed.

This was found using the `static` policy for Kubernetes CPU
manager[1]. Where the `update` command is called after the
`create` runtime command (when the container state is `ready`).

[1] https://github.com/kubernetes/community/blob/95a4a1/contributors/design-proposals/node/cpu-manager.md#example-scenarios-and-interactions

Fixes: #1083

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2019-01-16 17:15:00 -05:00
Samuel Ortiz
b25f43e865 virtcontainers: Add Capabilities to the types package
In order to move the hypervisor implementations into their own package,
we need to put the capabilities type into the types package.

Fixes: #1119

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-14 20:30:06 +01:00
Samuel Ortiz
b05dbe3886 runtime: Convert to the new internal types package
We can now remove all the sandbox shared types and convert the rest of
the code to using the new internal types package.

This commit includes virtcontainers, cli and containerd-shim changes in
one atomic change in order to not break bisect'ibility.

Fixes: #1095

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-08 14:43:33 +01:00
Peng Tao
bf2813fee8 cli: allow to kill a stopped container and sandbox
cri containerd calls kill on stopped sandbox and if we
fail the call, it can cause `cri stopp` command to fail
too.

Fixes: #1084

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2019-01-08 11:19:25 +08:00
Frank Cao
174e0c98bc Merge pull request #963 from running99/master
container: Use lazy unmount
2018-12-26 09:50:44 +08:00
Sebastien Boeuf
83e38c959a mounts: Ignore existing mounts if they cannot be honored
In case we use an hypervisor that cannot support filesystem sharing,
we copy files over to the VM rootfs through the gRPC protocol. This
is a nice workaround, but it only works with regular files, which
means no device file, no socket file, no directory, etc... can be
sent this way.

This is a limitation that we accept here, by simply ignoring those
non-regular files.

Fixes #1068

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
2018-12-21 15:38:06 +00:00
running
c099be56da container: Use lazy unmount
Unmount recursively to unmount bind-mounted volumes.
Fixes: #965
Signed-off-by: Ning Lu <crossrunning@outlook.com>
2018-12-21 11:11:58 +08:00
Julio Montes
378d8157a6 virtcontainers: copy or bind mount shared file
Copy files to contaier's rootfs if hypervisor doesn't supports filesystem
sharing, otherwise bind mount them in the shared directory.

see #1031

Signed-off-by: Julio Montes <julio.montes@intel.com>
2018-12-19 09:58:44 -06:00
Jose Carlos Venegas Munoz
618cfbf1db vc: sandbox: Let sandbox manage VM resources.
- Container only is responsable of namespaces and cgroups
inside the VM.

- Sandbox will manage VM resources.

The resouces has to be re-calculated and updated:

- Create new Container: If a new container is created the cpus and memory
may be updated.

- Container update: The update call will change the cgroups of a container.
the sandbox would need to resize the cpus and VM depending the update.

To manage the resources from sandbox the hypervisor interaface adds two methods.

- resizeMemory().

This function will be used by the sandbox to request
increase or decrease the VM memory.

- resizeCPUs()

vcpus are requested to the hypervisor based
on the sum of all the containers in the sandbox.

The CPUs calculations use the container cgroup information all the time.

This should allow do better calculations.

For example.

2 containers in a pod.

container 1 cpus = .5
container 2 cpus = .5

Now:
Sandbox requested vcpus 1

Before:
Sandbox requested vcpus 2

When a update request is done only some atributes have
information. If cpu and quota are nil or 0 we dont update them.

If we would updated them the sandbox calculations would remove already
removed vcpus.

This commit also moves the sandbox resource update call at container.update()
just before the container cgroups information is updated.

Fixes: #833

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2018-12-13 16:33:14 -06:00
Hui Zhu
193b324242 newContainer: Not attach device if it is a CDROM
Got "docker: Error response from daemon: OCI runtime create failed:
QMP command failed: unknown." when "docker run --privileged" with kata.
In qemu part, it got:
"Could not open '/dev/sr0': Read-only file system"
or
"No medium found"
The cause is qemu need open block device to get its status.
But /dev/sr0 is a CDROM that cannot be opened.

This patch let newContainer doesn't attach device if it is a CDROM
to handle the issue.

Fixes #829

Signed-off-by: Hui Zhu <teawater@hyper.sh>
2018-11-07 17:28:06 +08:00