Commit Graph

88 Commits

Author SHA1 Message Date
Wei Zhang
8e88859ee4 persist: remove all usage of VCStore
Remove VCStore usage from all modules

Signed-off-by: Wei Zhang <weizhang555.zw@gmail.com>
2019-12-30 18:42:15 +08:00
Wei Zhang
b63e517f6d persist: replace sandbox lock with newstore.Lock
Replace rLockSandbox and rwLockSandbox with new store lock functions.

Signed-off-by: Wei Zhang <weizhang555.zw@gmail.com>
2019-12-30 18:41:02 +08:00
James O. D. Hunt
67f203f1b8 compatoci: Add a SetLogger call
Add a standard `SetLogger()` call to allow the `compatoci` package to be
provided a base logger which it can then customise.

Fixes: #2305.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2019-12-02 10:10:37 +00:00
Ted Yu
544730b4b1 vc: Drop Sandbox#Pause and Sandbox#Resume
Fixes #2276

Signed-off-by: Ted Yu <yuzhihong@gmail.com>
2019-11-29 05:58:06 -08:00
Julio Montes
613fd0fb60 virtcontainers: rename GetOCISpec to GetPatchedOCISpec
GetOCISpec returns a patched version of the original OCI spec, it was modified
to support:
* capabilities
* Ephemeral storage
* k8s empty dir

In order to avoid consusions and make api clear, rename GetOCISpec
to GetPatchedOCISpec and ContainerConfig.Spec to ContainerConfig.CustomSpec

fixes #2252

Signed-off-by: Julio Montes <julio.montes@intel.com>
2019-11-25 17:22:23 +00:00
Julio Montes
7f67b9f084 virtcontainers: improve algorithm to find containers
Do not iterate over a map to find a container, use map built-in
method instead.

fixes #2254

Signed-off-by: Julio Montes <julio.montes@intel.com>
2019-11-22 20:08:37 +00:00
Wei Zhang
7943dd95b4 persistence: store configuration in newstore
Fixes #803

Store the configuration data in persist.json.

Signed-off-by: Wei Zhang <weizhang555.zw@gmail.com>
2019-11-19 18:40:19 +08:00
Peng Tao
254b85aec1 Merge pull request #2092 from lifupan/fixmissingwatchconsole
virtcontainers: fix the issue of missing watchConsole
2019-11-01 09:47:11 +08:00
Jose Carlos Venegas Munoz
1bbc1d58bd virtcontainers: add StatsSandbox to vc API
StatsSandbox is used to gather metrics for the sandbox (host cgroup) as
well as from the individual containers (from the guest cgroups). This is
intended to be used for easily calculating Kata sandbox overheads.

Fixes: #2096

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
2019-10-30 19:07:23 -07:00
lifupan
c51d49277e virtcontainers: fix the issue of missing watchConsole
When do the reloading sandbox in shimv2, it's needed to
rewatch the hypervisor's console when debug enabled.

Fixes:#2091

Signed-off-by: lifupan <lifupan@gmail.com>
2019-10-19 00:37:15 +08:00
Gabi Beyer
5f0799f1b7 vc: add rootless dir to path variables
Modify some path variables to be functions that return the path
with the rootless directory prefix if running rootlessly.

Fixes: #1827

Signed-off-by: Gabi Beyer <gabrielle.n.beyer@intel.com>
2019-09-26 16:17:16 +02:00
Wei Zhang
2ed94cbd9d Config: Remove ConfigJSONKey from annotations
Fixes: #2023

We can get OCI spec config from bundle instead of annotations, so this
field isn't necessary.

Signed-off-by: Wei Zhang <weizhang555.zw@gmail.com>
2019-09-17 11:47:06 +08:00
Eric Ernst
b62814a6f0 sandbox: combine sandbox cgroup functions
Simplify the tests and the code by combining the create and join
functions into a single function.

Signed-off-by: Eric Ernst <eric.ernst@intel.com>
2019-09-05 13:49:13 -07:00
Jose Carlos Venegas Munoz
074418f56b sandbox: Join cgroup sandbox on create.
When a new sandbox is created, join to its cgroup path
this will create all proxy, shim, etc in the sandbox cgroup.

Fixes: #1879

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2019-08-29 14:08:04 -05:00
lifupan
c91556aa41 api: add a CleanupContainer api for VC
When shimv2 was killed by accident, containerd would try to
launch a new shimv2 binarry to cleanup the container. In order
to avoid race condition, the cleanup should be done serialized
in a sandbox. Thus adding a new api to do this by locking the
sandbox.

Fixes:#1832

Signed-off-by: lifupan <lifupan@gmail.com>
2019-08-24 08:16:02 +08:00
Julio Montes
14474a49a2 Merge pull request #1921 from Ace-Tang/fix-remove-network
network: fix failed to remove network
2019-08-07 14:06:52 -05:00
Ace-Tang
50c3e56aeb network: fix failed to remove network
in create sandbox, if process error, should remove network without judge
NetNsCreated is true, since network is created by kata and should be
removed by kata, and network.Remove has judged if need to delete netns
depend on NetNsCreated

Fixes: #1920

Signed-off-by: Ace-Tang <aceapril@126.com>
2019-07-30 20:17:09 +08:00
Peng Tao
bc4460e12f sandbox: support force stop
When force is true, ignore any guest related errors. This can
be used to stop a sandbox when hypervisor process is dead.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2019-07-22 19:29:32 -07:00
Yang, Wei
071030b784 shimv2: Close vhostfd after vm get vhostfd
If kata containers is using vfio and vhost net,the unbinding
of vfio would be hang. In the scenario, vhost net kernel thread
takes a reference to the qemu's mm, and the reference also includes
the mmap regions on the vfio device file. so vhost kernel thread
would be not released when qemu is killed as the vhost file
descriptor still is opened by shim v2 process, and the vfio device
is not released because there's still a reference to the mmap.

Fixes: #1669

Signed-off-by: Yang, Wei <w90p710@gmail.com>
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
2019-05-16 13:31:11 +08:00
Wei Zhang
297097779e persist: save/load GuestMemoryHotplugProbe
Support saving/loading `GuestMemoryHotplugProbe` from sandbox state.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-05-09 14:39:04 +08:00
Wei Zhang
4c192139cf newstore: remove file "devices.json"
When using experimental feature "newstore", we save and load devices
information from `persist.json` instead of `devices.json`, in such case,
file `devices.json` isn't needed anymore, so remove it.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-05-06 14:40:08 +08:00
Salvador Fuentes
bc9b9e2af6 vc: Revert "vc: change container rootfs to be a mount"
This reverts commit 196661bc0d.

Reverting because cri-o with devicemapper started
to fail after this commit was merged.

Fixes: #1574.

Signed-off-by: Salvador Fuentes <salvador.fuentes@intel.com>
2019-04-23 08:56:36 -05:00
Peng Tao
196661bc0d vc: change container rootfs to be a mount
We can use the same data structure to describe both of them.
So that we can handle them similarly.

Fixes: #1566

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2019-04-20 00:42:25 -07:00
Wei Zhang
b42fde69c0 persist: demo code for persist api
Demonstrate how to make use of `virtcontainer/persist/api` data structure
package.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-04-19 15:33:53 +08:00
Peng Tao
cf90751638 vc: export vc error types
So that shimv2 can convert it into grpc errors.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2019-04-12 02:01:02 -07:00
Fupan Li
c9a3b933f8 Merge pull request #1427 from Ace-Tang/fix-qemu-leak
qemu: fix qemu leak when failed to start container
2019-04-02 23:32:11 +08:00
lifupan
628ea46c58 virtcontainers: change container's rootfs from string to mount alike struct
container's rootfs is a string type, which cannot represent a
block storage backed rootfs which hasn't been mounted.
Change it to a mount alike struct as below:
    RootFs struct {
            // Source specify the BlockDevice path
            Source string
            // Target specify where the rootfs is mounted if it has been mounted
            Target string
            // Type specifies the type of filesystem to mount.
            Type string
            // Options specifies zero or more fstab style mount options.
            Options []string
            // Mounted specifies whether the rootfs has be mounted or not
            Mounted bool
     }

If the container's rootfs has been mounted as before, then this struct can be
initialized as: RootFs{Target: <rootfs>, Mounted: true} to be compatible with
previous case.

Fixes:#1158

Signed-off-by: lifupan <lifupan@gmail.com>
2019-04-02 10:54:05 +08:00
Ace-Tang
096fa046f8 qemu: fix qemu leak when failed to start container
do cleanup inside startVM() if start vm get error

Fixes: #1426

Signed-off-by: Ace-Tang <aceapril@126.com>
2019-03-28 19:38:56 +08:00
James O. D. Hunt
c759cf5f37 tracing: Fix tracing
The store refactor (#1066) inadvertently broke runtime tracing as it
created new contexts containing trace spans.

Reworking the store changes to re-use the existing context resolves the
problem since runtime tracing assumes a single context.

Fixes #1277.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2019-03-04 11:02:31 +00:00
James O. D. Hunt
f540a80354 store: Add SetLogger API
Add a `store.SetLogger()` API to allow the store package to log with the
standard set of fields (as expected by the log parser [1].

Fixes #1297.

---

[1] - https://github.com/kata-containers/tests/tree/master/cmd/log-parser#logfile-requirements

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2019-02-28 10:56:26 +00:00
Julio Montes
5201860bb0 virtcontainers: reimplement sandbox cgroup
All containers run in different cgroups even the sandbox, with this new
implementation the sandbox cpu cgroup wil be equal to the sum of all its
containers and the hypervisor process will be placed there impacting to the
containers running in the sandbox (VM). The default number of vcpus is
used when the sandbox has no constraints. For example, if default_vcpus
is 2, then quota will be 200000 and period 100000.

**c-ray test**
http://www.futuretech.blinkenlights.nl/c-ray.html

```
+=============================================+
|         | 6 threads 6cpus | 1 thread 1 cpu  |
+=============================================+
| current |   40 seconds    |   122 seconds   |
+==============================================
|   new   |   37 seconds    |   124 seconds   |
+==============================================
```

current = current cgroups implementation
new = new cgroups implementation

**workload**

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: c-ray
  annotations:
    io.kubernetes.cri.untrusted-workload: "true"
spec:
  restartPolicy: Never
  containers:
  - name: c-ray-1
    image: docker.io/devimc/c-ray:latest
    imagePullPolicy: IfNotPresent
    args: ["-t", "6", "-s", "1600x1200", "-r", "8", "-i",
          "/c-ray-1.1/sphfract", "-o", "/tmp/output.ppm"]
    resources:
      limits:
        cpu: 6
  - name: c-ray-2
    image: docker.io/devimc/c-ray:latest
    imagePullPolicy: IfNotPresent
    args: ["-t", "1", "-s", "1600x1200", "-r", "8", "-i",
          "/c-ray-1.1/sphfract", "-o", "/tmp/output.ppm"]
    resources:
      limits:
        cpu: 1
```

fixes #1153

Signed-off-by: Julio Montes <julio.montes@intel.com>
2019-02-19 13:13:44 -06:00
Samuel Ortiz
fad23ea54e virtcontainers: Conversion to Stores
We convert the whole virtcontainers code to use the store package
instead of the resource_storage one. The resource_storage removal will
happen in a separate change for a more logical split.

This change is fairly big but mostly does not change the code logic.
What really changes is when we create a store for a container or a
sandbox. We now need to explictly do so instead of just assigning a
filesystem{} instance. Other than that, the logic is kept intact.

Fixes: #1099

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-02-07 00:59:29 +01:00
Samuel Ortiz
b05dbe3886 runtime: Convert to the new internal types package
We can now remove all the sandbox shared types and convert the rest of
the code to using the new internal types package.

This commit includes virtcontainers, cli and containerd-shim changes in
one atomic change in order to not break bisect'ibility.

Fixes: #1095

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-08 14:43:33 +01:00
Samuel Ortiz
3ab7d077d1 virtcontainers: Alias for pkg/types
Since we're going to have both external and internal types packages, we
alias the external one as vcTypes. And the internal one will be usable
through the types namespace.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-08 14:24:06 +01:00
Samuel Ortiz
acf833cb4a virtcontainers: Call agent startSandbox from startVM
We always ask the agent to start the sandbox when we start the VM, so we
should simply call agent.startSandbox from startVM instead of open
coding those.
This slightly simplifies the complex createSandboxFromConfig routine.

Fixes: #1011

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
2019-01-07 09:56:04 -08:00
Samuel Ortiz
ebf8547c38 virtcontainers: Remove useless startSandbox wrapper
startSandbox() wraps a single operation (sandbox.Start()), so we can
remove it and make the code easier to read/follow.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-07 09:48:22 -08:00
Peng Tao
bf1a5ce000 sandbox: cleanup sandbox if creation failed
This includes cleaning up the sandbox on disk resources,
and closing open fds when preparing the hypervisor.

Fixes: #1057

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-12-21 13:46:16 +08:00
Sebastien Boeuf
fa9b15dafe virtcontainers: Return the appropriate container status
When our runtime is asked for the container status, we also handle
the scenario where the container is stopped if the shim process for
that container on the host has terminated.

In the current implementation, we retrieve the container status
before stopping the container, causing a wrong status to be returned.
The wait for the original go-routine's completion was done in a defer
within the caller of statusContainers(), resulting in the
statusContainer()'s values to return the pre-stopped value.

This bug is first observed when updating to docker v18.09/containerd
v1.2.0. With the current implementation, containerd-shim receives the
TaskExit when it detects kata-shim is terminating. When checking the
container state, however, it does not get the expected "stopped" value.

The following commit resolves the described issue by simplifying the
locking used around the status container calls. Originally
StatusContainer would request a read lock. If we needed to update the
container status in statusContainer, we'd start a go-routine which
would request a read-write lock, waiting for the original read lock to
be released.  Can't imagine a bug could linger in this logic. We now
just request a read-write lock in the caller (StatusContainer),
skipping the need for a separate go-routine and defer. This greatly
simplifies the logic, and removes the original bug.

Fixes #926

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-28 20:10:34 -08:00
Sebastien Boeuf
982381bff0 api: Cleanup StartContainer()
Simple patch reducing the complexity of StartContainer().

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-12 15:15:56 -08:00
Sebastien Boeuf
57773816b3 sandbox: Create and export Pause/ResumeContainer() to the API level
In order to support use cases such as containerd-shim-v2 where
we would have a long running process holding the sandbox pointer,
there would be no reason to call into the stateless functions
PauseContainer() and ResumeContainer(), which would recreate a
new sandbox pointer and the corresponding ones for containers.

Fixes #903

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-12 15:15:50 -08:00
Sebastien Boeuf
b298ec4228 sandbox: Create and export ProcessListContainer() to the API level
In order to support use cases such as containerd-shim-v2 where
we would have a long running process holding the sandbox pointer,
there would be no reason to call into the stateless function
ProcessListContainer(), which would recreate a new sandbox pointer
and the corresponding ones for containers.

Fixes #903

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-12 15:15:44 -08:00
Sebastien Boeuf
3add296f78 sandbox: Create and export KillContainer() to the API level
In order to support use cases such as containerd-shim-v2 where we
would have a long running process holding the sandbox pointer, there
would be no reason to call into the stateless function KillContainer(),
which would recreate a new sandbox pointer and the corresponding ones
for containers.

Fixes #903

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-12 15:15:37 -08:00
Sebastien Boeuf
76537265cb sandbox: Create and export StopContainer() to the API level
In order to support use cases such as containerd-shim-v2 where we
would have a long running process holding the sandbox pointer, there
would be no reason to call into the stateless function StopContainer(),
which would recreate a new sandbox pointer and the corresponding ones
for containers.

Fixes #903

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-12 15:15:31 -08:00
Sebastien Boeuf
109e12aa56 sandbox: Export Stop() to the API level
In order to support use cases such as containerd-shim-v2 where we
would have a long running process holding the sandbox pointer, there
would be no reason to call into the stateless function StopSandbox(),
which would recreate a new sandbox pointer and the corresponding ones
for containers.

Fixes #903

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-12 15:15:24 -08:00
Sebastien Boeuf
6c3e266eb9 sandbox: Export Start() to the API level
In order to support use cases such as containerd-shim-v2 where we
would have a long running process holding the sandbox pointer, there
would be no reason to call into the stateless function StartSandbox(),
which would recreate a new sandbox pointer and the corresponding ones
for containers.

Fixes #903

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-12 15:15:04 -08:00
Sebastien Boeuf
7bf84d05ad types: Replace agent/pkg/types with virtcontainers/pkg/types
This commit replaces every place where the "types" package from the
Kata agent was used, with the new "types" package from virtcontainers.

In order to do so, it introduces a few translation functions between
the agent and virtcontainers types, since this is needed by the kata
agent implementation.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-02 08:46:11 -07:00
Wei Zhang
34fe3b9d6d cgroups: add host cgroup support
Fixes #344

Add host cgroup support for kata.

This commits only adds cpu.cfs_period and cpu.cfs_quota support.

It will create 3-level hierarchy, take "cpu" cgroup as an example:

```
/sys/fs/cgroup
|---cpu
   |---kata
      |---<sandbox-id>
         |--vcpu
      |---<sandbox-id>
```

* `vc` cgroup is common parent for all kata-container sandbox, it won't be removed
after sandbox removed. This cgroup has no limitation.
* `<sandbox-id>` cgroup is the layer for each sandbox, it contains all other qemu
threads except for vcpu threads. In future, we can consider putting all shim
processes and proxy process here. This cgroup has no limitation yet.
* `vcpu` cgroup contains vcpu threads from qemu. Currently cpu quota and period
constraint applies to this cgroup.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
Signed-off-by: Jingxiao Lu <lujingxiao@huawei.com>
2018-10-27 09:41:35 +08:00
Sebastien Boeuf
309dcf9977 vendor: Update the agent vendoring based on pkg/types
Some agent types definition that were generic enough to be reused
everywhere, have been split from the initial grpc package.

This prevents from importing the entire protobuf package through
the grpc one, and prevents binaries such as kata-netmon to stay
in sync with the types definitions.

Fixes #856

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-10-26 09:35:59 -07:00
Frank Cao
633f4567f3 Merge pull request #825 from jodh-intel/add-trace-to-remaining-api-funcs
virtcontainers: Add missing API trace calls
2018-10-18 16:53:30 +08:00
Graham Whaley
0a652a1ab8 Merge pull request #786 from linzichang/master
sandbox/virtcontainers: memory resource hotplug when create container.
2018-10-18 09:43:24 +01:00