Commit Graph

156 Commits

Author SHA1 Message Date
Julio Montes
890a3d5960 Merge pull request #1637 from marcov/kill-hyp
virtcontainers: kill hypervisor if startSandbox fails
2019-05-23 15:11:54 -05:00
Fupan Li
100db8abdc Merge pull request #1670 from xs3c/fix-vfio-hang
shim v2: Close vhostfd after vm get vhostfd
2019-05-21 14:53:26 +08:00
Marco Vedovati
f89834a276 virtcontainers: avoid unnecessary error checking in startVM
Remove redundant error checking in startVM.

Signed-off-by: Marco Vedovati <mvedovati@suse.com>
2019-05-16 12:31:51 +02:00
Yang, Wei
071030b784 shimv2: Close vhostfd after vm get vhostfd
If kata containers is using vfio and vhost net,the unbinding
of vfio would be hang. In the scenario, vhost net kernel thread
takes a reference to the qemu's mm, and the reference also includes
the mmap regions on the vfio device file. so vhost kernel thread
would be not released when qemu is killed as the vhost file
descriptor still is opened by shim v2 process, and the vfio device
is not released because there's still a reference to the mmap.

Fixes: #1669

Signed-off-by: Yang, Wei <w90p710@gmail.com>
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
2019-05-16 13:31:11 +08:00
Manohar Castelino
66b93c7ca0 Networking: Ensure that network namespace is propagated
Network namespace needs to be propagated if available at
createSandbox()

Fixes: #1664

Signed-off-by: Manohar Castelino <manohar.r.castelino@intel.com>
2019-05-10 18:00:30 -07:00
Hui Zhu
5ba09817d8 Merge pull request #1575 from WeiZhang555/simplify-persist-api
newstore:  removing deprecated files when use new store driver
2019-05-10 15:33:22 +08:00
Wei Zhang
4c192139cf newstore: remove file "devices.json"
When using experimental feature "newstore", we save and load devices
information from `persist.json` instead of `devices.json`, in such case,
file `devices.json` isn't needed anymore, so remove it.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-05-06 14:40:08 +08:00
Stefan Hajnoczi
9480978364 qemu: add vhost-user-fs-pci device instead of 9p
When enable_virtio_fs is true, add a vhost-user-fs-pci for the
kataShared volume instead of 9p.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-05-05 11:32:34 -06:00
Wei Zhang
341a988e06 persist: simplify persist api
Fixes #803

Simplify new store API to make the code easier to understand and use.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-04-30 11:54:42 +08:00
Archana Shinde
b5aa8d4f67 Merge pull request #1577 from chavafg/topic/revert-mount-pr
Revert "vc: change container rootfs to be a mount"
2019-04-25 09:41:15 -07:00
James O. D. Hunt
ed64240df2 agent: Support Kata agent tracing
Add configuration options to support the various Kata agent tracing
modes and types. See the comments in the built configuration files for
details:

- `cli/config/configuration-fc.toml`
- `cli/config/configuration-qemu.toml`

Fixes #1369.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2019-04-25 09:41:13 +01:00
James O. D. Hunt
e803a7f870 agent: Return an error, not just an interface
Make `newAgentConfig()` return an explicit error rather than handling
the error scenario by simply returning the `error` object in the
`interface{}` return type. The old behaviour was confusing and
inconsistent with the other functions creating a new config type (shim,
proxy, etc).

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2019-04-24 17:14:01 +01:00
Salvador Fuentes
bc9b9e2af6 vc: Revert "vc: change container rootfs to be a mount"
This reverts commit 196661bc0d.

Reverting because cri-o with devicemapper started
to fail after this commit was merged.

Fixes: #1574.

Signed-off-by: Salvador Fuentes <salvador.fuentes@intel.com>
2019-04-23 08:56:36 -05:00
Peng Tao
196661bc0d vc: change container rootfs to be a mount
We can use the same data structure to describe both of them.
So that we can handle them similarly.

Fixes: #1566

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2019-04-20 00:42:25 -07:00
Wei Zhang
e40dcb9376 storage: set new storage driver as "experimental"
Set new persist storage driver "virtcontainers/persist/" as "experimental"
feature.
One day when this can fully work and we're ready to move to 2.0, we'll move
it from "experimental" feature to formal feature.
At that time, the "virtcontainers/filesystem_resource_storage.go" can be removed
completely.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-04-19 15:35:33 +08:00
Wei Zhang
504c706bea storage: address comments
Address some comments:
* fix persist driver func names for better understanding
* modify some logic, add some returned error etc

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-04-19 15:33:53 +08:00
Wei Zhang
039ed4eeb8 persist: persist device data
Persist device information to relative file

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-04-19 15:33:53 +08:00
Wei Zhang
b42fde69c0 persist: demo code for persist api
Demonstrate how to make use of `virtcontainer/persist/api` data structure
package.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-04-19 15:33:53 +08:00
Peng Tao
f5125421d0 sandbox: return ErrNoSuchContainer when failing to find a container
So that caller can determine that it is ENOENT-alike error.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2019-04-12 03:57:07 -07:00
Peng Tao
cf90751638 vc: export vc error types
So that shimv2 can convert it into grpc errors.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2019-04-12 02:01:02 -07:00
Peng Tao
c414599635 types: remove pid from sandbox state
No longer needed.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2019-04-09 18:59:56 -07:00
Peng Tao
616f26cfe5 types: split sandbox and container state
Since they do not really share many of the fields.

Fixes: #1434

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2019-04-09 18:59:56 -07:00
Penny Zheng
3bfcdf755a agent: add interface memHotplugByProbe
we need to notify guest kernel about memory hot-added event via probe interface.
hot-added memory deivce should be sliced into the size of memory section.

Fixes: #1149

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
2019-04-04 17:03:20 +08:00
Penny Zheng
47670fcf73 memoryDevice: reconstruct memoryDevice
If kata-runtime supports memory hotplug via probe interface, we need to reconstruct
memoryDevice to store relevant status, which are addr and probe. addr specifies the
physical address of the memory device, and probe determines it is hotplugged via
acpi-driven or probe interface.

Fixes: #1149

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
2019-04-04 17:03:20 +08:00
Penny Zheng
30a6a7de39 agent: acquire memory hotplug probe info via GetGuestDetails
In order to support memory hotplug via probe interface in kata-runtime,
firstly, we need to verify whether guest kernel is capable of that.

Fixes: #1149

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
2019-04-04 17:03:19 +08:00
Fupan Li
c9a3b933f8 Merge pull request #1427 from Ace-Tang/fix-qemu-leak
qemu: fix qemu leak when failed to start container
2019-04-02 23:32:11 +08:00
Peng Tao
25d21060e3 Merge pull request #1412 from lifupan/shimv2mount
shimv2: optionally plug rootfs block storage instead of mounting it
2019-04-02 15:30:40 +08:00
lifupan
628ea46c58 virtcontainers: change container's rootfs from string to mount alike struct
container's rootfs is a string type, which cannot represent a
block storage backed rootfs which hasn't been mounted.
Change it to a mount alike struct as below:
    RootFs struct {
            // Source specify the BlockDevice path
            Source string
            // Target specify where the rootfs is mounted if it has been mounted
            Target string
            // Type specifies the type of filesystem to mount.
            Type string
            // Options specifies zero or more fstab style mount options.
            Options []string
            // Mounted specifies whether the rootfs has be mounted or not
            Mounted bool
     }

If the container's rootfs has been mounted as before, then this struct can be
initialized as: RootFs{Target: <rootfs>, Mounted: true} to be compatible with
previous case.

Fixes:#1158

Signed-off-by: lifupan <lifupan@gmail.com>
2019-04-02 10:54:05 +08:00
Ace-Tang
096fa046f8 qemu: fix qemu leak when failed to start container
do cleanup inside startVM() if start vm get error

Fixes: #1426

Signed-off-by: Ace-Tang <aceapril@126.com>
2019-03-28 19:38:56 +08:00
Ganesh Maharaj Mahalingam
f4428761cb lint: Update go linter from gometalinter to golangci-lint.
gometalinter is deprecated and will be archived April '19. The
suggestion is to switch to golangci-lint which is apparently 5x faster
than gometalinter.

Partially Fixes: #1377

Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
2019-03-25 08:48:13 -07:00
Jose Carlos Venegas Munoz
0061e166d4 virtcontainers: move resource calculation to its own function
Make cpu and memory calculation in a different function
this help to reduce the function complexity and easy  unit test.

Fixes: #1296

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2019-03-11 12:17:01 -06:00
Wei Zhang
da80c70c0c config: enhance Feature structure
Fixes #1226

Add more fields to better describe an experimental feature.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-03-10 22:44:41 +08:00
Wei Zhang
050f03bb36 config: Add config flag "experimental"
Fixes #1226

Add new flag "experimental" for supporting underworking features.
Some features are under developing which are not ready for release,
there're also some features which will break compatibility which is not
suitable to be merged into a kata minor release(x version in x.y.z)

For getting these features above merged earlier for more testing, we can
mark them as "experimental" features, and move them to formal features
when they are ready.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-03-12 11:03:28 +08:00
James O. D. Hunt
c759cf5f37 tracing: Fix tracing
The store refactor (#1066) inadvertently broke runtime tracing as it
created new contexts containing trace spans.

Reworking the store changes to re-use the existing context resolves the
problem since runtime tracing assumes a single context.

Fixes #1277.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2019-03-04 11:02:31 +00:00
Eric Ernst
dc2650889c virtcontainers: fix vCPU calculation errors
We were grabbing a running total of quota and period for each container
and then calculating the number of resulting vCPUs. Summing period
doesn't make sense.  To simplify, let's just calculate mCPU per
container, keep a running total of mCPUs requested, and then translate
to sandbox vCPUs after.

Fixes: #1292

Signed-off-by: Eric Ernst <eric.ernst@intel.com>
2019-02-28 08:13:04 -08:00
Julio Montes
5201860bb0 virtcontainers: reimplement sandbox cgroup
All containers run in different cgroups even the sandbox, with this new
implementation the sandbox cpu cgroup wil be equal to the sum of all its
containers and the hypervisor process will be placed there impacting to the
containers running in the sandbox (VM). The default number of vcpus is
used when the sandbox has no constraints. For example, if default_vcpus
is 2, then quota will be 200000 and period 100000.

**c-ray test**
http://www.futuretech.blinkenlights.nl/c-ray.html

```
+=============================================+
|         | 6 threads 6cpus | 1 thread 1 cpu  |
+=============================================+
| current |   40 seconds    |   122 seconds   |
+==============================================
|   new   |   37 seconds    |   124 seconds   |
+==============================================
```

current = current cgroups implementation
new = new cgroups implementation

**workload**

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: c-ray
  annotations:
    io.kubernetes.cri.untrusted-workload: "true"
spec:
  restartPolicy: Never
  containers:
  - name: c-ray-1
    image: docker.io/devimc/c-ray:latest
    imagePullPolicy: IfNotPresent
    args: ["-t", "6", "-s", "1600x1200", "-r", "8", "-i",
          "/c-ray-1.1/sphfract", "-o", "/tmp/output.ppm"]
    resources:
      limits:
        cpu: 6
  - name: c-ray-2
    image: docker.io/devimc/c-ray:latest
    imagePullPolicy: IfNotPresent
    args: ["-t", "1", "-s", "1600x1200", "-r", "8", "-i",
          "/c-ray-1.1/sphfract", "-o", "/tmp/output.ppm"]
    resources:
      limits:
        cpu: 1
```

fixes #1153

Signed-off-by: Julio Montes <julio.montes@intel.com>
2019-02-19 13:13:44 -06:00
Samuel Ortiz
fad23ea54e virtcontainers: Conversion to Stores
We convert the whole virtcontainers code to use the store package
instead of the resource_storage one. The resource_storage removal will
happen in a separate change for a more logical split.

This change is fairly big but mostly does not change the code logic.
What really changes is when we create a store for a container or a
sandbox. We now need to explictly do so instead of just assigning a
filesystem{} instance. Other than that, the logic is kept intact.

Fixes: #1099

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-02-07 00:59:29 +01:00
Samuel Ortiz
18dcd2c2f7 virtcontainers: Decouple the network API from the sandbox one
In order to fix #1059, we want to create a hypervisor package. Some of
the hypervisor implementations (qemu) depend on the network and endpoint
interfaces. We can not have a virtcontainers -> hypervisor -> network,
endpoint -> virtcontainers cyclic dependency.
So before creating the hypervisor package, we need to decouple the
network API from the virtcontainers one.

Fixes: #1180

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-25 15:25:49 +01:00
Samuel Ortiz
b39cb1d13a virtcontainers: Remove the network interface
There's only one real implementer of the network interface and no real
need to implement anything else. We can just go ahead and remove this
abstraction.

Fixes: #1179

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-25 15:25:46 +01:00
Peng Tao
d314e2d0b7 agent: clean up share path created by the agent
The agent code creates a directory at
`/run/kata-containers/shared/sandboxes/sbid/` to hold shared data
between host and guest. We need to clean it up when removing a sandbox.

Fixes: #1138

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2019-01-21 14:10:59 +08:00
Samuel Ortiz
67e696bf62 virtcontainers: Add Asset to the types package
In order to move the hypervisor implementations into their own package,
we need to put the asset type into the types package and break the
hypervisor->asset->virtcontainers->hypervisor cyclic dependency.

Fixes: #1119

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-14 20:30:06 +01:00
Samuel Ortiz
cf22f402d8 virtcontainers: Remove the hypervisor waitSandbox method
We always call waitSandbox after we start the VM (startSandbox), so
let's simplify the hypervisor interface and integrate waiting for the VM
into startSandbox.
This makes startSandbox a blocking call, but that is practically the
case today.

Fixes: #1009

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-08 19:38:33 +01:00
Samuel Ortiz
763bf18daa virtcontainers: Remove the hypervisor init method
We always combine the hypervisor init and createSandbox, because what
we're trying to do is simply that: Set the hypervisor and have it create
a sandbox.

Instead of keeping a method with vague semantics, remove init and
integrate the actual hypervisor setup phase into the createSandbox one.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-08 19:37:20 +01:00
Samuel Ortiz
b05dbe3886 runtime: Convert to the new internal types package
We can now remove all the sandbox shared types and convert the rest of
the code to using the new internal types package.

This commit includes virtcontainers, cli and containerd-shim changes in
one atomic change in order to not break bisect'ibility.

Fixes: #1095

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-08 14:43:33 +01:00
Samuel Ortiz
3ab7d077d1 virtcontainers: Alias for pkg/types
Since we're going to have both external and internal types packages, we
alias the external one as vcTypes. And the internal one will be usable
through the types namespace.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-08 14:24:06 +01:00
James O. D. Hunt
36c267a1d2 Merge pull request #1085 from bergwolf/containerd
cli: allow to kill a stopped container and sandbox
2019-01-08 08:44:10 +00:00
James O. D. Hunt
38c9cd2b85 Merge pull request #689 from nitkon/seccomp
virtcontainers: Pass seccomp profile inside VM
2019-01-08 08:42:07 +00:00
Nitesh Konkar
c2c9c844e2 virtcontainers: Conditionally pass seccomp profile
Pass Seccomp profile to the agent only if
the configuration.toml allows it to be passed
and the agent/image is seccomp capable.

Fixes: #688

Signed-off-by: Nitesh Konkar niteshkonkar@in.ibm.com
2019-01-08 10:22:23 +05:30
Peng Tao
bf2813fee8 cli: allow to kill a stopped container and sandbox
cri containerd calls kill on stopped sandbox and if we
fail the call, it can cause `cri stopp` command to fail
too.

Fixes: #1084

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2019-01-08 11:19:25 +08:00
Samuel Ortiz
09168ccda7 virtcontainers: Call stopVM() from sandbox.Stop()
Now that stopVM() also calls agent.stopSandbox(), we can have the
sandbox Stop() call using stopVM() directly and avoid code duplication.

Fixes: #1011

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-07 09:56:58 -08:00