Commit Graph

537 Commits

Author SHA1 Message Date
Nitesh Konkar
3b20aebd5b ppc64le: Restrict maxmem to avoid HTAB allocation failure
Fixes: #363

Signed-off-by: Nitesh Konkar <niteshkonkar@in.ibm.com>
2018-06-01 20:41:27 +05:30
Nitesh Konkar
2796b19668 virtcontainers: Remove unnecessary kernel parameters for ppc64le
Fixes: #360

Signed-off-by: Nitesh Konkar <niteshkonkar@in.ibm.com>
2018-06-01 19:07:34 +05:30
James O. D. Hunt
6e161a248e arch/arm64: Fix ARM64 build
Fix ARM64 build which silently broken (as we still don't have an ARM CI).

Fixes #349.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-06-01 14:04:26 +01:00
James O. D. Hunt
2400978f6a Merge pull request #286 from nitkon/master
Enable Kata container on ppc64le arch
2018-06-01 09:58:37 +01:00
y00316549
9a0434d6bf virtcontainers: make kataAgent/createContainer can decode old specs.Spec
in old specs.Spec, Capabilities is [] string, but we don't use CompatOCISpec
for compatibility in kataAgent/createContainer.

fixes #333

Signed-off-by: y00316549 <yangshukui@huawei.com>
2018-06-01 14:48:43 +08:00
Julio Montes
b99cadb553 virtcontainers: add pause and resume container to the API
Pause and resume container functions allow us to just pause/resume a
specific container not all the sanbox, in that way different containers
can be paused or running in the same sanbox.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2018-05-31 09:38:13 -05:00
Nitesh Konkar
e14eab084e runtime: Add testcases for ppc64le and arm64
Fixes #302

Signed-off-by: Nitesh Konkar niteshkonkar@in.ibm.com
2018-05-31 18:53:37 +05:30
Nitesh Konkar
baa553da07 virtcontainers: Get qemu suppport for ppc64le
Fixes #302

Signed-off-by: Nitesh Konkar niteshkonkar@in.ibm.com
2018-05-31 18:40:43 +05:30
Nitesh Konkar
4276c0c38e virtcontainers/cli: refactor code
Fixes #302

Signed-off-by: Nitesh Konkar niteshkonkar@in.ibm.com
2018-05-31 17:58:35 +05:30
Archana Shinde
44b65e1d52 Merge pull request #353 from devimc/virtcontainers/updateUseRWLock
virtcontainers/api: use RW lock to update containers
2018-05-30 15:37:13 -07:00
Julio Montes
7d435b84f0 virtcontainers/api: use RW lock to update containers
When a container is updated, those modifications are stored, to
avoid race conditions with other operations, a RW lock should be used.

fixes #346

Signed-off-by: Julio Montes <julio.montes@intel.com>
2018-05-30 16:04:11 -05:00
Archana Shinde
704d713571 test: Fix tests to include pause/resume api changes
Since the vendoring included changes introducing PauseContainer
and ResumeContainer changes, fix the tests to satisfy the grpc api.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2018-05-30 13:34:24 -07:00
Archana Shinde
d885782df1 namespace: Check if pid namespaces need to be shared
k8s provides a configuration for sharing PID namespace
among containers. In case of crio and cri plugin, an infra
container is started first. All following containers are
supposed to share the pid namespace of this container.

In case a non-empty pid namespace path is provided for a container,
we check for the above condition while creating a container
and pass this out to the kata agent in the CreatContainer
request as SandboxPidNs flag. We clear out the PID namespaces
in the configuration passed to the kata agent.

Fixes #343

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2018-05-30 13:34:24 -07:00
c00416947
7abb8fe326 virtcontainers: fix codes misunderstanding in virtcontainers
Still there are some codes left which
will cause some misunderstanding

Change `p` in short of `pod` into `s` or `sandbox`

Fixes: #325

Signed-off-by: Haomin <caihaomin@huawei.com>
2018-05-21 11:11:27 +08:00
Peng Tao
be82c7fc6f Merge pull request #299 from jshachm/implement-events-command
cli :Implement events command
2018-05-18 15:35:52 +08:00
c00416947
1205e347f2 cli: implement events command
Events cli display container events such as cpu,
memory, and IO usage statistics.

By now OOM notifications and intel RDT are not fully supproted.

Fixes: #186

Signed-off-by: Haomin <caihaomin@huawei.com>
2018-05-18 09:17:49 +08:00
Julio Montes
4527a8066a virtcontainers/qemu: honour CPU constrains
Don't fail if a new container with a CPU constraint was added to
a POD and no more vCPUs are available, instead apply the constraint
and let kernel balance the resources.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2018-05-14 17:33:31 -05:00
Julio Montes
07db945b09 virtcontainers/qemu: reduce memory footprint
There is a relation between the maximum number of vCPUs and the
memory footprint, if QEMU maxcpus option and kernel nr_cpus
cmdline argument are big, then memory footprint is big, this
issue only occurs if CPU hotplug support is enabled in the kernel,
might be because of kernel needs to allocate resources to watch all
sockets waiting for a CPU to be connected (ACPI event).

For example

```
+---------------+-------------------------+
|               | Memory Footprint (KB)   |
+---------------+-------------------------+
| NR_CPUS=240   | 186501                  |
+---------------+-------------------------+
| NR_CPUS=8     | 110684                  |
+---------------+-------------------------+
```

In order to do not affect CPU hotplug and allow to users to have containers
with the same number of physical CPUs, this patch tries to mitigate the
big memory footprint by using the actual number of physical CPUs as the
maximum number of vCPUs for each container if `default_maxvcpus` is <= 0 in
the runtime configuration file,  otherwise `default_maxvcpus` is used as the
maximum number of vCPUs.

Before this patch a container with 256MB of RAM

```
              total        used        free      shared  buff/cache   available
Mem:           195M         40M        113M         26M         41M        112M
Swap:            0B          0B          0B
```

With this patch

```
              total        used        free      shared  buff/cache   available
Mem:           236M         11M        188M         26M         36M        186M
Swap:            0B          0B          0B
```

fixes #295

Signed-off-by: Julio Montes <julio.montes@intel.com>
2018-05-14 17:33:31 -05:00
Eric Ernst
91e9ed0898 Merge pull request #294 from jodh-intel/vc-reduce-path-lengths
virtcontainers: Reduce path lengths
2018-05-09 20:40:59 -07:00
Eric Ernst
0c489d322c Merge pull request #289 from amshinde/accept-empty-env-val
oci: Allow environment values to be empty
2018-05-09 11:45:10 -07:00
James O. D. Hunt
6a47808580 virtcontainers: Reduce path lengths
Reduce the virtcontainers prefix path to avoid hitting the 107 byte
Unix domain socket path limit.

Related #268.

Fixes #290.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-05-09 11:37:15 +01:00
James O. D. Hunt
bce9edd277 socket: Enforce socket length
A Unix domain socket is limited to 107 usable bytes on Linux. However,
not all code creating socket paths was checking for this limits.

Created a new `utils.BuildSocketPath()` function (with tests) to
encapsulate the logic and updated all code creating sockets to use it.

Fixes #268.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-05-09 11:36:24 +01:00
Archana Shinde
b7674de3cf oci: Allow environment values to be empty
An empty string for an environment variable simply means that the
variable is unset. Do not error out if the env value is empty.

Fixes #288

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2018-05-08 09:27:35 -07:00
Julio Montes
81f376920e cli: implement update command
Update command is used to update container's resources at run time.
All constraints are applied inside the VM to each container cgroup.
By now only CPU constraints are fully supported, vCPU are hot added
or removed depending of the new constraint.

fixes #189

Signed-off-by: Julio Montes <julio.montes@intel.com>
2018-05-08 07:26:38 -05:00
Zhang Wei
f4a453b86c virtcontainers: address some comments
* Move makeNameID() func to virtcontainers/utils file as it's a generic
function for making name and ID.
* Move bindDevicetoVFIO() and bindDevicetoHost() to vfio driver package.

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2018-05-08 10:24:26 +08:00
Zhang Wei
28de16a450 virtcontainers: fix typo
Fix typo.

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2018-05-08 10:24:26 +08:00
Zhang Wei
9acbcba967 virtcontainers: make CreateDevice func private
CreateDevice() is only used by `NewDevices()` so we can make it private and
there's no need to export it.

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2018-05-08 10:24:26 +08:00
Zhang Wei
366558ad5b virtcontainers: refactor device.go to device manager
Fixes #50

This is done for decoupling device management part from other parts.
It seperate device.go to several dirs and files:

```
virtcontainers/device
├── api
│   └── interface.go
├── config
│   └── config.go
├── drivers
│   ├── block.go
│   ├── generic.go
│   ├── utils.go
│   ├── vfio.go
│   ├── vhost_user_blk.go
│   ├── vhost_user.go
│   ├── vhost_user_net.go
│   └── vhost_user_scsi.go
└── manager
    ├── manager.go
    └── utils.go
```

* `api` contains interface definition of device management, so upper level caller
should import and use the interface, and lower level should implement the interface.
it's bridge to device drivers and callers.
* `config` contains structed exported data.
* `drivers` contains specific device drivers including block, vfio and vhost user
devices.
* `manager` exposes an external management package with a `DeviceManager`.

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2018-05-08 10:24:26 +08:00
Peng Tao
410e5e6abb hyperstart_agent: fix comments
As @egernst pointed out, it should be hyperstart_agent instead of
cc-agent.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-05-05 09:23:46 +08:00
Peng Tao
1bb6ab9e22 api: add sandbox iostream API
It returns stdin, stdout and stderr stream of the specified process in
the container.

Fixes: #258

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-05-04 15:38:32 +08:00
Peng Tao
bf4ef4324e API: add sandbox winsizeprocess api
It sends tty resize request to the agent to resize a process's tty
window.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-05-04 15:38:32 +08:00
Peng Tao
55dc0b2995 API: add sandbox signalprocess api
It sends the signal to a process of a container, or all processes
inside a container.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-05-04 15:38:32 +08:00
Peng Tao
45970ba796 API: add sandbox waitprocess api
It waits a process inside the container of a sandbox.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-05-04 15:38:32 +08:00
Archana Shinde
717bc4cd26 virtcontainers: Pass the PCI address for block based rootfs
Store the PCI address of rootfs in case the rootfs is block
based and passed using virtio-block.
This helps up get rid of prdicting the device name inside the
container for the block device. The agent will determine the device
node name using the PCI address.

Fixes #266

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2018-05-03 10:59:09 -07:00
Archana Shinde
da08a65de3 device: Assign pci address to block device for kata_agent
Store PCI address for a block device on hotplugging it via
virtio-blk. This address will be passed by kata agent in the
device "Id" field. The agent within the guest can then use this
to identify the PCI slot in the guest and create the device node
based on it.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2018-05-03 10:59:09 -07:00
Archana Shinde
85865f1a2c bridge: Store the bridge address to state
We need to store the bridge address to state to use it
for assigning addresses to devices attached to teh bridge.
So we need to make sure that the bridge pointer is assigned
the address.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2018-05-03 10:59:09 -07:00
Archana Shinde
718dbd2a71 device: Assign pci address for block devices
Introduce a new field in Drive to store the PCI address if the drive is
attached using virtio-blk.
Assign PCI address in the format bridge-addr/device-addr.
Since we need to assign the address while hotplugging, pass Drive
by address.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2018-05-03 10:59:09 -07:00
Archana Shinde
dd927921c1 qemu: Return bridge itself with addDeviceToBridge instead of bridge bus
Change the function to return the bridge itself that the
device is attached to. This will allow bridge address to be used
for determining the PCI slot of the device within the guest.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2018-05-03 10:59:08 -07:00
Peng Tao
9d1311d0ee kata_agent: refactor sendReq
CI complains about cyclomatic complexity in sendReq.

warning: cyclomatic complexity 16 of function (*kataAgent).sendReq() is
high (> 15) (gocyclo)

Refactor it a bit to avoid such error. I'm not a big fan of the new code
but it is done so because golang does not support generics.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-05-01 22:42:39 +08:00
Peng Tao
35ebadcedc api: add sandbox Monitor API
It monitors the sandbox status and returns an error channel to let
caller watch it.

Fixes: #251

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-05-01 22:42:33 +08:00
Peng Tao
5fb4768f83 virtcontainers: always pass sandbox as a pointer
Currently we sometimes pass it as a pointer and other times not. As
a result, the view of sandbox across virtcontainers may not be the same
and it costs extra memory copy each time we pass it by value. Fix it
by ensuring sandbox is always passed by pointers.

Fixes: #262

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-05-01 20:50:07 +08:00
Graham Whaley
f92d7dd1c1 Merge pull request #275 from sboeuf/fix_k8s_shim_killed
virtcontainers: Properly remove the container when shim gets killed
2018-04-30 16:51:34 +01:00
Sebastien Boeuf
e78941e3e5 Merge pull request #272 from amshinde/pass-bundle-in-hooks
hooks: Send the bundle path in the state that is sent with hooks
2018-04-30 07:28:27 -07:00
Sebastien Boeuf
789dbca6d6 virtcontainers: Properly remove the container when shim gets killed
Here is an interesting case I have been debugging. I was trying to
understand why a "kubeadm reset" was not working for kata-runtime
compared to runc. In this case, the only pod started with Kata is
the kube-dns pod. For some reasons, when this pod is stopped and
removed, its containers receive some signals, 2 of them being SIGTERM
signals, which seems the way to properly stop them, but the third
container receives a SIGCONT. Obviously, nothing happens in this
case, but apparently CRI-O considers this should be the end of the
container and after a few seconds, it kills the container process
(being the shim in Kata case). Because it is using a SIGKILL, the
signal does not get forwarded to the agent because the shim itself
is killed right away. After this happened, CRI-O calls into
"kata-runtime state", we detect the shim is not running anymore
and we try to stop the container. The code will eventually call
into agent.RemoveContainer(), but this will fail and return an
error because inside the agent, the container is still running.

The approach to solve this issue here is to send a SIGKILL signal
to the container after the shim has been waited for. This call does
not check for the error returned because most of the cases, regular
use cases, will end up returning an error because the shim itself
not being there actually represents the container inside the VM has
already terminated.
And in case the shim has been killed without the possibility to
forward the signal (like described in first paragraph), the SIGKILL
will work and will allow the following call to agent.stopContainer()
to proceed to the removal of the container inside the agent.

Fixes #274

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-04-27 18:36:27 -07:00
Archana Shinde
a301a9e641 hooks: Send the bundle path in the state that is sent with hooks
We currently just send the pid in the state. While OCI specifies
a few other fields as well, this commit just adds the bundle path
and the container id to the state. This should fix the errors seen
with hooks that rely on the bundle path.

Other fields like running "state" string have been left out. As this
would need sending the strings that OCI recognises. Hooks have been
implemented in virtcontainers and sending the state string would
require calling into OCI specific code in virtcontainers.

The bundle path again is OCI specific, but this can be accessed
using annotations. Hooks really need to be moved to the cli as they
are OCI specific. This however needs network hotplug to be implemented
first so that the hooks can be called from the cli after the
VM has been created.

Fixes #271

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2018-04-27 15:48:58 -07:00
Sebastien Boeuf
644489b6e7 virtcontainers: Fix gofmt issues for Go 1.10
Now that our CI has moved to Go 1.10, we need to update one file
that is not formatted as the new gofmt (1.10) expects it to be
formatted.

Fixes #249

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-04-26 11:38:15 -05:00
Sebastien Boeuf
d931d2902d Merge pull request #218 from bergwolf/sandbox_api
api: add sandbox operation APIs
2018-04-24 07:21:36 -07:00
Peng Tao
29ce01fd11 api: add sandbox EnterContainer API
And make VC EnterContainer a wrapper of it.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-04-24 15:33:51 +08:00
Peng Tao
488c3ee353 api: add sandbox Status API
It returns the status of current sandbox.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-04-24 15:33:47 +08:00
Peng Tao
b3d9683743 api: add sandbox StatusContainer API
It retrieves container status from sandbox.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-04-24 15:32:54 +08:00