Commit Graph

60 Commits

Author SHA1 Message Date
Jose Carlos Venegas Munoz
bf7fd2bcd7 vc: hypervisor: qemu: Add rng device.
Kata Containers does not have provide a good entropy level,
make use of a paravirtual rng device to solve this problem.

Fixes: #445

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2018-09-10 17:11:48 -05:00
Ruidong
26f912ef86 virtcontainers: Make qdisc of hotplug nics mq
In order to avoid performance drop caused by qdisc. And align with
cold plug codes.

Fixes #650

Signed-off-by: Ruidong Cao <caoruidong@huawei.com>
2018-08-31 22:07:12 +08:00
Julio Montes
cc29b8d4b6 Merge pull request #607 from amshinde/pass-disk-as-shared
Pass qemu --share-rw option for hotplugging disks
2018-08-24 13:09:16 -05:00
Sebastien Boeuf
a1787da97c virtcontainers: qemu: Don't shutdown QMP from hotplug
The QMP shutdown is taken care of by the sandbox release, through a
call to hypervisor.disconnect(). By shutting down the QMP at the qemu
level directly, we are creating some unrecoverable errors by trying to
close an already closed channel.

This patch simply removes the faulty code, following the same design
other hotplug functions are designed.

Fixes #627

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-08-23 15:54:02 -07:00
James O. D. Hunt
d0679a6fd1 tracing: Add tracing support to virtcontainers
Add additional `context.Context` parameters and `struct` fields to allow
trace spans to be created by the `virtcontainers` internal functions,
objects and sub-packages.

Note that not every function is traced; we can add more traces as
desired.

Fixes #566.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-08-22 08:24:58 +01:00
Julio Montes
d6a773c90c Merge pull request #595 from amshinde/use-main-bus-for-hotplug
vfio: Add configuration to support VFIO hotplug on root bus
2018-08-21 11:09:49 -05:00
Archana Shinde
31e2925a9a vfio: Add configuration to support VFIO hotplug on root bus
We need this configuration due to a limitation in seabios
firmware in handling hotplug for PCI devices with large BARS.
Long term, this needs to be fixed in the firmware.

Fixes #594

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2018-08-20 11:36:21 -07:00
Archana Shinde
70edc56fc1 disk: Pass the --share-rw option for hotplugging disks
With qemu 2.10, a write lock was added for qcow images that
prevents the same image to be passed more than once.
This can be over-ridden using the --share-rw option which is
desired for raw images.

This solves an issue with running Kata with devicemapper
using the privileged mode as in this case all devices on the host
are passed to the container including the block device associated
with the rootfs, causing it to be passed twice to qemu.

Fixes #606

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2018-08-17 15:08:33 -07:00
Ruidong Cao
1a17200cc8 virtcontainers: add sandbox hotplug network API
Add sandbox hotplug network API to meet design

Signed-off-by: Ruidong Cao <caoruidong@huawei.com>
2018-08-16 16:10:10 +08:00
Peng Tao
bd5076101c qemu: create vm directory before launching qemu
Right now we create it in `createsandbox` and it would
create the vm dir unnecessarily for fetchsandbox() and
it ends up leaving an empty vm dir behind even after
DeleteSandbox.

Fixes: #547

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-08-03 16:40:02 +08:00
Peng Tao
568b65c275 qemu: remove redundant code
It looks to be left over due to merge conflicts.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-08-03 16:28:56 +08:00
Sebastien Boeuf
16600efc1d Merge pull request #531 from WeiZhang555/bugfix
re-add: refactor device manager
2018-08-02 07:32:02 -07:00
Fupan Li
15860185d9 virtcontainers: fix the issue of cleanup the vm's path
To use the filepath.Join() instead of the simple
string append method to form the file path, otherwise
it will lose the "/" between the two parts.

Fixes #543.

Signed-off-by: Fupan Li <lifupan@gmail.com>
2018-08-02 16:21:55 +08:00
James O. D. Hunt
fc0142ec8e Merge pull request #527 from jodh-intel/remove-initcall-debug-kernel-option
kernel: Remove initcall_debug boot option
2018-08-01 12:50:52 +01:00
James O. D. Hunt
a8f5e2becf kernel: Remove initcall_debug boot option
Remove the `initcall_debug` boot option from the kernel command-line as
we don't need it any more and it generates a ton of boot messages that
may well be impacting performance.

Fixes #526.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-08-01 09:52:13 +01:00
James O. D. Hunt
487f9efa57 Merge pull request #536 from bergwolf/qmp_clear
qemu: clear qmp state before wait for qemu process
2018-08-01 09:51:43 +01:00
Peng Tao
b200163de9 kata_agent: send sandbox id in CreateSandbox request
And do not append sandbox id to kernel arguments since that
would fail qemu args comparison in vm factory.

Fixes: #523

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-08-01 11:18:44 +08:00
Julio Montes
052769196d virtcontainers: implement function to cold plug vsocks
`appendVSockPCI` function can be used to cold plug vocks, vhost file descriptor
holds the context ID and it's inherit by QEMU process, ID must be unique and
disable-modern prevents qemu from relying on fast MMIO.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2018-07-31 13:52:44 -05:00
Peng Tao
44a3a441aa qemu: wait on disconnected channel in qmp shutdown
That is how govmm ensures us that the qmp channel has been cleaned
up entirely.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-07-31 18:34:37 +08:00
Peng Tao
c8b4fabc37 qemu: clear qmp state before wait for qemu process
So that if there is any remaining state, we do not let it interfere
with the new one. This should fix the occasional vm factory hang.

Fixes: #535

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-07-31 11:48:40 +08:00
Zhang Wei
44c37bf774 devices: rename VFIODrive to VFIODev
Rename VFIODrive to VFIODev, also rename device interface "GetDeviceDrive()" to
"GetDeviceInfo()".

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2018-07-31 10:05:56 +08:00
Wei Zhang
5db5f42b71 devices: remove interface VhostUserDevice
The interface "VhostUserDevice" has duplicate functions and fields with
Device, so we can merge them into one interface and manage them with one
group of interfaces.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2018-07-31 09:59:29 +08:00
Wei Zhang
1194154309 devices: use device manager to manage all devices
Fixes #50

Previously the devices are created with device manager and laterly
attached to hypervisor with "device.Attach()", this could work, but
there's no way to remember the reference count for every device, which
means if we plug one device to hypervisor twice, it's truly inserted
twice, but actually we only need to insert once but use it in many
places.

Use device manager as a consolidated entrypoint of device management can
give us a way to handle many "references" to single device, because it
can save all devices and remember it's use count.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2018-07-31 09:59:29 +08:00
Sebastien Boeuf
927487c142 revert: "virtcontainers: support pre-add storage for frakti"
This PR got merged while it had some issues with some shim processes
being left behind after k8s testing. And because those issues were
real issues introduced by this PR (not some random failures), now
the master branch is broken and new pull requests cannot get the
CI passing. That's the reason why this commit revert the changes
introduced by this PR so that we can fix the master branch.

Fixes #529

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-07-27 09:39:56 -07:00
Zhang Wei
04f4f528f7 devices: rename VFIODrive to VFIODev
Rename VFIODrive to VFIODev, also rename device interface "GetDeviceDrive()" to
"GetDeviceInfo()".

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2018-07-26 14:15:52 +08:00
Wei Zhang
b54df7e127 devices: remove interface VhostUserDevice
The interface "VhostUserDevice" has duplicate functions and fields with
Device, so we can merge them into one interface and manage them with one
group of interfaces.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2018-07-26 11:33:28 +08:00
Wei Zhang
2885eb0532 devices: use device manager to manage all devices
Fixes #50

Previously the devices are created with device manager and laterly
attached to hypervisor with "device.Attach()", this could work, but
there's no way to remember the reference count for every device, which
means if we plug one device to hypervisor twice, it's truly inserted
twice, but actually we only need to insert once but use it in many
places.

Use device manager as a consolidated entrypoint of device management can
give us a way to handle many "references" to single device, because it
can save all devices and remember it's use count.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2018-07-26 11:33:28 +08:00
Peng Tao
7a6f205970 virtcontainers: keep qmp connection when possible
For each time a sandbox structure is created, we ensure s.Release()
is called. Then we can keep the qmp connection as long as Sandbox
pointer is alive.

All VC interfaces are still stateless as s.Release() is called before
each API returns.

OTOH, for VCSandbox APIs, FetchSandbox() must be paired with s.Release,
the same as before.

Fixes: #500

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-07-23 08:37:55 +08:00
Peng Tao
c9bd12aa19 qemu: cleanup qmp channel setup and teardown
Unify qmp channel setup and teardown. This also fixes the issue that
sometimes qmp pointer is not reset after qmp is shutdown.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-07-23 08:36:58 +08:00
Peng Tao
28b6104710 qemu: prepare for vm templating support
1. support qemu migration save operation
2. setup vm templating parameters per hypervisor config
3. create vm storage path when it does not exist. This can happen when
an empty guest is created without a sandbox.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-07-19 12:44:58 +08:00
Peng Tao
7f20dd89a3 hypervisor: cleanup valid method
The boolean return value is not necessary.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-07-19 10:49:25 +08:00
Peng Tao
18e6a6effc hypervisor: decouple hypervisor from sandbox
A hypervisor implementation does not need to depend on a sandbox
structure. Decouple them in preparation for vm factory.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-07-19 10:49:25 +08:00
Peng Tao
4ac675453f qemu: remove append9PVolumes
It is not used and we actully cannot append multiple 9pfs volumes to
a guest.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-07-19 10:49:25 +08:00
Sebastien Boeuf
cd842afca4 Merge pull request #417 from nitkon/maxmem
virtcontainers: Set ppc64le maxmem depending on qemu version
2018-07-09 12:07:12 -07:00
Peng Tao
66a3e812f2 hypervisor/qemu: add memory hotplug support
So that we can add more memory to an existing guest.

Fixes: #469

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-07-09 15:29:50 +08:00
James O. D. Hunt
793a22083c qemu: Pass sandboxID to agent for logging purposes
Add a kernel command-line option that the agent can read to determine
the sandbox ID of the VM. It can use this to create a `sandbox=` log
field for improved log analysis.

Fixes #465.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-07-04 13:50:06 +01:00
Peng Tao
8f329dbf48 qemu: clean up qmp channel
We only need one qmp channel and it is qemu internal detail thus
sandbox.go does not need to be aware of it.

Fixes: #428

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-06-20 17:58:54 +08:00
Nitesh Konkar
d0bccabbe1 virtcontainers: Set ppc64le maxmem depending on qemu version
The "Failed to allocate HTAB of requested size,
try with smaller maxmem" error in ppc64le occurs
when maxmem allocated is very high. This got fixed
in qemu 2.10 and kernel 4.11. Hence put a maxmem
restriction of 32GB per kata-container if qemu
version less than 2.10

Fixes: #415

Signed-off-by: Nitesh Konkar <niteshkonkar@in.ibm.com>
2018-06-19 19:48:18 +05:30
Nitesh Konkar
baa553da07 virtcontainers: Get qemu suppport for ppc64le
Fixes #302

Signed-off-by: Nitesh Konkar niteshkonkar@in.ibm.com
2018-05-31 18:40:43 +05:30
Nitesh Konkar
4276c0c38e virtcontainers/cli: refactor code
Fixes #302

Signed-off-by: Nitesh Konkar niteshkonkar@in.ibm.com
2018-05-31 17:58:35 +05:30
Julio Montes
4527a8066a virtcontainers/qemu: honour CPU constrains
Don't fail if a new container with a CPU constraint was added to
a POD and no more vCPUs are available, instead apply the constraint
and let kernel balance the resources.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2018-05-14 17:33:31 -05:00
Julio Montes
07db945b09 virtcontainers/qemu: reduce memory footprint
There is a relation between the maximum number of vCPUs and the
memory footprint, if QEMU maxcpus option and kernel nr_cpus
cmdline argument are big, then memory footprint is big, this
issue only occurs if CPU hotplug support is enabled in the kernel,
might be because of kernel needs to allocate resources to watch all
sockets waiting for a CPU to be connected (ACPI event).

For example

```
+---------------+-------------------------+
|               | Memory Footprint (KB)   |
+---------------+-------------------------+
| NR_CPUS=240   | 186501                  |
+---------------+-------------------------+
| NR_CPUS=8     | 110684                  |
+---------------+-------------------------+
```

In order to do not affect CPU hotplug and allow to users to have containers
with the same number of physical CPUs, this patch tries to mitigate the
big memory footprint by using the actual number of physical CPUs as the
maximum number of vCPUs for each container if `default_maxvcpus` is <= 0 in
the runtime configuration file,  otherwise `default_maxvcpus` is used as the
maximum number of vCPUs.

Before this patch a container with 256MB of RAM

```
              total        used        free      shared  buff/cache   available
Mem:           195M         40M        113M         26M         41M        112M
Swap:            0B          0B          0B
```

With this patch

```
              total        used        free      shared  buff/cache   available
Mem:           236M         11M        188M         26M         36M        186M
Swap:            0B          0B          0B
```

fixes #295

Signed-off-by: Julio Montes <julio.montes@intel.com>
2018-05-14 17:33:31 -05:00
James O. D. Hunt
bce9edd277 socket: Enforce socket length
A Unix domain socket is limited to 107 usable bytes on Linux. However,
not all code creating socket paths was checking for this limits.

Created a new `utils.BuildSocketPath()` function (with tests) to
encapsulate the logic and updated all code creating sockets to use it.

Fixes #268.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-05-09 11:36:24 +01:00
Zhang Wei
366558ad5b virtcontainers: refactor device.go to device manager
Fixes #50

This is done for decoupling device management part from other parts.
It seperate device.go to several dirs and files:

```
virtcontainers/device
├── api
│   └── interface.go
├── config
│   └── config.go
├── drivers
│   ├── block.go
│   ├── generic.go
│   ├── utils.go
│   ├── vfio.go
│   ├── vhost_user_blk.go
│   ├── vhost_user.go
│   ├── vhost_user_net.go
│   └── vhost_user_scsi.go
└── manager
    ├── manager.go
    └── utils.go
```

* `api` contains interface definition of device management, so upper level caller
should import and use the interface, and lower level should implement the interface.
it's bridge to device drivers and callers.
* `config` contains structed exported data.
* `drivers` contains specific device drivers including block, vfio and vhost user
devices.
* `manager` exposes an external management package with a `DeviceManager`.

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2018-05-08 10:24:26 +08:00
Archana Shinde
718dbd2a71 device: Assign pci address for block devices
Introduce a new field in Drive to store the PCI address if the drive is
attached using virtio-blk.
Assign PCI address in the format bridge-addr/device-addr.
Since we need to assign the address while hotplugging, pass Drive
by address.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2018-05-03 10:59:09 -07:00
Archana Shinde
dd927921c1 qemu: Return bridge itself with addDeviceToBridge instead of bridge bus
Change the function to return the bridge itself that the
device is attached to. This will allow bridge address to be used
for determining the PCI slot of the device within the guest.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2018-05-03 10:59:08 -07:00
Archana Shinde
05c4ea39d0 qemu: Pass the pci/e address for qemu bridge
Pass the slot address while attaching bridges. This is needed
to determine the pci/e address of devices that are attached
to the bridge.

Fixes #210

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2018-04-19 10:42:19 -07:00
Graham whaley
d6c3ec864b license: SPDX: update all vc files to use SPDX style
When imported, the vc files carried in the 'full style' apache
license text, but the standard for kata is to use SPDX style.
Update the relevant files to SPDX.

Fixes: #227

Signed-off-by: Graham whaley <graham.whaley@intel.com>
2018-04-18 13:43:15 +01:00
Peng Tao
6107694930 runtime: rename pod to sandbox
As agreed in [the kata containers API
design](https://github.com/kata-containers/documentation/blob/master/design/kata-api-design.md),
we need to rename pod notion to sandbox. The patch is a bit big but the
actual change is done through the script:
```
sed -i -e 's/pod/sandbox/g' -e 's/Pod/Sandbox/g' -e 's/POD/SB/g'
```

The only expections are `pod_sandbox` and `pod_container` annotations,
since we already pushed them to cri shims, we have to use them unchanged.

Fixes: #199

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-04-13 09:32:51 +08:00
Archana Shinde
82e42b5dc5 qemu: iothreads: Add iothread support for scsi
Add a hypervisor configuration to specify if IO should
be handled in a separate thread. Add support for iothreads for
virtio-scsi for now. Since we attach all scsi drives to the
same scsi controller, all the drives will be handled in a separate
IO thread which would still give better performance.

Going forward we need to assess if adding more controllers and
attaching iothreasds to each of them with distributing drives
among teh scsi controllers should be done, based on more performance
analysis.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2018-03-30 17:52:20 -07:00