VM templates creates a symlink from `/run/vc/vm/sbid` to
`/run/vc/vm/vmid`. We need to clean up both of them.
Signed-off-by: Peng Tao <bergwolf@gmail.com>
Bridge is representing a PCI/E bridge, so we're moving the bridge*.go
to types/pci*.go.
Fixes: #1119
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to move the hypervisor implementations into their own package,
we need to put the capabilities type into the types package.
Fixes: #1119
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The error message does not provide the max memory that is exceeded.
Fix it for better error information.
Fixes: #1120
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
We always call waitSandbox after we start the VM (startSandbox), so
let's simplify the hypervisor interface and integrate waiting for the VM
into startSandbox.
This makes startSandbox a blocking call, but that is practically the
case today.
Fixes: #1009
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We always combine the hypervisor init and createSandbox, because what
we're trying to do is simply that: Set the hypervisor and have it create
a sandbox.
Instead of keeping a method with vague semantics, remove init and
integrate the actual hypervisor setup phase into the createSandbox one.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We can now remove all the sandbox shared types and convert the rest of
the code to using the new internal types package.
This commit includes virtcontainers, cli and containerd-shim changes in
one atomic change in order to not break bisect'ibility.
Fixes: #1095
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This value will be plused to max memory of hypervisor.
It is the memory address space for the NVDIMM devie.
If set block storage driver (block_device_driver) to "nvdimm",
should set memory_offset to the size of block device.
Signed-off-by: Hui Zhu <teawater@hyper.sh>
Set block_device_driver to "nvdimm" will make the hypervisor use
the block device as NVDIMM disk.
Fixes: #1032
Signed-off-by: Hui Zhu <teawater@hyper.sh>
This includes cleaning up the sandbox on disk resources,
and closing open fds when preparing the hypervisor.
Fixes: #1057
Signed-off-by: Peng Tao <bergwolf@gmail.com>
Start adding support for virtio-mmio devices starting with block.
The devices show within the vm as vda, vdb,... based on order of
insertion and such within the VM resemble virtio-blk devices.
They need to be explicitly differentiated to ensure that the
agent logic within the VM can discover and mount them appropropriately.
The agent uses PCI location to discover them for virtio-blk.
For virtio-mmio we need to use the predicted device name for now.
Note: Kata used a disk for the VM rootfs in the case of Firecracker.
(Instead of initrd or virtual-nvdimm). The Kata code today does not
handle this case properly.
For now as Firecracker is the only Hypervisor in Kata that
uses virtio-mmio directly offset the drive index to comprehend
this.
Longer term we should track if the rootfs is setup as a block
device explicitly.
Fixes: #1046
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Manohar Castelino <manohar.r.castelino@intel.com>
Add firecracker as a supported hypervisor. This connects the
newly defined firecracker implementation as a supported
hypervisor.
Move operation definition to the common hypervisor code.
Signed-off-by: Manohar Castelino <manohar.r.castelino@intel.com>
- Container only is responsable of namespaces and cgroups
inside the VM.
- Sandbox will manage VM resources.
The resouces has to be re-calculated and updated:
- Create new Container: If a new container is created the cpus and memory
may be updated.
- Container update: The update call will change the cgroups of a container.
the sandbox would need to resize the cpus and VM depending the update.
To manage the resources from sandbox the hypervisor interaface adds two methods.
- resizeMemory().
This function will be used by the sandbox to request
increase or decrease the VM memory.
- resizeCPUs()
vcpus are requested to the hypervisor based
on the sum of all the containers in the sandbox.
The CPUs calculations use the container cgroup information all the time.
This should allow do better calculations.
For example.
2 containers in a pod.
container 1 cpus = .5
container 2 cpus = .5
Now:
Sandbox requested vcpus 1
Before:
Sandbox requested vcpus 2
When a update request is done only some atributes have
information. If cpu and quota are nil or 0 we dont update them.
If we would updated them the sandbox calculations would remove already
removed vcpus.
This commit also moves the sandbox resource update call at container.update()
just before the container cgroups information is updated.
Fixes: #833
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
The PR adds the support for s390x.
In the case of CCW devices, the vhost-user devices are not supported.
See #659. An error message is thrown if they tried to be used.
Memory hotplug is not supported on s390 yet and an error message is thrown.
The VirtioNetPCI has been changed to VirtioNet. The generalization
allows to set the VirtioNet to the correct CCW device for s390x.
Fixes: #666
Co-authored-by: Yash D Jain ydjainopensource@gmail.com
Signed-off-by: Alice Frosi <afrosi@de.ibm.com>
Add block_device_cache_set, block_device_cache_direct and
block_device_cache_noflush.
They are cache-related options for block devices that are described in
https://github.com/qemu/qemu/blob/master/qapi/block-core.json.
block_device_cache_direct denotes whether use of O_DIRECT (bypass the host
page cache) is enabled. block_device_cache_noflush denotes whether flush
requests for the device are ignored.
The json said they are supported since 2.9.
So add block_device_cache_set to control the cache options set to block
devices or not. It will help to support the old version qemu.
Fixes: #956
Signed-off-by: Hui Zhu <teawater@hyper.sh>
Shortlog:
f9b31c0 qemu: Allow disable-modern option from QMP
d617307 Run tests for the s390x build
b36b5a8 Contributors: Add Clare Chen to CONTRIBUTORS.md
b41939c Contributors: Add my name
dab4cf1 qmp: Add tests
5ea6da1 Verify govmm builds on s390x
ee75813 contributors: add my name
c80fc3b qemu: Add s390x support
ca477a1 Update source file headers
e68e005 Update the CONTRIBUTING.md
2b7db54 Add the CONTRIBUTORS.md file
b3b765c qemu: test Valid for Vsock for Context ID
3becff5 qemu: change of ContextID from uint32 to uint64
f30fd13 qmp: Output error detail when execute QMP command failed
7da6a4c qmp: fix mem-path properties for hotplug memory.
e4892e3 qemu/qmp: preparation for s390x support
110d2fa qemu/qmp: add new function ExecuteBlockdevAddWithCache
a0b0c86 qmp_test: Change QMP version from 2.6 to 2.9
10c36a1 qemu: add support for pidfile option
Fixes#983
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This PR defines a new function supportGuestMemoryHotplug that
clearly defines if the architecture supports memory hotplug. The function
can be reimplemented in virtcontainers/qemu_$arch.go file for each
architecture.
Fixes: #910
Signed-off-by: Alice Frosi <afrosi@de.ibm.com>
Fixes#344
Add host cgroup support for kata.
This commits only adds cpu.cfs_period and cpu.cfs_quota support.
It will create 3-level hierarchy, take "cpu" cgroup as an example:
```
/sys/fs/cgroup
|---cpu
|---kata
|---<sandbox-id>
|--vcpu
|---<sandbox-id>
```
* `vc` cgroup is common parent for all kata-container sandbox, it won't be removed
after sandbox removed. This cgroup has no limitation.
* `<sandbox-id>` cgroup is the layer for each sandbox, it contains all other qemu
threads except for vcpu threads. In future, we can consider putting all shim
processes and proxy process here. This cgroup has no limitation yet.
* `vcpu` cgroup contains vcpu threads from qemu. Currently cpu quota and period
constraint applies to this cgroup.
Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
Signed-off-by: Jingxiao Lu <lujingxiao@huawei.com>
Refactor these functions so differernt types of endpoints can use a unified
function to hotplug nics.
Fixes#731
Signed-off-by: Ruidong Cao <caoruidong@huawei.com>
As we try to make sure we don't pull unneeded dependency when using
QEMU or NEMU as the hypervisor, and because SeaBIOS and OVMF firmware
already handle what's done by the default efi-virtio.rom binary, this
commit gets rid of this dependency by providing a default empty one.
Fixes#812
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
As this really represents a veth pair rather than a generic
virtual interface, rename VirtualEndpoint to VethEndpoint.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Then we can remove the arbitrary sleep waiting for migration
completion when creating a tempalte vm.
Fixes: #728
Signed-off-by: Peng Tao <bergwolf@gmail.com>
The virt machine type provided by the NEMU project needs to be
supported the same way we support pc and q35 machine types.
First, this patch takes care of adding the hotpluggable block device
capability to this machine type, this way when using devicemapper, we
prevent the code from falling back on using 9pfs instead of SCSI.
It also add one or several bridges to this machine type, as the code
is tightly coupled to the fact that a bridge is required for PCI
hotplug.
At last, it changes the name of the PCI host bridge (main bus), to
use "pcie.0". The default set up from pc machine type "pci.0" is not
suitable for this machine type.
Fixes#804
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
If the memory is reduced , its cgroup in the VM was updated properly. But the
runtime assumed that the memory was also removed from the VM.
Then when it is added more memory again, more is added (but not needed).
Fixes: #801
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Add configuration to decide the amount of slots that will be used in a VM
- This will limit the amount of times that memory can be hotplugged.
- Use memory slots provided by user.
- tests: aling struct
cli: kata-env: Add memory slots info.
- Show the slots to be added to the VM.
```diff
[Hypervisor]
MachineType = "pc"
Version = "QEMU ..."
Path = "/opt/kata/bin/qemu-system-x86_64"
BlockDeviceDriver = "virtio-scsi"
Msize9p = 8192
+ MemorySlots = 10
Debug = false
UseVSock = false
```
Fixes: #751
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Enable GPU device support in kata runtime, including GVT-g and GVT-d.
GVT-g: graphic virtualization technology with mediated pass through
GVT-d: graphic virtualization technology with direct pass through
BDF of device eg "0000:00:1c.0" is used to distinguish GPU device in GVT-d,
while sysfsdev of device eg "f79944e4-5a3d-11e8-99ce-479cbab002e4" is used
in GVT-g.
Fixes#542
Signed-off-by: Zhao Xinda <xinda.zhao@intel.com>
Add support for using update command to hotplug memory to vm.
Connect kata-runtime update interface with hypervisor memory hotplug
feature.
Fixes#625
Signed-off-by: Clare Chen <clare.chenhui@huawei.com>
Because the network monitor will be listening to every event received
through the netlink socket, it will be notified everytime a new link
will be added/updated/modified in the network namespace it's running
into. The goal being to detect new interface added by Docker such as
a veth pair.
The problem is that kata-runtime will add other internal interfaces
when the network monitor will ask for the addition of the new veth
pair. And we need a way to ignore those new interfaces being created
as they relate to the veth pair that is being added. That's why, in
order to prevent from running into an infinite loop, virtcontainers
needs to tag the internal interfaces with the "kata" suffix so that
the network monitor will be able to ignore them.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add `disable_vhost_net` option to enable or disable the use of
vhost_net. Vhost_net can improve network performance.
Signed-off-by: Ruidong Cao <caoruidong@huawei.com>
Instead of using a default queue size of 8 for macvtap fds,
use the number of CPUs on the guest as the queue size.
This is the recommended approach. This also shown better
performance results.
Fixes#680
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Kata Containers does not have provide a good entropy level,
make use of a paravirtual rng device to solve this problem.
Fixes: #445
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Now that we only use hypervisor config to set them, they
are not overridden by other configs. So drop the default prefix.
Signed-off-by: Peng Tao <bergwolf@gmail.com>
We can just use hyprvisor config to specify the memory size
of a guest. There is no need to maintain the extra place just
for memory size.
Fixes: #692
Signed-off-by: Peng Tao <bergwolf@gmail.com>
The QMP shutdown is taken care of by the sandbox release, through a
call to hypervisor.disconnect(). By shutting down the QMP at the qemu
level directly, we are creating some unrecoverable errors by trying to
close an already closed channel.
This patch simply removes the faulty code, following the same design
other hotplug functions are designed.
Fixes#627
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add additional `context.Context` parameters and `struct` fields to allow
trace spans to be created by the `virtcontainers` internal functions,
objects and sub-packages.
Note that not every function is traced; we can add more traces as
desired.
Fixes#566.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>