kata-containers

mirror of https://github.com/aljazceru/kata-containers.git synced 2025-12-26 10:34:24 +01:00

Author	SHA1	Message	Date
Penny Zheng	7e6fcddefa	kernelRootParams: define agnostic commonkernelRootParams Let's define agnostic commonkernelRootParams for all hypervisors, including qemu, firecracker, etc. for now, it has two scenarios, one for NVDIMM, one for virtio-blk. Fixes: #1642 Signed-off-by: Penny Zheng <penny.zheng@arm.com>	2019-05-29 15:12:56 +08:00
Ganesh Maharaj Mahalingam	a41894da18	runtime: Enable file based backend A file based memory backend mapped to the host, fot eg: '/dev/shm' will be used by virtio-fs for performance reasons. This change is a generic implementation of that for kata. This will be enabled default for virtio-fs negating the need to enable hugepages in that scenario. This option can be used without virtio-fs by setting 'file_mem_backend' to the location in the configuration file. Default value is an empty string. Fixes: #1656 Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>	2019-05-23 20:47:42 -07:00
Dr. David Alan Gilbert	75f75862c2	virtiofs: Add cache option Several cache modes are supported by virtio-fs. They affect the performance and consistency characteristics of the file system. For the time being cache="none" is recommended, but the other modes can be experimented with. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-05-05 11:32:34 -06:00
Dr. David Alan Gilbert	6767c1a358	virtiofs: Add cache size option Add VirtioFSCacheSize aka virtio_fs_cache_size option to set the size (in MiB) of the DAX cache. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-05-05 11:32:34 -06:00
Stefan Hajnoczi	d690dff164	config: add virtio_fs_daemon string Add a config option for the virtio-fs vhost-user daemon path. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-05-01 10:55:31 -04:00
Stefan Hajnoczi	9e87fa21cf	config: add shared_fs option Add a config option to select between virtio-9p and virtiofs. This option currently has no effect and will be used in a later patch. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-05-01 10:55:31 -04:00
Hui Zhu	16fe8553af	qemu: Remove the storage directories if qemu get from the factory Store related in directory /var/lib/vc/sbs and /run/vc/sbs if vm template is enabled. The cause is NewVM and NewVMFromGrpc will create vcStore with VM's ID and set it as store of hypervisor if the factory is enabled. This commit record the VM's ID to HypervisorConfig.VMid and remove directories in qemu.cleanupVM to handle the issue. Fixes: #1452 Signed-off-by: Hui Zhu <teawater@hyper.sh>	2019-04-10 11:11:45 +08:00
Penny Zheng	47670fcf73	memoryDevice: reconstruct memoryDevice If kata-runtime supports memory hotplug via probe interface, we need to reconstruct memoryDevice to store relevant status, which are addr and probe. addr specifies the physical address of the memory device, and probe determines it is hotplugged via acpi-driven or probe interface. Fixes: #1149 Signed-off-by: Penny Zheng <penny.zheng@arm.com>	2019-04-04 17:03:20 +08:00
Peng Tao	6fda03ec92	hypervisor: make getThreadIDs return vcpu to threadid mapping We need such mapping information to put vcpus in container cpuset properly. Fixes: #1435 Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2019-04-02 15:51:27 +08:00
Ganesh Maharaj Mahalingam	f4428761cb	lint: Update go linter from gometalinter to golangci-lint. gometalinter is deprecated and will be archived April '19. The suggestion is to switch to golangci-lint which is apparently 5x faster than gometalinter. Partially Fixes: #1377 Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>	2019-03-25 08:48:13 -07:00
Hui Zhu	90704c8bb6	VMCache: the core and the client VMCache is a new function that creates VMs as caches before using it. It helps speed up new container creation. The function consists of a server and some clients communicating through Unix socket. The protocol is gRPC in protocols/cache/cache.proto. The VMCache server will create some VMs and cache them by factory cache. It will convert the VM to gRPC format and transport it when gets requestion from clients. Factory grpccache is the VMCache client. It will request gRPC format VM and convert it back to a VM. If VMCache function is enabled, kata-runtime will request VM from factory grpccache when it creates a new sandbox. VMCache has two options. vm_cache_number specifies the number of caches of VMCache: unspecified or == 0 --> VMCache is disabled > 0 --> will be set to the specified number vm_cache_endpoint specifies the address of the Unix socket. This commit just includes the core and the client of VMCache. Currently, VM cache still cannot work with VM templating and vsock. And just support qemu. Fixes: #52 Signed-off-by: Hui Zhu <teawater@hyper.sh>	2019-03-08 10:05:59 +08:00
Julio Montes	a1c85902f6	virtcontainers: add method to get hypervisor PID hypervisor PID can be used to move the whole process and its threads into a new cgroup. Signed-off-by: Julio Montes <julio.montes@intel.com>	2019-02-13 18:01:14 -06:00
Samuel Ortiz	fad23ea54e	virtcontainers: Conversion to Stores We convert the whole virtcontainers code to use the store package instead of the resource_storage one. The resource_storage removal will happen in a separate change for a more logical split. This change is fairly big but mostly does not change the code logic. What really changes is when we create a store for a container or a sandbox. We now need to explictly do so instead of just assigning a filesystem{} instance. Other than that, the logic is kept intact. Fixes: #1099 Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2019-02-07 00:59:29 +01:00
Samuel Ortiz	b25f43e865	virtcontainers: Add Capabilities to the types package In order to move the hypervisor implementations into their own package, we need to put the capabilities type into the types package. Fixes: #1119 Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2019-01-14 20:30:06 +01:00
Samuel Ortiz	67e696bf62	virtcontainers: Add Asset to the types package In order to move the hypervisor implementations into their own package, we need to put the asset type into the types package and break the hypervisor->asset->virtcontainers->hypervisor cyclic dependency. Fixes: #1119 Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2019-01-14 20:30:06 +01:00
Samuel Ortiz	cf22f402d8	virtcontainers: Remove the hypervisor waitSandbox method We always call waitSandbox after we start the VM (startSandbox), so let's simplify the hypervisor interface and integrate waiting for the VM into startSandbox. This makes startSandbox a blocking call, but that is practically the case today. Fixes: #1009 Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2019-01-08 19:38:33 +01:00
Samuel Ortiz	763bf18daa	virtcontainers: Remove the hypervisor init method We always combine the hypervisor init and createSandbox, because what we're trying to do is simply that: Set the hypervisor and have it create a sandbox. Instead of keeping a method with vague semantics, remove init and integrate the actual hypervisor setup phase into the createSandbox one. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2019-01-08 19:37:20 +01:00
Hui Zhu	dd28ff5986	memory: Add new option memory_offset This value will be plused to max memory of hypervisor. It is the memory address space for the NVDIMM devie. If set block storage driver (block_device_driver) to "nvdimm", should set memory_offset to the size of block device. Signed-off-by: Hui Zhu <teawater@hyper.sh>	2018-12-24 15:36:25 +08:00
Peng Tao	bf1a5ce000	sandbox: cleanup sandbox if creation failed This includes cleaning up the sandbox on disk resources, and closing open fds when preparing the hypervisor. Fixes: #1057 Signed-off-by: Peng Tao <bergwolf@gmail.com>	2018-12-21 13:46:16 +08:00
Sebastien Boeuf	e14071f2bd	Merge pull request #1045 from mcastelino/topic/firecracker-virtio-mmio Firecracker: virtio mmio support	2018-12-20 19:47:01 -08:00
Manohar Castelino	0d84d799ea	virtio-mmio: Add support for virtio-mmio Start adding support for virtio-mmio devices starting with block. The devices show within the vm as vda, vdb,... based on order of insertion and such within the VM resemble virtio-blk devices. They need to be explicitly differentiated to ensure that the agent logic within the VM can discover and mount them appropropriately. The agent uses PCI location to discover them for virtio-blk. For virtio-mmio we need to use the predicted device name for now. Note: Kata used a disk for the VM rootfs in the case of Firecracker. (Instead of initrd or virtual-nvdimm). The Kata code today does not handle this case properly. For now as Firecracker is the only Hypervisor in Kata that uses virtio-mmio directly offset the drive index to comprehend this. Longer term we should track if the rootfs is setup as a block device explicitly. Fixes: #1046 Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com> Signed-off-by: Manohar Castelino <manohar.r.castelino@intel.com>	2018-12-20 15:08:51 -08:00
Manohar Castelino	e65bafa793	virtcontainers: Add firecracker as a supported hypervisor Add firecracker as a supported hypervisor. This connects the newly defined firecracker implementation as a supported hypervisor. Move operation definition to the common hypervisor code. Signed-off-by: Manohar Castelino <manohar.r.castelino@intel.com>	2018-12-20 11:54:59 -08:00
Jose Carlos Venegas Munoz	618cfbf1db	vc: sandbox: Let sandbox manage VM resources. - Container only is responsable of namespaces and cgroups inside the VM. - Sandbox will manage VM resources. The resouces has to be re-calculated and updated: - Create new Container: If a new container is created the cpus and memory may be updated. - Container update: The update call will change the cgroups of a container. the sandbox would need to resize the cpus and VM depending the update. To manage the resources from sandbox the hypervisor interaface adds two methods. - resizeMemory(). This function will be used by the sandbox to request increase or decrease the VM memory. - resizeCPUs() vcpus are requested to the hypervisor based on the sum of all the containers in the sandbox. The CPUs calculations use the container cgroup information all the time. This should allow do better calculations. For example. 2 containers in a pod. container 1 cpus = .5 container 2 cpus = .5 Now: Sandbox requested vcpus 1 Before: Sandbox requested vcpus 2 When a update request is done only some atributes have information. If cpu and quota are nil or 0 we dont update them. If we would updated them the sandbox calculations would remove already removed vcpus. This commit also moves the sandbox resource update call at container.update() just before the container cgroups information is updated. Fixes: #833 Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>	2018-12-13 16:33:14 -06:00
Julio Montes	976f5b2a6e	Merge pull request #990 from alicefr/s390x s390x: add support for s390x	2018-12-11 10:57:27 -06:00
Alice Frosi	6f83061139	s390x: add support for s390x The PR adds the support for s390x. In the case of CCW devices, the vhost-user devices are not supported. See #659. An error message is thrown if they tried to be used. Memory hotplug is not supported on s390 yet and an error message is thrown. The VirtioNetPCI has been changed to VirtioNet. The generalization allows to set the VirtioNet to the correct CCW device for s390x. Fixes: #666 Co-authored-by: Yash D Jain ydjainopensource@gmail.com Signed-off-by: Alice Frosi <afrosi@de.ibm.com>	2018-12-11 12:32:17 +01:00
Hui Zhu	f6511471d4	block: Add cache-related options for block devices Add block_device_cache_set, block_device_cache_direct and block_device_cache_noflush. They are cache-related options for block devices that are described in https://github.com/qemu/qemu/blob/master/qapi/block-core.json. block_device_cache_direct denotes whether use of O_DIRECT (bypass the host page cache) is enabled. block_device_cache_noflush denotes whether flush requests for the device are ignored. The json said they are supported since 2.9. So add block_device_cache_set to control the cache options set to block devices or not. It will help to support the old version qemu. Fixes: #956 Signed-off-by: Hui Zhu <teawater@hyper.sh>	2018-12-06 18:07:44 +08:00
Felix Abecassis	33abb3ecf8	cli: add guest hook path option in the configuration file Add support for specifying an optional drop-in path for guest OCI hooks. This is the runtime side for leveraging the agent change introduced in kata-containers/agent@980023ec62 Fixes: #720 Co-authored-by: Edward Guzman <eguzman@nvidia.com> Co-authored-by: Felix Abecassis <fabecassis@nvidia.com> Signed-off-by: Felix Abecassis <fabecassis@nvidia.com>	2018-10-29 13:06:22 -07:00
Wei Zhang	34fe3b9d6d	cgroups: add host cgroup support Fixes #344 Add host cgroup support for kata. This commits only adds cpu.cfs_period and cpu.cfs_quota support. It will create 3-level hierarchy, take "cpu" cgroup as an example: ``` /sys/fs/cgroup \|---cpu \|---kata \|---<sandbox-id> \|--vcpu \|---<sandbox-id> ``` * `vc` cgroup is common parent for all kata-container sandbox, it won't be removed after sandbox removed. This cgroup has no limitation. * `<sandbox-id>` cgroup is the layer for each sandbox, it contains all other qemu threads except for vcpu threads. In future, we can consider putting all shim processes and proxy process here. This cgroup has no limitation yet. * `vcpu` cgroup contains vcpu threads from qemu. Currently cpu quota and period constraint applies to this cgroup. Signed-off-by: Wei Zhang <zhangwei555@huawei.com> Signed-off-by: Jingxiao Lu <lujingxiao@huawei.com>	2018-10-27 09:41:35 +08:00
Jose Carlos Venegas Munoz	41619e4f83	vc: qemu: Add option to change entropy source This adds a config option to choose the VM entropy source. Fixes: #702 Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>	2018-09-25 17:54:32 -05:00
Jose Carlos Venegas Munoz	19801bf784	config: Add Memory slots configuration. Add configuration to decide the amount of slots that will be used in a VM - This will limit the amount of times that memory can be hotplugged. - Use memory slots provided by user. - tests: aling struct cli: kata-env: Add memory slots info. - Show the slots to be added to the VM. ```diff [Hypervisor] MachineType = "pc" Version = "QEMU ..." Path = "/opt/kata/bin/qemu-system-x86_64" BlockDeviceDriver = "virtio-scsi" Msize9p = 8192 + MemorySlots = 10 Debug = false UseVSock = false ``` Fixes: #751 Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>	2018-09-21 10:57:00 -05:00
Ruidong	225e10cfc4	cli: add configuration option to enable/disable vhost_net Add `disable_vhost_net` option to enable or disable the use of vhost_net. Vhost_net can improve network performance. Signed-off-by: Ruidong Cao <caoruidong@huawei.com>	2018-09-14 00:14:03 +08:00
Archana Shinde	2f552fbf43	hypervisor: Add hypervisor interface to return config This api will allow the config to be accessed by other subsystems such as network. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2018-09-12 12:02:15 -07:00
Peng Tao	a1537a5271	hypervisor: rename DefaultVCPUs and DefaultMemSz Now that we only use hypervisor config to set them, they are not overridden by other configs. So drop the default prefix. Signed-off-by: Peng Tao <bergwolf@gmail.com>	2018-09-06 21:04:56 +08:00
Peng Tao	ce288652d5	virtcontainers: remove sandboxConfig.VMConfig We can just use hyprvisor config to specify the memory size of a guest. There is no need to maintain the extra place just for memory size. Fixes: #692 Signed-off-by: Peng Tao <bergwolf@gmail.com>	2018-09-06 14:15:56 +08:00
James O. D. Hunt	d0679a6fd1	tracing: Add tracing support to virtcontainers Add additional `context.Context` parameters and `struct` fields to allow trace spans to be created by the `virtcontainers` internal functions, objects and sub-packages. Note that not every function is traced; we can add more traces as desired. Fixes #566. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2018-08-22 08:24:58 +01:00
Archana Shinde	31e2925a9a	vfio: Add configuration to support VFIO hotplug on root bus We need this configuration due to a limitation in seabios firmware in handling hotplug for PCI devices with large BARS. Long term, this needs to be fixed in the firmware. Fixes #594 Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2018-08-20 11:36:21 -07:00
Julio Montes	33643797ad	virtcontainers: Use vsock if host support it When the hypervisor option `use_vsock` is true the runtime will check for vsock support. If vsock is supported, not proxy will be used and the shims will connect to the VM using VSOCKS. This flag is true by default, so will use VSOCK when possible and no proxy will be started. fixes #383 Signed-off-by: Jose Carlos Venegas Munoz jose.carlos.venegas.munoz@intel.com Signed-off-by: Julio Montes <julio.montes@intel.com>	2018-07-31 15:38:45 -05:00
Julio Montes	4680e58e08	cli: add configuration option to enable/disable vsocks Add `use_vsock` option to enable or disable the use of vsocks for communication between host and guest. Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com> Signed-off-by: Julio Montes <julio.montes@intel.com>	2018-07-31 13:52:43 -05:00
Peng Tao	7a6f205970	virtcontainers: keep qmp connection when possible For each time a sandbox structure is created, we ensure s.Release() is called. Then we can keep the qmp connection as long as Sandbox pointer is alive. All VC interfaces are still stateless as s.Release() is called before each API returns. OTOH, for VCSandbox APIs, FetchSandbox() must be paired with s.Release, the same as before. Fixes: #500 Signed-off-by: Peng Tao <bergwolf@gmail.com>	2018-07-23 08:37:55 +08:00
Peng Tao	28b6104710	qemu: prepare for vm templating support 1. support qemu migration save operation 2. setup vm templating parameters per hypervisor config 3. create vm storage path when it does not exist. This can happen when an empty guest is created without a sandbox. Signed-off-by: Peng Tao <bergwolf@gmail.com>	2018-07-19 12:44:58 +08:00
Peng Tao	7f20dd89a3	hypervisor: cleanup valid method The boolean return value is not necessary. Signed-off-by: Peng Tao <bergwolf@gmail.com>	2018-07-19 10:49:25 +08:00
Peng Tao	18e6a6effc	hypervisor: decouple hypervisor from sandbox A hypervisor implementation does not need to depend on a sandbox structure. Decouple them in preparation for vm factory. Signed-off-by: Peng Tao <bergwolf@gmail.com>	2018-07-19 10:49:25 +08:00
Peng Tao	66a3e812f2	hypervisor/qemu: add memory hotplug support So that we can add more memory to an existing guest. Fixes: #469 Signed-off-by: Peng Tao <bergwolf@gmail.com>	2018-07-09 15:29:50 +08:00
Nitesh Konkar	baa553da07	virtcontainers: Get qemu suppport for ppc64le Fixes #302 Signed-off-by: Nitesh Konkar niteshkonkar@in.ibm.com	2018-05-31 18:40:43 +05:30
Julio Montes	4527a8066a	virtcontainers/qemu: honour CPU constrains Don't fail if a new container with a CPU constraint was added to a POD and no more vCPUs are available, instead apply the constraint and let kernel balance the resources. Signed-off-by: Julio Montes <julio.montes@intel.com>	2018-05-14 17:33:31 -05:00
Julio Montes	07db945b09	virtcontainers/qemu: reduce memory footprint There is a relation between the maximum number of vCPUs and the memory footprint, if QEMU maxcpus option and kernel nr_cpus cmdline argument are big, then memory footprint is big, this issue only occurs if CPU hotplug support is enabled in the kernel, might be because of kernel needs to allocate resources to watch all sockets waiting for a CPU to be connected (ACPI event). For example ``` +---------------+-------------------------+ \| \| Memory Footprint (KB) \| +---------------+-------------------------+ \| NR_CPUS=240 \| 186501 \| +---------------+-------------------------+ \| NR_CPUS=8 \| 110684 \| +---------------+-------------------------+ ``` In order to do not affect CPU hotplug and allow to users to have containers with the same number of physical CPUs, this patch tries to mitigate the big memory footprint by using the actual number of physical CPUs as the maximum number of vCPUs for each container if `default_maxvcpus` is <= 0 in the runtime configuration file, otherwise `default_maxvcpus` is used as the maximum number of vCPUs. Before this patch a container with 256MB of RAM ``` total used free shared buff/cache available Mem: 195M 40M 113M 26M 41M 112M Swap: 0B 0B 0B ``` With this patch ``` total used free shared buff/cache available Mem: 236M 11M 188M 26M 36M 186M Swap: 0B 0B 0B ``` fixes #295 Signed-off-by: Julio Montes <julio.montes@intel.com>	2018-05-14 17:33:31 -05:00
James O. D. Hunt	bce9edd277	socket: Enforce socket length A Unix domain socket is limited to 107 usable bytes on Linux. However, not all code creating socket paths was checking for this limits. Created a new `utils.BuildSocketPath()` function (with tests) to encapsulate the logic and updated all code creating sockets to use it. Fixes #268. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2018-05-09 11:36:24 +01:00
Zhang Wei	f4a453b86c	virtcontainers: address some comments * Move makeNameID() func to virtcontainers/utils file as it's a generic function for making name and ID. * Move bindDevicetoVFIO() and bindDevicetoHost() to vfio driver package. Signed-off-by: Zhang Wei <zhangwei555@huawei.com>	2018-05-08 10:24:26 +08:00
Sebastien Boeuf	ea789dbab9	Merge pull request #207 from amshinde/msize-9p Add configuration for 9p msize	2018-04-18 11:20:44 -07:00
Graham whaley	d6c3ec864b	license: SPDX: update all vc files to use SPDX style When imported, the vc files carried in the 'full style' apache license text, but the standard for kata is to use SPDX style. Update the relevant files to SPDX. Fixes: #227 Signed-off-by: Graham whaley <graham.whaley@intel.com>	2018-04-18 13:43:15 +01:00

1 2

58 Commits