packaging: merge packaging repository

git-subtree-dir: tools/packaging git-subtree-mainline: f818b46a41 git-subtree-split: 1f22d72d5d Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2026-02-15 11:34:22 +01:00 · 2020-06-23 22:49:04 -07:00
parent f818b46a41 1f22d72d5d
commit 782cd2ed10
645 changed files with 292694 additions and 0 deletions
--- a/tools/packaging/kernel/README.md
+++ b/tools/packaging/kernel/README.md
@@ -0,0 +1,179 @@
+# Build Kata Containers Kernel
+
+* [Requirements](#requirements)
+* [Usage](#usage)
+* [Setup kernel source code](#setup-kernel-source-code)
+* [Build the kernel](#build-the-kernel)
+* [Install the Kernel in the default path for Kata](#install-the-kernel-in-the-default-path-for-kata)
+* [Submit Kernel Changes](#submit-kernel-changes)
+* [How is it tested](#how-is-it-tested)
+* [Contribute](#contribute)
+
+This document explains the steps to build a kernel recommended for use with
+Kata Containers. To do this use `build-kernel.sh`, this script
+automates the process to build a kernel for Kata Containers.
+
+## Requirements
+
+The `build-kernel.sh` script requires an installed Golang version matching the
+[component build requirements](https://github.com/kata-containers/documentation/blob/master/Developer-Guide.md#requirements-to-build-individual-components).
+
+## Usage
+
+```
+$ ./build-kernel.sh -h
+Overview:
+
+	Build a kernel for Kata Containers
+
+Description: This script is the *ONLY* to build a kernel for development.
+
+
+Usage:
+
+	build-kernel.sh [options] <command> <argument>
+
+Commands:
+
+- setup
+
+- build
+
+- install
+
+Options:
+
+	-c <path>   : Path to config file to build a the kernel.
+	-d          : Enable bash debug.
+	-e          : Enable experimental kernel.
+	-f          : Enable force generate config when setup.
+	-g <vendor> : GPU vendor, intel or nvidia.
+	-h          : Display this help.
+	-k <path>   : Path to kernel to build.
+	-p <path>   : Path to a directory with patches to apply to kernel.
+	-t          : Hypervisor_target.
+	-v          : Kernel version to use if kernel path not provided.
+```
+
+Example:
+```
+$ ./build-kernel.sh -v 4.19.86 -g nvidia -f -d setup
+```
+> **Note**
+> - `-v 4.19.86`: Specify the guest kernel version.
+> - `-g nvidia`: To build a guest kernel supporting Nvidia GPU.
+> - `-f`: The .config file is forced to be generated even if the kernel directory already exists.
+> - `-d`: Enable bash debug mode.
+
+
+## Setup kernel source code
+
+```bash
+$ go get -d -u github.com/kata-containers/packaging
+$ cd $GOPATH/src/github.com/kata-containers/packaging/kernel
+$ ./build-kernel.sh setup
+```
+
+The script `./build-kernel.sh` tries to apply the patches from
+`${GOPATH}/src/github.com/kata-containers/packaging/kernel/patches/` when it
+sets up a kernel. If you want to add a source modification, add a patch on this
+directory.
+
+The script also adds a kernel config file from
+`${GOPATH}/src/github.com/kata-containers/packaging/kernel/configs/` to `.config`
+in the kernel source code. You can modify it as needed.
+
+## Build the kernel
+
+After the kernel source code is ready, it is possible to build the kernel.
+
+```bash
+$ ./build-kernel.sh build
+```
+
+## Install the Kernel in the default path for Kata
+
+Kata Containers uses some default path to search a kernel to boot. To install
+on this path, the following command will install it to the default Kata
+containers path (`/usr/share/kata-containers/`).
+
+```bash
+$ ./build-kernel.sh install
+```
+
+## Submit Kernel Changes
+
+Kata Containers packaging repository holds the kernel configs and patches. The
+config and patches can work for many versions, but we only test the
+kernel version defined in the [runtime versions file][runtime-versions-file].
+
+For further details, see [the kernel configuration documentation](configs).
+
+## How is it tested
+
+The Kata Containers CI scripts install the kernel from [CI cache
+job][cache-job] or build from sources.
+
+If the kernel defined in the [runtime versions file][runtime-versions-file] is
+built and cached with the latest kernel config and patches, it installs.
+Otherwise, the kernel is built from source.
+
+The Kata kernel version is a mix of the kernel version defined in the [runtime
+versions file][runtime-versions-file] and the file `kata_config_version`. This
+helps to identify if a kernel build has the latest recommend
+configuration.
+
+Example:
+
+```bash
+# From https://github.com/kata-containers/runtime/blob/master/versions.yaml
+$ kernel_version_in_versions_file=4.10.1
+# From https://github.com/kata-containers/packaging/blob/master/kernel/kata_config_version
+$ kata_config_version=25
+$ latest_kernel_version=${kernel_version_in_versions_file}-${kata_config_version}
+```
+
+The resulting version is 4.10.1-25, this helps identify whether or not the kernel
+configs are up-to-date on a CI version.
+
+## Contribute
+
+In order to do Kata Kernel changes. There are places to contribute:
+
+1. [Kata runtime versions file][runtime-versions-file]: This file points to the
+   recommended versions to be used by Kata. To update the kernel version send a
+   pull request to update that version. The Kata CI will run all the use cases
+   and verify it works.
+
+1. Kata packaging repository. This repository contains all the kernel configs
+   and patches recommended for Kata Containers kernel:
+
+- If you want to upload one new configuration (new version or architecture
+  specific) make sure the config file name has the following format:
+
+  ```bash
+  # Format:
+  $ ${arch}_kata_${hypervisor_target}_${major_kernel_version}.x
+
+  # example:
+  $ arch=x86_64
+  $ hypervisor_target=kvm
+  $ major_kernel_version=4.19
+
+  # Resulting file
+  $ name: x86_64_kata_kvm_4.19.x
+  ```
+
+- Kernel patches, the CI and packaging scripts will apply all patches in the
+  [patches directory][patches-dir].
+
+Note: The kernel version and configuration file live in different locations,
+which could result in a circular dependency on your (runtime or packaging) PR.
+In this case, the PR you submit needs to be tested together with a patch from
+another Kata Containers repository. To do this you have to specify which
+repository and which pull request [it depends on][depends-on-docs].
+
+[runtime-versions-file]: https://github.com/kata-containers/runtime/blob/master/versions.yaml
+[patches-dir]: https://github.com/kata-containers/packaging/tree/master/kernel/patches
+[depends-on-docs]: https://github.com/kata-containers/tests/blob/master/README.md#breaking-compatibility
+[cache-job]: http://jenkins.katacontainers.io/job/image-nightly-x86_64/
--- a/tools/packaging/kernel/build-kernel.sh
+++ b/tools/packaging/kernel/build-kernel.sh
@@ -0,0 +1,539 @@
+#!/bin/bash
+#
+# Copyright (c) 2018 Intel Corporation
+#
+# SPDX-License-Identifier: Apache-2.0
+
+description="
+Description: This script is the *ONLY* to build a kernel for development.
+"
+
+set -o errexit
+set -o nounset
+set -o pipefail
+
+readonly script_name="$(basename "${BASH_SOURCE[0]}")"
+readonly script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+kata_version="${kata_version:-}"
+
+#project_name
+readonly project_name="kata-containers"
+[ -n "${GOPATH:-}" ] || GOPATH="${HOME}/go"
+# Fetch the first element from GOPATH as working directory
+# as go get only works against the first item in the GOPATH
+GOPATH="${GOPATH%%:*}"
+# Kernel version to be used
+kernel_version=""
+# Flag know if need to download the kernel source
+download_kernel=false
+# The repository where kernel configuration lives
+runtime_repository="github.com/${project_name}/runtime"
+# The repository where kernel configuration lives
+readonly kernel_config_repo="github.com/${project_name}/packaging"
+readonly patches_repo="github.com/${project_name}/packaging"
+readonly patches_repo_dir="${GOPATH}/src/${patches_repo}"
+# Default path to search patches to apply to kernel
+readonly default_patches_dir="${patches_repo_dir}/kernel/patches/"
+# Default path to search config for kata
+readonly default_kernel_config_dir="${GOPATH}/src/${kernel_config_repo}/kernel/configs"
+# Default path to search for kernel config fragments
+readonly default_config_frags_dir="${GOPATH}/src/${kernel_config_repo}/kernel/configs/fragments"
+readonly default_config_whitelist="${GOPATH}/src/${kernel_config_repo}/kernel/configs/fragments/whitelist.conf"
+# GPU vendor
+readonly GV_INTEL="intel"
+readonly GV_NVIDIA="nvidia"
+
+#Path to kernel directory
+kernel_path=""
+#Experimental kernel support. Pull from virtio-fs GitLab instead of kernel.org
+experimental_kernel="false"
+#Force generate config when setup
+force_setup_generate_config="false"
+#GPU kernel support
+gpu_vendor=""
+#
+patches_path=""
+#
+hypervisor_target=""
+#
+arch_target=""
+#
+kernel_config_path=""
+# destdir
+DESTDIR="${DESTDIR:-/}"
+#PREFIX=
+PREFIX="${PREFIX:-/usr}"
+
+source "${script_dir}/../scripts/lib.sh"
+
+usage() {
+	exit_code="$1"
+	cat <<EOT
+Overview:
+
+	Build a kernel for Kata Containers
+	${description}
+
+Usage:
+
+	$script_name [options] <command> <argument>
+
+Commands:
+
+- setup
+
+- build
+
+- install
+
+Options:
+
+	-c <path>   : Path to config file to build a the kernel.
+	-d          : Enable bash debug.
+	-e          : Enable experimental kernel.
+	-f          : Enable force generate config when setup.
+	-g <vendor> : GPU vendor, intel or nvidia.
+	-h          : Display this help.
+	-k <path>   : Path to kernel to build.
+	-p <path>   : Path to a directory with patches to apply to kernel.
+	-t          : Hypervisor_target.
+	-v          : Kernel version to use if kernel path not provided.
+EOT
+	exit "$exit_code"
+}
+
+# Convert architecture to the name used by the Linux kernel build system
+arch_to_kernel() {
+	local -r arch="$1"
+
+	case "$arch" in
+		aarch64) echo "arm64" ;;
+		ppc64le) echo "powerpc" ;;
+		s390x) echo "s390" ;;
+		x86_64) echo "$arch" ;;
+		*) die "unsupported architecture: $arch" ;;
+	esac
+}
+
+get_kernel() {
+	local version="${1:-}"
+
+	local kernel_path=${2:-}
+	[ -n "${kernel_path}" ] || die "kernel_path not provided"
+	[ ! -d "${kernel_path}" ] || die "kernel_path already exist"
+
+
+	if [[ ${experimental_kernel} == "true" ]]; then
+		kernel_tarball="linux-${version}.tar.gz"
+		curl --fail -OL "https://gitlab.com/virtio-fs/linux/-/archive/${version}/${kernel_tarball}"
+		tar xf "${kernel_tarball}"
+		mv "linux-${version}" "${kernel_path}"
+	else
+
+		#Remove extra 'v'
+		version=${version#v}
+
+		major_version=$(echo "${version}" | cut -d. -f1)
+		kernel_tarball="linux-${version}.tar.xz"
+
+                if [ ! -f sha256sums.asc ] || ! grep -q "${kernel_tarball}" sha256sums.asc; then
+                        info "Download kernel checksum file: sha256sums.asc"
+                        curl --fail -OL "https://cdn.kernel.org/pub/linux/kernel/v${major_version}.x/sha256sums.asc"
+                fi
+                grep "${kernel_tarball}" sha256sums.asc >"${kernel_tarball}.sha256"
+
+		if [ -f "${kernel_tarball}" ] && ! sha256sum -c "${kernel_tarball}.sha256"; then
+			info "invalid kernel tarball ${kernel_tarball} removing "
+			rm -f "${kernel_tarball}"
+		fi
+		if [ ! -f "${kernel_tarball}" ]; then
+			info "Download kernel version ${version}"
+			info "Download kernel"
+			curl --fail -OL "https://www.kernel.org/pub/linux/kernel/v${major_version}.x/${kernel_tarball}"
+		else
+			info "kernel tarball already downloaded"
+		fi
+
+		sha256sum -c "${kernel_tarball}.sha256"
+
+		tar xf "${kernel_tarball}"
+
+		mv "linux-${version}" "${kernel_path}"
+	fi
+}
+
+get_major_kernel_version() {
+	local version="${1}"
+	[ -n "${version}" ] || die "kernel version not provided"
+	major_version=$(echo "${version}" | cut -d. -f1)
+	minor_version=$(echo "${version}" | cut -d. -f2)
+	echo "${major_version}.${minor_version}"
+}
+
+# Make a kernel config file from generic and arch specific
+# fragments
+# - arg1 - path to arch specific fragments
+# - arg2 - path to kernel sources
+#
+get_kernel_frag_path() {
+	local arch_path="$1"
+	local common_path="${arch_path}/../common"
+	local gpu_path="${arch_path}/../gpu"
+
+	local kernel_path="$2"
+	local arch="$3"
+	local cmdpath="${kernel_path}/scripts/kconfig/merge_config.sh"
+	local config_path="${arch_path}/.config"
+
+	local arch_configs="$(ls ${arch_path}/*.conf)"
+	# Exclude configs if they have !$arch tag in the header
+	local common_configs="$(grep "\!${arch}" ${common_path}/*.conf -L)"
+	local experimental_configs="$(ls ${common_path}/experimental/*.conf)"
+
+	# These are the strings that the kernel merge_config.sh script kicks out
+	# when it reports an error or warning condition. We search for them in the
+	# output to try and fail when we think something has been misconfigured.
+	local not_in_string="not in final"
+	local redefined_string="not in final"
+	local redundant_string="not in final"
+
+	# Later, if we need to add kernel version specific subdirs in order to
+	# handle specific cases, then add the path definition and search/list/cat
+	# here.
+	local all_configs="${common_configs} ${arch_configs}"
+	if [[ ${experimental_kernel} == "true" ]]; then
+		all_configs="${all_configs} ${experimental_configs}"
+	fi
+
+	if [[ "${gpu_vendor}" != "" ]];then
+		info "Add kernel config for GPU due to '-g ${gpu_vendor}'"
+		local gpu_configs="$(ls ${gpu_path}/${gpu_vendor}.conf)"
+		all_configs="${all_configs} ${gpu_configs}"
+	fi
+
+	info "Constructing config from fragments: ${config_path}"
+
+
+	export KCONFIG_CONFIG=${config_path}
+	export ARCH=${arch_target}
+	cd ${kernel_path}
+
+	local results
+	results=$( ${cmdpath} -r -n ${all_configs} )
+	# Only consider results highlighting "not in final"
+	results=$(grep "${not_in_string}" <<< "$results")
+	# Do not care about options that are in whitelist
+	results=$(grep -v -f ${default_config_whitelist} <<< "$results")
+
+	# Did we request any entries that did not make it?
+	local missing=$(echo $results | grep -v -q "${not_in_string}"; echo $?)
+	if [ ${missing} -ne 0 ]; then
+		info "Some CONFIG elements failed to make the final .config:"
+		info "${results}"
+		info "Generated config file can be found in ${config_path}"
+		die "Failed to construct requested .config file"
+	fi
+
+	# Did we define something as two different values?
+	local redefined=$(echo ${results} | grep -v -q "${redefined_string}"; echo $?)
+	if [ ${redefined} -ne 0 ]; then
+		info "Some CONFIG elements are redefined in fragments:"
+		info "${results}"
+		info "Generated config file can be found in ${config_path}"
+		die "Failed to construct requested .config file"
+	fi
+
+	# Did we define something twice? Nominally this may not be an error, and it
+	# might be convenient to allow it, but for now, let's pick up on them.
+	local redundant=$(echo ${results} | grep -v -q "${redundant_string}"; echo $?)
+	if [ ${redundant} -ne 0 ]; then
+		info "Some CONFIG elements failed to make the final .config"
+		info "${results}"
+		info "Generated config file can be found in ${config_path}"
+		die "Failed to construct requested .config file"
+	fi
+
+	echo "${config_path}"
+}
+
+# Locate and return the path to the relevant kernel config file
+# - arg1: kernel version
+# - arg2: hypervisor target
+# - arg3: arch target
+# - arg4: kernel source path
+get_default_kernel_config() {
+	local version="${1}"
+
+	local hypervisor="$2"
+	local kernel_arch="$3"
+	local kernel_path="$4"
+
+	[ -n "${version}" ] || die "kernel version not provided"
+	[ -n "${hypervisor}" ] || die "hypervisor not provided"
+	[ -n "${kernel_arch}" ] || die "kernel arch not provided"
+
+	local kernel_ver
+	kernel_ver=$(get_major_kernel_version "${version}")
+
+	archfragdir="${default_config_frags_dir}/${kernel_arch}"
+	if [ -d "${archfragdir}" ]; then
+		config="$(get_kernel_frag_path ${archfragdir} ${kernel_path} ${kernel_arch})"
+	else
+		[ "${hypervisor}" == "firecracker" ] && hypervisor="kvm"
+		config="${default_kernel_config_dir}/${kernel_arch}_kata_${hypervisor}_${major_kernel}.x"
+	fi
+
+	[ -f "${config}" ] || die "failed to find default config ${config}"
+	echo "${config}"
+}
+
+get_config_and_patches() {
+	if [ -z "${patches_path}" ]; then
+		patches_path="${default_patches_dir}"
+		if [ ! -d "${patches_path}" ]; then
+			tag="${kata_version}"
+			git clone -q "https://${patches_repo}.git" "${patches_repo_dir}"
+			pushd "${patches_repo_dir}" >> /dev/null
+			if [ -n $tag ] ; then
+				info "checking out $tag"
+				git checkout -q $tag
+			fi
+			popd >> /dev/null
+		fi
+	fi
+}
+
+get_config_version() {
+	get_config_and_patches
+	config_version_file="${default_patches_dir}/../kata_config_version"
+	if [ -f "${config_version_file}" ]; then
+		cat "${config_version_file}"
+	else
+		die "failed to find ${config_version_file}"
+	fi
+}
+
+setup_kernel() {
+	local kernel_path=${1:-}
+	[ -n "${kernel_path}" ] || die "kernel_path not provided"
+
+	if [ -d "$kernel_path" ]; then
+		info "${kernel_path} already exist"
+		if [[ "${force_setup_generate_config}" != "true" ]];then
+			return
+		else
+			info "Force generate config due to '-f'"
+		fi
+	else
+		info "kernel path does not exist, will download kernel"
+		download_kernel="true"
+		[ -n "$kernel_version" ] || die "failed to get kernel version: Kernel version is emtpy"
+
+		if [[ ${download_kernel} == "true" ]]; then
+			get_kernel "${kernel_version}" "${kernel_path}"
+		fi
+
+		[ -n "$kernel_path" ] || die "failed to find kernel source path"
+
+		get_config_and_patches
+
+		[ -d "${patches_path}" ] || die " patches path '${patches_path}' does not exist"
+	fi
+
+	local major_kernel
+	major_kernel=$(get_major_kernel_version "${kernel_version}")
+	local patches_dir_for_version="${patches_path}/${major_kernel}.x"
+	local kernel_patches=""
+	if [ -d "${patches_dir_for_version}" ]; then
+		# Patches are expected to be named in the standard
+		# git-format-patch(1) format where the first part of the
+		# filename represents the patch ordering
+		# (lowest numbers apply first):
+		#
+		#   "${number}-${dashed_description}"
+		#
+		# For example,
+		#
+		#   0001-fix-the-bad-thing.patch
+		#   0002-improve-the-fix-the-bad-thing-fix.patch
+		#   0003-correct-compiler-warnings.patch
+		kernel_patches=$(find "${patches_dir_for_version}" -name '*.patch' -type f |\
+			sort -t- -k1,1n)
+	else
+		info "kernel patches directory does not exit"
+	fi
+
+	[ -n "${arch_target}" ] || arch_target="$(uname -m)"
+	arch_target=$(arch_to_kernel "${arch_target}")
+	(
+	cd "${kernel_path}" || exit 1
+	for p in ${kernel_patches}; do
+		info "Applying patch $p"
+		patch -p1 --fuzz 0 <"$p"
+	done
+
+	[ -n "${hypervisor_target}" ] || hypervisor_target="kvm"
+	[ -n "${kernel_config_path}" ] || kernel_config_path=$(get_default_kernel_config "${kernel_version}" "${hypervisor_target}" "${arch_target}" "${kernel_path}")
+
+	info "Copying config file from: ${kernel_config_path}"
+	cp "${kernel_config_path}" ./.config
+	make oldconfig
+	)
+}
+
+build_kernel() {
+	local kernel_path=${1:-}
+	[ -n "${kernel_path}" ] || die "kernel_path not provided"
+	[ -d "${kernel_path}" ] || die "path to kernel does not exist, use ${script_name} setup"
+	[ -n "${arch_target}" ] || arch_target="$(uname -m)"
+	arch_target=$(arch_to_kernel "${arch_target}")
+	pushd "${kernel_path}" >>/dev/null
+	make -j $(nproc) ARCH="${arch_target}"
+	[ "$arch_target" != "powerpc" ] && ([ -e "arch/${arch_target}/boot/bzImage" ] || [ -e "arch/${arch_target}/boot/Image.gz" ])
+	[ -e "vmlinux" ]
+	[ "${hypervisor_target}" == "firecracker" ] && [ "${arch_target}" == "arm64" ] && [ -e "arch/${arch_target}/boot/Image" ]
+	popd >>/dev/null
+}
+
+install_kata() {
+	local kernel_path=${1:-}
+	[ -n "${kernel_path}" ] || die "kernel_path not provided"
+	[ -d "${kernel_path}" ] || die "path to kernel does not exist, use ${script_name} setup"
+	pushd "${kernel_path}" >>/dev/null
+	config_version=$(get_config_version)
+	[ -n "${config_version}" ] || die "failed to get config version"
+	install_path=$(readlink -m "${DESTDIR}/${PREFIX}/share/${project_name}")
+
+	suffix=""
+	if [[ ${experimental_kernel} == "true" ]]; then
+		suffix="-virtiofs"
+	fi
+	if [[ ${gpu_vendor} != "" ]];then
+		suffix="-${gpu_vendor}-gpu${suffix}"
+	fi
+
+	vmlinuz="vmlinuz-${kernel_version}-${config_version}${suffix}"
+	vmlinux="vmlinux-${kernel_version}-${config_version}${suffix}"
+
+	if [ -e "arch/${arch_target}/boot/bzImage" ]; then
+		bzImage="arch/${arch_target}/boot/bzImage"
+	elif [ -e "arch/${arch_target}/boot/Image.gz" ]; then
+		bzImage="arch/${arch_target}/boot/Image.gz"
+	elif [ "${arch_target}" != "powerpc" ]; then
+		die "failed to find image"
+	fi
+
+	# Install compressed kernel
+	if [ "${arch_target}" = "powerpc" ]; then
+		install --mode 0644 -D "vmlinux" "${install_path}/${vmlinuz}"
+	else
+		install --mode 0644 -D "${bzImage}" "${install_path}/${vmlinuz}"
+	fi
+
+	# Install uncompressed kernel
+	if [ "${arch_target}" = "arm64" ]; then
+		install --mode 0644 -D "arch/${arch_target}/boot/Image" "${install_path}/${vmlinux}"
+	else
+		install --mode 0644 -D "vmlinux" "${install_path}/${vmlinux}"
+	fi
+
+	install --mode 0644 -D ./.config "${install_path}/config-${kernel_version}"
+
+	ln -sf "${vmlinuz}" "${install_path}/vmlinuz${suffix}.container"
+	ln -sf "${vmlinux}" "${install_path}/vmlinux${suffix}.container"
+	ls -la "${install_path}/vmlinux${suffix}.container"
+	ls -la "${install_path}/vmlinuz${suffix}.container"
+	popd >>/dev/null
+}
+
+main() {
+	while getopts "a:c:defg:hk:p:t:v:" opt; do
+		case "$opt" in
+			a)
+				arch_target="${OPTARG}"
+				;;
+			c)
+				kernel_config_path="${OPTARG}"
+				;;
+			d)
+				PS4=' Line ${LINENO}: '
+				set -x
+				;;
+			e)
+				experimental_kernel="true"
+				;;
+			f)
+				force_setup_generate_config="true"
+				;;
+			g)
+				gpu_vendor="${OPTARG}"
+				[[ "${gpu_vendor}" == "${GV_INTEL}" || "${gpu_vendor}" == "${GV_NVIDIA}" ]] || die "GPU vendor only support intel and nvidia"
+				;;
+			h)
+				usage 0
+				;;
+			k)
+				kernel_path="${OPTARG}"
+				;;
+			p)
+				patches_path="${OPTARG}"
+				;;
+			t)
+				hypervisor_target="${OPTARG}"
+				;;
+			v)
+				kernel_version="${OPTARG}"
+				;;
+		esac
+	done
+
+	shift $((OPTIND - 1))
+
+	subcmd="${1:-}"
+
+	[ -z "${subcmd}" ] && usage 1
+
+	# If not kernel version take it from versions.yaml
+	if [ -z "$kernel_version" ]; then
+		if [[ ${experimental_kernel} == "true" ]]; then
+			kernel_version=$(get_from_kata_deps "assets.kernel-experimental.tag" "${kata_version}")
+		else
+			kernel_version=$(get_from_kata_deps "assets.kernel.version" "${kata_version}")
+			#Remove extra 'v'
+			kernel_version="${kernel_version#v}"
+		fi
+	fi
+
+	if [ -z "${kernel_path}" ]; then
+		config_version=$(get_config_version)
+		if [[ ${experimental_kernel} == "true" ]]; then
+			kernel_path="${PWD}/kata-linux-experimental-${kernel_version}-${config_version}"
+		else
+			kernel_path="${PWD}/kata-linux-${kernel_version}-${config_version}"
+		fi
+		info "Config version: ${config_version}"
+	fi
+
+	info "Kernel version: ${kernel_version}"
+
+	case "${subcmd}" in
+		build)
+			build_kernel "${kernel_path}"
+			;;
+		install)
+			build_kernel "${kernel_path}"
+			install_kata "${kernel_path}"
+			;;
+		setup)
+			setup_kernel "${kernel_path}"
+			[ -d "${kernel_path}" ] || die "${kernel_path} does not exist"
+			echo "Kernel source ready: ${kernel_path} "
+			;;
+		*)
+			usage 1
+			;;
+
+	esac
+}
+
+main $@
--- a/tools/packaging/kernel/configs/README.md
+++ b/tools/packaging/kernel/configs/README.md
@@ -0,0 +1,71 @@
+* [Kata Containers kernel config files](#kata-containers-kernel-config-files)
+   * [Types of config files](#types-of-config-files)
+   * [How to use config files](#how-to-use-config-files)
+   * [How to modify config files](#how-to-modify-config-files)
+
+# Kata Containers kernel config files
+
+This directory contains Linux Kernel config files used to configure Kata
+Containers VM kernels.
+
+## Types of config files
+
+This directory holds config files for the Kata Linux Kernel in two forms:
+
+- A tree of config file 'fragments' in the `fragments` sub-folder, that are
+  constructed into a complete config file using the kernel
+  `scripts/kconfig/merge_config.sh` script.
+- As complete config files that can be used as-is.
+
+Kernel config fragments are the preferred method of constructing `.config` files
+to build Kata Containers kernels, due to their improved clarity and ease of maintenance
+over single file monolithic `.config`s.
+
+## How to use config files
+
+The recommended way to set up a kernel tree, populate it with a relevant `.config` file,
+and build a kernel, is to use the [`build_kernel.sh`](../build-kernel.sh) script. For
+example:
+
+```bash
+$ ./build-kernel.sh setup
+```
+
+The `build-kernel.sh` script understands both full and fragment based config files.
+
+Run `./build-kernel.sh help` for more information.
+
+## How to modify config files
+
+Complete config files can be modified either with an editor, or preferably
+using the kernel `Kconfig` configuration tools, for example:
+
+```
+$ cp x86_kata_kvm_4.14.x linux-4.14.22/.config
+$ pushd linux-4.14.22
+$ make menuconfig
+$ popd
+$ cp linux-4.14.22/.config x86_kata_kvm_4.14.x
+```
+
+Kernel fragments are best constructed using an editor. Tools such as `grep` and
+`diff` can help find the differences between two config files to be placed
+into a fragment.
+
+If adding config entries for a new subsystem or feature, consider making a new
+fragment with an appropriately descriptive name.
+
+If you want to disable an entire fragment for a specific architecture, you can add the tag `# !${arch}` in the first line of the fragment. You can also exclude multiple architectures on the same line. Note the `#` at the beginning of the line, this is required to avoid that the tag is interpreted as a configuration.
+Example of valid exclusion:
+```
+# !s390x !ppc64le
+```
+
+The fragment gathering tool perfoms some basic sanity checks, and the `build-kernel.sh` will
+fail and report the error in the cases of:
+
+- A duplicate `CONFIG` symbol appearing.
+- A `CONFIG` symbol being in a fragment, but not appearing in the final .config
+  - which indicates that `CONFIG` variable is not a part of the kernel `Kconfig` setup, which
+    can indicate a typing mistake in the name of the symbol.
+- A `CONFIG` symbol appearing in the fragments with multiple different values.
--- a/tools/packaging/kernel/configs/arm64_kata_kvm_4.14.x
+++ b/tools/packaging/kernel/configs/arm64_kata_kvm_4.14.x
--- a/tools/packaging/kernel/configs/arm64_kata_kvm_4.19.x
+++ b/tools/packaging/kernel/configs/arm64_kata_kvm_4.19.x
--- a/tools/packaging/kernel/configs/arm64_kata_kvm_5.4.x
+++ b/tools/packaging/kernel/configs/arm64_kata_kvm_5.4.x
--- a/tools/packaging/kernel/configs/arm64_kata_kvm_virtio-fs-v0.3.x
+++ b/tools/packaging/kernel/configs/arm64_kata_kvm_virtio-fs-v0.3.x
--- a/tools/packaging/kernel/configs/fragments/arm64/acpi.conf
+++ b/tools/packaging/kernel/configs/fragments/arm64/acpi.conf
@@ -0,0 +1,5 @@
+# ACPI on arm64 is dependent on uEFI.
+CONFIG_EFI=y
+CONFIG_EFI_STUB=y
+# ARM64 can run properly in ACPI hardware reduced mode.
+CONFIG_ACPI_REDUCED_HARDWARE_ONLY=y
--- a/tools/packaging/kernel/configs/fragments/arm64/base.conf
+++ b/tools/packaging/kernel/configs/fragments/arm64/base.conf
@@ -0,0 +1,42 @@
+CONFIG_ARM64=y
+CONFIG_ARM64_4K_PAGES=y
+
+# ARM servers are often multi-cores, following configs improve
+# the CPU scheduler's decision making.
+CONFIG_SCHED_MC=y
+CONFIG_SCHED_SMT=y
+
+# Virtual address space size (48-bit)
+CONFIG_ARM64_VA_BITS_48=y
+CONFIG_ARM64_VA_BITS=48
+# Physical address space size (48-bit)
+CONFIG_ARM64_PA_BITS_48=y
+CONFIG_ARM64_PA_BITS=48
+
+# Use the maximum number of CPUs supported by KVM (255)
+CONFIG_NR_CPUS=255
+
+CONFIG_PERF_EVENTS=y
+
+# No architected NMI
+CONFIG_ARM64_PSEUDO_NMI=y
+CONFIG_ARM64_SVE=y
+
+# Arm64 prefers to use REFCOUNT_FULL by default.
+CONFIG_REFCOUNT_FULL=y
+
+#
+# ARMv8.1 architectural features
+#
+CONFIG_ARM64_HW_AFDBM=y
+CONFIG_ARM64_PAN=y
+# end of ARMv8.1 architectural features
+
+#
+# ARMv8.2 architectural features
+#
+CONFIG_ARM64_CNP=y
+CONFIG_ARM64_PMEM=y
+CONFIG_ARM64_RAS_EXTN=y
+CONFIG_ARM64_UAO=y
+# end of ARMv8.2 architectural feature
--- a/tools/packaging/kernel/configs/fragments/arm64/crypto.conf
+++ b/tools/packaging/kernel/configs/fragments/arm64/crypto.conf
@@ -0,0 +1,6 @@
+# ARMv8 adds cryptographic instructions that could significantly improve
+# performance on tasks such as AES encryption and SHA1 and SHA256 hashing.
+CONFIG_ARM64_CRYPTO=y
+CONFIG_CRYPTO_AES_ARM64=y
+CONFIG_CRYPTO_AES_ARM64_CE=y
+CONFIG_CRYPTO_SHA256_ARM64=y
--- a/tools/packaging/kernel/configs/fragments/arm64/dt.conf
+++ b/tools/packaging/kernel/configs/fragments/arm64/dt.conf
@@ -0,0 +1,4 @@
+# Device Tree and Open Firmware support
+CONFIG_DTC=y
+CONFIG_OF=y
+CONFIG_OF_PMEM=y
--- a/tools/packaging/kernel/configs/fragments/arm64/erratum.conf
+++ b/tools/packaging/kernel/configs/fragments/arm64/erratum.conf
@@ -0,0 +1,15 @@
+# ARM errata workarounds via the alternatives framework.
+# Vendor-specific option will be left to users to decide.
+CONFIG_ARM64_ERRATUM_1024718=y
+CONFIG_ARM64_ERRATUM_1165522=y
+CONFIG_ARM64_ERRATUM_1286807=y
+CONFIG_ARM64_ERRATUM_1463225=y
+CONFIG_ARM64_ERRATUM_819472=y
+CONFIG_ARM64_ERRATUM_824069=y
+CONFIG_ARM64_ERRATUM_826319=y
+CONFIG_ARM64_ERRATUM_827319=y
+CONFIG_ARM64_ERRATUM_832075=y
+CONFIG_ARM64_ERRATUM_843419=y
+CONFIG_ARM64_WORKAROUND_CLEAN_CACHE=y
+CONFIG_ARM64_WORKAROUND_REPEAT_TLBI=y
+
--- a/tools/packaging/kernel/configs/fragments/arm64/pci.conf
+++ b/tools/packaging/kernel/configs/fragments/arm64/pci.conf
@@ -0,0 +1,3 @@
+# It brings PCI support to mach-virt based upon an idealised host controller.
+CONFIG_PCI_HOST_COMMON=y
+CONFIG_PCI_HOST_GENERIC=y
--- a/tools/packaging/kernel/configs/fragments/arm64/ptp.conf
+++ b/tools/packaging/kernel/configs/fragments/arm64/ptp.conf
@@ -0,0 +1,7 @@
+# PTP clock support
+#
+# The implementation of ptp_kvm on arm is one experimental feature,
+# you need to apply private patches to enable it on your host machine.
+# See https://github.com/kata-containers/packaging/pull/998 for detailed info.
+CONFIG_PTP_1588_CLOCK=y
+CONFIG_PTP_1588_CLOCK_KVM=y
--- a/tools/packaging/kernel/configs/fragments/arm64/rtc.conf
+++ b/tools/packaging/kernel/configs/fragments/arm64/rtc.conf
@@ -0,0 +1,10 @@
+CONFIG_RTC_LIB=y
+CONFIG_RTC_CLASS=y
+CONFIG_RTC_HCTOSYS=y
+CONFIG_RTC_SYSTOHC=y
+# RTC interfaces
+CONFIG_RTC_INTF_SYSFS=y
+CONFIG_RTC_INTF_PROC=y
+CONFIG_RTC_INTF_DEV=y
+# QEMU provides an emulated ARM AMBA PrimeCell PL031 RTC.
+CONFIG_RTC_DRV_PL031=y
--- a/tools/packaging/kernel/configs/fragments/arm64/serial.conf
+++ b/tools/packaging/kernel/configs/fragments/arm64/serial.conf
@@ -0,0 +1,3 @@
+# This option is used for all 8250 compatible serial ports
+# that are probed through device tree.
+CONFIG_SERIAL_OF_PLATFORM=y
--- a/tools/packaging/kernel/configs/fragments/common/9p.conf
+++ b/tools/packaging/kernel/configs/fragments/common/9p.conf
@@ -0,0 +1,17 @@
+# Enable 9p(fs) support - required for Kata to mount filesystems into the workload
+
+CONFIG_NET_9P=y
+CONFIG_NET_9P_VIRTIO=y
+CONFIG_9P_FS=y
+# NOTE - 9p client cacheing turned off?
+# FIXME: check if that is right?
+# https://github.com/kata-containers/packaging/issues/483
+#CONFIG_9P_FSCACHE=y
+CONFIG_NETWORK_FILESYSTEMS=y
+# Q. Do we use the POSIX_ACL over 9p?
+# FIXME: https://github.com/kata-containers/packaging/issues/483
+CONFIG_9P_FS_POSIX_ACL=y
+# NOTE - this adds security labels, such as used by SELinux - we may be able to
+# disable this, for now.
+# FIXME: https://github.com/kata-containers/packaging/issues/483
+CONFIG_9P_FS_SECURITY=y
--- a/tools/packaging/kernel/configs/fragments/common/acpi.conf
+++ b/tools/packaging/kernel/configs/fragments/common/acpi.conf
@@ -0,0 +1,20 @@
+# enable ACPI support.
+# This could do with REVIEW
+# https://github.com/kata-containers/packaging/issues/483
+CONFIG_ARCH_SUPPORTS_ACPI=y
+CONFIG_ACPI=y
+CONFIG_ACPI_BUTTON=y
+CONFIG_ACPI_PROCESSOR_IDLE=y
+# Having trouble enabling this - disable for now.
+# Would add support for ACPI CPPC power control via firmware - do we need
+# that for the guest??
+#CONFIG_ACPI_CPPC_LIB=y
+CONFIG_ACPI_PROCESSOR=y
+CONFIG_ACPI_HOTPLUG_CPU=y
+CONFIG_ARCH_HAS_ACPI_TABLE_UPGRADE=y
+CONFIG_ACPI_TABLE_UPGRADE=y
+CONFIG_ACPI_PCI_SLOT=y
+CONFIG_ACPI_CONTAINER=y
+CONFIG_ACPI_HOTPLUG_MEMORY=y
+CONFIG_ACPI_NFIT=y
+CONFIG_HAVE_ACPI_APEI=y
--- a/tools/packaging/kernel/configs/fragments/common/base.conf
+++ b/tools/packaging/kernel/configs/fragments/common/base.conf
@@ -0,0 +1,52 @@
+# Basic necessary items!
+
+CONFIG_SECTION_MISMATCH_WARN_ONLY=y
+CONFIG_SMP=y
+CONFIG_PARAVIRT=y
+# Note, no nested VM support enabled here
+
+# Turn off embedded mode, as it disabled 'too much', and we
+# no longer pass all the tests. We should refine this, and
+# work out which of the ~66 items it enables are really needed.
+# I believe this is the actual syntax we need for a fragment to
+# disable an item...
+# CONFIG_EMBEDDED is not set
+
+# Note, no virt enabled baloon yet
+CONFIG_INPUT=y
+CONFIG_PRINTK=y
+# We use this for metrics!
+CONFIG_PRINTK_TIME=y
+CONFIG_UNIX98_PTYS=y
+CONFIG_FUTEX=y
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
+CONFIG_GENERIC_MSI_IRQ=y
+CONFIG_NO_HZ=y
+CONFIG_NO_HZ_FULL=y
+CONFIG_POSIX_MQUEUE=y
+CONFIG_POSIX_TIMERS=y
+CONFIG_PROC_SYSCTL=y
+
+CONFIG_SHMEM=y
+
+# For security...
+CONFIG_RELOCATABLE=y
+CONFIG_RANDOMIZE_BASE=y
+# FIXME - check if we should be setting this
+# https://github.com/kata-containers/packaging/issues/483
+# I have a feeling it effects our memory hotplug maybe?
+# PHYSICAL_ALIGN=0x1000000
+
+# This would only affect two drivers, neither of which we have enabled.
+# The recommendation is to have it on, and you will see if in a diff if you
+# look for differences against the frag generated config - so, add it here as
+# a comment to make it clear in the future why we have not set it - as it would
+# only add noise to our frags and config.
+# PREVENT_FIRMWARE_BUILD=y
+
+# Trust the hardware vendor to initialise the RNG - which can speed up boot.
+# This can still be dynamically disabled on the kernel command line/kata config if needed.
+# Disable for now, as it upsets the entropy test, and we need to improve those: FIXME: see:
+# https://github.com/kata-containers/tests/issues/1543
+# RANDOM_TRUST_CPU=y
--- a/tools/packaging/kernel/configs/fragments/common/cgroup.conf
+++ b/tools/packaging/kernel/configs/fragments/common/cgroup.conf
@@ -0,0 +1,26 @@
+# Add cgroup support. Needed both for the agent to place the workload into, and
+# also used/looked for by systemd rootfs.
+CONFIG_CGROUPS=y
+CONFIG_MEMCG=y
+CONFIG_BLK_CGROUP=y
+CONFIG_CGROUP_WRITEBACK=y
+CONFIG_CGROUP_SCHED=y
+CONFIG_FAIR_GROUP_SCHED=y
+CONFIG_CFS_BANDWIDTH=y
+CONFIG_CGROUP_PIDS=y
+CONFIG_CGROUP_FREEZER=y
+CONFIG_CPUSETS=y
+CONFIG_CGROUP_DEVICE=y
+CONFIG_CGROUP_CPUACCT=y
+CONFIG_CGROUP_PERF=y
+CONFIG_SOCK_CGROUP_DATA=y
+
+# We have to enable SWAP CG, as runc/libcontainer in the agent currently fails
+# to write to it, even though it does some checks to see if swap is enabled.
+CONFIG_SWAP=y
+CONFIG_MEMCG_SWAP=y
+CONFIG_MEMCG_SWAP_ENABLED=y
+
+# Needed for cgroups v2
+CONFIG_BPF_SYSCALL=y
+CONFIG_CGROUP_BPF=y
--- a/tools/packaging/kernel/configs/fragments/common/cpu.conf
+++ b/tools/packaging/kernel/configs/fragments/common/cpu.conf
@@ -0,0 +1,7 @@
+# Items to do with CPU frequency, power etc.
+
+CONFIG_CPU_FREQ=y
+CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
+CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
+CONFIG_CPU_IDLE=y
+CONFIG_CPU_IDLE_GOV_MENU=y
--- a/tools/packaging/kernel/configs/fragments/common/crypto.conf
+++ b/tools/packaging/kernel/configs/fragments/common/crypto.conf
@@ -0,0 +1,17 @@
+# Need decompressors for root filesystems and kernels.
+# Do we need all of these?
+CONFIG_CRYPTO=y
+# Deflate used by IPSec and IPCOMP protocols
+# Also selects ZLIB and a couple of other algos
+CONFIG_CRYPTO_DEFLATE=y
+CONFIG_XZ_DEC=y
+CONFIG_ZLIB_DEFLATE=y
+# FIXME - check, do we need gzip?
+# https://github.com/kata-containers/packaging/issues/483
+CONFIG_DECOMPRESS_GZIP=y
+# Some items required by systemd: https://github.com/systemd/systemd/blob/master/README
+CONFIG_CRYPTO_USER_API=y
+CONFIG_CRYPTO_USER_API_HASH=y
+CONFIG_CRYPTO_SHA256=y
+CONFIG_CRYPTO_FIPS=y
+CONFIG_CRYPTO_ANSI_CPRNG=y
--- a/tools/packaging/kernel/configs/fragments/common/dax.conf
+++ b/tools/packaging/kernel/configs/fragments/common/dax.conf
@@ -0,0 +1,32 @@
+# Enable DAX and NVDIMM support so we can map in our rootfs
+
+# Need HOTREMOVE, or ZONE_DEVICE will not get enabled
+# We don't actually afaik remove any memory once we have plugged it in, as
+# generally it is too 'expensive' an operation.
+CONFIG_MEMORY_HOTREMOVE=y
+# Also need this
+CONFIG_SPARSEMEM_VMEMMAP=y
+
+# Without these the pmem_should_map_pages() call in the kernel fails with new
+# Related to the ARCH_HAS_HMM set in the arch files.
+CONFIG_ZONE_DEVICE=y
+CONFIG_DEV_PAGEMAP_OPS=y
+
+CONFIG_ND_PFN=y
+CONFIG_NVDIMM_PFN=y
+CONFIG_NVDIMM_DAX=y
+
+CONFIG_BLOCK=y
+CONFIG_BLK_DEV=y
+CONFIG_BLK_DEV_PMEM=y
+CONFIG_BLK_DEV_RAM=y
+CONFIG_LIBNVDIMM=y
+CONFIG_ND_BLK=y
+CONFIG_BTT=y
+# FIXME: Should check if this is really needed
+# https://github.com/kata-containers/packaging/issues/483
+CONFIG_NVMEM=y
+# Is auto selected by other options
+#CONFIG_DAX_DRIVER=y
+CONFIG_DAX=y
+CONFIG_FS_DAX=y
--- a/tools/packaging/kernel/configs/fragments/common/elf.conf
+++ b/tools/packaging/kernel/configs/fragments/common/elf.conf
@@ -0,0 +1,5 @@
+# Enable Elf loading, and script loading
+
+CONFIG_BINFMT_ELF=y
+CONFIG_BINFMT_SCRIPT=y
+CONFIG_BINFMT_MISC=y
--- a/tools/packaging/kernel/configs/fragments/common/experimental/virtio-fs.conf
+++ b/tools/packaging/kernel/configs/fragments/common/experimental/virtio-fs.conf
@@ -0,0 +1,3 @@
+# virtio-fs support
+CONFIG_VIRTIO_FS=y
+CONFIG_FUSE_FS=y
--- a/tools/packaging/kernel/configs/fragments/common/fs.conf
+++ b/tools/packaging/kernel/configs/fragments/common/fs.conf
@@ -0,0 +1,51 @@
+# Enable a whole bunch of filesystem related items
+
+CONFIG_BLK_DEV_INITRD=y
+
+# Recommended for Docker
+CONFIG_BLK_DEV_THROTTLING=y
+
+# Required for hotplug block devices into Kata, using SCSI
+CONFIG_BLK_DEV_LOOP=y
+CONFIG_BLK_DEV_BSG=y
+CONFIG_BLK_DEV_SD=y
+
+# support initial ramdisk
+CONFIG_RD_GZIP=y
+CONFIG_FS_IOMAP=y
+CONFIG_EXT4_FS=y
+CONFIG_EXT4_USE_FOR_EXT2=y
+CONFIG_EXT4_FS_POSIX_ACL=y
+CONFIG_EXT4_FS_SECURITY=y
+# FIXME - do we need journalling support in the container?
+# https://github.com/kata-containers/packaging/issues/483
+CONFIG_JBD2=y
+CONFIG_FS_MBCACHE=y
+CONFIG_XFS_FS=y
+CONFIG_FS_POSIX_ACL=y
+CONFIG_EXPORTFS=y
+CONFIG_EXPORTFS_BLOCK_OPS=y
+CONFIG_FILE_LOCKING=y
+CONFIG_MANDATORY_FILE_LOCKING=y
+# A bunch of these are required for systemd at least.
+CONFIG_FSNOTIFY=y
+CONFIG_DNOTIFY=y
+CONFIG_INOTIFY_USER=y
+CONFIG_FANOTIFY=y
+CONFIG_AUTOFS4_FS=y
+CONFIG_AUTOFS_FS=y
+CONFIG_TMPFS=y
+CONFIG_DEVTMPFS=y
+CONFIG_DEVTMPFS_MOUNT=y
+CONFIG_SIGNALFD=y
+CONFIG_TIMERFD=y
+CONFIG_EPOLL=y
+CONFIG_FHANDLE=y
+
+# We should support Async IO.
+CONFIG_AIO=y
+
+# Docker in Docker support requires overlay
+CONFIG_OVERLAY_FS=y
+CONFIG_OVERLAY_FS_INDEX=y
+CONFIG_OVERLAY_FS_REDIRECT_DIR=y
--- a/tools/packaging/kernel/configs/fragments/common/hotplug.conf
+++ b/tools/packaging/kernel/configs/fragments/common/hotplug.conf
@@ -0,0 +1,13 @@
+# Setups to support our hotplug - memory, PCI devices and cpus
+
+CONFIG_MEMORY_HOTPLUG=y
+CONFIG_HOTPLUG_CPU=y
+CONFIG_HOTPLUG_PCI=y
+CONFIG_HOTPLUG_PCI_PCIE=y
+CONFIG_PCIEPORTBUS=y
+CONFIG_HOTPLUG_PCI_ACPI=y
+CONFIG_PNPACPI=y
+
+# Define hotplugs to be online immediately. Speeds things up, and makes things
+# work smoother on some arch's.
+CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
--- a/tools/packaging/kernel/configs/fragments/common/huge.conf
+++ b/tools/packaging/kernel/configs/fragments/common/huge.conf
@@ -0,0 +1,12 @@
+# Items to enable large/huge mmu pages and tlbs etc.
+
+# Compaction is the only memory management component to form high order
+# (larger physically contiguous) memory blocks reliably. The lack of the
+# feature can lead to unexpected OOM killer invocations for high order memory requests.
+CONFIG_COMPACTION=y
+
+CONFIG_HUGETLBFS=y
+
+# Enable memory page physical migration here, as it can come
+# into play when trying to find space to allocate a hugepage.
+CONFIG_MIGRATION=y
--- a/tools/packaging/kernel/configs/fragments/common/mmio.conf
+++ b/tools/packaging/kernel/configs/fragments/common/mmio.conf
@@ -0,0 +1,3 @@
+# mmio devices are required for firecracker
+CONFIG_VIRTIO_MMIO=y
+CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES=y
--- a/tools/packaging/kernel/configs/fragments/common/mmu.conf
+++ b/tools/packaging/kernel/configs/fragments/common/mmu.conf
@@ -0,0 +1,5 @@
+# MMU specific items
+
+# vmap the kernel stacks - detects stack over-runs better and reduces
+# the stack attack window.
+CONFIG_VMAP_STACK=y
--- a/tools/packaging/kernel/configs/fragments/common/namespaces.conf
+++ b/tools/packaging/kernel/configs/fragments/common/namespaces.conf
@@ -0,0 +1,11 @@
+# We need namespaces to isolate the workload
+
+# Cannot have namespaces if not multi user...
+CONFIG_MULTIUSER=y
+CONFIG_NAMESPACES=y
+CONFIG_SYSVIPC=y
+CONFIG_UTS_NS=y
+CONFIG_IPC_NS=y
+CONFIG_USER_NS=y
+CONFIG_PID_NS=y
+CONFIG_NET_NS=y
--- a/tools/packaging/kernel/configs/fragments/common/netfilter.conf
+++ b/tools/packaging/kernel/configs/fragments/common/netfilter.conf
@@ -0,0 +1,203 @@
+# Netfilter (used by sidecars like istio)
+
+# FIXME - this is a big file - it could probably benefit from a
+# good reviewing. https://github.com/kata-containers/packaging/issues/483
+
+CONFIG_NETFILTER=y
+CONFIG_NETFILTER_ADVANCED=y
+CONFIG_NETFILTER_INGRESS=y
+CONFIG_NETFILTER_NETLINK=y
+CONFIG_NETFILTER_FAMILY_ARP=y
+CONFIG_NETFILTER_NETLINK_ACCT=y
+CONFIG_NETFILTER_NETLINK_QUEUE=y
+CONFIG_NETFILTER_NETLINK_LOG=y
+CONFIG_NETFILTER_NETLINK_OSF=y
+CONFIG_NF_CONNTRACK=y
+CONFIG_NF_LOG_COMMON=y
+CONFIG_NETFILTER_CONNCOUNT=y
+CONFIG_NF_CONNTRACK_MARK=y
+CONFIG_NF_CONNTRACK_ZONES=y
+CONFIG_NF_CONNTRACK_EVENTS=y
+CONFIG_NF_CONNTRACK_TIMEOUT=y
+CONFIG_NF_CONNTRACK_TIMESTAMP=y
+CONFIG_NF_CONNTRACK_LABELS=y
+CONFIG_NF_CT_PROTO_DCCP=y
+CONFIG_NF_CT_PROTO_GRE=y
+CONFIG_NF_CT_PROTO_SCTP=y
+CONFIG_NF_CT_PROTO_UDPLITE=y
+CONFIG_NF_CONNTRACK_AMANDA=y
+CONFIG_NF_CONNTRACK_FTP=y
+CONFIG_NF_CONNTRACK_H323=y
+CONFIG_NF_CONNTRACK_IRC=y
+CONFIG_NF_CONNTRACK_BROADCAST=y
+CONFIG_NF_CONNTRACK_NETBIOS_NS=y
+CONFIG_NF_CONNTRACK_SNMP=y
+CONFIG_NF_CONNTRACK_PPTP=y
+CONFIG_NF_CONNTRACK_SANE=y
+CONFIG_NF_CONNTRACK_SIP=y
+CONFIG_NF_CONNTRACK_TFTP=y
+CONFIG_NF_CT_NETLINK=y
+CONFIG_NF_CT_NETLINK_TIMEOUT=y
+CONFIG_NF_CT_NETLINK_HELPER=y
+CONFIG_NETFILTER_NETLINK_GLUE_CT=y
+CONFIG_NF_NAT=y
+# NF_NAT_NEEDED is removed in newer kernels - we should drop once we move to next LTS (5.4).
+#  This is part of whitelist.conf
+CONFIG_NF_NAT_NEEDED=y
+
+# NF_NAT_PROTO_* are removed in newer kernels, but needed currentlyi. They are part of whitelist.conf:
+CONFIG_NF_NAT_PROTO_DCCP=y
+CONFIG_NF_NAT_PROTO_UDPLITE=y
+CONFIG_NF_NAT_PROTO_SCTP=y
+CONFIG_NF_NAT_PROTO_GRE=y
+
+CONFIG_NF_NAT_AMANDA=y
+CONFIG_NF_NAT_FTP=y
+CONFIG_NF_NAT_IRC=y
+CONFIG_NF_NAT_SIP=y
+CONFIG_NF_NAT_TFTP=y
+CONFIG_NF_NAT_REDIRECT=y
+CONFIG_NETFILTER_SYNPROXY=y
+CONFIG_NETFILTER_XTABLES=y
+CONFIG_NETFILTER_XT_MARK=y
+CONFIG_NETFILTER_XT_CONNMARK=y
+CONFIG_NETFILTER_XT_SET=y
+CONFIG_NETFILTER_XT_TARGET_CHECKSUM=y
+CONFIG_NETFILTER_XT_TARGET_CLASSIFY=y
+CONFIG_NETFILTER_XT_TARGET_CONNMARK=y
+CONFIG_NETFILTER_XT_TARGET_CT=y
+CONFIG_NETFILTER_XT_TARGET_DSCP=y
+CONFIG_NETFILTER_XT_TARGET_HL=y
+CONFIG_NETFILTER_XT_TARGET_HMARK=y
+CONFIG_NETFILTER_XT_TARGET_IDLETIMER=y
+CONFIG_NETFILTER_XT_TARGET_LOG=y
+CONFIG_NETFILTER_XT_TARGET_MARK=y
+CONFIG_NETFILTER_XT_NAT=y
+CONFIG_NETFILTER_XT_TARGET_NETMAP=y
+CONFIG_NETFILTER_XT_TARGET_NFLOG=y
+CONFIG_NETFILTER_XT_TARGET_NFQUEUE=y
+CONFIG_NETFILTER_XT_TARGET_RATEEST=y
+CONFIG_NETFILTER_XT_TARGET_REDIRECT=y
+CONFIG_NETFILTER_XT_TARGET_TEE=y
+CONFIG_NETFILTER_XT_TARGET_TPROXY=y
+CONFIG_NETFILTER_XT_TARGET_TRACE=y
+CONFIG_NETFILTER_XT_TARGET_TCPMSS=y
+CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP=y
+CONFIG_NETFILTER_XT_MATCH_ADDRTYPE=y
+CONFIG_NETFILTER_XT_MATCH_BPF=y
+CONFIG_NETFILTER_XT_MATCH_CGROUP=y
+CONFIG_NETFILTER_XT_MATCH_CLUSTER=y
+CONFIG_NETFILTER_XT_MATCH_COMMENT=y
+CONFIG_NETFILTER_XT_MATCH_CONNBYTES=y
+CONFIG_NETFILTER_XT_MATCH_CONNLABEL=y
+CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=y
+CONFIG_NETFILTER_XT_MATCH_CONNMARK=y
+CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y
+CONFIG_NETFILTER_XT_MATCH_CPU=y
+CONFIG_NETFILTER_XT_MATCH_DCCP=y
+CONFIG_NETFILTER_XT_MATCH_DEVGROUP=y
+CONFIG_NETFILTER_XT_MATCH_DSCP=y
+CONFIG_NETFILTER_XT_MATCH_ECN=y
+CONFIG_NETFILTER_XT_MATCH_ESP=y
+CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=y
+CONFIG_NETFILTER_XT_MATCH_HELPER=y
+CONFIG_NETFILTER_XT_MATCH_HL=y
+CONFIG_NETFILTER_XT_MATCH_IPCOMP=y
+CONFIG_NETFILTER_XT_MATCH_IPRANGE=y
+CONFIG_NETFILTER_XT_MATCH_IPVS=y
+CONFIG_NETFILTER_XT_MATCH_L2TP=y
+CONFIG_NETFILTER_XT_MATCH_LENGTH=y
+CONFIG_NETFILTER_XT_MATCH_LIMIT=y
+CONFIG_NETFILTER_XT_MATCH_MAC=y
+CONFIG_NETFILTER_XT_MATCH_MARK=y
+CONFIG_NETFILTER_XT_MATCH_MULTIPORT=y
+CONFIG_NETFILTER_XT_MATCH_NFACCT=y
+CONFIG_NETFILTER_XT_MATCH_OSF=y
+CONFIG_NETFILTER_XT_MATCH_OWNER=y
+CONFIG_NETFILTER_XT_MATCH_POLICY=y
+CONFIG_NETFILTER_XT_MATCH_PKTTYPE=y
+CONFIG_NETFILTER_XT_MATCH_QUOTA=y
+CONFIG_NETFILTER_XT_MATCH_RATEEST=y
+CONFIG_NETFILTER_XT_MATCH_REALM=y
+CONFIG_NETFILTER_XT_MATCH_RECENT=y
+CONFIG_NETFILTER_XT_MATCH_SCTP=y
+CONFIG_NETFILTER_XT_MATCH_STATE=y
+CONFIG_NETFILTER_XT_MATCH_STATISTIC=y
+CONFIG_NETFILTER_XT_MATCH_STRING=y
+CONFIG_NETFILTER_XT_MATCH_TCPMSS=y
+CONFIG_NETFILTER_XT_MATCH_TIME=y
+CONFIG_NETFILTER_XT_MATCH_U32=y
+CONFIG_IP_SET=y
+CONFIG_IP_SET_BITMAP_IP=y
+CONFIG_IP_SET_BITMAP_IPMAC=y
+CONFIG_IP_SET_BITMAP_PORT=y
+CONFIG_IP_SET_HASH_IP=y
+CONFIG_IP_SET_HASH_IPMARK=y
+CONFIG_IP_SET_HASH_IPPORT=y
+CONFIG_IP_SET_HASH_IPPORTIP=y
+CONFIG_IP_SET_HASH_IPPORTNET=y
+CONFIG_IP_SET_HASH_MAC=y
+CONFIG_IP_SET_HASH_NETPORTNET=y
+CONFIG_IP_SET_HASH_NET=y
+CONFIG_IP_SET_HASH_NETNET=y
+CONFIG_IP_SET_HASH_NETPORT=y
+CONFIG_IP_SET_HASH_NETIFACE=y
+CONFIG_IP_SET_LIST_SET=y
+CONFIG_IP_VS=y
+CONFIG_IP_VS_PROTO_TCP=y
+CONFIG_IP_VS_PROTO_UDP=y
+CONFIG_IP_VS_PROTO_AH_ESP=y
+CONFIG_IP_VS_PROTO_ESP=y
+CONFIG_IP_VS_PROTO_AH=y
+CONFIG_IP_VS_PROTO_SCTP=y
+CONFIG_IP_VS_RR=y
+CONFIG_IP_VS_WRR=y
+CONFIG_IP_VS_LC=y
+CONFIG_IP_VS_WLC=y
+CONFIG_IP_VS_FO=y
+CONFIG_IP_VS_OVF=y
+CONFIG_IP_VS_LBLC=y
+CONFIG_IP_VS_LBLCR=y
+CONFIG_IP_VS_DH=y
+CONFIG_IP_VS_SH=y
+CONFIG_IP_VS_SED=y
+CONFIG_IP_VS_NQ=y
+CONFIG_IP_VS_FTP=y
+CONFIG_IP_VS_NFCT=y
+CONFIG_IP_VS_PE_SIP=y
+CONFIG_NF_DEFRAG_IPV4=y
+CONFIG_NF_TPROXY_IPV4=y
+CONFIG_NF_DUP_IPV4=y
+CONFIG_NF_LOG_IPV4=y
+CONFIG_NF_REJECT_IPV4=y
+
+# NF_NAT_IPV4 is removed in future kernel, and is part of whitelist.conf:
+CONFIG_NF_NAT_IPV4=y
+
+CONFIG_NF_NAT_SNMP_BASIC=y
+CONFIG_NF_NAT_PPTP=y
+CONFIG_NF_NAT_H323=y
+CONFIG_IP_NF_IPTABLES=y
+CONFIG_IP_NF_MATCH_AH=y
+CONFIG_IP_NF_MATCH_ECN=y
+CONFIG_IP_NF_MATCH_RPFILTER=y
+CONFIG_IP_NF_MATCH_TTL=y
+CONFIG_IP_NF_FILTER=y
+CONFIG_IP_NF_TARGET_REJECT=y
+CONFIG_IP_NF_TARGET_SYNPROXY=y
+CONFIG_IP_NF_NAT=y
+CONFIG_IP_NF_TARGET_MASQUERADE=y
+CONFIG_IP_NF_TARGET_NETMAP=y
+CONFIG_IP_NF_TARGET_REDIRECT=y
+CONFIG_IP_NF_MANGLE=y
+CONFIG_IP_NF_TARGET_CLUSTERIP=y
+CONFIG_IP_NF_TARGET_ECN=y
+CONFIG_IP_NF_TARGET_TTL=y
+CONFIG_IP_NF_RAW=y
+CONFIG_IP_NF_SECURITY=y
+CONFIG_IP_NF_ARPTABLES=y
+CONFIG_IP_NF_ARPFILTER=y
+CONFIG_IP_NF_ARP_MANGLE=y
+CONFIG_NF_DUP_IPV6=y
+CONFIG_NF_LOG_IPV6=y
+CONFIG_NF_DEFRAG_IPV6=y
--- a/tools/packaging/kernel/configs/fragments/common/network.conf
+++ b/tools/packaging/kernel/configs/fragments/common/network.conf
@@ -0,0 +1,75 @@
+# Our networking requirements
+### FIXME - this probably needs a good review ###
+# https://github.com/kata-containers/packaging/issues/483
+
+# pre-reqs
+CONFIG_NETDEVICES=y
+CONFIG_PROC_FS=y
+CONFIG_SYSFS=y
+CONFIG_SECURITY=y
+
+# The list
+CONFIG_NET=y
+CONFIG_ETHERNET=y
+CONFIG_NET_CORE=y
+CONFIG_NET_INGRESS=y
+CONFIG_PACKET=y
+CONFIG_PACKET_DIAG=y
+CONFIG_UNIX=y
+CONFIG_XFRM=y
+CONFIG_XFRM_ALGO=y
+CONFIG_XFRM_USER=y
+CONFIG_XFRM_SUB_POLICY=y
+# Used for mobile ipv6 type instances, unlikely we need
+#CONFIG_XFRM_MIGRATE=y
+# Developer feature - unlikely we need it
+#CONFIG_XFRM_STATISTICS=y
+CONFIG_INET=y
+CONFIG_IP_MULTICAST=y
+CONFIG_IP_ROUTE_CLASSID=y
+CONFIG_IP_PNP=y
+CONFIG_IP_PNP_DHCP=y
+CONFIG_SYN_COOKIES=y
+CONFIG_TCP_CONG_ADVANCED=y
+CONFIG_TCP_CONG_BBR=y
+CONFIG_DEFAULT_BBR=y
+CONFIG_TCP_MD5SIG=y
+CONFIG_IPV6=y
+CONFIG_IPV6_MULTIPLE_TABLES=y
+
+CONFIG_STP=y
+CONFIG_BRIDGE=y
+CONFIG_BRIDGE_IGMP_SNOOPING=y
+CONFIG_HAVE_NET_DSA=y
+CONFIG_LLC=y
+CONFIG_NET_SCHED=y
+CONFIG_NET_SCH_CBQ=y
+CONFIG_NET_SCH_MULTIQ=y
+CONFIG_NET_SCH_FQ_CODEL=y
+CONFIG_NET_SCH_FQ=y
+CONFIG_NET_CLS=y
+CONFIG_NET_CLS_CGROUP=y
+CONFIG_NET_EMATCH=y
+CONFIG_NET_SCH_FIFO=y
+CONFIG_VSOCKETS=y
+CONFIG_VIRTIO_VSOCKETS=y
+CONFIG_VIRTIO_VSOCKETS_COMMON=y
+CONFIG_NET_SWITCHDEV=y
+CONFIG_RPS=y
+CONFIG_RFS_ACCEL=y
+CONFIG_XPS=y
+CONFIG_CGROUP_NET_PRIO=y
+CONFIG_CGROUP_NET_CLASSID=y
+CONFIG_NET_RX_BUSY_POLL=y
+CONFIG_BQL=y
+CONFIG_NET_FLOW_LIMIT=y
+CONFIG_GRO_CELLS=y
+CONFIG_FAILOVER=y
+CONFIG_HAVE_EBPF_JIT=y
+
+# We v.likely need some intel chip support
+CONFIG_NET_VENDOR_INTEL=y
+
+# Add VETH support (necessary for running Docker in the guest)
+CONFIG_VETH=y
+# We quite likely need to add others for passthrough and maybe SRIOV support
--- a/tools/packaging/kernel/configs/fragments/common/seccomp.conf
+++ b/tools/packaging/kernel/configs/fragments/common/seccomp.conf
@@ -0,0 +1,4 @@
+# enable seccomp items
+
+CONFIG_SECCOMP=y
+CONFIG_SECCOMP_FILTER=y
--- a/tools/packaging/kernel/configs/fragments/common/security.conf
+++ b/tools/packaging/kernel/configs/fragments/common/security.conf
@@ -0,0 +1,6 @@
+
+# Let's enable stack protection checks, and strong checks
+# Estimated cost (detailed in the kernel config files)
+# is maybe 2.3% for both
+CONFIG_STACKPROTECTOR=y
+CONFIG_STACKPROTECTOR_STRONG=y
--- a/tools/packaging/kernel/configs/fragments/common/serial.conf
+++ b/tools/packaging/kernel/configs/fragments/common/serial.conf
@@ -0,0 +1,14 @@
+# We need some sort of 'serial' for virtio-serial consoles - at the moment.
+# We might not need all of thse though...
+# FIXME - https://github.com/kata-containers/packaging/issues/483
+CONFIG_SERIAL_8250_CONSOLE=y
+CONFIG_SERIAL_8250_PCI=y
+CONFIG_SERIAL_8250=y
+CONFIG_SERIAL_CORE_CONSOLE=y
+CONFIG_SERIAL_CORE=y
+CONFIG_SERIAL_EARLYCON=y
+
+# SERIO may be only for keyboards, mice etc., and not UARTS
+# We likely don't need
+#CONFIG_SERIO_RAW=y
+#CONFIG_SERIO=y
--- a/tools/packaging/kernel/configs/fragments/common/virtio.conf
+++ b/tools/packaging/kernel/configs/fragments/common/virtio.conf
@@ -0,0 +1,29 @@
+# We need virtio for 9p and serial and vsock at least
+
+# To get VIRTIO, we need a bus - ours of choice is PCI. We need to enable
+# PCI support to get VIRTIO_PCI support
+CONFIG_PCI=y
+CONFIG_PCI_MSI=y
+CONFIG_PCI_MSI_IRQ_DOMAIN=y
+# To get to the VIRTIO_PCI, we need the VIRTIO_MENU enabled
+CONFIG_VIRTIO_MENU=y
+CONFIG_VIRTIO_PCI=y
+# Without this nested-VM Kata does not work (we have not worked out exactly why)
+CONFIG_VIRTIO_PCI_LEGACY=y
+
+# This is used by the s390 arch at least. Leave it on globally.
+CONFIG_HW_RANDOM=y
+CONFIG_HW_RANDOM_VIRTIO=y
+
+# This is required for booting from pmem
+CONFIG_VIRTIO_PMEM=y
+
+# FIXME - are we moving away from/choosing between SCSI and BLK support?
+# https://github.com/kata-containers/packaging/issues/483
+CONFIG_SCSI=y
+CONFIG_SCSI_LOWLEVEL=y
+CONFIG_SCSI_VIRTIO=y
+CONFIG_VIRTIO_BLK=y
+CONFIG_TTY=y
+CONFIG_VIRTIO_CONSOLE=y
+CONFIG_VIRTIO_NET=y
--- a/tools/packaging/kernel/configs/fragments/gpu/intel.conf
+++ b/tools/packaging/kernel/configs/fragments/gpu/intel.conf
@@ -0,0 +1,7 @@
+# The following i915 kernel config options need to be enabled
+CONFIG_DRM=y
+CONFIG_DRM_I915=y
+CONFIG_DRM_I915_USERPTR=y
+
+# Linux kernel version suffix
+CONFIG_LOCALVERSION="-intel-gpu"
--- a/tools/packaging/kernel/configs/fragments/gpu/nvidia.conf
+++ b/tools/packaging/kernel/configs/fragments/gpu/nvidia.conf
@@ -0,0 +1,14 @@
+# Support mmconfig PCI config space access.
+# It's used to enable the MMIO access method for PCIe devices.
+CONFIG_PCI_MMCONFIG=y
+
+# Support for loading modules.
+# It is used to support loading GPU drivers.
+CONFIG_MODULES=y
+CONFIG_MODULE_UNLOAD=y
+
+# CRYPTO_FIPS requires this config when loading modules is enabled.
+CONFIG_MODULE_SIG=y
+
+# Linux kernel version suffix
+CONFIG_LOCALVERSION="-nvidia-gpu"
--- a/tools/packaging/kernel/configs/fragments/whitelist.conf
+++ b/tools/packaging/kernel/configs/fragments/whitelist.conf
@@ -0,0 +1,8 @@
+# configuration options which may dropped in newer kernels
+# without generating an error in fragment merging
+CONFIG_NF_NAT_IPV4
+CONFIG_NF_NAT_NEEDED
+CONFIG_NF_NAT_PROTO_DCCP
+CONFIG_NF_NAT_PROTO_GRE
+CONFIG_NF_NAT_PROTO_SCTP
+CONFIG_NF_NAT_PROTO_UDPLITE
--- a/tools/packaging/kernel/configs/fragments/x86_64/acpi.conf
+++ b/tools/packaging/kernel/configs/fragments/x86_64/acpi.conf
@@ -0,0 +1,14 @@
+CONFIG_X86_INTEL_PSTATE=y
+
+# For old smp systems that do not have proper acpi support.
+# Firecracker needs this to support `vcpu_count`
+CONFIG_X86_MPPARSE=y
+
+CONFIG_ACPI_CPU_FREQ_PSS=y
+CONFIG_ACPI_HOTPLUG_IOAPIC=y
+CONFIG_ACPI_LEGACY_TABLES_LOOKUP
+CONFIG_ACPI_LPIT=y
+CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC=y
+CONFIG_ACPI_PROCESSOR_CSTATE=y
+CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT=y
+CONFIG_HAVE_ACPI_APEI_NMI=y
--- a/tools/packaging/kernel/configs/fragments/x86_64/base.conf
+++ b/tools/packaging/kernel/configs/fragments/x86_64/base.conf
@@ -0,0 +1,20 @@
+CONFIG_X86=y
+CONFIG_X86_CPUID=y
+CONFIG_X86_MSR=y
+CONFIG_X86_X2APIC=y
+CONFIG_X86_VERBOSE_BOOTUP=y
+
+# Configs around linux guest support and optimizations.
+CONFIG_HYPERVISOR_GUEST=y
+CONFIG_KVM_GUEST=y
+
+# Use the maximum number of CPUs supported by KVM (240)
+CONFIG_NR_CPUS=240
+
+# For security
+CONFIG_LEGACY_VSYSCALL_NONE=y
+CONFIG_RETPOLINE=y
+
+# Boot directly into the uncompressed kernel
+# Reduce memory footprint
+CONFIG_PVH=y
--- a/tools/packaging/kernel/configs/fragments/x86_64/fs.conf
+++ b/tools/packaging/kernel/configs/fragments/x86_64/fs.conf
@@ -0,0 +1,5 @@
+# x86 specific filesystem items
+
+# Yes, we do support unaligned word accesses
+CONFIG_DCACHE_WORD_ACCESS=y
+
--- a/tools/packaging/kernel/configs/fragments/x86_64/hotplug.conf
+++ b/tools/packaging/kernel/configs/fragments/x86_64/hotplug.conf
@@ -0,0 +1,5 @@
+# Since we disable pci shpc hotplug for arm64,
+# See https://github.com/kata-containers/packaging/pull/498
+# for detailed reasons.
+# we move this config into x86_64-specific.
+CONFIG_HOTPLUG_PCI_SHPC=y
--- a/tools/packaging/kernel/configs/fragments/x86_64/mmu.conf
+++ b/tools/packaging/kernel/configs/fragments/x86_64/mmu.conf
@@ -0,0 +1,4 @@
+# x86 specific mmu/memory related items
+
+# Remove the kernel mapping from the user space - security improvement.
+CONFIG_PAGE_TABLE_ISOLATION=y
--- a/tools/packaging/kernel/configs/fragments/x86_64/nemu.conf
+++ b/tools/packaging/kernel/configs/fragments/x86_64/nemu.conf
@@ -0,0 +1,7 @@
+# Items needed to run the NEMU cut of QEMU
+# NEMU uses an EFI bios/boot, so requires a few extra bits
+
+CONFIG_MSDOS_PARTITION=y
+CONFIG_EFI=y
+CONFIG_EFI_ESRT=y
+CONFIG_EFI_RUNTIME_WRAPPERS=y
--- a/tools/packaging/kernel/configs/powerpc_kata_kvm_4.14.x
+++ b/tools/packaging/kernel/configs/powerpc_kata_kvm_4.14.x
--- a/tools/packaging/kernel/configs/powerpc_kata_kvm_4.19.x
+++ b/tools/packaging/kernel/configs/powerpc_kata_kvm_4.19.x
--- a/tools/packaging/kernel/configs/powerpc_kata_kvm_5.4.x
+++ b/tools/packaging/kernel/configs/powerpc_kata_kvm_5.4.x
--- a/tools/packaging/kernel/configs/s390_kata_kvm_4.19.x
+++ b/tools/packaging/kernel/configs/s390_kata_kvm_4.19.x
--- a/tools/packaging/kernel/configs/s390_kata_kvm_5.4.x
+++ b/tools/packaging/kernel/configs/s390_kata_kvm_5.4.x
--- a/tools/packaging/kernel/configs/x86_64_kata_kvm_4.14.x
+++ b/tools/packaging/kernel/configs/x86_64_kata_kvm_4.14.x
--- a/tools/packaging/kernel/configs/x86_64_kata_old_kernel_compat
+++ b/tools/packaging/kernel/configs/x86_64_kata_old_kernel_compat
@@ -0,0 +1,7 @@
+#
+# This file contains config options which is removed/modified in kernel 4.14 but
+# necessary for older kernels, if you're using a old kernel and failed to start
+# kata containers, try to add these options and hope it can help! Enjoy it!
+#
+CONFIG_DEVPTS_MULTIPLE_INSTANCES=y
+
--- a/tools/packaging/kernel/kata_config_version
+++ b/tools/packaging/kernel/kata_config_version
@@ -0,0 +1 @@
+80
--- a/tools/packaging/kernel/patches/4.19.x/0001-4.19-enable-ptp_kvm-for-arm64-in-kata.patch
+++ b/tools/packaging/kernel/patches/4.19.x/0001-4.19-enable-ptp_kvm-for-arm64-in-kata.patch
@@ -0,0 +1,457 @@
+From bee1ae5587a7427dbb9e9e313f6d0a43a9e0ec2e Mon Sep 17 00:00:00 2001
+From: Jianyong Wu <jianyong.wu@arm.com>
+Date: Mon, 30 Sep 2019 09:26:22 +0800
+Subject: [PATCH] 4.19: enable ptp_kvm for arm64 in kata
+
+---
+ drivers/clocksource/arm_arch_timer.c        | 25 ++++++
+ drivers/ptp/Kconfig                         |  2 +-
+ drivers/ptp/Makefile                        |  1 +
+ drivers/ptp/ptp_kvm_arm64.c                 | 59 ++++++++++++++
+ drivers/ptp/{ptp_kvm.c => ptp_kvm_common.c} | 89 +++++----------------
+ drivers/ptp/ptp_kvm_x86.c                   | 87 ++++++++++++++++++++
+ include/asm-generic/ptp_kvm.h               | 12 +++
+ include/linux/arm-smccc.h                   |  5 ++
+ virt/kvm/arm/psci.c                         | 12 +++
+ 9 files changed, 221 insertions(+), 71 deletions(-)
+ create mode 100644 drivers/ptp/ptp_kvm_arm64.c
+ rename drivers/ptp/{ptp_kvm.c => ptp_kvm_common.c} (56%)
+ create mode 100644 drivers/ptp/ptp_kvm_x86.c
+ create mode 100644 include/asm-generic/ptp_kvm.h
+
+diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
+index d8c7f5750cdb..84ba8f9e57be 100644
+--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
+@@ -1571,3 +1571,28 @@ static int __init arch_timer_acpi_init(struct acpi_table_header *table)
+ }
+ TIMER_ACPI_DECLARE(arch_timer, ACPI_SIG_GTDT, arch_timer_acpi_init);
+ #endif
+
+#if IS_ENABLED(CONFIG_PTP_1588_CLOCK_KVM)
+#include <linux/arm-smccc.h>
+int kvm_arch_ptp_get_clock_fn(long *cycle, struct timespec64 *ts,
+			      struct clocksource **cs)
+{
+	struct arm_smccc_res hvc_res;
+	ktime_t ktime_overall;
+	struct arm_smccc_quirk hvc_quirk;
+
+	 __arm_smccc_hvc(ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID, 0, 0, 0, 0, 0, 0, 0, &hvc_res, &hvc_quirk);
+
+       if ((long)(hvc_res.a0) < 0)
+               return -EOPNOTSUPP;
+
+       ts->tv_sec = hvc_res.a0;
+       ts->tv_nsec = hvc_res.a1;
+       *cycle = hvc_res.a2 << 32 | hvc_res.a3;
+       *cs = &clocksource_counter;
+
+      return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_arch_ptp_get_clock_fn);
+#endif
+
+diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
+index d137c480db46..318b3f5df1ea 100644
+--- a/drivers/ptp/Kconfig
+++ b/drivers/ptp/Kconfig
+@@ -109,7 +109,7 @@ config PTP_1588_CLOCK_PCH
+ config PTP_1588_CLOCK_KVM
+ 	tristate "KVM virtual PTP clock"
+ 	depends on PTP_1588_CLOCK
+-	depends on KVM_GUEST && X86
+	depends on KVM_GUEST && X86 || ARM64
+ 	default y
+ 	help
+ 	  This driver adds support for using kvm infrastructure as a PTP
+diff --git a/drivers/ptp/Makefile b/drivers/ptp/Makefile
+index 19efa9cfa950..1bf4940a88a6 100644
+--- a/drivers/ptp/Makefile
+++ b/drivers/ptp/Makefile
+@@ -4,6 +4,7 @@
+ #
+ 
+ ptp-y					:= ptp_clock.o ptp_chardev.o ptp_sysfs.o
+ptp_kvm-y				:= ptp_kvm_common.o ptp_kvm_$(ARCH).o
+ obj-$(CONFIG_PTP_1588_CLOCK)		+= ptp.o
+ obj-$(CONFIG_PTP_1588_CLOCK_DTE)	+= ptp_dte.o
+ obj-$(CONFIG_PTP_1588_CLOCK_IXP46X)	+= ptp_ixp46x.o
+diff --git a/drivers/ptp/ptp_kvm_arm64.c b/drivers/ptp/ptp_kvm_arm64.c
+new file mode 100644
+index 000000000000..fcd83324c7e1
+--- /dev/null
+++ b/drivers/ptp/ptp_kvm_arm64.c
+@@ -0,0 +1,59 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ *  Virtual PTP 1588 clock for use with KVM guests
+ *  Copyright (C) 2019 ARM Ltd.
+ *  All Rights Reserved
+ */
+
+#include <linux/kernel.h>
+#include <linux/err.h>
+#include <asm/hypervisor.h>
+#include <linux/module.h>
+#include <linux/psci.h>
+#include <linux/arm-smccc.h>
+#include <linux/timecounter.h>
+#include <linux/sched/clock.h>
+#include <asm/arch_timer.h>
+
+
+void arm_smccc_1_1_invoke(u32 id, struct arm_smccc_res *res)
+{
+	struct arm_smccc_quirk hvc_quirk;
+
+	 __arm_smccc_hvc(id, 0, 0, 0, 0, 0, 0, 0, res, &hvc_quirk);
+}
+
+int kvm_arch_ptp_init(void)
+{
+	struct arm_smccc_res hvc_res;
+
+	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID,
+					&hvc_res);
+	if ((long)(hvc_res.a0) < 0)
+		return -EOPNOTSUPP;
+
+	return 0;
+}
+
+int kvm_arch_ptp_get_clock_generic(struct timespec64 *ts,
+				   struct arm_smccc_res *hvc_res)
+{
+	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID,
+					hvc_res);
+	if ((long)(hvc_res->a0) < 0)
+		return -EOPNOTSUPP;
+
+	ts->tv_sec = hvc_res->a0;
+	ts->tv_nsec = hvc_res->a1;
+
+	return 0;
+}
+
+int kvm_arch_ptp_get_clock(struct timespec64 *ts)
+{
+	struct arm_smccc_res hvc_res;
+
+	kvm_arch_ptp_get_clock_generic(ts, &hvc_res);
+
+	return 0;
+}
+diff --git a/drivers/ptp/ptp_kvm.c b/drivers/ptp/ptp_kvm_common.c
+similarity index 56%
+rename from drivers/ptp/ptp_kvm.c
+rename to drivers/ptp/ptp_kvm_common.c
+index c67dd11e08b1..c0b445fa6144 100644
+--- a/drivers/ptp/ptp_kvm.c
+++ b/drivers/ptp/ptp_kvm_common.c
+@@ -1,29 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+ /*
+  * Virtual PTP 1588 clock for use with KVM guests
+  *
+  * Copyright (C) 2017 Red Hat Inc.
+- *
+- *  This program is free software; you can redistribute it and/or modify
+- *  it under the terms of the GNU General Public License as published by
+- *  the Free Software Foundation; either version 2 of the License, or
+- *  (at your option) any later version.
+- *
+- *  This program is distributed in the hope that it will be useful,
+- *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+- *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+- *  GNU General Public License for more details.
+- *
+  */
+ #include <linux/device.h>
+ #include <linux/err.h>
+ #include <linux/init.h>
+ #include <linux/kernel.h>
+#include <linux/slab.h>
+ #include <linux/module.h>
+ #include <uapi/linux/kvm_para.h>
+ #include <asm/kvm_para.h>
+-#include <asm/pvclock.h>
+-#include <asm/kvmclock.h>
+ #include <uapi/asm/kvm_para.h>
+#include <asm-generic/ptp_kvm.h>
+ 
+ #include <linux/ptp_clock_kernel.h>
+ 
+@@ -34,56 +24,29 @@ struct kvm_ptp_clock {
+ 
+ DEFINE_SPINLOCK(kvm_ptp_lock);
+ 
+-static struct pvclock_vsyscall_time_info *hv_clock;
+-
+-static struct kvm_clock_pairing clock_pair;
+-static phys_addr_t clock_pair_gpa;
+-
+ static int ptp_kvm_get_time_fn(ktime_t *device_time,
+ 			       struct system_counterval_t *system_counter,
+ 			       void *ctx)
+ {
+-	unsigned long ret;
+	unsigned long ret, cycle;
+ 	struct timespec64 tspec;
+-	unsigned version;
+-	int cpu;
+-	struct pvclock_vcpu_time_info *src;
+	struct clocksource *cs;
+ 
+ 	spin_lock(&kvm_ptp_lock);
+ 
+ 	preempt_disable_notrace();
+-	cpu = smp_processor_id();
+-	src = &hv_clock[cpu].pvti;
+-
+-	do {
+-		/*
+-		 * We are using a TSC value read in the hosts
+-		 * kvm_hc_clock_pairing handling.
+-		 * So any changes to tsc_to_system_mul
+-		 * and tsc_shift or any other pvclock
+-		 * data invalidate that measurement.
+-		 */
+-		version = pvclock_read_begin(src);
+-
+-		ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
+-				     clock_pair_gpa,
+-				     KVM_CLOCK_PAIRING_WALLCLOCK);
+-		if (ret != 0) {
+-			pr_err_ratelimited("clock pairing hypercall ret %lu\n", ret);
+-			spin_unlock(&kvm_ptp_lock);
+-			preempt_enable_notrace();
+-			return -EOPNOTSUPP;
+-		}
+-
+-		tspec.tv_sec = clock_pair.sec;
+-		tspec.tv_nsec = clock_pair.nsec;
+-		ret = __pvclock_read_cycles(src, clock_pair.tsc);
+-	} while (pvclock_read_retry(src, version));
+	ret = kvm_arch_ptp_get_clock_fn(&cycle, &tspec, &cs);
+	if (ret != 0) {
+		pr_err_ratelimited("clock pairing hypercall ret %lu\n", ret);
+		spin_unlock(&kvm_ptp_lock);
+		preempt_enable_notrace();
+		return -EOPNOTSUPP;
+	}
+ 
+ 	preempt_enable_notrace();
+ 
+-	system_counter->cycles = ret;
+-	system_counter->cs = &kvm_clock;
+	system_counter->cycles = cycle;
+	system_counter->cs = cs;
+ 
+ 	*device_time = timespec64_to_ktime(tspec);
+ 
+@@ -126,17 +89,13 @@ static int ptp_kvm_gettime(struct ptp_clock_info *ptp, struct timespec64 *ts)
+ 
+ 	spin_lock(&kvm_ptp_lock);
+ 
+-	ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
+-			     clock_pair_gpa,
+-			     KVM_CLOCK_PAIRING_WALLCLOCK);
+	ret = kvm_arch_ptp_get_clock(&tspec);
+ 	if (ret != 0) {
+ 		pr_err_ratelimited("clock offset hypercall ret %lu\n", ret);
+ 		spin_unlock(&kvm_ptp_lock);
+ 		return -EOPNOTSUPP;
+ 	}
+ 
+-	tspec.tv_sec = clock_pair.sec;
+-	tspec.tv_nsec = clock_pair.nsec;
+ 	spin_unlock(&kvm_ptp_lock);
+ 
+ 	memcpy(ts, &tspec, sizeof(struct timespec64));
+@@ -176,21 +135,11 @@ static void __exit ptp_kvm_exit(void)
+ 
+ static int __init ptp_kvm_init(void)
+ {
+-	long ret;
+-
+-	if (!kvm_para_available())
+-		return -ENODEV;
+	int ret;
+ 
+-	clock_pair_gpa = slow_virt_to_phys(&clock_pair);
+-	hv_clock = pvclock_get_pvti_cpu0_va();
+-
+-	if (!hv_clock)
+-		return -ENODEV;
+-
+-	ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING, clock_pair_gpa,
+-			KVM_CLOCK_PAIRING_WALLCLOCK);
+-	if (ret == -KVM_ENOSYS || ret == -KVM_EOPNOTSUPP)
+-		return -ENODEV;
+	ret = kvm_arch_ptp_init();
+	if (ret)
+		return -EOPNOTSUPP;
+ 
+ 	kvm_ptp_clock.caps = ptp_kvm_caps;
+ 
+diff --git a/drivers/ptp/ptp_kvm_x86.c b/drivers/ptp/ptp_kvm_x86.c
+new file mode 100644
+index 000000000000..a52cf1c2990c
+--- /dev/null
+++ b/drivers/ptp/ptp_kvm_x86.c
+@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Virtual PTP 1588 clock for use with KVM guests
+ *
+ * Copyright (C) 2017 Red Hat Inc.
+ */
+
+#include <asm/pvclock.h>
+#include <asm/kvmclock.h>
+#include <linux/module.h>
+#include <uapi/asm/kvm_para.h>
+#include <uapi/linux/kvm_para.h>
+#include <linux/ptp_clock_kernel.h>
+
+phys_addr_t clock_pair_gpa;
+struct kvm_clock_pairing clock_pair;
+struct pvclock_vsyscall_time_info *hv_clock;
+
+int kvm_arch_ptp_init(void)
+{
+	int ret;
+
+	if (!kvm_para_available())
+		return -ENODEV;
+
+	clock_pair_gpa = slow_virt_to_phys(&clock_pair);
+	hv_clock = pvclock_get_pvti_cpu0_va();
+	if (!hv_clock)
+		return -ENODEV;
+
+	ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING, clock_pair_gpa,
+			     KVM_CLOCK_PAIRING_WALLCLOCK);
+	if (ret == -KVM_ENOSYS || ret == -KVM_EOPNOTSUPP)
+		return -ENODEV;
+
+	return 0;
+}
+
+int kvm_arch_ptp_get_clock(struct timespec64 *ts)
+{
+	long ret;
+
+	ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
+			     clock_pair_gpa,
+			     KVM_CLOCK_PAIRING_WALLCLOCK);
+	if (ret != 0)
+		return -EOPNOTSUPP;
+
+	ts->tv_sec = clock_pair.sec;
+	ts->tv_nsec = clock_pair.nsec;
+
+	return 0;
+}
+
+int kvm_arch_ptp_get_clock_fn(unsigned long *cycle, struct timespec64 *tspec,
+			      struct clocksource **cs)
+{
+	unsigned long ret;
+	unsigned int version;
+	int cpu;
+	struct pvclock_vcpu_time_info *src;
+
+	cpu = smp_processor_id();
+	src = &hv_clock[cpu].pvti;
+
+	do {
+		/*
+		 * We are using a TSC value read in the hosts
+		 * kvm_hc_clock_pairing handling.
+		 * So any changes to tsc_to_system_mul
+		 * and tsc_shift or any other pvclock
+		 * data invalidate that measurement.
+		 */
+		version = pvclock_read_begin(src);
+
+		ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
+				     clock_pair_gpa,
+				     KVM_CLOCK_PAIRING_WALLCLOCK);
+		tspec->tv_sec = clock_pair.sec;
+		tspec->tv_nsec = clock_pair.nsec;
+		*cycle = __pvclock_read_cycles(src, clock_pair.tsc);
+	} while (pvclock_read_retry(src, version));
+
+	*cs = &kvm_clock;
+
+	return 0;
+}
+diff --git a/include/asm-generic/ptp_kvm.h b/include/asm-generic/ptp_kvm.h
+new file mode 100644
+index 000000000000..883eea494a80
+--- /dev/null
+++ b/include/asm-generic/ptp_kvm.h
+@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ *  linux/drivers/clocksource/arm_arch_timer.c
+ *
+ *  Copyright (C) 2019 ARM Ltd.
+ *  All Rights Reserved
+ */
+
+int kvm_arch_ptp_init(void);
+int kvm_arch_ptp_get_clock(struct timespec64 *ts);
+int kvm_arch_ptp_get_clock_fn(unsigned long *cycle,
+                struct timespec64 *tspec, void *cs);
+diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
+index 18863d56273c..10e99c82d098 100644
+--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
+@@ -75,6 +75,11 @@
+ 			   ARM_SMCCC_SMC_32,				\
+ 			   0, 1)
+ 
+#define ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID				\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_32,				\
+			   0, 2)
+
+ #define ARM_SMCCC_ARCH_WORKAROUND_1					\
+ 	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+ 			   ARM_SMCCC_SMC_32,				\
+diff --git a/virt/kvm/arm/psci.c b/virt/kvm/arm/psci.c
+index 9b73d3ad918a..9b9999bdeab7 100644
+--- a/virt/kvm/arm/psci.c
+++ b/virt/kvm/arm/psci.c
+@@ -407,6 +407,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
+ 	u32 func_id = smccc_get_function(vcpu);
+ 	u32 val = SMCCC_RET_NOT_SUPPORTED;
+ 	u32 feature;
+	struct timespec64 ts;
+	u64 cycles, cycle_high, cycle_low;
+	struct system_time_snapshot systime_snapshot;
+ 
+ 	switch (func_id) {
+ 	case ARM_SMCCC_VERSION_FUNC_ID:
+@@ -435,6 +438,15 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
+ 			break;
+ 		}
+ 		break;
+	case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
+		ktime_get_real_ts64(&ts);
+		ktime_get_snapshot(&systime_snapshot);
+		cycles = systime_snapshot.cycles - vcpu_vtimer(vcpu)->cntvoff;
+		cycle_high = cycles >> 32;
+		cycle_low = cycles << 32 >> 32;
+
+		smccc_set_retval(vcpu, ts.tv_sec, ts.tv_nsec, cycle_high, cycle_low);
+		return 1;
+ 	default:
+ 		return kvm_psci_call(vcpu);
+ 	}
+-- 
+2.17.1
+
--- a/tools/packaging/kernel/patches/4.19.x/0002-Enable-memory-hotplug-using-probe-for-arm64.patch
+++ b/tools/packaging/kernel/patches/4.19.x/0002-Enable-memory-hotplug-using-probe-for-arm64.patch
@@ -0,0 +1,98 @@
+From 33ffc9a93a1d9e72594d5eb3e4fc583a1a2911d1 Mon Sep 17 00:00:00 2001
+From: Jianyong Wu <jianyong.wu@arm.com>
+Date: Tue, 19 Feb 2019 01:15:32 -0500
+Subject: [PATCH 2/5] Enable memory-hotplug using probe for arm64
+
+---
+ arch/arm64/Kconfig   |  7 +++++++
+ arch/arm64/mm/init.c |  9 ++++++++-
+ arch/arm64/mm/mmu.c  | 17 +++++++++++++++++
+ arch/arm64/mm/numa.c | 10 ++++++++++
+ 4 files changed, 42 insertions(+), 1 deletion(-)
+
+diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
+index 1b1a0e95c751..881bea194d53 100644
+--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
+@@ -740,6 +740,13 @@ config NUMA
+ 	  local memory of the CPU and add some more
+ 	  NUMA awareness to the kernel.
+ 
+config ARCH_MEMORY_PROBE
+	def_bool y
+	depends on MEMORY_HOTPLUG
+
+config ARCH_ENABLE_MEMORY_HOTPLUG
+	def_bool y
+
+ config NODES_SHIFT
+ 	int "Maximum NUMA Nodes (as a power of 2)"
+ 	range 1 10
+diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
+index 787e27964ab9..e66e44b7bafe 100644
+--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
+@@ -288,9 +288,16 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
+ int pfn_valid(unsigned long pfn)
+ {
+ 	phys_addr_t addr = pfn << PAGE_SHIFT;
+-
+ 	if ((addr >> PAGE_SHIFT) != pfn)
+ 		return 0;
+
+#ifdef CONFIG_SPARSEMEM
+	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
+		return 0;
+
+	if (!valid_section(__nr_to_section(pfn_to_section_nr(pfn))))
+		return 0;
+#endif
+ 	return memblock_is_map_memory(addr);
+ }
+ EXPORT_SYMBOL(pfn_valid);
+diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
+index 8080c9f489c3..c393b37597af 100644
+--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
+@@ -1028,3 +1028,20 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr)
+ 	pmd_free(NULL, table);
+ 	return 1;
+ }
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
+		    bool want_memblock)
+{
+	int flags = 0;
+
+	if (debug_pagealloc_enabled())
+		flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
+
+	__create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
+			     size, PAGE_KERNEL, pgd_pgtable_alloc, flags);
+
+	return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
+			   altmap, want_memblock);
+}
+#endif
+diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
+index 146c04ceaa51..d276bd4d38b5 100644
+--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
+@@ -464,3 +464,13 @@ void __init arm64_numa_init(void)
+ 
+ 	numa_init(dummy_numa_init);
+ }
+
+/*
+ * We hope that we will be hotplugging memory on nodes we already know about,
+ * such that acpi_get_node() succeeds and we never fall back to this...
+ */
+int memory_add_physaddr_to_nid(u64 addr)
+{
+	pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr);
+	return 0;
+}
+-- 
+2.20.1
+
--- a/tools/packaging/kernel/patches/4.19.x/0003-NO-UPSTREAM-9P-always-use-cached-inode-to-fill-in-v9.patch
+++ b/tools/packaging/kernel/patches/4.19.x/0003-NO-UPSTREAM-9P-always-use-cached-inode-to-fill-in-v9.patch
@@ -0,0 +1,47 @@
+From cab495651e8f71c39e87a08abbe051916110b3ca Mon Sep 17 00:00:00 2001
+From: Julio Montes <julio.montes@intel.com>
+Date: Mon, 18 Sep 2017 11:46:59 -0500
+Subject: [PATCH 3/5] NO-UPSTREAM: 9P: always use cached inode to fill in
+ v9fs_vfs_getattr
+
+So that if in cache=none mode, we don't have to lookup server that
+might not support open-unlink-fstat operation.
+
+fixes https://github.com/01org/cc-oci-runtime/issues/47
+fixes https://github.com/01org/cc-oci-runtime/issues/1062
+
+Signed-off-by: Peng Tao <bergwolf@gmail.com>
+---
+ fs/9p/vfs_inode.c      | 2 +-
+ fs/9p/vfs_inode_dotl.c | 2 +-
+ 2 files changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
+index 85ff859d3af5..efdc2a8f37bb 100644
+--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
+@@ -1080,7 +1080,7 @@ v9fs_vfs_getattr(const struct path *path, struct kstat *stat,
+ 
+ 	p9_debug(P9_DEBUG_VFS, "dentry: %p\n", dentry);
+ 	v9ses = v9fs_dentry2v9ses(dentry);
+-	if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) {
+	if (!d_really_is_negative(dentry) || v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) {
+ 		generic_fillattr(d_inode(dentry), stat);
+ 		return 0;
+ 	}
+diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
+index 4823e1c46999..daa5e6a41864 100644
+--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
+@@ -480,7 +480,7 @@ v9fs_vfs_getattr_dotl(const struct path *path, struct kstat *stat,
+ 
+ 	p9_debug(P9_DEBUG_VFS, "dentry: %p\n", dentry);
+ 	v9ses = v9fs_dentry2v9ses(dentry);
+-	if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) {
+	if (!d_really_is_negative(dentry) || v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) {
+ 		generic_fillattr(d_inode(dentry), stat);
+ 		return 0;
+ 	}
+-- 
+2.20.1
+
--- a/tools/packaging/kernel/patches/4.19.x/0004-Compile-in-evged-always.patch
+++ b/tools/packaging/kernel/patches/4.19.x/0004-Compile-in-evged-always.patch
@@ -0,0 +1,29 @@
+From d78297bf9d8e41711bddc6003f460e815340a214 Mon Sep 17 00:00:00 2001
+From: Arjan van de Ven <arjan@linux.intel.com>
+Date: Fri, 10 Aug 2018 13:22:08 +0000
+Subject: [PATCH 4/5] Compile in evged always
+
+We need evged for NEMU (and in general for hw reduced)
+
+The config option cannot be set normally since it breaks all
+regular systems, and hardware reduced is really a runtime choice.
+---
+ drivers/acpi/Makefile | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
+index 6d59aa109a91..97f2fbbd5014 100644
+--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
+@@ -47,7 +47,7 @@ acpi-y				+= acpi_pnp.o
+ acpi-$(CONFIG_ARM_AMBA)	+= acpi_amba.o
+ acpi-y				+= power.o
+ acpi-y				+= event.o
+-acpi-$(CONFIG_ACPI_REDUCED_HARDWARE_ONLY) += evged.o
+acpi-y				+= evged.o
+ acpi-y				+= sysfs.o
+ acpi-y				+= property.o
+ acpi-$(CONFIG_X86)		+= acpi_cmos_rtc.o
+-- 
+2.20.1
+
--- a/tools/packaging/kernel/patches/4.19.x/0005-arm64-backport-Arm64-KVM-Dynamic-IPA-and-52bit-IPA-s.patch
+++ b/tools/packaging/kernel/patches/4.19.x/0005-arm64-backport-Arm64-KVM-Dynamic-IPA-and-52bit-IPA-s.patch
--- a/tools/packaging/kernel/patches/4.19.x/0006-net-virtio_vsock-Fix-race-condition-between-bind-and.patch
+++ b/tools/packaging/kernel/patches/4.19.x/0006-net-virtio_vsock-Fix-race-condition-between-bind-and.patch
@@ -0,0 +1,49 @@
+From 267ca21784bb307babbbb2f5a4a111da4da4c015 Mon Sep 17 00:00:00 2001
+From: Sebastien Boeuf <sebastien.boeuf@intel.com>
+Date: Thu, 13 Feb 2020 08:50:38 +0100
+Subject: [PATCH] net: virtio_vsock: Fix race condition between bind and listen
+
+Whenever the vsock backend on the host sends a packet through the RX
+queue, it expects an answer on the TX queue. Unfortunately, there is one
+case where the host side will hang waiting for the answer and will
+effectively never recover.
+
+This issue happens when the guest side starts binding to the socket,
+which insert a new bound socket into the list of already bound sockets.
+At this time, we expect the guest to also start listening, which will
+trigger the sk_state to move from TCP_CLOSE to TCP_LISTEN. The problem
+occurs if the host side queued a RX packet and triggered an interrupt
+right between the end of the binding process and the beginning of the
+listening process. In this specific case, the function processing the
+packet virtio_transport_recv_pkt() will find a bound socket, which means
+it will hit the switch statement checking for the sk_state, but the
+state won't be changed into TCP_LISTEN yet, which leads the code to pick
+the default statement. This default statement will only free the buffer,
+while it should also respond to the host side, by sending a packet on
+its TX queue.
+
+In order to simply fix this unfortunate chain of events, it is important
+that in case the default statement is entered, and because at this stage
+we know the host side is waiting for an answer, we must send back a
+packet containing the operation VIRTIO_VSOCK_OP_RST.
+
+Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
+---
+ net/vmw_vsock/virtio_transport_common.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
+index 2a8651aa90c8..7d83e2c80b15 100644
+--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
+@@ -1051,6 +1051,7 @@ void virtio_transport_recv_pkt(struct virtio_vsock_pkt *pkt)
+ 		virtio_transport_free_pkt(pkt);
+ 		break;
+ 	default:
+		(void)virtio_transport_reset_no_sock(pkt);
+ 		virtio_transport_free_pkt(pkt);
+ 		break;
+ 	}
+-- 
+2.20.1
+
--- a/tools/packaging/kernel/patches/5.4.x/0001-NO-UPSTREAM-9P-always-use-cached-inode-to-fill-in-v9.patch
+++ b/tools/packaging/kernel/patches/5.4.x/0001-NO-UPSTREAM-9P-always-use-cached-inode-to-fill-in-v9.patch
@@ -0,0 +1,47 @@
+From cab495651e8f71c39e87a08abbe051916110b3ca Mon Sep 17 00:00:00 2001
+From: Julio Montes <julio.montes@intel.com>
+Date: Mon, 18 Sep 2017 11:46:59 -0500
+Subject: [PATCH 3/5] NO-UPSTREAM: 9P: always use cached inode to fill in
+ v9fs_vfs_getattr
+
+So that if in cache=none mode, we don't have to lookup server that
+might not support open-unlink-fstat operation.
+
+fixes https://github.com/01org/cc-oci-runtime/issues/47
+fixes https://github.com/01org/cc-oci-runtime/issues/1062
+
+Signed-off-by: Peng Tao <bergwolf@gmail.com>
+---
+ fs/9p/vfs_inode.c      | 2 +-
+ fs/9p/vfs_inode_dotl.c | 2 +-
+ 2 files changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
+index 85ff859d3af5..efdc2a8f37bb 100644
+--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
+@@ -1080,7 +1080,7 @@ v9fs_vfs_getattr(const struct path *path, struct kstat *stat,
+ 
+ 	p9_debug(P9_DEBUG_VFS, "dentry: %p\n", dentry);
+ 	v9ses = v9fs_dentry2v9ses(dentry);
+-	if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) {
+	if (!d_really_is_negative(dentry) || v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) {
+ 		generic_fillattr(d_inode(dentry), stat);
+ 		return 0;
+ 	}
+diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
+index 4823e1c46999..daa5e6a41864 100644
+--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
+@@ -480,7 +480,7 @@ v9fs_vfs_getattr_dotl(const struct path *path, struct kstat *stat,
+ 
+ 	p9_debug(P9_DEBUG_VFS, "dentry: %p\n", dentry);
+ 	v9ses = v9fs_dentry2v9ses(dentry);
+-	if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) {
+	if (!d_really_is_negative(dentry) || v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) {
+ 		generic_fillattr(d_inode(dentry), stat);
+ 		return 0;
+ 	}
+-- 
+2.20.1
+
--- a/tools/packaging/kernel/patches/5.4.x/0002-net-virtio_vsock-Fix-race-condition-between-bind-and.patch
+++ b/tools/packaging/kernel/patches/5.4.x/0002-net-virtio_vsock-Fix-race-condition-between-bind-and.patch
@@ -0,0 +1,49 @@
+From ac1956caf20f8ac0589f69b2d5fcc81e6ba7c71a Mon Sep 17 00:00:00 2001
+From: Sebastien Boeuf <sebastien.boeuf@intel.com>
+Date: Thu, 13 Feb 2020 08:50:38 +0100
+Subject: [PATCH] net: virtio_vsock: Fix race condition between bind and listen
+
+Whenever the vsock backend on the host sends a packet through the RX
+queue, it expects an answer on the TX queue. Unfortunately, there is one
+case where the host side will hang waiting for the answer and will
+effectively never recover.
+
+This issue happens when the guest side starts binding to the socket,
+which insert a new bound socket into the list of already bound sockets.
+At this time, we expect the guest to also start listening, which will
+trigger the sk_state to move from TCP_CLOSE to TCP_LISTEN. The problem
+occurs if the host side queued a RX packet and triggered an interrupt
+right between the end of the binding process and the beginning of the
+listening process. In this specific case, the function processing the
+packet virtio_transport_recv_pkt() will find a bound socket, which means
+it will hit the switch statement checking for the sk_state, but the
+state won't be changed into TCP_LISTEN yet, which leads the code to pick
+the default statement. This default statement will only free the buffer,
+while it should also respond to the host side, by sending a packet on
+its TX queue.
+
+In order to simply fix this unfortunate chain of events, it is important
+that in case the default statement is entered, and because at this stage
+we know the host side is waiting for an answer, we must send back a
+packet containing the operation VIRTIO_VSOCK_OP_RST.
+
+Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
+---
+ net/vmw_vsock/virtio_transport_common.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
+index fb2060dffb0a..696e9a03ad0f 100644
+--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
+@@ -1127,6 +1127,7 @@ void virtio_transport_recv_pkt(struct virtio_vsock_pkt *pkt)
+ 		virtio_transport_free_pkt(pkt);
+ 		break;
+ 	default:
+		(void)virtio_transport_reset_no_sock(pkt);
+ 		virtio_transport_free_pkt(pkt);
+ 		break;
+ 	}
+-- 
+2.20.1
+
--- a/tools/packaging/kernel/patches/5.4.x/0003-arm-arm64-Provide-a-wrapper-for-SMCCC-1.1-calls.patch
+++ b/tools/packaging/kernel/patches/5.4.x/0003-arm-arm64-Provide-a-wrapper-for-SMCCC-1.1-calls.patch
@@ -0,0 +1,81 @@
+From 3d1d7f8922ed2f080f6d8e08df0d51e22f9590ec Mon Sep 17 00:00:00 2001
+From: Jianyong Wu <jianyong.wu@arm.com>
+Date: Wed, 1 Apr 2020 15:19:29 +0800
+Subject: [PATCH 1/9] arm/arm64: Provide a wrapper for SMCCC 1.1 calls
+
+From: Steven Price <steven.price@arm.com>
+
+SMCCC 1.1 calls may use either HVC or SMC depending on the PSCI
+conduit. Rather than coding this in every call site, provide a macro
+which uses the correct instruction. The macro also handles the case
+where no conduit is configured/available returning a not supported error
+in res, along with returning the conduit used for the call.
+
+This allow us to remove some duplicated code and will be useful later
+when adding paravirtualized time hypervisor calls.
+
+Signed-off-by: Steven Price <steven.price@arm.com>
+Acked-by: Will Deacon <will@kernel.org>
+Signed-off-by: Marc Zyngier <maz@kernel.org>
+---
+ include/linux/arm-smccc.h | 45 +++++++++++++++++++++++++++++++++++++++
+ 1 file changed, 45 insertions(+)
+
+diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
+index 080012a6f025..131edde5d37e 100644
+--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
+@@ -302,5 +302,50 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1,
+ #define SMCCC_RET_NOT_SUPPORTED			-1
+ #define SMCCC_RET_NOT_REQUIRED			-2
+ 
+/*
+ * Like arm_smccc_1_1* but always returns SMCCC_RET_NOT_SUPPORTED.
+ * Used when the SMCCC conduit is not defined. The empty asm statement
+ * avoids compiler warnings about unused variables.
+ */
+#define __fail_smccc_1_1(...)                                          \
+	do {                                                            \
+		__declare_args(__count_args(__VA_ARGS__), __VA_ARGS__); \
+		asm ("" __constraints(__count_args(__VA_ARGS__)));      \
+		if (___res)                                             \
+			___res->a0 = SMCCC_RET_NOT_SUPPORTED;           \
+	} while (0)
+
+/*
+ * arm_smccc_1_1_invoke() - make an SMCCC v1.1 compliant call
+ *
+ * This is a variadic macro taking one to eight source arguments, and
+ * an optional return structure.
+ *
+ * @a0-a7: arguments passed in registers 0 to 7
+ * @res: result values from registers 0 to 3
+ *
+ * This macro will make either an HVC call or an SMC call depending on the
+ * current SMCCC conduit. If no valid conduit is available then -1
+ * (SMCCC_RET_NOT_SUPPORTED) is returned in @res.a0 (if supplied).
+ *
+ * The return value also provides the conduit that was used.
+ */
+#define arm_smccc_1_1_invoke(...) ({					\
+		int method = arm_smccc_1_1_get_conduit();		\
+		switch (method) {					\
+		case SMCCC_CONDUIT_HVC:					\
+			arm_smccc_1_1_hvc(__VA_ARGS__);			\
+			break;						\
+		case SMCCC_CONDUIT_SMC:					\
+			arm_smccc_1_1_smc(__VA_ARGS__);			\
+			break;						\
+		default:						\
+			__fail_smccc_1_1(__VA_ARGS__);			\
+			method = SMCCC_CONDUIT_NONE;			\
+			break;						\
+		}							\
+		method;							\
+	})
+
+ #endif /*__ASSEMBLY__*/
+ #endif /*__LINUX_ARM_SMCCC_H*/
+-- 
+2.17.1
+
--- a/tools/packaging/kernel/patches/5.4.x/0004-arm-arm64-smccc-psci-add-arm_smccc_1_1_get_conduit.patch
+++ b/tools/packaging/kernel/patches/5.4.x/0004-arm-arm64-smccc-psci-add-arm_smccc_1_1_get_conduit.patch
@@ -0,0 +1,81 @@
+From b830806f5cd02119be9b25812b3ea56d97cd08f3 Mon Sep 17 00:00:00 2001
+From: Mark Rutland <mark.rutland@arm.com>
+Date: Fri, 9 Aug 2019 14:22:40 +0100
+Subject: [PATCH 2/9] arm/arm64: smccc/psci: add arm_smccc_1_1_get_conduit()
+
+SMCCC callers are currently amassing a collection of enums for the SMCCC
+conduit, and are having to dig into the PSCI driver's internals in order
+to figure out what to do.
+
+Let's clean this up, with common SMCCC_CONDUIT_* definitions, and an
+arm_smccc_1_1_get_conduit() helper that abstracts the PSCI driver's
+internal state.
+
+We can kill off the PSCI_CONDUIT_* definitions once we've migrated users
+over to the new interface.
+
+Signed-off-by: Mark Rutland <mark.rutland@arm.com>
+Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
+Acked-by: Will Deacon <will.deacon@arm.com>
+Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
+---
+ drivers/firmware/psci/psci.c | 15 +++++++++++++++
+ include/linux/arm-smccc.h    | 16 ++++++++++++++++
+ 2 files changed, 31 insertions(+)
+
+diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
+index 84f4ff351c62..eb797081d159 100644
+--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
+@@ -57,6 +57,21 @@ struct psci_operations psci_ops = {
+ 	.smccc_version = SMCCC_VERSION_1_0,
+ };
+ 
+enum arm_smccc_conduit arm_smccc_1_1_get_conduit(void)
+{
+	if (psci_ops.smccc_version < SMCCC_VERSION_1_1)
+		return SMCCC_CONDUIT_NONE;
+
+	switch (psci_ops.conduit) {
+	case PSCI_CONDUIT_SMC:
+		return SMCCC_CONDUIT_SMC;
+	case PSCI_CONDUIT_HVC:
+		return SMCCC_CONDUIT_HVC;
+	default:
+		return SMCCC_CONDUIT_NONE;
+	}
+}
+
+ typedef unsigned long (psci_fn)(unsigned long, unsigned long,
+ 				unsigned long, unsigned long);
+ static psci_fn *invoke_psci_fn;
+diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
+index 131edde5d37e..e6d4cb4f61f1 100644
+--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
+@@ -80,6 +80,22 @@
+ 
+ #include <linux/linkage.h>
+ #include <linux/types.h>
+
+enum arm_smccc_conduit {
+	SMCCC_CONDUIT_NONE,
+	SMCCC_CONDUIT_SMC,
+	SMCCC_CONDUIT_HVC,
+};
+
+/**
+ * arm_smccc_1_1_get_conduit()
+ *
+ * Returns the conduit to be used for SMCCCv1.1 or later.
+ *
+ * When SMCCCv1.1 is not present, returns SMCCC_CONDUIT_NONE.
+ */
+enum arm_smccc_conduit arm_smccc_1_1_get_conduit(void);
+
+ /**
+  * struct arm_smccc_res - Result from SMC/HVC call
+  * @a0-a3 result values from registers 0 to 3
+-- 
+2.17.1
+
--- a/tools/packaging/kernel/patches/5.4.x/0005-ptp-arm64-Enable-ptp_kvm-for-arm64.patch
+++ b/tools/packaging/kernel/patches/5.4.x/0005-ptp-arm64-Enable-ptp_kvm-for-arm64.patch
@@ -0,0 +1,641 @@
+From cb55878a1cecb7ef56956a28a9f1b745d0ac522b Mon Sep 17 00:00:00 2001
+From: Jianyong Wu <jianyong.wu@arm.com>
+Date: Wed, 1 Apr 2020 15:39:44 +0800
+Subject: [PATCH 3/3] ptp: arm64: Enable ptp_kvm for arm64.
+
+Currently in arm64 virtualization environment, there is no mechanism to
+keep time sync between guest and host. Time in guest will drift compared
+with host after boot up as they may both use third party time sources
+to correct their time respectively. The time deviation will be in order
+of milliseconds but some scenarios ask for higher time precision, like
+in cloud envirenment, we want all the VMs running in the host aquire the
+same level accuracy from host clock.
+
+Use of kvm ptp clock, which choose the host clock source clock as a
+reference clock to sync time clock between guest and host has been adopted
+by x86 which makes the time sync order from milliseconds to nanoseconds.
+
+This patch enables kvm ptp on arm64.
+
+Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
+---
+ drivers/clocksource/arm_arch_timer.c        | 24 ++++++
+ drivers/firmware/psci/psci.c                |  1 +
+ drivers/ptp/Kconfig                         |  2 +-
+ drivers/ptp/Makefile                        |  1 +
+ drivers/ptp/ptp_kvm.h                       | 11 +++
+ drivers/ptp/ptp_kvm_arm64.c                 | 51 ++++++++++++
+ drivers/ptp/{ptp_kvm.c => ptp_kvm_common.c} | 78 +++++-------------
+ drivers/ptp/ptp_kvm_x86.c                   | 87 +++++++++++++++++++++
+ include/linux/arm-smccc.h                   |  8 ++
+ include/linux/clocksource.h                 |  6 ++
+ include/linux/clocksource_ids.h             | 13 +++
+ include/linux/timekeeping.h                 | 12 +--
+ include/uapi/linux/kvm.h                    |  1 +
+ kernel/time/clocksource.c                   |  3 +
+ kernel/time/timekeeping.c                   |  1 +
+ virt/kvm/arm/arm.c                          |  1 +
+ virt/kvm/arm/psci.c                         | 23 ++++++
+ 17 files changed, 258 insertions(+), 65 deletions(-)
+ create mode 100644 drivers/ptp/ptp_kvm.h
+ create mode 100644 drivers/ptp/ptp_kvm_arm64.c
+ rename drivers/ptp/{ptp_kvm.c => ptp_kvm_common.c} (63%)
+ create mode 100644 drivers/ptp/ptp_kvm_x86.c
+ create mode 100644 include/linux/clocksource_ids.h
+
+diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
+index 9a5464c625b4..0c723df39b55 100644
+--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
+@@ -16,6 +16,7 @@
+ #include <linux/cpu_pm.h>
+ #include <linux/clockchips.h>
+ #include <linux/clocksource.h>
+#include <linux/clocksource_ids.h>
+ #include <linux/interrupt.h>
+ #include <linux/of_irq.h>
+ #include <linux/of_address.h>
+@@ -187,6 +188,7 @@ static u64 arch_counter_read_cc(const struct cyclecounter *cc)
+ 
+ static struct clocksource clocksource_counter = {
+ 	.name	= "arch_sys_counter",
+	.id	= CSID_ARM_ARCH_COUNTER,
+ 	.rating	= 400,
+ 	.read	= arch_counter_read,
+ 	.mask	= CLOCKSOURCE_MASK(56),
+@@ -1623,3 +1625,25 @@ static int __init arch_timer_acpi_init(struct acpi_table_header *table)
+ }
+ TIMER_ACPI_DECLARE(arch_timer, ACPI_SIG_GTDT, arch_timer_acpi_init);
+ #endif
+
+#if IS_ENABLED(CONFIG_PTP_1588_CLOCK_KVM)
+#include <linux/arm-smccc.h>
+int kvm_arch_ptp_get_crosststamp(unsigned long *cycle, struct timespec64 *ts,
+			      struct clocksource **cs)
+{
+	struct arm_smccc_res hvc_res;
+	ktime_t ktime_overall;
+
+	arm_smccc_1_1_invoke(ARM_SMCCC_HYP_KVM_PTP_FUNC_ID, &hvc_res);
+	if ((long)(hvc_res.a0) < 0)
+		return -EOPNOTSUPP;
+
+	ktime_overall = hvc_res.a0 << 32 | hvc_res.a1;
+	*ts = ktime_to_timespec64(ktime_overall);
+	*cycle = hvc_res.a2 << 32 | hvc_res.a3;
+	*cs = &clocksource_counter;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_arch_ptp_get_crosststamp);
+#endif
+diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
+index eb797081d159..87a7dc18b175 100644
+--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
+@@ -71,6 +71,7 @@ enum arm_smccc_conduit arm_smccc_1_1_get_conduit(void)
+ 		return SMCCC_CONDUIT_NONE;
+ 	}
+ }
+EXPORT_SYMBOL(arm_smccc_1_1_get_conduit);
+ 
+ typedef unsigned long (psci_fn)(unsigned long, unsigned long,
+ 				unsigned long, unsigned long);
+diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
+index 0517272a268e..6f3688e7e440 100644
+--- a/drivers/ptp/Kconfig
+++ b/drivers/ptp/Kconfig
+@@ -110,7 +110,7 @@ config PTP_1588_CLOCK_PCH
+ config PTP_1588_CLOCK_KVM
+ 	tristate "KVM virtual PTP clock"
+ 	depends on PTP_1588_CLOCK
+-	depends on KVM_GUEST && X86
+	depends on KVM_GUEST && X86 || ARM64 && ARM_ARCH_TIMER
+ 	default y
+ 	help
+ 	  This driver adds support for using kvm infrastructure as a PTP
+diff --git a/drivers/ptp/Makefile b/drivers/ptp/Makefile
+index 677d1d178a3e..3b7554f56ad9 100644
+--- a/drivers/ptp/Makefile
+++ b/drivers/ptp/Makefile
+@@ -4,6 +4,7 @@
+ #
+ 
+ ptp-y					:= ptp_clock.o ptp_chardev.o ptp_sysfs.o
+ptp_kvm-y				:= ptp_kvm_$(ARCH).o ptp_kvm_common.o
+ obj-$(CONFIG_PTP_1588_CLOCK)		+= ptp.o
+ obj-$(CONFIG_PTP_1588_CLOCK_DTE)	+= ptp_dte.o
+ obj-$(CONFIG_PTP_1588_CLOCK_IXP46X)	+= ptp_ixp46x.o
+diff --git a/drivers/ptp/ptp_kvm.h b/drivers/ptp/ptp_kvm.h
+new file mode 100644
+index 000000000000..4bf1802bbeb8
+--- /dev/null
+++ b/drivers/ptp/ptp_kvm.h
+@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Virtual PTP 1588 clock for use with KVM guests
+ *
+ * Copyright (C) 2017 Red Hat Inc.
+ */
+
+int kvm_arch_ptp_init(void);
+int kvm_arch_ptp_get_clock(struct timespec64 *ts);
+int kvm_arch_ptp_get_crosststamp(unsigned long *cycle,
+		struct timespec64 *tspec, void *cs);
+diff --git a/drivers/ptp/ptp_kvm_arm64.c b/drivers/ptp/ptp_kvm_arm64.c
+new file mode 100644
+index 000000000000..446f2444d285
+--- /dev/null
+++ b/drivers/ptp/ptp_kvm_arm64.c
+@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ *  Virtual PTP 1588 clock for use with KVM guests
+ *  Copyright (C) 2019 ARM Ltd.
+ *  All Rights Reserved
+ */
+
+#include <linux/kernel.h>
+#include <linux/err.h>
+#include <asm/hypervisor.h>
+#include <linux/module.h>
+#include <linux/psci.h>
+#include <linux/arm-smccc.h>
+#include <linux/timecounter.h>
+#include <linux/sched/clock.h>
+#include <asm/arch_timer.h>
+
+int kvm_arch_ptp_init(void)
+{
+	struct arm_smccc_res hvc_res;
+
+	arm_smccc_1_1_invoke(ARM_SMCCC_HYP_KVM_PTP_FUNC_ID, &hvc_res);
+	if ((long)(hvc_res.a0) < 0)
+		return -EOPNOTSUPP;
+
+	return 0;
+}
+
+int kvm_arch_ptp_get_clock_generic(struct timespec64 *ts,
+				   struct arm_smccc_res *hvc_res)
+{
+	ktime_t ktime_overall;
+
+	arm_smccc_1_1_invoke(ARM_SMCCC_HYP_KVM_PTP_FUNC_ID, hvc_res);
+	if ((long)(hvc_res->a0) < 0)
+		return -EOPNOTSUPP;
+
+	ktime_overall = hvc_res->a0 << 32 | hvc_res->a1;
+	*ts = ktime_to_timespec64(ktime_overall);
+
+	return 0;
+}
+
+int kvm_arch_ptp_get_clock(struct timespec64 *ts)
+{
+	struct arm_smccc_res hvc_res;
+
+	kvm_arch_ptp_get_clock_generic(ts, &hvc_res);
+
+	return 0;
+}
+diff --git a/drivers/ptp/ptp_kvm.c b/drivers/ptp/ptp_kvm_common.c
+similarity index 63%
+rename from drivers/ptp/ptp_kvm.c
+rename to drivers/ptp/ptp_kvm_common.c
+index fc7d0b77e118..60442f70d3fc 100644
+--- a/drivers/ptp/ptp_kvm.c
+++ b/drivers/ptp/ptp_kvm_common.c
+@@ -8,15 +8,16 @@
+ #include <linux/err.h>
+ #include <linux/init.h>
+ #include <linux/kernel.h>
+#include <linux/slab.h>
+ #include <linux/module.h>
+ #include <uapi/linux/kvm_para.h>
+ #include <asm/kvm_para.h>
+-#include <asm/pvclock.h>
+-#include <asm/kvmclock.h>
+ #include <uapi/asm/kvm_para.h>
+ 
+ #include <linux/ptp_clock_kernel.h>
+ 
+#include "ptp_kvm.h"
+
+ struct kvm_ptp_clock {
+ 	struct ptp_clock *ptp_clock;
+ 	struct ptp_clock_info caps;
+@@ -24,56 +25,29 @@ struct kvm_ptp_clock {
+ 
+ DEFINE_SPINLOCK(kvm_ptp_lock);
+ 
+-static struct pvclock_vsyscall_time_info *hv_clock;
+-
+-static struct kvm_clock_pairing clock_pair;
+-static phys_addr_t clock_pair_gpa;
+-
+ static int ptp_kvm_get_time_fn(ktime_t *device_time,
+ 			       struct system_counterval_t *system_counter,
+ 			       void *ctx)
+ {
+-	unsigned long ret;
+	unsigned long ret, cycle;
+ 	struct timespec64 tspec;
+-	unsigned version;
+-	int cpu;
+-	struct pvclock_vcpu_time_info *src;
+	struct clocksource *cs;
+ 
+ 	spin_lock(&kvm_ptp_lock);
+ 
+ 	preempt_disable_notrace();
+-	cpu = smp_processor_id();
+-	src = &hv_clock[cpu].pvti;
+-
+-	do {
+-		/*
+-		 * We are using a TSC value read in the hosts
+-		 * kvm_hc_clock_pairing handling.
+-		 * So any changes to tsc_to_system_mul
+-		 * and tsc_shift or any other pvclock
+-		 * data invalidate that measurement.
+-		 */
+-		version = pvclock_read_begin(src);
+-
+-		ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
+-				     clock_pair_gpa,
+-				     KVM_CLOCK_PAIRING_WALLCLOCK);
+-		if (ret != 0) {
+-			pr_err_ratelimited("clock pairing hypercall ret %lu\n", ret);
+-			spin_unlock(&kvm_ptp_lock);
+-			preempt_enable_notrace();
+-			return -EOPNOTSUPP;
+-		}
+-
+-		tspec.tv_sec = clock_pair.sec;
+-		tspec.tv_nsec = clock_pair.nsec;
+-		ret = __pvclock_read_cycles(src, clock_pair.tsc);
+-	} while (pvclock_read_retry(src, version));
+	ret = kvm_arch_ptp_get_crosststamp(&cycle, &tspec, &cs);
+	if (ret != 0) {
+		pr_err_ratelimited("clock pairing hypercall ret %lu\n", ret);
+		spin_unlock(&kvm_ptp_lock);
+		preempt_enable_notrace();
+		return -EOPNOTSUPP;
+	}
+ 
+ 	preempt_enable_notrace();
+ 
+-	system_counter->cycles = ret;
+-	system_counter->cs = &kvm_clock;
+	system_counter->cycles = cycle;
+	system_counter->cs = cs;
+ 
+ 	*device_time = timespec64_to_ktime(tspec);
+ 
+@@ -116,17 +90,13 @@ static int ptp_kvm_gettime(struct ptp_clock_info *ptp, struct timespec64 *ts)
+ 
+ 	spin_lock(&kvm_ptp_lock);
+ 
+-	ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
+-			     clock_pair_gpa,
+-			     KVM_CLOCK_PAIRING_WALLCLOCK);
+	ret = kvm_arch_ptp_get_clock(&tspec);
+ 	if (ret != 0) {
+ 		pr_err_ratelimited("clock offset hypercall ret %lu\n", ret);
+ 		spin_unlock(&kvm_ptp_lock);
+ 		return -EOPNOTSUPP;
+ 	}
+ 
+-	tspec.tv_sec = clock_pair.sec;
+-	tspec.tv_nsec = clock_pair.nsec;
+ 	spin_unlock(&kvm_ptp_lock);
+ 
+ 	memcpy(ts, &tspec, sizeof(struct timespec64));
+@@ -166,21 +136,11 @@ static void __exit ptp_kvm_exit(void)
+ 
+ static int __init ptp_kvm_init(void)
+ {
+-	long ret;
+-
+-	if (!kvm_para_available())
+-		return -ENODEV;
+	int ret;
+ 
+-	clock_pair_gpa = slow_virt_to_phys(&clock_pair);
+-	hv_clock = pvclock_get_pvti_cpu0_va();
+-
+-	if (!hv_clock)
+-		return -ENODEV;
+-
+-	ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING, clock_pair_gpa,
+-			KVM_CLOCK_PAIRING_WALLCLOCK);
+-	if (ret == -KVM_ENOSYS || ret == -KVM_EOPNOTSUPP)
+-		return -ENODEV;
+	ret = kvm_arch_ptp_init();
+	if (ret)
+		return -EOPNOTSUPP;
+ 
+ 	kvm_ptp_clock.caps = ptp_kvm_caps;
+ 
+diff --git a/drivers/ptp/ptp_kvm_x86.c b/drivers/ptp/ptp_kvm_x86.c
+new file mode 100644
+index 000000000000..6c891d7299c6
+--- /dev/null
+++ b/drivers/ptp/ptp_kvm_x86.c
+@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Virtual PTP 1588 clock for use with KVM guests
+ *
+ * Copyright (C) 2017 Red Hat Inc.
+ */
+
+#include <asm/pvclock.h>
+#include <asm/kvmclock.h>
+#include <linux/module.h>
+#include <uapi/asm/kvm_para.h>
+#include <uapi/linux/kvm_para.h>
+#include <linux/ptp_clock_kernel.h>
+
+phys_addr_t clock_pair_gpa;
+struct kvm_clock_pairing clock_pair;
+struct pvclock_vsyscall_time_info *hv_clock;
+
+int kvm_arch_ptp_init(void)
+{
+	int ret;
+
+	if (!kvm_para_available())
+		return -ENODEV;
+
+	clock_pair_gpa = slow_virt_to_phys(&clock_pair);
+	hv_clock = pvclock_get_pvti_cpu0_va();
+	if (!hv_clock)
+		return -ENODEV;
+
+	ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING, clock_pair_gpa,
+			     KVM_CLOCK_PAIRING_WALLCLOCK);
+	if (ret == -KVM_ENOSYS || ret == -KVM_EOPNOTSUPP)
+		return -ENODEV;
+
+	return 0;
+}
+
+int kvm_arch_ptp_get_clock(struct timespec64 *ts)
+{
+	long ret;
+
+	ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
+			     clock_pair_gpa,
+			     KVM_CLOCK_PAIRING_WALLCLOCK);
+	if (ret != 0)
+		return -EOPNOTSUPP;
+
+	ts->tv_sec = clock_pair.sec;
+	ts->tv_nsec = clock_pair.nsec;
+
+	return 0;
+}
+
+int kvm_arch_ptp_get_crosststamp(unsigned long *cycle, struct timespec64 *tspec,
+			      struct clocksource **cs)
+{
+	unsigned long ret;
+	unsigned int version;
+	int cpu;
+	struct pvclock_vcpu_time_info *src;
+
+	cpu = smp_processor_id();
+	src = &hv_clock[cpu].pvti;
+
+	do {
+		/*
+		 * We are using a TSC value read in the hosts
+		 * kvm_hc_clock_pairing handling.
+		 * So any changes to tsc_to_system_mul
+		 * and tsc_shift or any other pvclock
+		 * data invalidate that measurement.
+		 */
+		version = pvclock_read_begin(src);
+
+		ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
+				     clock_pair_gpa,
+				     KVM_CLOCK_PAIRING_WALLCLOCK);
+		tspec->tv_sec = clock_pair.sec;
+		tspec->tv_nsec = clock_pair.nsec;
+		*cycle = __pvclock_read_cycles(src, clock_pair.tsc);
+	} while (pvclock_read_retry(src, version));
+
+	*cs = &kvm_clock;
+
+	return 0;
+}
+diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
+index e6d4cb4f61f1..32a46d564934 100644
+--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
+@@ -45,6 +45,7 @@
+ #define ARM_SMCCC_OWNER_SIP		2
+ #define ARM_SMCCC_OWNER_OEM		3
+ #define ARM_SMCCC_OWNER_STANDARD	4
+#define ARM_SMCCC_OWNER_STANDARD_HYP	5
+ #define ARM_SMCCC_OWNER_TRUSTED_APP	48
+ #define ARM_SMCCC_OWNER_TRUSTED_APP_END	49
+ #define ARM_SMCCC_OWNER_TRUSTED_OS	50
+@@ -76,6 +77,13 @@
+ 			   ARM_SMCCC_SMC_32,				\
+ 			   0, 0x7fff)
+ 
+/* PTP KVM call requests clock time from guest OS to host */
+#define ARM_SMCCC_HYP_KVM_PTP_FUNC_ID					\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_32,				\
+			   ARM_SMCCC_OWNER_STANDARD_HYP,		\
+			   0)
+
+ #ifndef __ASSEMBLY__
+ 
+ #include <linux/linkage.h>
+diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
+index b21db536fd52..96e85b6f9ca0 100644
+--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
+@@ -17,6 +17,7 @@
+ #include <linux/timer.h>
+ #include <linux/init.h>
+ #include <linux/of.h>
+#include <linux/clocksource_ids.h>
+ #include <asm/div64.h>
+ #include <asm/io.h>
+ 
+@@ -49,6 +50,10 @@ struct module;
+  *			400-499: Perfect
+  *				The ideal clocksource. A must-use where
+  *				available.
+ * @id:			Defaults to CSID_GENERIC. The id value is captured
+ *			in certain snapshot functions to allow callers to
+ *			validate the clocksource from which the snapshot was
+ *			taken.
+  * @read:		returns a cycle value, passes clocksource as argument
+  * @enable:		optional function to enable the clocksource
+  * @disable:		optional function to disable the clocksource
+@@ -91,6 +96,7 @@ struct clocksource {
+ 	const char *name;
+ 	struct list_head list;
+ 	int rating;
+	enum clocksource_ids id;
+ 	int (*enable)(struct clocksource *cs);
+ 	void (*disable)(struct clocksource *cs);
+ 	unsigned long flags;
+diff --git a/include/linux/clocksource_ids.h b/include/linux/clocksource_ids.h
+new file mode 100644
+index 000000000000..93bec8426c44
+--- /dev/null
+++ b/include/linux/clocksource_ids.h
+@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_CLOCKSOURCE_IDS_H
+#define _LINUX_CLOCKSOURCE_IDS_H
+
+/* Enum to give clocksources a unique identifier */
+enum clocksource_ids {
+	CSID_GENERIC		= 0,
+	CSID_ARM_ARCH_COUNTER,
+	CSID_MAX,
+};
+
+#endif
+
+diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
+index b27e2ffa96c1..4ecc32ad3879 100644
+--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
+@@ -2,6 +2,7 @@
+ #ifndef _LINUX_TIMEKEEPING_H
+ #define _LINUX_TIMEKEEPING_H
+ 
+#include <linux/clocksource_ids.h>
+ #include <linux/errno.h>
+ 
+ /* Included from linux/ktime.h */
+@@ -232,11 +233,12 @@ extern void timekeeping_inject_sleeptime64(const struct timespec64 *delta);
+  * @cs_was_changed_seq:	The sequence number of clocksource change events
+  */
+ struct system_time_snapshot {
+-	u64		cycles;
+-	ktime_t		real;
+-	ktime_t		raw;
+-	unsigned int	clock_was_set_seq;
+-	u8		cs_was_changed_seq;
+	u64			cycles;
+	ktime_t			real;
+	ktime_t			raw;
+	enum clocksource_ids	cs_id;
+	unsigned int		clock_was_set_seq;
+	u8			cs_was_changed_seq;
+ };
+ 
+ /*
+diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
+index 52641d8ca9e8..16008ebe5474 100644
+--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
+@@ -1000,6 +1000,7 @@ struct kvm_ppc_resize_hpt {
+ #define KVM_CAP_PMU_EVENT_FILTER 173
+ #define KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 174
+ #define KVM_CAP_HYPERV_DIRECT_TLBFLUSH 175
+#define KVM_CAP_ARM_KVM_PTP 176
+ 
+ #ifdef KVM_CAP_IRQ_ROUTING
+ 
+diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
+index fff5f64981c6..5fe2d61172b1 100644
+--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
+@@ -921,6 +921,9 @@ int __clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq)
+ 
+ 	clocksource_arch_init(cs);
+ 
+	if (WARN_ON_ONCE((unsigned int)cs->id >= CSID_MAX))
+		cs->id = CSID_GENERIC;
+
+ 	/* Initialize mult/shift and max_idle_ns */
+ 	__clocksource_update_freq_scale(cs, scale, freq);
+ 
+diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
+index ca69290bee2a..a8b378338b9e 100644
+--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
+@@ -979,6 +979,7 @@ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
+ 	do {
+ 		seq = read_seqcount_begin(&tk_core.seq);
+ 		now = tk_clock_read(&tk->tkr_mono);
+		systime_snapshot->cs_id = tk->tkr_mono.clock->id;
+ 		systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq;
+ 		systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
+ 		base_real = ktime_add(tk->tkr_mono.base,
+diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
+index 86c6aa1cb58e..ee159ce9ca39 100644
+--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
+@@ -197,6 +197,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
+ 	case KVM_CAP_IMMEDIATE_EXIT:
+ 	case KVM_CAP_VCPU_EVENTS:
+ 	case KVM_CAP_ARM_IRQ_LINE_LAYOUT_2:
+	case KVM_CAP_ARM_KVM_PTP:
+ 		r = 1;
+ 		break;
+ 	case KVM_CAP_ARM_SET_DEVICE_ADDR:
+diff --git a/virt/kvm/arm/psci.c b/virt/kvm/arm/psci.c
+index 87927f7e1ee7..6e689f9952fb 100644
+--- a/virt/kvm/arm/psci.c
+++ b/virt/kvm/arm/psci.c
+@@ -9,6 +9,7 @@
+ #include <linux/kvm_host.h>
+ #include <linux/uaccess.h>
+ #include <linux/wait.h>
+#include <linux/clocksource_ids.h>
+ 
+ #include <asm/cputype.h>
+ #include <asm/kvm_emulate.h>
+@@ -389,6 +390,9 @@ static int kvm_psci_call(struct kvm_vcpu *vcpu)
+ 
+ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
+ {
+	struct system_time_snapshot systime_snapshot;
+	long arg[4];
+	u64 cycles;
+ 	u32 func_id = smccc_get_function(vcpu);
+ 	u32 val = SMCCC_RET_NOT_SUPPORTED;
+ 	u32 feature;
+@@ -428,6 +432,25 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
+ 			break;
+ 		}
+ 		break;
+	/*
+	 * This will used for virtual ptp kvm clock. three values will be
+	 * passed back.
+	 * reg0 stores high 32-bit host ktime;
+	 * reg1 stores low 32-bit host ktime;
+	 * reg2 stores high 32-bit difference of host cycles and cntvoff;
+	 * reg3 stores low 32-bit difference of host cycles and cntvoff.
+	 */
+	case ARM_SMCCC_HYP_KVM_PTP_FUNC_ID:
+		ktime_get_snapshot(&systime_snapshot);
+		if (systime_snapshot.cs_id != CSID_ARM_ARCH_COUNTER)
+			break;
+		arg[0] = systime_snapshot.real >> 32;
+		arg[1] = systime_snapshot.real << 32 >> 32;
+		cycles = systime_snapshot.cycles - vcpu_vtimer(vcpu)->cntvoff;
+		arg[2] = cycles >> 32;
+		arg[3] = cycles << 32 >> 32;
+		smccc_set_retval(vcpu, arg[0], arg[1], arg[2], arg[3]);
+		return 1;
+ 	default:
+ 		return kvm_psci_call(vcpu);
+ 	}
+-- 
+2.17.1
+
--- a/tools/packaging/kernel/patches/5.4.x/0006-arm64-mm-Enable-memory-hot-remove.patch
+++ b/tools/packaging/kernel/patches/5.4.x/0006-arm64-mm-Enable-memory-hot-remove.patch
@@ -0,0 +1,498 @@
+From ba91422b18892bceacf3b4aa60354cf36fcabf9b Mon Sep 17 00:00:00 2001
+From: Penny Zheng <penny.zheng@arm.com>
+Date: Wed, 8 Apr 2020 10:26:52 +0800
+Subject: [PATCH] arm64/mm: Enable memory hot remove
+
+Backport Anshuman Khandual's patch series of Enabling memory hot
+remove on aarch64(https://patchwork.kernel.org/cover/11419305/)
+to v5.4.x.
+This patch series has already been merged, and queued for 5.7.
+
+Signed-off-by: Penny Zheng <penny.zheng@arm.com>
+---
+ arch/arm64/Kconfig              |   3 +
+ arch/arm64/include/asm/memory.h |   1 +
+ arch/arm64/mm/mmu.c             | 379 +++++++++++++++++++++++++++++++-
+ arch/arm64/mm/ptdump_debugfs.c  |   4 +
+ 4 files changed, 378 insertions(+), 9 deletions(-)
+
+diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
+index 6ccd2ed30963..d18b716fa569 100644
+--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
+@@ -274,6 +274,9 @@ config ZONE_DMA32
+ config ARCH_ENABLE_MEMORY_HOTPLUG
+ 	def_bool y
+ 
+config ARCH_ENABLE_MEMORY_HOTREMOVE
+       def_bool y
+
+ config SMP
+ 	def_bool y
+ 
+diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
+index c23c47360664..dbba06e258f5 100644
+--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
+@@ -54,6 +54,7 @@
+ #define MODULES_VADDR		(BPF_JIT_REGION_END)
+ #define MODULES_VSIZE		(SZ_128M)
+ #define VMEMMAP_START		(-VMEMMAP_SIZE - SZ_2M)
+#define VMEMMAP_END		(VMEMMAP_START + VMEMMAP_SIZE)
+ #define PCI_IO_END		(VMEMMAP_START - SZ_2M)
+ #define PCI_IO_START		(PCI_IO_END - PCI_IO_SIZE)
+ #define FIXADDR_TOP		(PCI_IO_START - SZ_2M)
+diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
+index d10247fab0fd..99fec235144e 100644
+--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
+@@ -17,6 +17,7 @@
+ #include <linux/mman.h>
+ #include <linux/nodemask.h>
+ #include <linux/memblock.h>
+#include <linux/memory.h>
+ #include <linux/fs.h>
+ #include <linux/io.h>
+ #include <linux/mm.h>
+@@ -725,6 +726,312 @@ int kern_addr_valid(unsigned long addr)
+ 
+ 	return pfn_valid(pte_pfn(pte));
+ }
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+static void free_hotplug_page_range(struct page *page, size_t size)
+{
+	WARN_ON(PageReserved(page));
+	free_pages((unsigned long)page_address(page), get_order(size));
+}
+
+static void free_hotplug_pgtable_page(struct page *page)
+{
+	free_hotplug_page_range(page, PAGE_SIZE);
+}
+
+static bool pgtable_range_aligned(unsigned long start, unsigned long end,
+				  unsigned long floor, unsigned long ceiling,
+				  unsigned long mask)
+{
+	start &= mask;
+	if (start < floor)
+		return false;
+
+	if (ceiling) {
+		ceiling &= mask;
+		if (!ceiling)
+			return false;
+	}
+
+	if (end - 1 > ceiling - 1)
+		return false;
+	return true;
+}
+
+static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr,
+				    unsigned long end, bool free_mapped)
+{
+	pte_t *ptep, pte;
+
+	do {
+		ptep = pte_offset_kernel(pmdp, addr);
+		pte = READ_ONCE(*ptep);
+		if (pte_none(pte))
+			continue;
+
+		WARN_ON(!pte_present(pte));
+		pte_clear(&init_mm, addr, ptep);
+		flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
+		if (free_mapped)
+			free_hotplug_page_range(pte_page(pte), PAGE_SIZE);
+	} while (addr += PAGE_SIZE, addr < end);
+}
+
+static void unmap_hotplug_pmd_range(pud_t *pudp, unsigned long addr,
+				    unsigned long end, bool free_mapped)
+{
+	unsigned long next;
+	pmd_t *pmdp, pmd;
+
+	do {
+		next = pmd_addr_end(addr, end);
+		pmdp = pmd_offset(pudp, addr);
+		pmd = READ_ONCE(*pmdp);
+		if (pmd_none(pmd))
+			continue;
+
+		WARN_ON(!pmd_present(pmd));
+		if (pmd_sect(pmd)) {
+			pmd_clear(pmdp);
+
+			/*
+			 * One TLBI should be sufficient here as the PMD_SIZE
+			 * range is mapped with a single block entry.
+			 */
+			flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
+			if (free_mapped)
+				free_hotplug_page_range(pmd_page(pmd),
+							PMD_SIZE);
+			continue;
+		}
+		WARN_ON(!pmd_table(pmd));
+		unmap_hotplug_pte_range(pmdp, addr, next, free_mapped);
+	} while (addr = next, addr < end);
+}
+
+static void unmap_hotplug_pud_range(p4d_t *p4dp, unsigned long addr,
+				    unsigned long end, bool free_mapped)
+{
+	unsigned long next;
+	pud_t *pudp, pud;
+
+	do {
+		next = pud_addr_end(addr, end);
+		pudp = pud_offset(p4dp, addr);
+		pud = READ_ONCE(*pudp);
+		if (pud_none(pud))
+			continue;
+
+		WARN_ON(!pud_present(pud));
+		if (pud_sect(pud)) {
+			pud_clear(pudp);
+
+			/*
+			 * One TLBI should be sufficient here as the PUD_SIZE
+			 * range is mapped with a single block entry.
+			 */
+			flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
+			if (free_mapped)
+				free_hotplug_page_range(pud_page(pud),
+							PUD_SIZE);
+			continue;
+		}
+		WARN_ON(!pud_table(pud));
+		unmap_hotplug_pmd_range(pudp, addr, next, free_mapped);
+	} while (addr = next, addr < end);
+}
+
+static void unmap_hotplug_p4d_range(pgd_t *pgdp, unsigned long addr,
+				    unsigned long end, bool free_mapped)
+{
+	unsigned long next;
+	p4d_t *p4dp, p4d;
+
+	do {
+		next = p4d_addr_end(addr, end);
+		p4dp = p4d_offset(pgdp, addr);
+		p4d = READ_ONCE(*p4dp);
+		if (p4d_none(p4d))
+			continue;
+
+		WARN_ON(!p4d_present(p4d));
+		unmap_hotplug_pud_range(p4dp, addr, next, free_mapped);
+	} while (addr = next, addr < end);
+}
+
+static void unmap_hotplug_range(unsigned long addr, unsigned long end,
+				bool free_mapped)
+{
+	unsigned long next;
+	pgd_t *pgdp, pgd;
+
+	do {
+		next = pgd_addr_end(addr, end);
+		pgdp = pgd_offset_k(addr);
+		pgd = READ_ONCE(*pgdp);
+		if (pgd_none(pgd))
+			continue;
+
+		WARN_ON(!pgd_present(pgd));
+		unmap_hotplug_p4d_range(pgdp, addr, next, free_mapped);
+	} while (addr = next, addr < end);
+}
+
+static void free_empty_pte_table(pmd_t *pmdp, unsigned long addr,
+				 unsigned long end, unsigned long floor,
+				 unsigned long ceiling)
+{
+	pte_t *ptep, pte;
+	unsigned long i, start = addr;
+
+	do {
+		ptep = pte_offset_kernel(pmdp, addr);
+		pte = READ_ONCE(*ptep);
+
+		/*
+		 * This is just a sanity check here which verifies that
+		 * pte clearing has been done by earlier unmap loops.
+		 */
+		WARN_ON(!pte_none(pte));
+	} while (addr += PAGE_SIZE, addr < end);
+
+	if (!pgtable_range_aligned(start, end, floor, ceiling, PMD_MASK))
+		return;
+
+	/*
+	 * Check whether we can free the pte page if the rest of the
+	 * entries are empty. Overlap with other regions have been
+	 * handled by the floor/ceiling check.
+	 */
+	ptep = pte_offset_kernel(pmdp, 0UL);
+	for (i = 0; i < PTRS_PER_PTE; i++) {
+		if (!pte_none(READ_ONCE(ptep[i])))
+			return;
+	}
+
+	pmd_clear(pmdp);
+	__flush_tlb_kernel_pgtable(start);
+	free_hotplug_pgtable_page(virt_to_page(ptep));
+}
+
+static void free_empty_pmd_table(pud_t *pudp, unsigned long addr,
+				 unsigned long end, unsigned long floor,
+				 unsigned long ceiling)
+{
+	pmd_t *pmdp, pmd;
+	unsigned long i, next, start = addr;
+
+	do {
+		next = pmd_addr_end(addr, end);
+		pmdp = pmd_offset(pudp, addr);
+		pmd = READ_ONCE(*pmdp);
+		if (pmd_none(pmd))
+			continue;
+
+		WARN_ON(!pmd_present(pmd) || !pmd_table(pmd) || pmd_sect(pmd));
+		free_empty_pte_table(pmdp, addr, next, floor, ceiling);
+	} while (addr = next, addr < end);
+
+	if (CONFIG_PGTABLE_LEVELS <= 2)
+		return;
+
+	if (!pgtable_range_aligned(start, end, floor, ceiling, PUD_MASK))
+		return;
+
+	/*
+	 * Check whether we can free the pmd page if the rest of the
+	 * entries are empty. Overlap with other regions have been
+	 * handled by the floor/ceiling check.
+	 */
+	pmdp = pmd_offset(pudp, 0UL);
+	for (i = 0; i < PTRS_PER_PMD; i++) {
+		if (!pmd_none(READ_ONCE(pmdp[i])))
+			return;
+	}
+
+	pud_clear(pudp);
+	__flush_tlb_kernel_pgtable(start);
+	free_hotplug_pgtable_page(virt_to_page(pmdp));
+}
+
+static void free_empty_pud_table(p4d_t *p4dp, unsigned long addr,
+				 unsigned long end, unsigned long floor,
+				 unsigned long ceiling)
+{
+	pud_t *pudp, pud;
+	unsigned long i, next, start = addr;
+
+	do {
+		next = pud_addr_end(addr, end);
+		pudp = pud_offset(p4dp, addr);
+		pud = READ_ONCE(*pudp);
+		if (pud_none(pud))
+			continue;
+
+		WARN_ON(!pud_present(pud) || !pud_table(pud) || pud_sect(pud));
+		free_empty_pmd_table(pudp, addr, next, floor, ceiling);
+	} while (addr = next, addr < end);
+
+	if (CONFIG_PGTABLE_LEVELS <= 3)
+		return;
+
+	if (!pgtable_range_aligned(start, end, floor, ceiling, PGDIR_MASK))
+		return;
+
+	/*
+	 * Check whether we can free the pud page if the rest of the
+	 * entries are empty. Overlap with other regions have been
+	 * handled by the floor/ceiling check.
+	 */
+	pudp = pud_offset(p4dp, 0UL);
+	for (i = 0; i < PTRS_PER_PUD; i++) {
+		if (!pud_none(READ_ONCE(pudp[i])))
+			return;
+	}
+
+	p4d_clear(p4dp);
+	__flush_tlb_kernel_pgtable(start);
+	free_hotplug_pgtable_page(virt_to_page(pudp));
+}
+
+static void free_empty_p4d_table(pgd_t *pgdp, unsigned long addr,
+				 unsigned long end, unsigned long floor,
+				 unsigned long ceiling)
+{
+	unsigned long next;
+	p4d_t *p4dp, p4d;
+
+	do {
+		next = p4d_addr_end(addr, end);
+		p4dp = p4d_offset(pgdp, addr);
+		p4d = READ_ONCE(*p4dp);
+		if (p4d_none(p4d))
+			continue;
+
+		WARN_ON(!p4d_present(p4d));
+		free_empty_pud_table(p4dp, addr, next, floor, ceiling);
+	} while (addr = next, addr < end);
+}
+
+static void free_empty_tables(unsigned long addr, unsigned long end,
+			      unsigned long floor, unsigned long ceiling)
+{
+	unsigned long next;
+	pgd_t *pgdp, pgd;
+
+	do {
+		next = pgd_addr_end(addr, end);
+		pgdp = pgd_offset_k(addr);
+		pgd = READ_ONCE(*pgdp);
+		if (pgd_none(pgd))
+			continue;
+
+		WARN_ON(!pgd_present(pgd));
+		free_empty_p4d_table(pgdp, addr, next, floor, ceiling);
+	} while (addr = next, addr < end);
+}
+#endif
+
+ #ifdef CONFIG_SPARSEMEM_VMEMMAP
+ #if !ARM64_SWAPPER_USES_SECTION_MAPS
+ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
+@@ -772,6 +1079,12 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
+ void vmemmap_free(unsigned long start, unsigned long end,
+ 		struct vmem_altmap *altmap)
+ {
+#ifdef CONFIG_MEMORY_HOTPLUG
+	WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
+
+	unmap_hotplug_range(start, end, true);
+	free_empty_tables(start, end, VMEMMAP_START, VMEMMAP_END);
+#endif
+ }
+ #endif	/* CONFIG_SPARSEMEM_VMEMMAP */
+ 
+@@ -1050,10 +1363,21 @@ int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
+ }
+ 
+ #ifdef CONFIG_MEMORY_HOTPLUG
+static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size)
+{
+	unsigned long end = start + size;
+
+	WARN_ON(pgdir != init_mm.pgd);
+	WARN_ON((start < PAGE_OFFSET) || (end > PAGE_END));
+
+	unmap_hotplug_range(start, end, false);
+	free_empty_tables(start, end, PAGE_OFFSET, PAGE_END);
+}
+
+ int arch_add_memory(int nid, u64 start, u64 size,
+ 			struct mhp_restrictions *restrictions)
+ {
+-	int flags = 0;
+	int ret, flags = 0;
+ 
+ 	if (rodata_full || debug_pagealloc_enabled())
+ 		flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
+@@ -1061,22 +1385,59 @@ int arch_add_memory(int nid, u64 start, u64 size,
+ 	__create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
+ 			     size, PAGE_KERNEL, __pgd_pgtable_alloc, flags);
+ 
+-	return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
+	ret = __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
+ 			   restrictions);
+	if (ret)
+		__remove_pgd_mapping(swapper_pg_dir,
+				     __phys_to_virt(start), size);
+	return ret;
+ }
+
+ void arch_remove_memory(int nid, u64 start, u64 size,
+ 			struct vmem_altmap *altmap)
+ {
+ 	unsigned long start_pfn = start >> PAGE_SHIFT;
+ 	unsigned long nr_pages = size >> PAGE_SHIFT;
+ 
+-	/*
+-	 * FIXME: Cleanup page tables (also in arch_add_memory() in case
+-	 * adding fails). Until then, this function should only be used
+-	 * during memory hotplug (adding memory), not for memory
+-	 * unplug. ARCH_ENABLE_MEMORY_HOTREMOVE must not be
+-	 * unlocked yet.
+-	 */
+ 	__remove_pages(start_pfn, nr_pages, altmap);
+	__remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size);
+}
+
+/*
+ * This memory hotplug notifier helps prevent boot memory from being
+ * inadvertently removed as it blocks pfn range offlining process in
+ * __offline_pages(). Hence this prevents both offlining as well as
+ * removal process for boot memory which is initially always online.
+ * In future if and when boot memory could be removed, this notifier
+ * should be dropped and free_hotplug_page_range() should handle any
+ * reserved pages allocated during boot.
+ */
+static int prevent_bootmem_remove_notifier(struct notifier_block *nb,
+					   unsigned long action, void *data)
+{
+	struct mem_section *ms;
+	struct memory_notify *arg = data;
+	unsigned long end_pfn = arg->start_pfn + arg->nr_pages;
+	unsigned long pfn = arg->start_pfn;
+
+	if (action != MEM_GOING_OFFLINE)
+		return NOTIFY_OK;
+
+	for (; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
+		ms = __pfn_to_section(pfn);
+		if (early_section(ms))
+			return NOTIFY_BAD;
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block prevent_bootmem_remove_nb = {
+	.notifier_call = prevent_bootmem_remove_notifier,
+};
+
+static int __init prevent_bootmem_remove_init(void)
+{
+	return register_memory_notifier(&prevent_bootmem_remove_nb);
+ }
+device_initcall(prevent_bootmem_remove_init);
+ #endif
+diff --git a/arch/arm64/mm/ptdump_debugfs.c b/arch/arm64/mm/ptdump_debugfs.c
+index 064163f25592..b5eebc8c4924 100644
+--- a/arch/arm64/mm/ptdump_debugfs.c
+++ b/arch/arm64/mm/ptdump_debugfs.c
+@@ -1,5 +1,6 @@
+ // SPDX-License-Identifier: GPL-2.0
+ #include <linux/debugfs.h>
+#include <linux/memory_hotplug.h>
+ #include <linux/seq_file.h>
+ 
+ #include <asm/ptdump.h>
+@@ -7,7 +8,10 @@
+ static int ptdump_show(struct seq_file *m, void *v)
+ {
+ 	struct ptdump_info *info = m->private;
+
+	get_online_mems();
+ 	ptdump_walk_pgd(m, info);
+	put_online_mems();
+ 	return 0;
+ }
+ DEFINE_SHOW_ATTRIBUTE(ptdump);
+-- 
+2.17.1
+
--- a/tools/packaging/kernel/patches/virtio-fs-dev.virtio-fs-dev.x/0001-net-virtio_vsock-Fix-race-condition-between-bind-and.patch
+++ b/tools/packaging/kernel/patches/virtio-fs-dev.virtio-fs-dev.x/0001-net-virtio_vsock-Fix-race-condition-between-bind-and.patch
@@ -0,0 +1,49 @@
+From c7ec155ec5e0f573e9c3cc4eb38d47543a2f1e81 Mon Sep 17 00:00:00 2001
+From: Sebastien Boeuf <sebastien.boeuf@intel.com>
+Date: Thu, 13 Feb 2020 08:50:38 +0100
+Subject: [PATCH] net: virtio_vsock: Fix race condition between bind and listen
+
+Whenever the vsock backend on the host sends a packet through the RX
+queue, it expects an answer on the TX queue. Unfortunately, there is one
+case where the host side will hang waiting for the answer and will
+effectively never recover.
+
+This issue happens when the guest side starts binding to the socket,
+which insert a new bound socket into the list of already bound sockets.
+At this time, we expect the guest to also start listening, which will
+trigger the sk_state to move from TCP_CLOSE to TCP_LISTEN. The problem
+occurs if the host side queued a RX packet and triggered an interrupt
+right between the end of the binding process and the beginning of the
+listening process. In this specific case, the function processing the
+packet virtio_transport_recv_pkt() will find a bound socket, which means
+it will hit the switch statement checking for the sk_state, but the
+state won't be changed into TCP_LISTEN yet, which leads the code to pick
+the default statement. This default statement will only free the buffer,
+while it should also respond to the host side, by sending a packet on
+its TX queue.
+
+In order to simply fix this unfortunate chain of events, it is important
+that in case the default statement is entered, and because at this stage
+we know the host side is waiting for an answer, we must send back a
+packet containing the operation VIRTIO_VSOCK_OP_RST.
+
+Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
+---
+ net/vmw_vsock/virtio_transport_common.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
+index 6f1a8aff65c5..0b6fb687a3e0 100644
+--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
+@@ -1048,6 +1048,7 @@ void virtio_transport_recv_pkt(struct virtio_vsock_pkt *pkt)
+ 		virtio_transport_free_pkt(pkt);
+ 		break;
+ 	default:
+		(void)virtio_transport_reset_no_sock(t, pkt);
+ 		virtio_transport_free_pkt(pkt);
+ 		break;
+ 	}
+-- 
+2.20.1
+
--- a/tools/packaging/kernel/patches/virtio-fs-v0.3.x/0001-arm64-mm-Enable-memory-hot-remove.patch
+++ b/tools/packaging/kernel/patches/virtio-fs-v0.3.x/0001-arm64-mm-Enable-memory-hot-remove.patch
@@ -0,0 +1,453 @@
+From: Anshuman Khandual <anshuman.khandual@arm.com>
+Date: Mon, 15 Jul 2019 11:47:50 +0530
+Subject: [PATCH] arm64/mm: Enable memory hot remove
+
+The arch code for hot-remove must tear down portions of the linear map and
+vmemmap corresponding to memory being removed. In both cases the page
+tables mapping these regions must be freed, and when sparse vmemmap is in
+use the memory backing the vmemmap must also be freed.
+
+This patch adds a new remove_pagetable() helper which can be used to tear
+down either region, and calls it from vmemmap_free() and
+___remove_pgd_mapping(). The sparse_vmap argument determines whether the
+backing memory will be freed.
+
+remove_pagetable() makes two distinct passes over the kernel page table.
+In the first pass it unmaps, invalidates applicable TLB cache and frees
+backing memory if required (vmemmap) for each mapped leaf entry. In the
+second pass it looks for empty page table sections whose page table page
+can be unmapped, TLB invalidated and freed.
+
+While freeing intermediate level page table pages bail out if any of its
+entries are still valid. This can happen for partially filled kernel page
+table either from a previously attempted failed memory hot add or while
+removing an address range which does not span the entire page table page
+range.
+
+The vmemmap region may share levels of table with the vmalloc region.
+There can be conflicts between hot remove freeing page table pages with
+a concurrent vmalloc() walking the kernel page table. This conflict can
+not just be solved by taking the init_mm ptl because of existing locking
+scheme in vmalloc(). Hence unlike linear mapping, skip freeing page table
+pages while tearing down vmemmap mapping.
+
+While here update arch_add_memory() to handle __add_pages() failures by
+just unmapping recently added kernel linear mapping. Now enable memory hot
+remove on arm64 platforms by default with ARCH_ENABLE_MEMORY_HOTREMOVE.
+
+This implementation is overall inspired from kernel page table tear down
+procedure on X86 architecture.
+
+Acked-by: Steve Capper <steve.capper@arm.com>
+Acked-by: David Hildenbrand <david@redhat.com>
+Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
+---
+ arch/arm64/Kconfig               |   3 +
+ arch/arm64/include/asm/pgtable.h |   7 +-
+ arch/arm64/mm/mmu.c              | 290 ++++++++++++++++++++++++++++++-
+ include/linux/mmzone.h           |   1 +
+ mm/Kconfig                       |   2 +-
+ 5 files changed, 291 insertions(+), 12 deletions(-)
+
+diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
+index 3adcec05b1f6..5a1231b8b8cf 100644
+--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
+@@ -273,6 +273,9 @@ config ZONE_DMA32
+ config ARCH_ENABLE_MEMORY_HOTPLUG
+ 	def_bool y
+ 
+config ARCH_ENABLE_MEMORY_HOTREMOVE
+	def_bool y
+
+ config SMP
+ 	def_bool y
+ 
+diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
+index 5fdcfe237338..e09760ece844 100644
+--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
+@@ -209,7 +209,7 @@ static inline pmd_t pmd_mkcont(pmd_t pmd)
+ 
+ static inline pte_t pte_mkdevmap(pte_t pte)
+ {
+-	return set_pte_bit(pte, __pgprot(PTE_DEVMAP));
+	return set_pte_bit(pte, __pgprot(PTE_DEVMAP | PTE_SPECIAL));
+ }
+ 
+ static inline void set_pte(pte_t *ptep, pte_t pte)
+@@ -396,7 +396,10 @@ static inline int pmd_protnone(pmd_t pmd)
+ #ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ #define pmd_devmap(pmd)		pte_devmap(pmd_pte(pmd))
+ #endif
+-#define pmd_mkdevmap(pmd)	pte_pmd(pte_mkdevmap(pmd_pte(pmd)))
+static inline pmd_t pmd_mkdevmap(pmd_t pmd)
+{
+	return pte_pmd(set_pte_bit(pmd_pte(pmd), __pgprot(PTE_DEVMAP)));
+}
+ 
+ #define __pmd_to_phys(pmd)	__pte_to_phys(pmd_pte(pmd))
+ #define __phys_to_pmd_val(phys)	__phys_to_pte_val(phys)
+diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
+index 750a69dde39b..282a4b26218c 100644
+--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
+@@ -722,6 +722,250 @@ int kern_addr_valid(unsigned long addr)
+ 
+ 	return pfn_valid(pte_pfn(pte));
+ }
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+static void free_hotplug_page_range(struct page *page, size_t size)
+{
+	WARN_ON(!page || PageReserved(page));
+	free_pages((unsigned long)page_address(page), get_order(size));
+}
+
+static void free_hotplug_pgtable_page(struct page *page)
+{
+	free_hotplug_page_range(page, PAGE_SIZE);
+}
+
+static void free_pte_table(pmd_t *pmdp, unsigned long addr)
+{
+	struct page *page;
+	pte_t *ptep;
+	int i;
+
+	ptep = pte_offset_kernel(pmdp, 0UL);
+	for (i = 0; i < PTRS_PER_PTE; i++) {
+		if (!pte_none(READ_ONCE(ptep[i])))
+			return;
+	}
+
+	page = pmd_page(READ_ONCE(*pmdp));
+	pmd_clear(pmdp);
+	__flush_tlb_kernel_pgtable(addr);
+	free_hotplug_pgtable_page(page);
+}
+
+static void free_pmd_table(pud_t *pudp, unsigned long addr)
+{
+	struct page *page;
+	pmd_t *pmdp;
+	int i;
+
+	if (CONFIG_PGTABLE_LEVELS <= 2)
+		return;
+
+	pmdp = pmd_offset(pudp, 0UL);
+	for (i = 0; i < PTRS_PER_PMD; i++) {
+		if (!pmd_none(READ_ONCE(pmdp[i])))
+			return;
+	}
+
+	page = pud_page(READ_ONCE(*pudp));
+	pud_clear(pudp);
+	__flush_tlb_kernel_pgtable(addr);
+	free_hotplug_pgtable_page(page);
+}
+
+static void free_pud_table(pgd_t *pgdp, unsigned long addr)
+{
+	struct page *page;
+	pud_t *pudp;
+	int i;
+
+	if (CONFIG_PGTABLE_LEVELS <= 3)
+		return;
+
+	pudp = pud_offset(pgdp, 0UL);
+	for (i = 0; i < PTRS_PER_PUD; i++) {
+		if (!pud_none(READ_ONCE(pudp[i])))
+			return;
+	}
+
+	page = pgd_page(READ_ONCE(*pgdp));
+	pgd_clear(pgdp);
+	__flush_tlb_kernel_pgtable(addr);
+	free_hotplug_pgtable_page(page);
+}
+
+static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr,
+				    unsigned long end, bool sparse_vmap)
+{
+	struct page *page;
+	pte_t *ptep, pte;
+
+	do {
+		ptep = pte_offset_kernel(pmdp, addr);
+		pte = READ_ONCE(*ptep);
+		if (pte_none(pte))
+			continue;
+
+		WARN_ON(!pte_present(pte));
+		page = sparse_vmap ? pte_page(pte) : NULL;
+		pte_clear(&init_mm, addr, ptep);
+		flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
+		if (sparse_vmap)
+			free_hotplug_page_range(page, PAGE_SIZE);
+	} while (addr += PAGE_SIZE, addr < end);
+}
+
+static void unmap_hotplug_pmd_range(pud_t *pudp, unsigned long addr,
+				    unsigned long end, bool sparse_vmap)
+{
+	unsigned long next;
+	struct page *page;
+	pmd_t *pmdp, pmd;
+
+	do {
+		next = pmd_addr_end(addr, end);
+		pmdp = pmd_offset(pudp, addr);
+		pmd = READ_ONCE(*pmdp);
+		if (pmd_none(pmd))
+			continue;
+
+		WARN_ON(!pmd_present(pmd));
+		if (pmd_sect(pmd)) {
+			page = sparse_vmap ? pmd_page(pmd) : NULL;
+			pmd_clear(pmdp);
+			flush_tlb_kernel_range(addr, next);
+			if (sparse_vmap)
+				free_hotplug_page_range(page, PMD_SIZE);
+			continue;
+		}
+		WARN_ON(!pmd_table(pmd));
+		unmap_hotplug_pte_range(pmdp, addr, next, sparse_vmap);
+	} while (addr = next, addr < end);
+}
+
+static void unmap_hotplug_pud_range(pgd_t *pgdp, unsigned long addr,
+				    unsigned long end, bool sparse_vmap)
+{
+	unsigned long next;
+	struct page *page;
+	pud_t *pudp, pud;
+
+	do {
+		next = pud_addr_end(addr, end);
+		pudp = pud_offset(pgdp, addr);
+		pud = READ_ONCE(*pudp);
+		if (pud_none(pud))
+			continue;
+
+		WARN_ON(!pud_present(pud));
+		if (pud_sect(pud)) {
+			page = sparse_vmap ? pud_page(pud) : NULL;
+			pud_clear(pudp);
+			flush_tlb_kernel_range(addr, next);
+			if (sparse_vmap)
+				free_hotplug_page_range(page, PUD_SIZE);
+			continue;
+		}
+		WARN_ON(!pud_table(pud));
+		unmap_hotplug_pmd_range(pudp, addr, next, sparse_vmap);
+	} while (addr = next, addr < end);
+}
+
+static void unmap_hotplug_range(unsigned long addr, unsigned long end,
+				bool sparse_vmap)
+{
+	unsigned long next;
+	pgd_t *pgdp, pgd;
+
+	do {
+		next = pgd_addr_end(addr, end);
+		pgdp = pgd_offset_k(addr);
+		pgd = READ_ONCE(*pgdp);
+		if (pgd_none(pgd))
+			continue;
+
+		WARN_ON(!pgd_present(pgd));
+		unmap_hotplug_pud_range(pgdp, addr, next, sparse_vmap);
+	} while (addr = next, addr < end);
+}
+
+static void free_empty_pte_table(pmd_t *pmdp, unsigned long addr,
+				 unsigned long end)
+{
+	pte_t *ptep, pte;
+
+	do {
+		ptep = pte_offset_kernel(pmdp, addr);
+		pte = READ_ONCE(*ptep);
+		WARN_ON(!pte_none(pte));
+	} while (addr += PAGE_SIZE, addr < end);
+}
+
+static void free_empty_pmd_table(pud_t *pudp, unsigned long addr,
+				 unsigned long end)
+{
+	unsigned long next;
+	pmd_t *pmdp, pmd;
+
+	do {
+		next = pmd_addr_end(addr, end);
+		pmdp = pmd_offset(pudp, addr);
+		pmd = READ_ONCE(*pmdp);
+		if (pmd_none(pmd))
+			continue;
+
+		WARN_ON(!pmd_present(pmd) || !pmd_table(pmd) || pmd_sect(pmd));
+		free_empty_pte_table(pmdp, addr, next);
+		free_pte_table(pmdp, addr);
+	} while (addr = next, addr < end);
+}
+
+static void free_empty_pud_table(pgd_t *pgdp, unsigned long addr,
+				 unsigned long end)
+{
+	unsigned long next;
+	pud_t *pudp, pud;
+
+	do {
+		next = pud_addr_end(addr, end);
+		pudp = pud_offset(pgdp, addr);
+		pud = READ_ONCE(*pudp);
+		if (pud_none(pud))
+			continue;
+
+		WARN_ON(!pud_present(pud) || !pud_table(pud) || pud_sect(pud));
+		free_empty_pmd_table(pudp, addr, next);
+		free_pmd_table(pudp, addr);
+	} while (addr = next, addr < end);
+}
+
+static void free_empty_tables(unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+	pgd_t *pgdp, pgd;
+
+	do {
+		next = pgd_addr_end(addr, end);
+		pgdp = pgd_offset_k(addr);
+		pgd = READ_ONCE(*pgdp);
+		if (pgd_none(pgd))
+			continue;
+
+		WARN_ON(!pgd_present(pgd));
+		free_empty_pud_table(pgdp, addr, next);
+		free_pud_table(pgdp, addr);
+	} while (addr = next, addr < end);
+}
+
+static void remove_pagetable(unsigned long start, unsigned long end,
+			     bool sparse_vmap)
+{
+	unmap_hotplug_range(start, end, sparse_vmap);
+	free_empty_tables(start, end);
+}
+#endif
+
+ #ifdef CONFIG_SPARSEMEM_VMEMMAP
+ #if !ARM64_SWAPPER_USES_SECTION_MAPS
+ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
+@@ -769,6 +1013,27 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
+ void vmemmap_free(unsigned long start, unsigned long end,
+ 		struct vmem_altmap *altmap)
+ {
+#ifdef CONFIG_MEMORY_HOTPLUG
+	/*
+	 * FIXME: We should have called remove_pagetable(start, end, true).
+	 * vmemmap and vmalloc virtual range might share intermediate kernel
+	 * page table entries. Removing vmemmap range page table pages here
+	 * can potentially conflict with a concurrent vmalloc() allocation.
+	 *
+	 * This is primarily because vmalloc() does not take init_mm ptl for
+	 * the entire page table walk and it's modification. Instead it just
+	 * takes the lock while allocating and installing page table pages
+	 * via [p4d|pud|pmd|pte]_alloc(). A concurrently vanishing page table
+	 * entry via memory hot remove can cause vmalloc() kernel page table
+	 * walk pointers to be invalid on the fly which can cause corruption
+	 * or worst, a crash.
+	 *
+	 * To avoid this problem, lets not free empty page table pages for
+	 * given vmemmap range being hot-removed. Just unmap and free the
+	 * range instead.
+	 */
+	unmap_hotplug_range(start, end, true);
+#endif
+ }
+ #endif	/* CONFIG_SPARSEMEM_VMEMMAP */
+ 
+@@ -1060,10 +1325,18 @@ int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
+ }
+ 
+ #ifdef CONFIG_MEMORY_HOTPLUG
+static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size)
+{
+	unsigned long end = start + size;
+
+	WARN_ON(pgdir != init_mm.pgd);
+	remove_pagetable(start, end, false);
+}
+
+ int arch_add_memory(int nid, u64 start, u64 size,
+ 			struct mhp_restrictions *restrictions)
+ {
+-	int flags = 0;
+	int ret, flags = 0;
+ 
+ 	if (rodata_full || debug_pagealloc_enabled())
+ 		flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
+@@ -1071,9 +1344,14 @@ int arch_add_memory(int nid, u64 start, u64 size,
+ 	__create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
+ 			     size, PAGE_KERNEL, __pgd_pgtable_alloc, flags);
+ 
+-	return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
+	ret = __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
+ 			   restrictions);
+	if (ret)
+		__remove_pgd_mapping(swapper_pg_dir,
+				     __phys_to_virt(start), size);
+	return ret;
+ }
+
+ void arch_remove_memory(int nid, u64 start, u64 size,
+ 			struct vmem_altmap *altmap)
+ {
+@@ -1081,14 +1359,8 @@ void arch_remove_memory(int nid, u64 start, u64 size,
+ 	unsigned long nr_pages = size >> PAGE_SHIFT;
+ 	struct zone *zone;
+ 
+-	/*
+-	 * FIXME: Cleanup page tables (also in arch_add_memory() in case
+-	 * adding fails). Until then, this function should only be used
+-	 * during memory hotplug (adding memory), not for memory
+-	 * unplug. ARCH_ENABLE_MEMORY_HOTREMOVE must not be
+-	 * unlocked yet.
+-	 */
+ 	zone = page_zone(pfn_to_page(start_pfn));
+ 	__remove_pages(zone, start_pfn, nr_pages, altmap);
+	__remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size);
+ }
+ #endif
+diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
+index d77d717c620c..47230ebdcb01 100644
+--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
+@@ -1122,6 +1122,7 @@ static inline unsigned long early_pfn_to_nid(unsigned long pfn)
+  * PFN_SECTION_SHIFT		pfn to/from section number
+  */
+ #define PA_SECTION_SHIFT	(SECTION_SIZE_BITS)
+#define PA_SECTION_SIZE                (1UL << PA_SECTION_SHIFT)
+ #define PFN_SECTION_SHIFT	(SECTION_SIZE_BITS - PAGE_SHIFT)
+ 
+ #define NR_MEM_SECTIONS		(1UL << SECTIONS_SHIFT)
+diff --git a/mm/Kconfig b/mm/Kconfig
+index 56cec636a1fc..7c980f483a7d 100644
+--- a/mm/Kconfig
+++ b/mm/Kconfig
+@@ -677,7 +677,7 @@ config DEV_PAGEMAP_OPS
+ 
+ config HMM_MIRROR
+ 	bool "HMM mirror CPU page table into a device page table"
+-	depends on (X86_64 || PPC64)
+	depends on (X86_64 || PPC64 || ARM64)
+ 	depends on MMU && 64BIT
+ 	select MMU_NOTIFIER
+ 	help
+-- 
+2.17.1
+
--- a/tools/packaging/kernel/patches/virtio-fs-v0.3.x/0001-net-virtio_vsock-Fix-race-condition-between-bind-and.patch
+++ b/tools/packaging/kernel/patches/virtio-fs-v0.3.x/0001-net-virtio_vsock-Fix-race-condition-between-bind-and.patch
@@ -0,0 +1,49 @@
+From c7ec155ec5e0f573e9c3cc4eb38d47543a2f1e81 Mon Sep 17 00:00:00 2001
+From: Sebastien Boeuf <sebastien.boeuf@intel.com>
+Date: Thu, 13 Feb 2020 08:50:38 +0100
+Subject: [PATCH] net: virtio_vsock: Fix race condition between bind and listen
+
+Whenever the vsock backend on the host sends a packet through the RX
+queue, it expects an answer on the TX queue. Unfortunately, there is one
+case where the host side will hang waiting for the answer and will
+effectively never recover.
+
+This issue happens when the guest side starts binding to the socket,
+which insert a new bound socket into the list of already bound sockets.
+At this time, we expect the guest to also start listening, which will
+trigger the sk_state to move from TCP_CLOSE to TCP_LISTEN. The problem
+occurs if the host side queued a RX packet and triggered an interrupt
+right between the end of the binding process and the beginning of the
+listening process. In this specific case, the function processing the
+packet virtio_transport_recv_pkt() will find a bound socket, which means
+it will hit the switch statement checking for the sk_state, but the
+state won't be changed into TCP_LISTEN yet, which leads the code to pick
+the default statement. This default statement will only free the buffer,
+while it should also respond to the host side, by sending a packet on
+its TX queue.
+
+In order to simply fix this unfortunate chain of events, it is important
+that in case the default statement is entered, and because at this stage
+we know the host side is waiting for an answer, we must send back a
+packet containing the operation VIRTIO_VSOCK_OP_RST.
+
+Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
+---
+ net/vmw_vsock/virtio_transport_common.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
+index 6f1a8aff65c5..0b6fb687a3e0 100644
+--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
+@@ -1048,6 +1048,7 @@ void virtio_transport_recv_pkt(struct virtio_vsock_pkt *pkt)
+ 		virtio_transport_free_pkt(pkt);
+ 		break;
+ 	default:
+		(void)virtio_transport_reset_no_sock(pkt);
+ 		virtio_transport_free_pkt(pkt);
+ 		break;
+ 	}
+-- 
+2.20.1
+
--- a/tools/packaging/kernel/patches/virtio-fs-v0.3.x/0002-ACPI-Always-build-evged-in.patch
+++ b/tools/packaging/kernel/patches/virtio-fs-v0.3.x/0002-ACPI-Always-build-evged-in.patch
@@ -0,0 +1,39 @@
+From ac36d37e943635fc072e9d4f47e40a48fbcdb3f0 Mon Sep 17 00:00:00 2001
+From: Arjan van de Ven <arjan@linux.intel.com>
+Date: Wed, 9 Oct 2019 15:04:33 +0200
+Subject: ACPI: Always build evged in
+
+Although the Generic Event Device is a Hardware-reduced
+platfom device in principle, it should not be restricted to
+ACPI_REDUCED_HARDWARE_ONLY.
+
+Kernels supporting both fixed and hardware-reduced ACPI platforms
+should be able to probe the GED when dynamically detecting that a
+platform is hardware-reduced. For that, the driver must be
+unconditionally built in.
+
+Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
+Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
+Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
+---
+ drivers/acpi/Makefile | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+(limited to 'drivers/acpi/Makefile')
+
+diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
+index 5d361e4e3405..ef1ac4d127da 100644
+--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
+@@ -48,7 +48,7 @@ acpi-y				+= acpi_pnp.o
+ acpi-$(CONFIG_ARM_AMBA)	+= acpi_amba.o
+ acpi-y				+= power.o
+ acpi-y				+= event.o
+-acpi-$(CONFIG_ACPI_REDUCED_HARDWARE_ONLY) += evged.o
+acpi-y				+= evged.o
+ acpi-y				+= sysfs.o
+ acpi-y				+= property.o
+ acpi-$(CONFIG_X86)		+= acpi_cmos_rtc.o
+-- 
+cgit 1.2-0.3.lf.el7
+