mirror of
https://github.com/aljazceru/kata-containers.git
synced 2026-02-15 11:34:22 +01:00
packaging: merge packaging repository
git-subtree-dir: tools/packaging git-subtree-mainline:f818b46a41git-subtree-split:1f22d72d5dSigned-off-by: Peng Tao <bergwolf@hyper.sh>
This commit is contained in:
179
tools/packaging/kernel/README.md
Normal file
179
tools/packaging/kernel/README.md
Normal file
@@ -0,0 +1,179 @@
|
||||
# Build Kata Containers Kernel
|
||||
|
||||
* [Requirements](#requirements)
|
||||
* [Usage](#usage)
|
||||
* [Setup kernel source code](#setup-kernel-source-code)
|
||||
* [Build the kernel](#build-the-kernel)
|
||||
* [Install the Kernel in the default path for Kata](#install-the-kernel-in-the-default-path-for-kata)
|
||||
* [Submit Kernel Changes](#submit-kernel-changes)
|
||||
* [How is it tested](#how-is-it-tested)
|
||||
* [Contribute](#contribute)
|
||||
|
||||
This document explains the steps to build a kernel recommended for use with
|
||||
Kata Containers. To do this use `build-kernel.sh`, this script
|
||||
automates the process to build a kernel for Kata Containers.
|
||||
|
||||
## Requirements
|
||||
|
||||
The `build-kernel.sh` script requires an installed Golang version matching the
|
||||
[component build requirements](https://github.com/kata-containers/documentation/blob/master/Developer-Guide.md#requirements-to-build-individual-components).
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
$ ./build-kernel.sh -h
|
||||
Overview:
|
||||
|
||||
Build a kernel for Kata Containers
|
||||
|
||||
Description: This script is the *ONLY* to build a kernel for development.
|
||||
|
||||
|
||||
Usage:
|
||||
|
||||
build-kernel.sh [options] <command> <argument>
|
||||
|
||||
Commands:
|
||||
|
||||
- setup
|
||||
|
||||
- build
|
||||
|
||||
- install
|
||||
|
||||
Options:
|
||||
|
||||
-c <path> : Path to config file to build a the kernel.
|
||||
-d : Enable bash debug.
|
||||
-e : Enable experimental kernel.
|
||||
-f : Enable force generate config when setup.
|
||||
-g <vendor> : GPU vendor, intel or nvidia.
|
||||
-h : Display this help.
|
||||
-k <path> : Path to kernel to build.
|
||||
-p <path> : Path to a directory with patches to apply to kernel.
|
||||
-t : Hypervisor_target.
|
||||
-v : Kernel version to use if kernel path not provided.
|
||||
```
|
||||
|
||||
Example:
|
||||
```
|
||||
$ ./build-kernel.sh -v 4.19.86 -g nvidia -f -d setup
|
||||
```
|
||||
> **Note**
|
||||
> - `-v 4.19.86`: Specify the guest kernel version.
|
||||
> - `-g nvidia`: To build a guest kernel supporting Nvidia GPU.
|
||||
> - `-f`: The .config file is forced to be generated even if the kernel directory already exists.
|
||||
> - `-d`: Enable bash debug mode.
|
||||
|
||||
|
||||
## Setup kernel source code
|
||||
|
||||
```bash
|
||||
$ go get -d -u github.com/kata-containers/packaging
|
||||
$ cd $GOPATH/src/github.com/kata-containers/packaging/kernel
|
||||
$ ./build-kernel.sh setup
|
||||
```
|
||||
|
||||
The script `./build-kernel.sh` tries to apply the patches from
|
||||
`${GOPATH}/src/github.com/kata-containers/packaging/kernel/patches/` when it
|
||||
sets up a kernel. If you want to add a source modification, add a patch on this
|
||||
directory.
|
||||
|
||||
The script also adds a kernel config file from
|
||||
`${GOPATH}/src/github.com/kata-containers/packaging/kernel/configs/` to `.config`
|
||||
in the kernel source code. You can modify it as needed.
|
||||
|
||||
## Build the kernel
|
||||
|
||||
After the kernel source code is ready, it is possible to build the kernel.
|
||||
|
||||
```bash
|
||||
$ ./build-kernel.sh build
|
||||
```
|
||||
|
||||
## Install the Kernel in the default path for Kata
|
||||
|
||||
Kata Containers uses some default path to search a kernel to boot. To install
|
||||
on this path, the following command will install it to the default Kata
|
||||
containers path (`/usr/share/kata-containers/`).
|
||||
|
||||
```bash
|
||||
$ ./build-kernel.sh install
|
||||
```
|
||||
|
||||
## Submit Kernel Changes
|
||||
|
||||
Kata Containers packaging repository holds the kernel configs and patches. The
|
||||
config and patches can work for many versions, but we only test the
|
||||
kernel version defined in the [runtime versions file][runtime-versions-file].
|
||||
|
||||
For further details, see [the kernel configuration documentation](configs).
|
||||
|
||||
## How is it tested
|
||||
|
||||
The Kata Containers CI scripts install the kernel from [CI cache
|
||||
job][cache-job] or build from sources.
|
||||
|
||||
If the kernel defined in the [runtime versions file][runtime-versions-file] is
|
||||
built and cached with the latest kernel config and patches, it installs.
|
||||
Otherwise, the kernel is built from source.
|
||||
|
||||
The Kata kernel version is a mix of the kernel version defined in the [runtime
|
||||
versions file][runtime-versions-file] and the file `kata_config_version`. This
|
||||
helps to identify if a kernel build has the latest recommend
|
||||
configuration.
|
||||
|
||||
Example:
|
||||
|
||||
```bash
|
||||
# From https://github.com/kata-containers/runtime/blob/master/versions.yaml
|
||||
$ kernel_version_in_versions_file=4.10.1
|
||||
# From https://github.com/kata-containers/packaging/blob/master/kernel/kata_config_version
|
||||
$ kata_config_version=25
|
||||
$ latest_kernel_version=${kernel_version_in_versions_file}-${kata_config_version}
|
||||
```
|
||||
|
||||
The resulting version is 4.10.1-25, this helps identify whether or not the kernel
|
||||
configs are up-to-date on a CI version.
|
||||
|
||||
## Contribute
|
||||
|
||||
In order to do Kata Kernel changes. There are places to contribute:
|
||||
|
||||
1. [Kata runtime versions file][runtime-versions-file]: This file points to the
|
||||
recommended versions to be used by Kata. To update the kernel version send a
|
||||
pull request to update that version. The Kata CI will run all the use cases
|
||||
and verify it works.
|
||||
|
||||
1. Kata packaging repository. This repository contains all the kernel configs
|
||||
and patches recommended for Kata Containers kernel:
|
||||
|
||||
- If you want to upload one new configuration (new version or architecture
|
||||
specific) make sure the config file name has the following format:
|
||||
|
||||
```bash
|
||||
# Format:
|
||||
$ ${arch}_kata_${hypervisor_target}_${major_kernel_version}.x
|
||||
|
||||
# example:
|
||||
$ arch=x86_64
|
||||
$ hypervisor_target=kvm
|
||||
$ major_kernel_version=4.19
|
||||
|
||||
# Resulting file
|
||||
$ name: x86_64_kata_kvm_4.19.x
|
||||
```
|
||||
|
||||
- Kernel patches, the CI and packaging scripts will apply all patches in the
|
||||
[patches directory][patches-dir].
|
||||
|
||||
Note: The kernel version and configuration file live in different locations,
|
||||
which could result in a circular dependency on your (runtime or packaging) PR.
|
||||
In this case, the PR you submit needs to be tested together with a patch from
|
||||
another Kata Containers repository. To do this you have to specify which
|
||||
repository and which pull request [it depends on][depends-on-docs].
|
||||
|
||||
[runtime-versions-file]: https://github.com/kata-containers/runtime/blob/master/versions.yaml
|
||||
[patches-dir]: https://github.com/kata-containers/packaging/tree/master/kernel/patches
|
||||
[depends-on-docs]: https://github.com/kata-containers/tests/blob/master/README.md#breaking-compatibility
|
||||
[cache-job]: http://jenkins.katacontainers.io/job/image-nightly-x86_64/
|
||||
539
tools/packaging/kernel/build-kernel.sh
Executable file
539
tools/packaging/kernel/build-kernel.sh
Executable file
@@ -0,0 +1,539 @@
|
||||
#!/bin/bash
|
||||
#
|
||||
# Copyright (c) 2018 Intel Corporation
|
||||
#
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
|
||||
description="
|
||||
Description: This script is the *ONLY* to build a kernel for development.
|
||||
"
|
||||
|
||||
set -o errexit
|
||||
set -o nounset
|
||||
set -o pipefail
|
||||
|
||||
readonly script_name="$(basename "${BASH_SOURCE[0]}")"
|
||||
readonly script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
kata_version="${kata_version:-}"
|
||||
|
||||
#project_name
|
||||
readonly project_name="kata-containers"
|
||||
[ -n "${GOPATH:-}" ] || GOPATH="${HOME}/go"
|
||||
# Fetch the first element from GOPATH as working directory
|
||||
# as go get only works against the first item in the GOPATH
|
||||
GOPATH="${GOPATH%%:*}"
|
||||
# Kernel version to be used
|
||||
kernel_version=""
|
||||
# Flag know if need to download the kernel source
|
||||
download_kernel=false
|
||||
# The repository where kernel configuration lives
|
||||
runtime_repository="github.com/${project_name}/runtime"
|
||||
# The repository where kernel configuration lives
|
||||
readonly kernel_config_repo="github.com/${project_name}/packaging"
|
||||
readonly patches_repo="github.com/${project_name}/packaging"
|
||||
readonly patches_repo_dir="${GOPATH}/src/${patches_repo}"
|
||||
# Default path to search patches to apply to kernel
|
||||
readonly default_patches_dir="${patches_repo_dir}/kernel/patches/"
|
||||
# Default path to search config for kata
|
||||
readonly default_kernel_config_dir="${GOPATH}/src/${kernel_config_repo}/kernel/configs"
|
||||
# Default path to search for kernel config fragments
|
||||
readonly default_config_frags_dir="${GOPATH}/src/${kernel_config_repo}/kernel/configs/fragments"
|
||||
readonly default_config_whitelist="${GOPATH}/src/${kernel_config_repo}/kernel/configs/fragments/whitelist.conf"
|
||||
# GPU vendor
|
||||
readonly GV_INTEL="intel"
|
||||
readonly GV_NVIDIA="nvidia"
|
||||
|
||||
#Path to kernel directory
|
||||
kernel_path=""
|
||||
#Experimental kernel support. Pull from virtio-fs GitLab instead of kernel.org
|
||||
experimental_kernel="false"
|
||||
#Force generate config when setup
|
||||
force_setup_generate_config="false"
|
||||
#GPU kernel support
|
||||
gpu_vendor=""
|
||||
#
|
||||
patches_path=""
|
||||
#
|
||||
hypervisor_target=""
|
||||
#
|
||||
arch_target=""
|
||||
#
|
||||
kernel_config_path=""
|
||||
# destdir
|
||||
DESTDIR="${DESTDIR:-/}"
|
||||
#PREFIX=
|
||||
PREFIX="${PREFIX:-/usr}"
|
||||
|
||||
source "${script_dir}/../scripts/lib.sh"
|
||||
|
||||
usage() {
|
||||
exit_code="$1"
|
||||
cat <<EOT
|
||||
Overview:
|
||||
|
||||
Build a kernel for Kata Containers
|
||||
${description}
|
||||
|
||||
Usage:
|
||||
|
||||
$script_name [options] <command> <argument>
|
||||
|
||||
Commands:
|
||||
|
||||
- setup
|
||||
|
||||
- build
|
||||
|
||||
- install
|
||||
|
||||
Options:
|
||||
|
||||
-c <path> : Path to config file to build a the kernel.
|
||||
-d : Enable bash debug.
|
||||
-e : Enable experimental kernel.
|
||||
-f : Enable force generate config when setup.
|
||||
-g <vendor> : GPU vendor, intel or nvidia.
|
||||
-h : Display this help.
|
||||
-k <path> : Path to kernel to build.
|
||||
-p <path> : Path to a directory with patches to apply to kernel.
|
||||
-t : Hypervisor_target.
|
||||
-v : Kernel version to use if kernel path not provided.
|
||||
EOT
|
||||
exit "$exit_code"
|
||||
}
|
||||
|
||||
# Convert architecture to the name used by the Linux kernel build system
|
||||
arch_to_kernel() {
|
||||
local -r arch="$1"
|
||||
|
||||
case "$arch" in
|
||||
aarch64) echo "arm64" ;;
|
||||
ppc64le) echo "powerpc" ;;
|
||||
s390x) echo "s390" ;;
|
||||
x86_64) echo "$arch" ;;
|
||||
*) die "unsupported architecture: $arch" ;;
|
||||
esac
|
||||
}
|
||||
|
||||
get_kernel() {
|
||||
local version="${1:-}"
|
||||
|
||||
local kernel_path=${2:-}
|
||||
[ -n "${kernel_path}" ] || die "kernel_path not provided"
|
||||
[ ! -d "${kernel_path}" ] || die "kernel_path already exist"
|
||||
|
||||
|
||||
if [[ ${experimental_kernel} == "true" ]]; then
|
||||
kernel_tarball="linux-${version}.tar.gz"
|
||||
curl --fail -OL "https://gitlab.com/virtio-fs/linux/-/archive/${version}/${kernel_tarball}"
|
||||
tar xf "${kernel_tarball}"
|
||||
mv "linux-${version}" "${kernel_path}"
|
||||
else
|
||||
|
||||
#Remove extra 'v'
|
||||
version=${version#v}
|
||||
|
||||
major_version=$(echo "${version}" | cut -d. -f1)
|
||||
kernel_tarball="linux-${version}.tar.xz"
|
||||
|
||||
if [ ! -f sha256sums.asc ] || ! grep -q "${kernel_tarball}" sha256sums.asc; then
|
||||
info "Download kernel checksum file: sha256sums.asc"
|
||||
curl --fail -OL "https://cdn.kernel.org/pub/linux/kernel/v${major_version}.x/sha256sums.asc"
|
||||
fi
|
||||
grep "${kernel_tarball}" sha256sums.asc >"${kernel_tarball}.sha256"
|
||||
|
||||
if [ -f "${kernel_tarball}" ] && ! sha256sum -c "${kernel_tarball}.sha256"; then
|
||||
info "invalid kernel tarball ${kernel_tarball} removing "
|
||||
rm -f "${kernel_tarball}"
|
||||
fi
|
||||
if [ ! -f "${kernel_tarball}" ]; then
|
||||
info "Download kernel version ${version}"
|
||||
info "Download kernel"
|
||||
curl --fail -OL "https://www.kernel.org/pub/linux/kernel/v${major_version}.x/${kernel_tarball}"
|
||||
else
|
||||
info "kernel tarball already downloaded"
|
||||
fi
|
||||
|
||||
sha256sum -c "${kernel_tarball}.sha256"
|
||||
|
||||
tar xf "${kernel_tarball}"
|
||||
|
||||
mv "linux-${version}" "${kernel_path}"
|
||||
fi
|
||||
}
|
||||
|
||||
get_major_kernel_version() {
|
||||
local version="${1}"
|
||||
[ -n "${version}" ] || die "kernel version not provided"
|
||||
major_version=$(echo "${version}" | cut -d. -f1)
|
||||
minor_version=$(echo "${version}" | cut -d. -f2)
|
||||
echo "${major_version}.${minor_version}"
|
||||
}
|
||||
|
||||
# Make a kernel config file from generic and arch specific
|
||||
# fragments
|
||||
# - arg1 - path to arch specific fragments
|
||||
# - arg2 - path to kernel sources
|
||||
#
|
||||
get_kernel_frag_path() {
|
||||
local arch_path="$1"
|
||||
local common_path="${arch_path}/../common"
|
||||
local gpu_path="${arch_path}/../gpu"
|
||||
|
||||
local kernel_path="$2"
|
||||
local arch="$3"
|
||||
local cmdpath="${kernel_path}/scripts/kconfig/merge_config.sh"
|
||||
local config_path="${arch_path}/.config"
|
||||
|
||||
local arch_configs="$(ls ${arch_path}/*.conf)"
|
||||
# Exclude configs if they have !$arch tag in the header
|
||||
local common_configs="$(grep "\!${arch}" ${common_path}/*.conf -L)"
|
||||
local experimental_configs="$(ls ${common_path}/experimental/*.conf)"
|
||||
|
||||
# These are the strings that the kernel merge_config.sh script kicks out
|
||||
# when it reports an error or warning condition. We search for them in the
|
||||
# output to try and fail when we think something has been misconfigured.
|
||||
local not_in_string="not in final"
|
||||
local redefined_string="not in final"
|
||||
local redundant_string="not in final"
|
||||
|
||||
# Later, if we need to add kernel version specific subdirs in order to
|
||||
# handle specific cases, then add the path definition and search/list/cat
|
||||
# here.
|
||||
local all_configs="${common_configs} ${arch_configs}"
|
||||
if [[ ${experimental_kernel} == "true" ]]; then
|
||||
all_configs="${all_configs} ${experimental_configs}"
|
||||
fi
|
||||
|
||||
if [[ "${gpu_vendor}" != "" ]];then
|
||||
info "Add kernel config for GPU due to '-g ${gpu_vendor}'"
|
||||
local gpu_configs="$(ls ${gpu_path}/${gpu_vendor}.conf)"
|
||||
all_configs="${all_configs} ${gpu_configs}"
|
||||
fi
|
||||
|
||||
info "Constructing config from fragments: ${config_path}"
|
||||
|
||||
|
||||
export KCONFIG_CONFIG=${config_path}
|
||||
export ARCH=${arch_target}
|
||||
cd ${kernel_path}
|
||||
|
||||
local results
|
||||
results=$( ${cmdpath} -r -n ${all_configs} )
|
||||
# Only consider results highlighting "not in final"
|
||||
results=$(grep "${not_in_string}" <<< "$results")
|
||||
# Do not care about options that are in whitelist
|
||||
results=$(grep -v -f ${default_config_whitelist} <<< "$results")
|
||||
|
||||
# Did we request any entries that did not make it?
|
||||
local missing=$(echo $results | grep -v -q "${not_in_string}"; echo $?)
|
||||
if [ ${missing} -ne 0 ]; then
|
||||
info "Some CONFIG elements failed to make the final .config:"
|
||||
info "${results}"
|
||||
info "Generated config file can be found in ${config_path}"
|
||||
die "Failed to construct requested .config file"
|
||||
fi
|
||||
|
||||
# Did we define something as two different values?
|
||||
local redefined=$(echo ${results} | grep -v -q "${redefined_string}"; echo $?)
|
||||
if [ ${redefined} -ne 0 ]; then
|
||||
info "Some CONFIG elements are redefined in fragments:"
|
||||
info "${results}"
|
||||
info "Generated config file can be found in ${config_path}"
|
||||
die "Failed to construct requested .config file"
|
||||
fi
|
||||
|
||||
# Did we define something twice? Nominally this may not be an error, and it
|
||||
# might be convenient to allow it, but for now, let's pick up on them.
|
||||
local redundant=$(echo ${results} | grep -v -q "${redundant_string}"; echo $?)
|
||||
if [ ${redundant} -ne 0 ]; then
|
||||
info "Some CONFIG elements failed to make the final .config"
|
||||
info "${results}"
|
||||
info "Generated config file can be found in ${config_path}"
|
||||
die "Failed to construct requested .config file"
|
||||
fi
|
||||
|
||||
echo "${config_path}"
|
||||
}
|
||||
|
||||
# Locate and return the path to the relevant kernel config file
|
||||
# - arg1: kernel version
|
||||
# - arg2: hypervisor target
|
||||
# - arg3: arch target
|
||||
# - arg4: kernel source path
|
||||
get_default_kernel_config() {
|
||||
local version="${1}"
|
||||
|
||||
local hypervisor="$2"
|
||||
local kernel_arch="$3"
|
||||
local kernel_path="$4"
|
||||
|
||||
[ -n "${version}" ] || die "kernel version not provided"
|
||||
[ -n "${hypervisor}" ] || die "hypervisor not provided"
|
||||
[ -n "${kernel_arch}" ] || die "kernel arch not provided"
|
||||
|
||||
local kernel_ver
|
||||
kernel_ver=$(get_major_kernel_version "${version}")
|
||||
|
||||
archfragdir="${default_config_frags_dir}/${kernel_arch}"
|
||||
if [ -d "${archfragdir}" ]; then
|
||||
config="$(get_kernel_frag_path ${archfragdir} ${kernel_path} ${kernel_arch})"
|
||||
else
|
||||
[ "${hypervisor}" == "firecracker" ] && hypervisor="kvm"
|
||||
config="${default_kernel_config_dir}/${kernel_arch}_kata_${hypervisor}_${major_kernel}.x"
|
||||
fi
|
||||
|
||||
[ -f "${config}" ] || die "failed to find default config ${config}"
|
||||
echo "${config}"
|
||||
}
|
||||
|
||||
get_config_and_patches() {
|
||||
if [ -z "${patches_path}" ]; then
|
||||
patches_path="${default_patches_dir}"
|
||||
if [ ! -d "${patches_path}" ]; then
|
||||
tag="${kata_version}"
|
||||
git clone -q "https://${patches_repo}.git" "${patches_repo_dir}"
|
||||
pushd "${patches_repo_dir}" >> /dev/null
|
||||
if [ -n $tag ] ; then
|
||||
info "checking out $tag"
|
||||
git checkout -q $tag
|
||||
fi
|
||||
popd >> /dev/null
|
||||
fi
|
||||
fi
|
||||
}
|
||||
|
||||
get_config_version() {
|
||||
get_config_and_patches
|
||||
config_version_file="${default_patches_dir}/../kata_config_version"
|
||||
if [ -f "${config_version_file}" ]; then
|
||||
cat "${config_version_file}"
|
||||
else
|
||||
die "failed to find ${config_version_file}"
|
||||
fi
|
||||
}
|
||||
|
||||
setup_kernel() {
|
||||
local kernel_path=${1:-}
|
||||
[ -n "${kernel_path}" ] || die "kernel_path not provided"
|
||||
|
||||
if [ -d "$kernel_path" ]; then
|
||||
info "${kernel_path} already exist"
|
||||
if [[ "${force_setup_generate_config}" != "true" ]];then
|
||||
return
|
||||
else
|
||||
info "Force generate config due to '-f'"
|
||||
fi
|
||||
else
|
||||
info "kernel path does not exist, will download kernel"
|
||||
download_kernel="true"
|
||||
[ -n "$kernel_version" ] || die "failed to get kernel version: Kernel version is emtpy"
|
||||
|
||||
if [[ ${download_kernel} == "true" ]]; then
|
||||
get_kernel "${kernel_version}" "${kernel_path}"
|
||||
fi
|
||||
|
||||
[ -n "$kernel_path" ] || die "failed to find kernel source path"
|
||||
|
||||
get_config_and_patches
|
||||
|
||||
[ -d "${patches_path}" ] || die " patches path '${patches_path}' does not exist"
|
||||
fi
|
||||
|
||||
local major_kernel
|
||||
major_kernel=$(get_major_kernel_version "${kernel_version}")
|
||||
local patches_dir_for_version="${patches_path}/${major_kernel}.x"
|
||||
local kernel_patches=""
|
||||
if [ -d "${patches_dir_for_version}" ]; then
|
||||
# Patches are expected to be named in the standard
|
||||
# git-format-patch(1) format where the first part of the
|
||||
# filename represents the patch ordering
|
||||
# (lowest numbers apply first):
|
||||
#
|
||||
# "${number}-${dashed_description}"
|
||||
#
|
||||
# For example,
|
||||
#
|
||||
# 0001-fix-the-bad-thing.patch
|
||||
# 0002-improve-the-fix-the-bad-thing-fix.patch
|
||||
# 0003-correct-compiler-warnings.patch
|
||||
kernel_patches=$(find "${patches_dir_for_version}" -name '*.patch' -type f |\
|
||||
sort -t- -k1,1n)
|
||||
else
|
||||
info "kernel patches directory does not exit"
|
||||
fi
|
||||
|
||||
[ -n "${arch_target}" ] || arch_target="$(uname -m)"
|
||||
arch_target=$(arch_to_kernel "${arch_target}")
|
||||
(
|
||||
cd "${kernel_path}" || exit 1
|
||||
for p in ${kernel_patches}; do
|
||||
info "Applying patch $p"
|
||||
patch -p1 --fuzz 0 <"$p"
|
||||
done
|
||||
|
||||
[ -n "${hypervisor_target}" ] || hypervisor_target="kvm"
|
||||
[ -n "${kernel_config_path}" ] || kernel_config_path=$(get_default_kernel_config "${kernel_version}" "${hypervisor_target}" "${arch_target}" "${kernel_path}")
|
||||
|
||||
info "Copying config file from: ${kernel_config_path}"
|
||||
cp "${kernel_config_path}" ./.config
|
||||
make oldconfig
|
||||
)
|
||||
}
|
||||
|
||||
build_kernel() {
|
||||
local kernel_path=${1:-}
|
||||
[ -n "${kernel_path}" ] || die "kernel_path not provided"
|
||||
[ -d "${kernel_path}" ] || die "path to kernel does not exist, use ${script_name} setup"
|
||||
[ -n "${arch_target}" ] || arch_target="$(uname -m)"
|
||||
arch_target=$(arch_to_kernel "${arch_target}")
|
||||
pushd "${kernel_path}" >>/dev/null
|
||||
make -j $(nproc) ARCH="${arch_target}"
|
||||
[ "$arch_target" != "powerpc" ] && ([ -e "arch/${arch_target}/boot/bzImage" ] || [ -e "arch/${arch_target}/boot/Image.gz" ])
|
||||
[ -e "vmlinux" ]
|
||||
[ "${hypervisor_target}" == "firecracker" ] && [ "${arch_target}" == "arm64" ] && [ -e "arch/${arch_target}/boot/Image" ]
|
||||
popd >>/dev/null
|
||||
}
|
||||
|
||||
install_kata() {
|
||||
local kernel_path=${1:-}
|
||||
[ -n "${kernel_path}" ] || die "kernel_path not provided"
|
||||
[ -d "${kernel_path}" ] || die "path to kernel does not exist, use ${script_name} setup"
|
||||
pushd "${kernel_path}" >>/dev/null
|
||||
config_version=$(get_config_version)
|
||||
[ -n "${config_version}" ] || die "failed to get config version"
|
||||
install_path=$(readlink -m "${DESTDIR}/${PREFIX}/share/${project_name}")
|
||||
|
||||
suffix=""
|
||||
if [[ ${experimental_kernel} == "true" ]]; then
|
||||
suffix="-virtiofs"
|
||||
fi
|
||||
if [[ ${gpu_vendor} != "" ]];then
|
||||
suffix="-${gpu_vendor}-gpu${suffix}"
|
||||
fi
|
||||
|
||||
vmlinuz="vmlinuz-${kernel_version}-${config_version}${suffix}"
|
||||
vmlinux="vmlinux-${kernel_version}-${config_version}${suffix}"
|
||||
|
||||
if [ -e "arch/${arch_target}/boot/bzImage" ]; then
|
||||
bzImage="arch/${arch_target}/boot/bzImage"
|
||||
elif [ -e "arch/${arch_target}/boot/Image.gz" ]; then
|
||||
bzImage="arch/${arch_target}/boot/Image.gz"
|
||||
elif [ "${arch_target}" != "powerpc" ]; then
|
||||
die "failed to find image"
|
||||
fi
|
||||
|
||||
# Install compressed kernel
|
||||
if [ "${arch_target}" = "powerpc" ]; then
|
||||
install --mode 0644 -D "vmlinux" "${install_path}/${vmlinuz}"
|
||||
else
|
||||
install --mode 0644 -D "${bzImage}" "${install_path}/${vmlinuz}"
|
||||
fi
|
||||
|
||||
# Install uncompressed kernel
|
||||
if [ "${arch_target}" = "arm64" ]; then
|
||||
install --mode 0644 -D "arch/${arch_target}/boot/Image" "${install_path}/${vmlinux}"
|
||||
else
|
||||
install --mode 0644 -D "vmlinux" "${install_path}/${vmlinux}"
|
||||
fi
|
||||
|
||||
install --mode 0644 -D ./.config "${install_path}/config-${kernel_version}"
|
||||
|
||||
ln -sf "${vmlinuz}" "${install_path}/vmlinuz${suffix}.container"
|
||||
ln -sf "${vmlinux}" "${install_path}/vmlinux${suffix}.container"
|
||||
ls -la "${install_path}/vmlinux${suffix}.container"
|
||||
ls -la "${install_path}/vmlinuz${suffix}.container"
|
||||
popd >>/dev/null
|
||||
}
|
||||
|
||||
main() {
|
||||
while getopts "a:c:defg:hk:p:t:v:" opt; do
|
||||
case "$opt" in
|
||||
a)
|
||||
arch_target="${OPTARG}"
|
||||
;;
|
||||
c)
|
||||
kernel_config_path="${OPTARG}"
|
||||
;;
|
||||
d)
|
||||
PS4=' Line ${LINENO}: '
|
||||
set -x
|
||||
;;
|
||||
e)
|
||||
experimental_kernel="true"
|
||||
;;
|
||||
f)
|
||||
force_setup_generate_config="true"
|
||||
;;
|
||||
g)
|
||||
gpu_vendor="${OPTARG}"
|
||||
[[ "${gpu_vendor}" == "${GV_INTEL}" || "${gpu_vendor}" == "${GV_NVIDIA}" ]] || die "GPU vendor only support intel and nvidia"
|
||||
;;
|
||||
h)
|
||||
usage 0
|
||||
;;
|
||||
k)
|
||||
kernel_path="${OPTARG}"
|
||||
;;
|
||||
p)
|
||||
patches_path="${OPTARG}"
|
||||
;;
|
||||
t)
|
||||
hypervisor_target="${OPTARG}"
|
||||
;;
|
||||
v)
|
||||
kernel_version="${OPTARG}"
|
||||
;;
|
||||
esac
|
||||
done
|
||||
|
||||
shift $((OPTIND - 1))
|
||||
|
||||
subcmd="${1:-}"
|
||||
|
||||
[ -z "${subcmd}" ] && usage 1
|
||||
|
||||
# If not kernel version take it from versions.yaml
|
||||
if [ -z "$kernel_version" ]; then
|
||||
if [[ ${experimental_kernel} == "true" ]]; then
|
||||
kernel_version=$(get_from_kata_deps "assets.kernel-experimental.tag" "${kata_version}")
|
||||
else
|
||||
kernel_version=$(get_from_kata_deps "assets.kernel.version" "${kata_version}")
|
||||
#Remove extra 'v'
|
||||
kernel_version="${kernel_version#v}"
|
||||
fi
|
||||
fi
|
||||
|
||||
if [ -z "${kernel_path}" ]; then
|
||||
config_version=$(get_config_version)
|
||||
if [[ ${experimental_kernel} == "true" ]]; then
|
||||
kernel_path="${PWD}/kata-linux-experimental-${kernel_version}-${config_version}"
|
||||
else
|
||||
kernel_path="${PWD}/kata-linux-${kernel_version}-${config_version}"
|
||||
fi
|
||||
info "Config version: ${config_version}"
|
||||
fi
|
||||
|
||||
info "Kernel version: ${kernel_version}"
|
||||
|
||||
case "${subcmd}" in
|
||||
build)
|
||||
build_kernel "${kernel_path}"
|
||||
;;
|
||||
install)
|
||||
build_kernel "${kernel_path}"
|
||||
install_kata "${kernel_path}"
|
||||
;;
|
||||
setup)
|
||||
setup_kernel "${kernel_path}"
|
||||
[ -d "${kernel_path}" ] || die "${kernel_path} does not exist"
|
||||
echo "Kernel source ready: ${kernel_path} "
|
||||
;;
|
||||
*)
|
||||
usage 1
|
||||
;;
|
||||
|
||||
esac
|
||||
}
|
||||
|
||||
main $@
|
||||
71
tools/packaging/kernel/configs/README.md
Normal file
71
tools/packaging/kernel/configs/README.md
Normal file
@@ -0,0 +1,71 @@
|
||||
* [Kata Containers kernel config files](#kata-containers-kernel-config-files)
|
||||
* [Types of config files](#types-of-config-files)
|
||||
* [How to use config files](#how-to-use-config-files)
|
||||
* [How to modify config files](#how-to-modify-config-files)
|
||||
|
||||
# Kata Containers kernel config files
|
||||
|
||||
This directory contains Linux Kernel config files used to configure Kata
|
||||
Containers VM kernels.
|
||||
|
||||
## Types of config files
|
||||
|
||||
This directory holds config files for the Kata Linux Kernel in two forms:
|
||||
|
||||
- A tree of config file 'fragments' in the `fragments` sub-folder, that are
|
||||
constructed into a complete config file using the kernel
|
||||
`scripts/kconfig/merge_config.sh` script.
|
||||
- As complete config files that can be used as-is.
|
||||
|
||||
Kernel config fragments are the preferred method of constructing `.config` files
|
||||
to build Kata Containers kernels, due to their improved clarity and ease of maintenance
|
||||
over single file monolithic `.config`s.
|
||||
|
||||
## How to use config files
|
||||
|
||||
The recommended way to set up a kernel tree, populate it with a relevant `.config` file,
|
||||
and build a kernel, is to use the [`build_kernel.sh`](../build-kernel.sh) script. For
|
||||
example:
|
||||
|
||||
```bash
|
||||
$ ./build-kernel.sh setup
|
||||
```
|
||||
|
||||
The `build-kernel.sh` script understands both full and fragment based config files.
|
||||
|
||||
Run `./build-kernel.sh help` for more information.
|
||||
|
||||
## How to modify config files
|
||||
|
||||
Complete config files can be modified either with an editor, or preferably
|
||||
using the kernel `Kconfig` configuration tools, for example:
|
||||
|
||||
```
|
||||
$ cp x86_kata_kvm_4.14.x linux-4.14.22/.config
|
||||
$ pushd linux-4.14.22
|
||||
$ make menuconfig
|
||||
$ popd
|
||||
$ cp linux-4.14.22/.config x86_kata_kvm_4.14.x
|
||||
```
|
||||
|
||||
Kernel fragments are best constructed using an editor. Tools such as `grep` and
|
||||
`diff` can help find the differences between two config files to be placed
|
||||
into a fragment.
|
||||
|
||||
If adding config entries for a new subsystem or feature, consider making a new
|
||||
fragment with an appropriately descriptive name.
|
||||
|
||||
If you want to disable an entire fragment for a specific architecture, you can add the tag `# !${arch}` in the first line of the fragment. You can also exclude multiple architectures on the same line. Note the `#` at the beginning of the line, this is required to avoid that the tag is interpreted as a configuration.
|
||||
Example of valid exclusion:
|
||||
```
|
||||
# !s390x !ppc64le
|
||||
```
|
||||
|
||||
The fragment gathering tool perfoms some basic sanity checks, and the `build-kernel.sh` will
|
||||
fail and report the error in the cases of:
|
||||
|
||||
- A duplicate `CONFIG` symbol appearing.
|
||||
- A `CONFIG` symbol being in a fragment, but not appearing in the final .config
|
||||
- which indicates that `CONFIG` variable is not a part of the kernel `Kconfig` setup, which
|
||||
can indicate a typing mistake in the name of the symbol.
|
||||
- A `CONFIG` symbol appearing in the fragments with multiple different values.
|
||||
2242
tools/packaging/kernel/configs/arm64_kata_kvm_4.14.x
Normal file
2242
tools/packaging/kernel/configs/arm64_kata_kvm_4.14.x
Normal file
File diff suppressed because it is too large
Load Diff
2382
tools/packaging/kernel/configs/arm64_kata_kvm_4.19.x
Normal file
2382
tools/packaging/kernel/configs/arm64_kata_kvm_4.19.x
Normal file
File diff suppressed because it is too large
Load Diff
2793
tools/packaging/kernel/configs/arm64_kata_kvm_5.4.x
Normal file
2793
tools/packaging/kernel/configs/arm64_kata_kvm_5.4.x
Normal file
File diff suppressed because it is too large
Load Diff
2763
tools/packaging/kernel/configs/arm64_kata_kvm_virtio-fs-v0.3.x
Normal file
2763
tools/packaging/kernel/configs/arm64_kata_kvm_virtio-fs-v0.3.x
Normal file
File diff suppressed because it is too large
Load Diff
5
tools/packaging/kernel/configs/fragments/arm64/acpi.conf
Normal file
5
tools/packaging/kernel/configs/fragments/arm64/acpi.conf
Normal file
@@ -0,0 +1,5 @@
|
||||
# ACPI on arm64 is dependent on uEFI.
|
||||
CONFIG_EFI=y
|
||||
CONFIG_EFI_STUB=y
|
||||
# ARM64 can run properly in ACPI hardware reduced mode.
|
||||
CONFIG_ACPI_REDUCED_HARDWARE_ONLY=y
|
||||
42
tools/packaging/kernel/configs/fragments/arm64/base.conf
Normal file
42
tools/packaging/kernel/configs/fragments/arm64/base.conf
Normal file
@@ -0,0 +1,42 @@
|
||||
CONFIG_ARM64=y
|
||||
CONFIG_ARM64_4K_PAGES=y
|
||||
|
||||
# ARM servers are often multi-cores, following configs improve
|
||||
# the CPU scheduler's decision making.
|
||||
CONFIG_SCHED_MC=y
|
||||
CONFIG_SCHED_SMT=y
|
||||
|
||||
# Virtual address space size (48-bit)
|
||||
CONFIG_ARM64_VA_BITS_48=y
|
||||
CONFIG_ARM64_VA_BITS=48
|
||||
# Physical address space size (48-bit)
|
||||
CONFIG_ARM64_PA_BITS_48=y
|
||||
CONFIG_ARM64_PA_BITS=48
|
||||
|
||||
# Use the maximum number of CPUs supported by KVM (255)
|
||||
CONFIG_NR_CPUS=255
|
||||
|
||||
CONFIG_PERF_EVENTS=y
|
||||
|
||||
# No architected NMI
|
||||
CONFIG_ARM64_PSEUDO_NMI=y
|
||||
CONFIG_ARM64_SVE=y
|
||||
|
||||
# Arm64 prefers to use REFCOUNT_FULL by default.
|
||||
CONFIG_REFCOUNT_FULL=y
|
||||
|
||||
#
|
||||
# ARMv8.1 architectural features
|
||||
#
|
||||
CONFIG_ARM64_HW_AFDBM=y
|
||||
CONFIG_ARM64_PAN=y
|
||||
# end of ARMv8.1 architectural features
|
||||
|
||||
#
|
||||
# ARMv8.2 architectural features
|
||||
#
|
||||
CONFIG_ARM64_CNP=y
|
||||
CONFIG_ARM64_PMEM=y
|
||||
CONFIG_ARM64_RAS_EXTN=y
|
||||
CONFIG_ARM64_UAO=y
|
||||
# end of ARMv8.2 architectural feature
|
||||
@@ -0,0 +1,6 @@
|
||||
# ARMv8 adds cryptographic instructions that could significantly improve
|
||||
# performance on tasks such as AES encryption and SHA1 and SHA256 hashing.
|
||||
CONFIG_ARM64_CRYPTO=y
|
||||
CONFIG_CRYPTO_AES_ARM64=y
|
||||
CONFIG_CRYPTO_AES_ARM64_CE=y
|
||||
CONFIG_CRYPTO_SHA256_ARM64=y
|
||||
4
tools/packaging/kernel/configs/fragments/arm64/dt.conf
Normal file
4
tools/packaging/kernel/configs/fragments/arm64/dt.conf
Normal file
@@ -0,0 +1,4 @@
|
||||
# Device Tree and Open Firmware support
|
||||
CONFIG_DTC=y
|
||||
CONFIG_OF=y
|
||||
CONFIG_OF_PMEM=y
|
||||
15
tools/packaging/kernel/configs/fragments/arm64/erratum.conf
Normal file
15
tools/packaging/kernel/configs/fragments/arm64/erratum.conf
Normal file
@@ -0,0 +1,15 @@
|
||||
# ARM errata workarounds via the alternatives framework.
|
||||
# Vendor-specific option will be left to users to decide.
|
||||
CONFIG_ARM64_ERRATUM_1024718=y
|
||||
CONFIG_ARM64_ERRATUM_1165522=y
|
||||
CONFIG_ARM64_ERRATUM_1286807=y
|
||||
CONFIG_ARM64_ERRATUM_1463225=y
|
||||
CONFIG_ARM64_ERRATUM_819472=y
|
||||
CONFIG_ARM64_ERRATUM_824069=y
|
||||
CONFIG_ARM64_ERRATUM_826319=y
|
||||
CONFIG_ARM64_ERRATUM_827319=y
|
||||
CONFIG_ARM64_ERRATUM_832075=y
|
||||
CONFIG_ARM64_ERRATUM_843419=y
|
||||
CONFIG_ARM64_WORKAROUND_CLEAN_CACHE=y
|
||||
CONFIG_ARM64_WORKAROUND_REPEAT_TLBI=y
|
||||
|
||||
3
tools/packaging/kernel/configs/fragments/arm64/pci.conf
Normal file
3
tools/packaging/kernel/configs/fragments/arm64/pci.conf
Normal file
@@ -0,0 +1,3 @@
|
||||
# It brings PCI support to mach-virt based upon an idealised host controller.
|
||||
CONFIG_PCI_HOST_COMMON=y
|
||||
CONFIG_PCI_HOST_GENERIC=y
|
||||
7
tools/packaging/kernel/configs/fragments/arm64/ptp.conf
Normal file
7
tools/packaging/kernel/configs/fragments/arm64/ptp.conf
Normal file
@@ -0,0 +1,7 @@
|
||||
# PTP clock support
|
||||
#
|
||||
# The implementation of ptp_kvm on arm is one experimental feature,
|
||||
# you need to apply private patches to enable it on your host machine.
|
||||
# See https://github.com/kata-containers/packaging/pull/998 for detailed info.
|
||||
CONFIG_PTP_1588_CLOCK=y
|
||||
CONFIG_PTP_1588_CLOCK_KVM=y
|
||||
10
tools/packaging/kernel/configs/fragments/arm64/rtc.conf
Normal file
10
tools/packaging/kernel/configs/fragments/arm64/rtc.conf
Normal file
@@ -0,0 +1,10 @@
|
||||
CONFIG_RTC_LIB=y
|
||||
CONFIG_RTC_CLASS=y
|
||||
CONFIG_RTC_HCTOSYS=y
|
||||
CONFIG_RTC_SYSTOHC=y
|
||||
# RTC interfaces
|
||||
CONFIG_RTC_INTF_SYSFS=y
|
||||
CONFIG_RTC_INTF_PROC=y
|
||||
CONFIG_RTC_INTF_DEV=y
|
||||
# QEMU provides an emulated ARM AMBA PrimeCell PL031 RTC.
|
||||
CONFIG_RTC_DRV_PL031=y
|
||||
@@ -0,0 +1,3 @@
|
||||
# This option is used for all 8250 compatible serial ports
|
||||
# that are probed through device tree.
|
||||
CONFIG_SERIAL_OF_PLATFORM=y
|
||||
17
tools/packaging/kernel/configs/fragments/common/9p.conf
Normal file
17
tools/packaging/kernel/configs/fragments/common/9p.conf
Normal file
@@ -0,0 +1,17 @@
|
||||
# Enable 9p(fs) support - required for Kata to mount filesystems into the workload
|
||||
|
||||
CONFIG_NET_9P=y
|
||||
CONFIG_NET_9P_VIRTIO=y
|
||||
CONFIG_9P_FS=y
|
||||
# NOTE - 9p client cacheing turned off?
|
||||
# FIXME: check if that is right?
|
||||
# https://github.com/kata-containers/packaging/issues/483
|
||||
#CONFIG_9P_FSCACHE=y
|
||||
CONFIG_NETWORK_FILESYSTEMS=y
|
||||
# Q. Do we use the POSIX_ACL over 9p?
|
||||
# FIXME: https://github.com/kata-containers/packaging/issues/483
|
||||
CONFIG_9P_FS_POSIX_ACL=y
|
||||
# NOTE - this adds security labels, such as used by SELinux - we may be able to
|
||||
# disable this, for now.
|
||||
# FIXME: https://github.com/kata-containers/packaging/issues/483
|
||||
CONFIG_9P_FS_SECURITY=y
|
||||
20
tools/packaging/kernel/configs/fragments/common/acpi.conf
Normal file
20
tools/packaging/kernel/configs/fragments/common/acpi.conf
Normal file
@@ -0,0 +1,20 @@
|
||||
# enable ACPI support.
|
||||
# This could do with REVIEW
|
||||
# https://github.com/kata-containers/packaging/issues/483
|
||||
CONFIG_ARCH_SUPPORTS_ACPI=y
|
||||
CONFIG_ACPI=y
|
||||
CONFIG_ACPI_BUTTON=y
|
||||
CONFIG_ACPI_PROCESSOR_IDLE=y
|
||||
# Having trouble enabling this - disable for now.
|
||||
# Would add support for ACPI CPPC power control via firmware - do we need
|
||||
# that for the guest??
|
||||
#CONFIG_ACPI_CPPC_LIB=y
|
||||
CONFIG_ACPI_PROCESSOR=y
|
||||
CONFIG_ACPI_HOTPLUG_CPU=y
|
||||
CONFIG_ARCH_HAS_ACPI_TABLE_UPGRADE=y
|
||||
CONFIG_ACPI_TABLE_UPGRADE=y
|
||||
CONFIG_ACPI_PCI_SLOT=y
|
||||
CONFIG_ACPI_CONTAINER=y
|
||||
CONFIG_ACPI_HOTPLUG_MEMORY=y
|
||||
CONFIG_ACPI_NFIT=y
|
||||
CONFIG_HAVE_ACPI_APEI=y
|
||||
52
tools/packaging/kernel/configs/fragments/common/base.conf
Normal file
52
tools/packaging/kernel/configs/fragments/common/base.conf
Normal file
@@ -0,0 +1,52 @@
|
||||
# Basic necessary items!
|
||||
|
||||
CONFIG_SECTION_MISMATCH_WARN_ONLY=y
|
||||
CONFIG_SMP=y
|
||||
CONFIG_PARAVIRT=y
|
||||
# Note, no nested VM support enabled here
|
||||
|
||||
# Turn off embedded mode, as it disabled 'too much', and we
|
||||
# no longer pass all the tests. We should refine this, and
|
||||
# work out which of the ~66 items it enables are really needed.
|
||||
# I believe this is the actual syntax we need for a fragment to
|
||||
# disable an item...
|
||||
# CONFIG_EMBEDDED is not set
|
||||
|
||||
# Note, no virt enabled baloon yet
|
||||
CONFIG_INPUT=y
|
||||
CONFIG_PRINTK=y
|
||||
# We use this for metrics!
|
||||
CONFIG_PRINTK_TIME=y
|
||||
CONFIG_UNIX98_PTYS=y
|
||||
CONFIG_FUTEX=y
|
||||
CONFIG_HIGH_RES_TIMERS=y
|
||||
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
|
||||
CONFIG_GENERIC_MSI_IRQ=y
|
||||
CONFIG_NO_HZ=y
|
||||
CONFIG_NO_HZ_FULL=y
|
||||
CONFIG_POSIX_MQUEUE=y
|
||||
CONFIG_POSIX_TIMERS=y
|
||||
CONFIG_PROC_SYSCTL=y
|
||||
|
||||
CONFIG_SHMEM=y
|
||||
|
||||
# For security...
|
||||
CONFIG_RELOCATABLE=y
|
||||
CONFIG_RANDOMIZE_BASE=y
|
||||
# FIXME - check if we should be setting this
|
||||
# https://github.com/kata-containers/packaging/issues/483
|
||||
# I have a feeling it effects our memory hotplug maybe?
|
||||
# PHYSICAL_ALIGN=0x1000000
|
||||
|
||||
# This would only affect two drivers, neither of which we have enabled.
|
||||
# The recommendation is to have it on, and you will see if in a diff if you
|
||||
# look for differences against the frag generated config - so, add it here as
|
||||
# a comment to make it clear in the future why we have not set it - as it would
|
||||
# only add noise to our frags and config.
|
||||
# PREVENT_FIRMWARE_BUILD=y
|
||||
|
||||
# Trust the hardware vendor to initialise the RNG - which can speed up boot.
|
||||
# This can still be dynamically disabled on the kernel command line/kata config if needed.
|
||||
# Disable for now, as it upsets the entropy test, and we need to improve those: FIXME: see:
|
||||
# https://github.com/kata-containers/tests/issues/1543
|
||||
# RANDOM_TRUST_CPU=y
|
||||
26
tools/packaging/kernel/configs/fragments/common/cgroup.conf
Normal file
26
tools/packaging/kernel/configs/fragments/common/cgroup.conf
Normal file
@@ -0,0 +1,26 @@
|
||||
# Add cgroup support. Needed both for the agent to place the workload into, and
|
||||
# also used/looked for by systemd rootfs.
|
||||
CONFIG_CGROUPS=y
|
||||
CONFIG_MEMCG=y
|
||||
CONFIG_BLK_CGROUP=y
|
||||
CONFIG_CGROUP_WRITEBACK=y
|
||||
CONFIG_CGROUP_SCHED=y
|
||||
CONFIG_FAIR_GROUP_SCHED=y
|
||||
CONFIG_CFS_BANDWIDTH=y
|
||||
CONFIG_CGROUP_PIDS=y
|
||||
CONFIG_CGROUP_FREEZER=y
|
||||
CONFIG_CPUSETS=y
|
||||
CONFIG_CGROUP_DEVICE=y
|
||||
CONFIG_CGROUP_CPUACCT=y
|
||||
CONFIG_CGROUP_PERF=y
|
||||
CONFIG_SOCK_CGROUP_DATA=y
|
||||
|
||||
# We have to enable SWAP CG, as runc/libcontainer in the agent currently fails
|
||||
# to write to it, even though it does some checks to see if swap is enabled.
|
||||
CONFIG_SWAP=y
|
||||
CONFIG_MEMCG_SWAP=y
|
||||
CONFIG_MEMCG_SWAP_ENABLED=y
|
||||
|
||||
# Needed for cgroups v2
|
||||
CONFIG_BPF_SYSCALL=y
|
||||
CONFIG_CGROUP_BPF=y
|
||||
7
tools/packaging/kernel/configs/fragments/common/cpu.conf
Normal file
7
tools/packaging/kernel/configs/fragments/common/cpu.conf
Normal file
@@ -0,0 +1,7 @@
|
||||
# Items to do with CPU frequency, power etc.
|
||||
|
||||
CONFIG_CPU_FREQ=y
|
||||
CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
|
||||
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
|
||||
CONFIG_CPU_IDLE=y
|
||||
CONFIG_CPU_IDLE_GOV_MENU=y
|
||||
17
tools/packaging/kernel/configs/fragments/common/crypto.conf
Normal file
17
tools/packaging/kernel/configs/fragments/common/crypto.conf
Normal file
@@ -0,0 +1,17 @@
|
||||
# Need decompressors for root filesystems and kernels.
|
||||
# Do we need all of these?
|
||||
CONFIG_CRYPTO=y
|
||||
# Deflate used by IPSec and IPCOMP protocols
|
||||
# Also selects ZLIB and a couple of other algos
|
||||
CONFIG_CRYPTO_DEFLATE=y
|
||||
CONFIG_XZ_DEC=y
|
||||
CONFIG_ZLIB_DEFLATE=y
|
||||
# FIXME - check, do we need gzip?
|
||||
# https://github.com/kata-containers/packaging/issues/483
|
||||
CONFIG_DECOMPRESS_GZIP=y
|
||||
# Some items required by systemd: https://github.com/systemd/systemd/blob/master/README
|
||||
CONFIG_CRYPTO_USER_API=y
|
||||
CONFIG_CRYPTO_USER_API_HASH=y
|
||||
CONFIG_CRYPTO_SHA256=y
|
||||
CONFIG_CRYPTO_FIPS=y
|
||||
CONFIG_CRYPTO_ANSI_CPRNG=y
|
||||
32
tools/packaging/kernel/configs/fragments/common/dax.conf
Normal file
32
tools/packaging/kernel/configs/fragments/common/dax.conf
Normal file
@@ -0,0 +1,32 @@
|
||||
# Enable DAX and NVDIMM support so we can map in our rootfs
|
||||
|
||||
# Need HOTREMOVE, or ZONE_DEVICE will not get enabled
|
||||
# We don't actually afaik remove any memory once we have plugged it in, as
|
||||
# generally it is too 'expensive' an operation.
|
||||
CONFIG_MEMORY_HOTREMOVE=y
|
||||
# Also need this
|
||||
CONFIG_SPARSEMEM_VMEMMAP=y
|
||||
|
||||
# Without these the pmem_should_map_pages() call in the kernel fails with new
|
||||
# Related to the ARCH_HAS_HMM set in the arch files.
|
||||
CONFIG_ZONE_DEVICE=y
|
||||
CONFIG_DEV_PAGEMAP_OPS=y
|
||||
|
||||
CONFIG_ND_PFN=y
|
||||
CONFIG_NVDIMM_PFN=y
|
||||
CONFIG_NVDIMM_DAX=y
|
||||
|
||||
CONFIG_BLOCK=y
|
||||
CONFIG_BLK_DEV=y
|
||||
CONFIG_BLK_DEV_PMEM=y
|
||||
CONFIG_BLK_DEV_RAM=y
|
||||
CONFIG_LIBNVDIMM=y
|
||||
CONFIG_ND_BLK=y
|
||||
CONFIG_BTT=y
|
||||
# FIXME: Should check if this is really needed
|
||||
# https://github.com/kata-containers/packaging/issues/483
|
||||
CONFIG_NVMEM=y
|
||||
# Is auto selected by other options
|
||||
#CONFIG_DAX_DRIVER=y
|
||||
CONFIG_DAX=y
|
||||
CONFIG_FS_DAX=y
|
||||
5
tools/packaging/kernel/configs/fragments/common/elf.conf
Normal file
5
tools/packaging/kernel/configs/fragments/common/elf.conf
Normal file
@@ -0,0 +1,5 @@
|
||||
# Enable Elf loading, and script loading
|
||||
|
||||
CONFIG_BINFMT_ELF=y
|
||||
CONFIG_BINFMT_SCRIPT=y
|
||||
CONFIG_BINFMT_MISC=y
|
||||
@@ -0,0 +1,3 @@
|
||||
# virtio-fs support
|
||||
CONFIG_VIRTIO_FS=y
|
||||
CONFIG_FUSE_FS=y
|
||||
51
tools/packaging/kernel/configs/fragments/common/fs.conf
Normal file
51
tools/packaging/kernel/configs/fragments/common/fs.conf
Normal file
@@ -0,0 +1,51 @@
|
||||
# Enable a whole bunch of filesystem related items
|
||||
|
||||
CONFIG_BLK_DEV_INITRD=y
|
||||
|
||||
# Recommended for Docker
|
||||
CONFIG_BLK_DEV_THROTTLING=y
|
||||
|
||||
# Required for hotplug block devices into Kata, using SCSI
|
||||
CONFIG_BLK_DEV_LOOP=y
|
||||
CONFIG_BLK_DEV_BSG=y
|
||||
CONFIG_BLK_DEV_SD=y
|
||||
|
||||
# support initial ramdisk
|
||||
CONFIG_RD_GZIP=y
|
||||
CONFIG_FS_IOMAP=y
|
||||
CONFIG_EXT4_FS=y
|
||||
CONFIG_EXT4_USE_FOR_EXT2=y
|
||||
CONFIG_EXT4_FS_POSIX_ACL=y
|
||||
CONFIG_EXT4_FS_SECURITY=y
|
||||
# FIXME - do we need journalling support in the container?
|
||||
# https://github.com/kata-containers/packaging/issues/483
|
||||
CONFIG_JBD2=y
|
||||
CONFIG_FS_MBCACHE=y
|
||||
CONFIG_XFS_FS=y
|
||||
CONFIG_FS_POSIX_ACL=y
|
||||
CONFIG_EXPORTFS=y
|
||||
CONFIG_EXPORTFS_BLOCK_OPS=y
|
||||
CONFIG_FILE_LOCKING=y
|
||||
CONFIG_MANDATORY_FILE_LOCKING=y
|
||||
# A bunch of these are required for systemd at least.
|
||||
CONFIG_FSNOTIFY=y
|
||||
CONFIG_DNOTIFY=y
|
||||
CONFIG_INOTIFY_USER=y
|
||||
CONFIG_FANOTIFY=y
|
||||
CONFIG_AUTOFS4_FS=y
|
||||
CONFIG_AUTOFS_FS=y
|
||||
CONFIG_TMPFS=y
|
||||
CONFIG_DEVTMPFS=y
|
||||
CONFIG_DEVTMPFS_MOUNT=y
|
||||
CONFIG_SIGNALFD=y
|
||||
CONFIG_TIMERFD=y
|
||||
CONFIG_EPOLL=y
|
||||
CONFIG_FHANDLE=y
|
||||
|
||||
# We should support Async IO.
|
||||
CONFIG_AIO=y
|
||||
|
||||
# Docker in Docker support requires overlay
|
||||
CONFIG_OVERLAY_FS=y
|
||||
CONFIG_OVERLAY_FS_INDEX=y
|
||||
CONFIG_OVERLAY_FS_REDIRECT_DIR=y
|
||||
13
tools/packaging/kernel/configs/fragments/common/hotplug.conf
Normal file
13
tools/packaging/kernel/configs/fragments/common/hotplug.conf
Normal file
@@ -0,0 +1,13 @@
|
||||
# Setups to support our hotplug - memory, PCI devices and cpus
|
||||
|
||||
CONFIG_MEMORY_HOTPLUG=y
|
||||
CONFIG_HOTPLUG_CPU=y
|
||||
CONFIG_HOTPLUG_PCI=y
|
||||
CONFIG_HOTPLUG_PCI_PCIE=y
|
||||
CONFIG_PCIEPORTBUS=y
|
||||
CONFIG_HOTPLUG_PCI_ACPI=y
|
||||
CONFIG_PNPACPI=y
|
||||
|
||||
# Define hotplugs to be online immediately. Speeds things up, and makes things
|
||||
# work smoother on some arch's.
|
||||
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
|
||||
12
tools/packaging/kernel/configs/fragments/common/huge.conf
Normal file
12
tools/packaging/kernel/configs/fragments/common/huge.conf
Normal file
@@ -0,0 +1,12 @@
|
||||
# Items to enable large/huge mmu pages and tlbs etc.
|
||||
|
||||
# Compaction is the only memory management component to form high order
|
||||
# (larger physically contiguous) memory blocks reliably. The lack of the
|
||||
# feature can lead to unexpected OOM killer invocations for high order memory requests.
|
||||
CONFIG_COMPACTION=y
|
||||
|
||||
CONFIG_HUGETLBFS=y
|
||||
|
||||
# Enable memory page physical migration here, as it can come
|
||||
# into play when trying to find space to allocate a hugepage.
|
||||
CONFIG_MIGRATION=y
|
||||
@@ -0,0 +1,3 @@
|
||||
# mmio devices are required for firecracker
|
||||
CONFIG_VIRTIO_MMIO=y
|
||||
CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES=y
|
||||
5
tools/packaging/kernel/configs/fragments/common/mmu.conf
Normal file
5
tools/packaging/kernel/configs/fragments/common/mmu.conf
Normal file
@@ -0,0 +1,5 @@
|
||||
# MMU specific items
|
||||
|
||||
# vmap the kernel stacks - detects stack over-runs better and reduces
|
||||
# the stack attack window.
|
||||
CONFIG_VMAP_STACK=y
|
||||
@@ -0,0 +1,11 @@
|
||||
# We need namespaces to isolate the workload
|
||||
|
||||
# Cannot have namespaces if not multi user...
|
||||
CONFIG_MULTIUSER=y
|
||||
CONFIG_NAMESPACES=y
|
||||
CONFIG_SYSVIPC=y
|
||||
CONFIG_UTS_NS=y
|
||||
CONFIG_IPC_NS=y
|
||||
CONFIG_USER_NS=y
|
||||
CONFIG_PID_NS=y
|
||||
CONFIG_NET_NS=y
|
||||
203
tools/packaging/kernel/configs/fragments/common/netfilter.conf
Normal file
203
tools/packaging/kernel/configs/fragments/common/netfilter.conf
Normal file
@@ -0,0 +1,203 @@
|
||||
# Netfilter (used by sidecars like istio)
|
||||
|
||||
# FIXME - this is a big file - it could probably benefit from a
|
||||
# good reviewing. https://github.com/kata-containers/packaging/issues/483
|
||||
|
||||
CONFIG_NETFILTER=y
|
||||
CONFIG_NETFILTER_ADVANCED=y
|
||||
CONFIG_NETFILTER_INGRESS=y
|
||||
CONFIG_NETFILTER_NETLINK=y
|
||||
CONFIG_NETFILTER_FAMILY_ARP=y
|
||||
CONFIG_NETFILTER_NETLINK_ACCT=y
|
||||
CONFIG_NETFILTER_NETLINK_QUEUE=y
|
||||
CONFIG_NETFILTER_NETLINK_LOG=y
|
||||
CONFIG_NETFILTER_NETLINK_OSF=y
|
||||
CONFIG_NF_CONNTRACK=y
|
||||
CONFIG_NF_LOG_COMMON=y
|
||||
CONFIG_NETFILTER_CONNCOUNT=y
|
||||
CONFIG_NF_CONNTRACK_MARK=y
|
||||
CONFIG_NF_CONNTRACK_ZONES=y
|
||||
CONFIG_NF_CONNTRACK_EVENTS=y
|
||||
CONFIG_NF_CONNTRACK_TIMEOUT=y
|
||||
CONFIG_NF_CONNTRACK_TIMESTAMP=y
|
||||
CONFIG_NF_CONNTRACK_LABELS=y
|
||||
CONFIG_NF_CT_PROTO_DCCP=y
|
||||
CONFIG_NF_CT_PROTO_GRE=y
|
||||
CONFIG_NF_CT_PROTO_SCTP=y
|
||||
CONFIG_NF_CT_PROTO_UDPLITE=y
|
||||
CONFIG_NF_CONNTRACK_AMANDA=y
|
||||
CONFIG_NF_CONNTRACK_FTP=y
|
||||
CONFIG_NF_CONNTRACK_H323=y
|
||||
CONFIG_NF_CONNTRACK_IRC=y
|
||||
CONFIG_NF_CONNTRACK_BROADCAST=y
|
||||
CONFIG_NF_CONNTRACK_NETBIOS_NS=y
|
||||
CONFIG_NF_CONNTRACK_SNMP=y
|
||||
CONFIG_NF_CONNTRACK_PPTP=y
|
||||
CONFIG_NF_CONNTRACK_SANE=y
|
||||
CONFIG_NF_CONNTRACK_SIP=y
|
||||
CONFIG_NF_CONNTRACK_TFTP=y
|
||||
CONFIG_NF_CT_NETLINK=y
|
||||
CONFIG_NF_CT_NETLINK_TIMEOUT=y
|
||||
CONFIG_NF_CT_NETLINK_HELPER=y
|
||||
CONFIG_NETFILTER_NETLINK_GLUE_CT=y
|
||||
CONFIG_NF_NAT=y
|
||||
# NF_NAT_NEEDED is removed in newer kernels - we should drop once we move to next LTS (5.4).
|
||||
# This is part of whitelist.conf
|
||||
CONFIG_NF_NAT_NEEDED=y
|
||||
|
||||
# NF_NAT_PROTO_* are removed in newer kernels, but needed currentlyi. They are part of whitelist.conf:
|
||||
CONFIG_NF_NAT_PROTO_DCCP=y
|
||||
CONFIG_NF_NAT_PROTO_UDPLITE=y
|
||||
CONFIG_NF_NAT_PROTO_SCTP=y
|
||||
CONFIG_NF_NAT_PROTO_GRE=y
|
||||
|
||||
CONFIG_NF_NAT_AMANDA=y
|
||||
CONFIG_NF_NAT_FTP=y
|
||||
CONFIG_NF_NAT_IRC=y
|
||||
CONFIG_NF_NAT_SIP=y
|
||||
CONFIG_NF_NAT_TFTP=y
|
||||
CONFIG_NF_NAT_REDIRECT=y
|
||||
CONFIG_NETFILTER_SYNPROXY=y
|
||||
CONFIG_NETFILTER_XTABLES=y
|
||||
CONFIG_NETFILTER_XT_MARK=y
|
||||
CONFIG_NETFILTER_XT_CONNMARK=y
|
||||
CONFIG_NETFILTER_XT_SET=y
|
||||
CONFIG_NETFILTER_XT_TARGET_CHECKSUM=y
|
||||
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=y
|
||||
CONFIG_NETFILTER_XT_TARGET_CONNMARK=y
|
||||
CONFIG_NETFILTER_XT_TARGET_CT=y
|
||||
CONFIG_NETFILTER_XT_TARGET_DSCP=y
|
||||
CONFIG_NETFILTER_XT_TARGET_HL=y
|
||||
CONFIG_NETFILTER_XT_TARGET_HMARK=y
|
||||
CONFIG_NETFILTER_XT_TARGET_IDLETIMER=y
|
||||
CONFIG_NETFILTER_XT_TARGET_LOG=y
|
||||
CONFIG_NETFILTER_XT_TARGET_MARK=y
|
||||
CONFIG_NETFILTER_XT_NAT=y
|
||||
CONFIG_NETFILTER_XT_TARGET_NETMAP=y
|
||||
CONFIG_NETFILTER_XT_TARGET_NFLOG=y
|
||||
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=y
|
||||
CONFIG_NETFILTER_XT_TARGET_RATEEST=y
|
||||
CONFIG_NETFILTER_XT_TARGET_REDIRECT=y
|
||||
CONFIG_NETFILTER_XT_TARGET_TEE=y
|
||||
CONFIG_NETFILTER_XT_TARGET_TPROXY=y
|
||||
CONFIG_NETFILTER_XT_TARGET_TRACE=y
|
||||
CONFIG_NETFILTER_XT_TARGET_TCPMSS=y
|
||||
CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP=y
|
||||
CONFIG_NETFILTER_XT_MATCH_ADDRTYPE=y
|
||||
CONFIG_NETFILTER_XT_MATCH_BPF=y
|
||||
CONFIG_NETFILTER_XT_MATCH_CGROUP=y
|
||||
CONFIG_NETFILTER_XT_MATCH_CLUSTER=y
|
||||
CONFIG_NETFILTER_XT_MATCH_COMMENT=y
|
||||
CONFIG_NETFILTER_XT_MATCH_CONNBYTES=y
|
||||
CONFIG_NETFILTER_XT_MATCH_CONNLABEL=y
|
||||
CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=y
|
||||
CONFIG_NETFILTER_XT_MATCH_CONNMARK=y
|
||||
CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y
|
||||
CONFIG_NETFILTER_XT_MATCH_CPU=y
|
||||
CONFIG_NETFILTER_XT_MATCH_DCCP=y
|
||||
CONFIG_NETFILTER_XT_MATCH_DEVGROUP=y
|
||||
CONFIG_NETFILTER_XT_MATCH_DSCP=y
|
||||
CONFIG_NETFILTER_XT_MATCH_ECN=y
|
||||
CONFIG_NETFILTER_XT_MATCH_ESP=y
|
||||
CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=y
|
||||
CONFIG_NETFILTER_XT_MATCH_HELPER=y
|
||||
CONFIG_NETFILTER_XT_MATCH_HL=y
|
||||
CONFIG_NETFILTER_XT_MATCH_IPCOMP=y
|
||||
CONFIG_NETFILTER_XT_MATCH_IPRANGE=y
|
||||
CONFIG_NETFILTER_XT_MATCH_IPVS=y
|
||||
CONFIG_NETFILTER_XT_MATCH_L2TP=y
|
||||
CONFIG_NETFILTER_XT_MATCH_LENGTH=y
|
||||
CONFIG_NETFILTER_XT_MATCH_LIMIT=y
|
||||
CONFIG_NETFILTER_XT_MATCH_MAC=y
|
||||
CONFIG_NETFILTER_XT_MATCH_MARK=y
|
||||
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=y
|
||||
CONFIG_NETFILTER_XT_MATCH_NFACCT=y
|
||||
CONFIG_NETFILTER_XT_MATCH_OSF=y
|
||||
CONFIG_NETFILTER_XT_MATCH_OWNER=y
|
||||
CONFIG_NETFILTER_XT_MATCH_POLICY=y
|
||||
CONFIG_NETFILTER_XT_MATCH_PKTTYPE=y
|
||||
CONFIG_NETFILTER_XT_MATCH_QUOTA=y
|
||||
CONFIG_NETFILTER_XT_MATCH_RATEEST=y
|
||||
CONFIG_NETFILTER_XT_MATCH_REALM=y
|
||||
CONFIG_NETFILTER_XT_MATCH_RECENT=y
|
||||
CONFIG_NETFILTER_XT_MATCH_SCTP=y
|
||||
CONFIG_NETFILTER_XT_MATCH_STATE=y
|
||||
CONFIG_NETFILTER_XT_MATCH_STATISTIC=y
|
||||
CONFIG_NETFILTER_XT_MATCH_STRING=y
|
||||
CONFIG_NETFILTER_XT_MATCH_TCPMSS=y
|
||||
CONFIG_NETFILTER_XT_MATCH_TIME=y
|
||||
CONFIG_NETFILTER_XT_MATCH_U32=y
|
||||
CONFIG_IP_SET=y
|
||||
CONFIG_IP_SET_BITMAP_IP=y
|
||||
CONFIG_IP_SET_BITMAP_IPMAC=y
|
||||
CONFIG_IP_SET_BITMAP_PORT=y
|
||||
CONFIG_IP_SET_HASH_IP=y
|
||||
CONFIG_IP_SET_HASH_IPMARK=y
|
||||
CONFIG_IP_SET_HASH_IPPORT=y
|
||||
CONFIG_IP_SET_HASH_IPPORTIP=y
|
||||
CONFIG_IP_SET_HASH_IPPORTNET=y
|
||||
CONFIG_IP_SET_HASH_MAC=y
|
||||
CONFIG_IP_SET_HASH_NETPORTNET=y
|
||||
CONFIG_IP_SET_HASH_NET=y
|
||||
CONFIG_IP_SET_HASH_NETNET=y
|
||||
CONFIG_IP_SET_HASH_NETPORT=y
|
||||
CONFIG_IP_SET_HASH_NETIFACE=y
|
||||
CONFIG_IP_SET_LIST_SET=y
|
||||
CONFIG_IP_VS=y
|
||||
CONFIG_IP_VS_PROTO_TCP=y
|
||||
CONFIG_IP_VS_PROTO_UDP=y
|
||||
CONFIG_IP_VS_PROTO_AH_ESP=y
|
||||
CONFIG_IP_VS_PROTO_ESP=y
|
||||
CONFIG_IP_VS_PROTO_AH=y
|
||||
CONFIG_IP_VS_PROTO_SCTP=y
|
||||
CONFIG_IP_VS_RR=y
|
||||
CONFIG_IP_VS_WRR=y
|
||||
CONFIG_IP_VS_LC=y
|
||||
CONFIG_IP_VS_WLC=y
|
||||
CONFIG_IP_VS_FO=y
|
||||
CONFIG_IP_VS_OVF=y
|
||||
CONFIG_IP_VS_LBLC=y
|
||||
CONFIG_IP_VS_LBLCR=y
|
||||
CONFIG_IP_VS_DH=y
|
||||
CONFIG_IP_VS_SH=y
|
||||
CONFIG_IP_VS_SED=y
|
||||
CONFIG_IP_VS_NQ=y
|
||||
CONFIG_IP_VS_FTP=y
|
||||
CONFIG_IP_VS_NFCT=y
|
||||
CONFIG_IP_VS_PE_SIP=y
|
||||
CONFIG_NF_DEFRAG_IPV4=y
|
||||
CONFIG_NF_TPROXY_IPV4=y
|
||||
CONFIG_NF_DUP_IPV4=y
|
||||
CONFIG_NF_LOG_IPV4=y
|
||||
CONFIG_NF_REJECT_IPV4=y
|
||||
|
||||
# NF_NAT_IPV4 is removed in future kernel, and is part of whitelist.conf:
|
||||
CONFIG_NF_NAT_IPV4=y
|
||||
|
||||
CONFIG_NF_NAT_SNMP_BASIC=y
|
||||
CONFIG_NF_NAT_PPTP=y
|
||||
CONFIG_NF_NAT_H323=y
|
||||
CONFIG_IP_NF_IPTABLES=y
|
||||
CONFIG_IP_NF_MATCH_AH=y
|
||||
CONFIG_IP_NF_MATCH_ECN=y
|
||||
CONFIG_IP_NF_MATCH_RPFILTER=y
|
||||
CONFIG_IP_NF_MATCH_TTL=y
|
||||
CONFIG_IP_NF_FILTER=y
|
||||
CONFIG_IP_NF_TARGET_REJECT=y
|
||||
CONFIG_IP_NF_TARGET_SYNPROXY=y
|
||||
CONFIG_IP_NF_NAT=y
|
||||
CONFIG_IP_NF_TARGET_MASQUERADE=y
|
||||
CONFIG_IP_NF_TARGET_NETMAP=y
|
||||
CONFIG_IP_NF_TARGET_REDIRECT=y
|
||||
CONFIG_IP_NF_MANGLE=y
|
||||
CONFIG_IP_NF_TARGET_CLUSTERIP=y
|
||||
CONFIG_IP_NF_TARGET_ECN=y
|
||||
CONFIG_IP_NF_TARGET_TTL=y
|
||||
CONFIG_IP_NF_RAW=y
|
||||
CONFIG_IP_NF_SECURITY=y
|
||||
CONFIG_IP_NF_ARPTABLES=y
|
||||
CONFIG_IP_NF_ARPFILTER=y
|
||||
CONFIG_IP_NF_ARP_MANGLE=y
|
||||
CONFIG_NF_DUP_IPV6=y
|
||||
CONFIG_NF_LOG_IPV6=y
|
||||
CONFIG_NF_DEFRAG_IPV6=y
|
||||
75
tools/packaging/kernel/configs/fragments/common/network.conf
Normal file
75
tools/packaging/kernel/configs/fragments/common/network.conf
Normal file
@@ -0,0 +1,75 @@
|
||||
# Our networking requirements
|
||||
### FIXME - this probably needs a good review ###
|
||||
# https://github.com/kata-containers/packaging/issues/483
|
||||
|
||||
# pre-reqs
|
||||
CONFIG_NETDEVICES=y
|
||||
CONFIG_PROC_FS=y
|
||||
CONFIG_SYSFS=y
|
||||
CONFIG_SECURITY=y
|
||||
|
||||
# The list
|
||||
CONFIG_NET=y
|
||||
CONFIG_ETHERNET=y
|
||||
CONFIG_NET_CORE=y
|
||||
CONFIG_NET_INGRESS=y
|
||||
CONFIG_PACKET=y
|
||||
CONFIG_PACKET_DIAG=y
|
||||
CONFIG_UNIX=y
|
||||
CONFIG_XFRM=y
|
||||
CONFIG_XFRM_ALGO=y
|
||||
CONFIG_XFRM_USER=y
|
||||
CONFIG_XFRM_SUB_POLICY=y
|
||||
# Used for mobile ipv6 type instances, unlikely we need
|
||||
#CONFIG_XFRM_MIGRATE=y
|
||||
# Developer feature - unlikely we need it
|
||||
#CONFIG_XFRM_STATISTICS=y
|
||||
CONFIG_INET=y
|
||||
CONFIG_IP_MULTICAST=y
|
||||
CONFIG_IP_ROUTE_CLASSID=y
|
||||
CONFIG_IP_PNP=y
|
||||
CONFIG_IP_PNP_DHCP=y
|
||||
CONFIG_SYN_COOKIES=y
|
||||
CONFIG_TCP_CONG_ADVANCED=y
|
||||
CONFIG_TCP_CONG_BBR=y
|
||||
CONFIG_DEFAULT_BBR=y
|
||||
CONFIG_TCP_MD5SIG=y
|
||||
CONFIG_IPV6=y
|
||||
CONFIG_IPV6_MULTIPLE_TABLES=y
|
||||
|
||||
CONFIG_STP=y
|
||||
CONFIG_BRIDGE=y
|
||||
CONFIG_BRIDGE_IGMP_SNOOPING=y
|
||||
CONFIG_HAVE_NET_DSA=y
|
||||
CONFIG_LLC=y
|
||||
CONFIG_NET_SCHED=y
|
||||
CONFIG_NET_SCH_CBQ=y
|
||||
CONFIG_NET_SCH_MULTIQ=y
|
||||
CONFIG_NET_SCH_FQ_CODEL=y
|
||||
CONFIG_NET_SCH_FQ=y
|
||||
CONFIG_NET_CLS=y
|
||||
CONFIG_NET_CLS_CGROUP=y
|
||||
CONFIG_NET_EMATCH=y
|
||||
CONFIG_NET_SCH_FIFO=y
|
||||
CONFIG_VSOCKETS=y
|
||||
CONFIG_VIRTIO_VSOCKETS=y
|
||||
CONFIG_VIRTIO_VSOCKETS_COMMON=y
|
||||
CONFIG_NET_SWITCHDEV=y
|
||||
CONFIG_RPS=y
|
||||
CONFIG_RFS_ACCEL=y
|
||||
CONFIG_XPS=y
|
||||
CONFIG_CGROUP_NET_PRIO=y
|
||||
CONFIG_CGROUP_NET_CLASSID=y
|
||||
CONFIG_NET_RX_BUSY_POLL=y
|
||||
CONFIG_BQL=y
|
||||
CONFIG_NET_FLOW_LIMIT=y
|
||||
CONFIG_GRO_CELLS=y
|
||||
CONFIG_FAILOVER=y
|
||||
CONFIG_HAVE_EBPF_JIT=y
|
||||
|
||||
# We v.likely need some intel chip support
|
||||
CONFIG_NET_VENDOR_INTEL=y
|
||||
|
||||
# Add VETH support (necessary for running Docker in the guest)
|
||||
CONFIG_VETH=y
|
||||
# We quite likely need to add others for passthrough and maybe SRIOV support
|
||||
@@ -0,0 +1,4 @@
|
||||
# enable seccomp items
|
||||
|
||||
CONFIG_SECCOMP=y
|
||||
CONFIG_SECCOMP_FILTER=y
|
||||
@@ -0,0 +1,6 @@
|
||||
|
||||
# Let's enable stack protection checks, and strong checks
|
||||
# Estimated cost (detailed in the kernel config files)
|
||||
# is maybe 2.3% for both
|
||||
CONFIG_STACKPROTECTOR=y
|
||||
CONFIG_STACKPROTECTOR_STRONG=y
|
||||
14
tools/packaging/kernel/configs/fragments/common/serial.conf
Normal file
14
tools/packaging/kernel/configs/fragments/common/serial.conf
Normal file
@@ -0,0 +1,14 @@
|
||||
# We need some sort of 'serial' for virtio-serial consoles - at the moment.
|
||||
# We might not need all of thse though...
|
||||
# FIXME - https://github.com/kata-containers/packaging/issues/483
|
||||
CONFIG_SERIAL_8250_CONSOLE=y
|
||||
CONFIG_SERIAL_8250_PCI=y
|
||||
CONFIG_SERIAL_8250=y
|
||||
CONFIG_SERIAL_CORE_CONSOLE=y
|
||||
CONFIG_SERIAL_CORE=y
|
||||
CONFIG_SERIAL_EARLYCON=y
|
||||
|
||||
# SERIO may be only for keyboards, mice etc., and not UARTS
|
||||
# We likely don't need
|
||||
#CONFIG_SERIO_RAW=y
|
||||
#CONFIG_SERIO=y
|
||||
29
tools/packaging/kernel/configs/fragments/common/virtio.conf
Normal file
29
tools/packaging/kernel/configs/fragments/common/virtio.conf
Normal file
@@ -0,0 +1,29 @@
|
||||
# We need virtio for 9p and serial and vsock at least
|
||||
|
||||
# To get VIRTIO, we need a bus - ours of choice is PCI. We need to enable
|
||||
# PCI support to get VIRTIO_PCI support
|
||||
CONFIG_PCI=y
|
||||
CONFIG_PCI_MSI=y
|
||||
CONFIG_PCI_MSI_IRQ_DOMAIN=y
|
||||
# To get to the VIRTIO_PCI, we need the VIRTIO_MENU enabled
|
||||
CONFIG_VIRTIO_MENU=y
|
||||
CONFIG_VIRTIO_PCI=y
|
||||
# Without this nested-VM Kata does not work (we have not worked out exactly why)
|
||||
CONFIG_VIRTIO_PCI_LEGACY=y
|
||||
|
||||
# This is used by the s390 arch at least. Leave it on globally.
|
||||
CONFIG_HW_RANDOM=y
|
||||
CONFIG_HW_RANDOM_VIRTIO=y
|
||||
|
||||
# This is required for booting from pmem
|
||||
CONFIG_VIRTIO_PMEM=y
|
||||
|
||||
# FIXME - are we moving away from/choosing between SCSI and BLK support?
|
||||
# https://github.com/kata-containers/packaging/issues/483
|
||||
CONFIG_SCSI=y
|
||||
CONFIG_SCSI_LOWLEVEL=y
|
||||
CONFIG_SCSI_VIRTIO=y
|
||||
CONFIG_VIRTIO_BLK=y
|
||||
CONFIG_TTY=y
|
||||
CONFIG_VIRTIO_CONSOLE=y
|
||||
CONFIG_VIRTIO_NET=y
|
||||
7
tools/packaging/kernel/configs/fragments/gpu/intel.conf
Normal file
7
tools/packaging/kernel/configs/fragments/gpu/intel.conf
Normal file
@@ -0,0 +1,7 @@
|
||||
# The following i915 kernel config options need to be enabled
|
||||
CONFIG_DRM=y
|
||||
CONFIG_DRM_I915=y
|
||||
CONFIG_DRM_I915_USERPTR=y
|
||||
|
||||
# Linux kernel version suffix
|
||||
CONFIG_LOCALVERSION="-intel-gpu"
|
||||
14
tools/packaging/kernel/configs/fragments/gpu/nvidia.conf
Normal file
14
tools/packaging/kernel/configs/fragments/gpu/nvidia.conf
Normal file
@@ -0,0 +1,14 @@
|
||||
# Support mmconfig PCI config space access.
|
||||
# It's used to enable the MMIO access method for PCIe devices.
|
||||
CONFIG_PCI_MMCONFIG=y
|
||||
|
||||
# Support for loading modules.
|
||||
# It is used to support loading GPU drivers.
|
||||
CONFIG_MODULES=y
|
||||
CONFIG_MODULE_UNLOAD=y
|
||||
|
||||
# CRYPTO_FIPS requires this config when loading modules is enabled.
|
||||
CONFIG_MODULE_SIG=y
|
||||
|
||||
# Linux kernel version suffix
|
||||
CONFIG_LOCALVERSION="-nvidia-gpu"
|
||||
8
tools/packaging/kernel/configs/fragments/whitelist.conf
Normal file
8
tools/packaging/kernel/configs/fragments/whitelist.conf
Normal file
@@ -0,0 +1,8 @@
|
||||
# configuration options which may dropped in newer kernels
|
||||
# without generating an error in fragment merging
|
||||
CONFIG_NF_NAT_IPV4
|
||||
CONFIG_NF_NAT_NEEDED
|
||||
CONFIG_NF_NAT_PROTO_DCCP
|
||||
CONFIG_NF_NAT_PROTO_GRE
|
||||
CONFIG_NF_NAT_PROTO_SCTP
|
||||
CONFIG_NF_NAT_PROTO_UDPLITE
|
||||
14
tools/packaging/kernel/configs/fragments/x86_64/acpi.conf
Normal file
14
tools/packaging/kernel/configs/fragments/x86_64/acpi.conf
Normal file
@@ -0,0 +1,14 @@
|
||||
CONFIG_X86_INTEL_PSTATE=y
|
||||
|
||||
# For old smp systems that do not have proper acpi support.
|
||||
# Firecracker needs this to support `vcpu_count`
|
||||
CONFIG_X86_MPPARSE=y
|
||||
|
||||
CONFIG_ACPI_CPU_FREQ_PSS=y
|
||||
CONFIG_ACPI_HOTPLUG_IOAPIC=y
|
||||
CONFIG_ACPI_LEGACY_TABLES_LOOKUP
|
||||
CONFIG_ACPI_LPIT=y
|
||||
CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC=y
|
||||
CONFIG_ACPI_PROCESSOR_CSTATE=y
|
||||
CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT=y
|
||||
CONFIG_HAVE_ACPI_APEI_NMI=y
|
||||
20
tools/packaging/kernel/configs/fragments/x86_64/base.conf
Normal file
20
tools/packaging/kernel/configs/fragments/x86_64/base.conf
Normal file
@@ -0,0 +1,20 @@
|
||||
CONFIG_X86=y
|
||||
CONFIG_X86_CPUID=y
|
||||
CONFIG_X86_MSR=y
|
||||
CONFIG_X86_X2APIC=y
|
||||
CONFIG_X86_VERBOSE_BOOTUP=y
|
||||
|
||||
# Configs around linux guest support and optimizations.
|
||||
CONFIG_HYPERVISOR_GUEST=y
|
||||
CONFIG_KVM_GUEST=y
|
||||
|
||||
# Use the maximum number of CPUs supported by KVM (240)
|
||||
CONFIG_NR_CPUS=240
|
||||
|
||||
# For security
|
||||
CONFIG_LEGACY_VSYSCALL_NONE=y
|
||||
CONFIG_RETPOLINE=y
|
||||
|
||||
# Boot directly into the uncompressed kernel
|
||||
# Reduce memory footprint
|
||||
CONFIG_PVH=y
|
||||
5
tools/packaging/kernel/configs/fragments/x86_64/fs.conf
Normal file
5
tools/packaging/kernel/configs/fragments/x86_64/fs.conf
Normal file
@@ -0,0 +1,5 @@
|
||||
# x86 specific filesystem items
|
||||
|
||||
# Yes, we do support unaligned word accesses
|
||||
CONFIG_DCACHE_WORD_ACCESS=y
|
||||
|
||||
@@ -0,0 +1,5 @@
|
||||
# Since we disable pci shpc hotplug for arm64,
|
||||
# See https://github.com/kata-containers/packaging/pull/498
|
||||
# for detailed reasons.
|
||||
# we move this config into x86_64-specific.
|
||||
CONFIG_HOTPLUG_PCI_SHPC=y
|
||||
4
tools/packaging/kernel/configs/fragments/x86_64/mmu.conf
Normal file
4
tools/packaging/kernel/configs/fragments/x86_64/mmu.conf
Normal file
@@ -0,0 +1,4 @@
|
||||
# x86 specific mmu/memory related items
|
||||
|
||||
# Remove the kernel mapping from the user space - security improvement.
|
||||
CONFIG_PAGE_TABLE_ISOLATION=y
|
||||
@@ -0,0 +1,7 @@
|
||||
# Items needed to run the NEMU cut of QEMU
|
||||
# NEMU uses an EFI bios/boot, so requires a few extra bits
|
||||
|
||||
CONFIG_MSDOS_PARTITION=y
|
||||
CONFIG_EFI=y
|
||||
CONFIG_EFI_ESRT=y
|
||||
CONFIG_EFI_RUNTIME_WRAPPERS=y
|
||||
3152
tools/packaging/kernel/configs/powerpc_kata_kvm_4.14.x
Normal file
3152
tools/packaging/kernel/configs/powerpc_kata_kvm_4.14.x
Normal file
File diff suppressed because it is too large
Load Diff
3182
tools/packaging/kernel/configs/powerpc_kata_kvm_4.19.x
Normal file
3182
tools/packaging/kernel/configs/powerpc_kata_kvm_4.19.x
Normal file
File diff suppressed because it is too large
Load Diff
3182
tools/packaging/kernel/configs/powerpc_kata_kvm_5.4.x
Normal file
3182
tools/packaging/kernel/configs/powerpc_kata_kvm_5.4.x
Normal file
File diff suppressed because it is too large
Load Diff
2175
tools/packaging/kernel/configs/s390_kata_kvm_4.19.x
Normal file
2175
tools/packaging/kernel/configs/s390_kata_kvm_4.19.x
Normal file
File diff suppressed because it is too large
Load Diff
2838
tools/packaging/kernel/configs/s390_kata_kvm_5.4.x
Normal file
2838
tools/packaging/kernel/configs/s390_kata_kvm_5.4.x
Normal file
File diff suppressed because it is too large
Load Diff
3131
tools/packaging/kernel/configs/x86_64_kata_kvm_4.14.x
Normal file
3131
tools/packaging/kernel/configs/x86_64_kata_kvm_4.14.x
Normal file
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,7 @@
|
||||
#
|
||||
# This file contains config options which is removed/modified in kernel 4.14 but
|
||||
# necessary for older kernels, if you're using a old kernel and failed to start
|
||||
# kata containers, try to add these options and hope it can help! Enjoy it!
|
||||
#
|
||||
CONFIG_DEVPTS_MULTIPLE_INSTANCES=y
|
||||
|
||||
1
tools/packaging/kernel/kata_config_version
Normal file
1
tools/packaging/kernel/kata_config_version
Normal file
@@ -0,0 +1 @@
|
||||
80
|
||||
@@ -0,0 +1,457 @@
|
||||
From bee1ae5587a7427dbb9e9e313f6d0a43a9e0ec2e Mon Sep 17 00:00:00 2001
|
||||
From: Jianyong Wu <jianyong.wu@arm.com>
|
||||
Date: Mon, 30 Sep 2019 09:26:22 +0800
|
||||
Subject: [PATCH] 4.19: enable ptp_kvm for arm64 in kata
|
||||
|
||||
---
|
||||
drivers/clocksource/arm_arch_timer.c | 25 ++++++
|
||||
drivers/ptp/Kconfig | 2 +-
|
||||
drivers/ptp/Makefile | 1 +
|
||||
drivers/ptp/ptp_kvm_arm64.c | 59 ++++++++++++++
|
||||
drivers/ptp/{ptp_kvm.c => ptp_kvm_common.c} | 89 +++++----------------
|
||||
drivers/ptp/ptp_kvm_x86.c | 87 ++++++++++++++++++++
|
||||
include/asm-generic/ptp_kvm.h | 12 +++
|
||||
include/linux/arm-smccc.h | 5 ++
|
||||
virt/kvm/arm/psci.c | 12 +++
|
||||
9 files changed, 221 insertions(+), 71 deletions(-)
|
||||
create mode 100644 drivers/ptp/ptp_kvm_arm64.c
|
||||
rename drivers/ptp/{ptp_kvm.c => ptp_kvm_common.c} (56%)
|
||||
create mode 100644 drivers/ptp/ptp_kvm_x86.c
|
||||
create mode 100644 include/asm-generic/ptp_kvm.h
|
||||
|
||||
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
|
||||
index d8c7f5750cdb..84ba8f9e57be 100644
|
||||
--- a/drivers/clocksource/arm_arch_timer.c
|
||||
+++ b/drivers/clocksource/arm_arch_timer.c
|
||||
@@ -1571,3 +1571,28 @@ static int __init arch_timer_acpi_init(struct acpi_table_header *table)
|
||||
}
|
||||
TIMER_ACPI_DECLARE(arch_timer, ACPI_SIG_GTDT, arch_timer_acpi_init);
|
||||
#endif
|
||||
+
|
||||
+#if IS_ENABLED(CONFIG_PTP_1588_CLOCK_KVM)
|
||||
+#include <linux/arm-smccc.h>
|
||||
+int kvm_arch_ptp_get_clock_fn(long *cycle, struct timespec64 *ts,
|
||||
+ struct clocksource **cs)
|
||||
+{
|
||||
+ struct arm_smccc_res hvc_res;
|
||||
+ ktime_t ktime_overall;
|
||||
+ struct arm_smccc_quirk hvc_quirk;
|
||||
+
|
||||
+ __arm_smccc_hvc(ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID, 0, 0, 0, 0, 0, 0, 0, &hvc_res, &hvc_quirk);
|
||||
+
|
||||
+ if ((long)(hvc_res.a0) < 0)
|
||||
+ return -EOPNOTSUPP;
|
||||
+
|
||||
+ ts->tv_sec = hvc_res.a0;
|
||||
+ ts->tv_nsec = hvc_res.a1;
|
||||
+ *cycle = hvc_res.a2 << 32 | hvc_res.a3;
|
||||
+ *cs = &clocksource_counter;
|
||||
+
|
||||
+ return 0;
|
||||
+}
|
||||
+EXPORT_SYMBOL_GPL(kvm_arch_ptp_get_clock_fn);
|
||||
+#endif
|
||||
+
|
||||
diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
|
||||
index d137c480db46..318b3f5df1ea 100644
|
||||
--- a/drivers/ptp/Kconfig
|
||||
+++ b/drivers/ptp/Kconfig
|
||||
@@ -109,7 +109,7 @@ config PTP_1588_CLOCK_PCH
|
||||
config PTP_1588_CLOCK_KVM
|
||||
tristate "KVM virtual PTP clock"
|
||||
depends on PTP_1588_CLOCK
|
||||
- depends on KVM_GUEST && X86
|
||||
+ depends on KVM_GUEST && X86 || ARM64
|
||||
default y
|
||||
help
|
||||
This driver adds support for using kvm infrastructure as a PTP
|
||||
diff --git a/drivers/ptp/Makefile b/drivers/ptp/Makefile
|
||||
index 19efa9cfa950..1bf4940a88a6 100644
|
||||
--- a/drivers/ptp/Makefile
|
||||
+++ b/drivers/ptp/Makefile
|
||||
@@ -4,6 +4,7 @@
|
||||
#
|
||||
|
||||
ptp-y := ptp_clock.o ptp_chardev.o ptp_sysfs.o
|
||||
+ptp_kvm-y := ptp_kvm_common.o ptp_kvm_$(ARCH).o
|
||||
obj-$(CONFIG_PTP_1588_CLOCK) += ptp.o
|
||||
obj-$(CONFIG_PTP_1588_CLOCK_DTE) += ptp_dte.o
|
||||
obj-$(CONFIG_PTP_1588_CLOCK_IXP46X) += ptp_ixp46x.o
|
||||
diff --git a/drivers/ptp/ptp_kvm_arm64.c b/drivers/ptp/ptp_kvm_arm64.c
|
||||
new file mode 100644
|
||||
index 000000000000..fcd83324c7e1
|
||||
--- /dev/null
|
||||
+++ b/drivers/ptp/ptp_kvm_arm64.c
|
||||
@@ -0,0 +1,59 @@
|
||||
+// SPDX-License-Identifier: GPL-2.0-only
|
||||
+/*
|
||||
+ * Virtual PTP 1588 clock for use with KVM guests
|
||||
+ * Copyright (C) 2019 ARM Ltd.
|
||||
+ * All Rights Reserved
|
||||
+ */
|
||||
+
|
||||
+#include <linux/kernel.h>
|
||||
+#include <linux/err.h>
|
||||
+#include <asm/hypervisor.h>
|
||||
+#include <linux/module.h>
|
||||
+#include <linux/psci.h>
|
||||
+#include <linux/arm-smccc.h>
|
||||
+#include <linux/timecounter.h>
|
||||
+#include <linux/sched/clock.h>
|
||||
+#include <asm/arch_timer.h>
|
||||
+
|
||||
+
|
||||
+void arm_smccc_1_1_invoke(u32 id, struct arm_smccc_res *res)
|
||||
+{
|
||||
+ struct arm_smccc_quirk hvc_quirk;
|
||||
+
|
||||
+ __arm_smccc_hvc(id, 0, 0, 0, 0, 0, 0, 0, res, &hvc_quirk);
|
||||
+}
|
||||
+
|
||||
+int kvm_arch_ptp_init(void)
|
||||
+{
|
||||
+ struct arm_smccc_res hvc_res;
|
||||
+
|
||||
+ arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID,
|
||||
+ &hvc_res);
|
||||
+ if ((long)(hvc_res.a0) < 0)
|
||||
+ return -EOPNOTSUPP;
|
||||
+
|
||||
+ return 0;
|
||||
+}
|
||||
+
|
||||
+int kvm_arch_ptp_get_clock_generic(struct timespec64 *ts,
|
||||
+ struct arm_smccc_res *hvc_res)
|
||||
+{
|
||||
+ arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID,
|
||||
+ hvc_res);
|
||||
+ if ((long)(hvc_res->a0) < 0)
|
||||
+ return -EOPNOTSUPP;
|
||||
+
|
||||
+ ts->tv_sec = hvc_res->a0;
|
||||
+ ts->tv_nsec = hvc_res->a1;
|
||||
+
|
||||
+ return 0;
|
||||
+}
|
||||
+
|
||||
+int kvm_arch_ptp_get_clock(struct timespec64 *ts)
|
||||
+{
|
||||
+ struct arm_smccc_res hvc_res;
|
||||
+
|
||||
+ kvm_arch_ptp_get_clock_generic(ts, &hvc_res);
|
||||
+
|
||||
+ return 0;
|
||||
+}
|
||||
diff --git a/drivers/ptp/ptp_kvm.c b/drivers/ptp/ptp_kvm_common.c
|
||||
similarity index 56%
|
||||
rename from drivers/ptp/ptp_kvm.c
|
||||
rename to drivers/ptp/ptp_kvm_common.c
|
||||
index c67dd11e08b1..c0b445fa6144 100644
|
||||
--- a/drivers/ptp/ptp_kvm.c
|
||||
+++ b/drivers/ptp/ptp_kvm_common.c
|
||||
@@ -1,29 +1,19 @@
|
||||
+// SPDX-License-Identifier: GPL-2.0-or-later
|
||||
/*
|
||||
* Virtual PTP 1588 clock for use with KVM guests
|
||||
*
|
||||
* Copyright (C) 2017 Red Hat Inc.
|
||||
- *
|
||||
- * This program is free software; you can redistribute it and/or modify
|
||||
- * it under the terms of the GNU General Public License as published by
|
||||
- * the Free Software Foundation; either version 2 of the License, or
|
||||
- * (at your option) any later version.
|
||||
- *
|
||||
- * This program is distributed in the hope that it will be useful,
|
||||
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
- * GNU General Public License for more details.
|
||||
- *
|
||||
*/
|
||||
#include <linux/device.h>
|
||||
#include <linux/err.h>
|
||||
#include <linux/init.h>
|
||||
#include <linux/kernel.h>
|
||||
+#include <linux/slab.h>
|
||||
#include <linux/module.h>
|
||||
#include <uapi/linux/kvm_para.h>
|
||||
#include <asm/kvm_para.h>
|
||||
-#include <asm/pvclock.h>
|
||||
-#include <asm/kvmclock.h>
|
||||
#include <uapi/asm/kvm_para.h>
|
||||
+#include <asm-generic/ptp_kvm.h>
|
||||
|
||||
#include <linux/ptp_clock_kernel.h>
|
||||
|
||||
@@ -34,56 +24,29 @@ struct kvm_ptp_clock {
|
||||
|
||||
DEFINE_SPINLOCK(kvm_ptp_lock);
|
||||
|
||||
-static struct pvclock_vsyscall_time_info *hv_clock;
|
||||
-
|
||||
-static struct kvm_clock_pairing clock_pair;
|
||||
-static phys_addr_t clock_pair_gpa;
|
||||
-
|
||||
static int ptp_kvm_get_time_fn(ktime_t *device_time,
|
||||
struct system_counterval_t *system_counter,
|
||||
void *ctx)
|
||||
{
|
||||
- unsigned long ret;
|
||||
+ unsigned long ret, cycle;
|
||||
struct timespec64 tspec;
|
||||
- unsigned version;
|
||||
- int cpu;
|
||||
- struct pvclock_vcpu_time_info *src;
|
||||
+ struct clocksource *cs;
|
||||
|
||||
spin_lock(&kvm_ptp_lock);
|
||||
|
||||
preempt_disable_notrace();
|
||||
- cpu = smp_processor_id();
|
||||
- src = &hv_clock[cpu].pvti;
|
||||
-
|
||||
- do {
|
||||
- /*
|
||||
- * We are using a TSC value read in the hosts
|
||||
- * kvm_hc_clock_pairing handling.
|
||||
- * So any changes to tsc_to_system_mul
|
||||
- * and tsc_shift or any other pvclock
|
||||
- * data invalidate that measurement.
|
||||
- */
|
||||
- version = pvclock_read_begin(src);
|
||||
-
|
||||
- ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
|
||||
- clock_pair_gpa,
|
||||
- KVM_CLOCK_PAIRING_WALLCLOCK);
|
||||
- if (ret != 0) {
|
||||
- pr_err_ratelimited("clock pairing hypercall ret %lu\n", ret);
|
||||
- spin_unlock(&kvm_ptp_lock);
|
||||
- preempt_enable_notrace();
|
||||
- return -EOPNOTSUPP;
|
||||
- }
|
||||
-
|
||||
- tspec.tv_sec = clock_pair.sec;
|
||||
- tspec.tv_nsec = clock_pair.nsec;
|
||||
- ret = __pvclock_read_cycles(src, clock_pair.tsc);
|
||||
- } while (pvclock_read_retry(src, version));
|
||||
+ ret = kvm_arch_ptp_get_clock_fn(&cycle, &tspec, &cs);
|
||||
+ if (ret != 0) {
|
||||
+ pr_err_ratelimited("clock pairing hypercall ret %lu\n", ret);
|
||||
+ spin_unlock(&kvm_ptp_lock);
|
||||
+ preempt_enable_notrace();
|
||||
+ return -EOPNOTSUPP;
|
||||
+ }
|
||||
|
||||
preempt_enable_notrace();
|
||||
|
||||
- system_counter->cycles = ret;
|
||||
- system_counter->cs = &kvm_clock;
|
||||
+ system_counter->cycles = cycle;
|
||||
+ system_counter->cs = cs;
|
||||
|
||||
*device_time = timespec64_to_ktime(tspec);
|
||||
|
||||
@@ -126,17 +89,13 @@ static int ptp_kvm_gettime(struct ptp_clock_info *ptp, struct timespec64 *ts)
|
||||
|
||||
spin_lock(&kvm_ptp_lock);
|
||||
|
||||
- ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
|
||||
- clock_pair_gpa,
|
||||
- KVM_CLOCK_PAIRING_WALLCLOCK);
|
||||
+ ret = kvm_arch_ptp_get_clock(&tspec);
|
||||
if (ret != 0) {
|
||||
pr_err_ratelimited("clock offset hypercall ret %lu\n", ret);
|
||||
spin_unlock(&kvm_ptp_lock);
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
|
||||
- tspec.tv_sec = clock_pair.sec;
|
||||
- tspec.tv_nsec = clock_pair.nsec;
|
||||
spin_unlock(&kvm_ptp_lock);
|
||||
|
||||
memcpy(ts, &tspec, sizeof(struct timespec64));
|
||||
@@ -176,21 +135,11 @@ static void __exit ptp_kvm_exit(void)
|
||||
|
||||
static int __init ptp_kvm_init(void)
|
||||
{
|
||||
- long ret;
|
||||
-
|
||||
- if (!kvm_para_available())
|
||||
- return -ENODEV;
|
||||
+ int ret;
|
||||
|
||||
- clock_pair_gpa = slow_virt_to_phys(&clock_pair);
|
||||
- hv_clock = pvclock_get_pvti_cpu0_va();
|
||||
-
|
||||
- if (!hv_clock)
|
||||
- return -ENODEV;
|
||||
-
|
||||
- ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING, clock_pair_gpa,
|
||||
- KVM_CLOCK_PAIRING_WALLCLOCK);
|
||||
- if (ret == -KVM_ENOSYS || ret == -KVM_EOPNOTSUPP)
|
||||
- return -ENODEV;
|
||||
+ ret = kvm_arch_ptp_init();
|
||||
+ if (ret)
|
||||
+ return -EOPNOTSUPP;
|
||||
|
||||
kvm_ptp_clock.caps = ptp_kvm_caps;
|
||||
|
||||
diff --git a/drivers/ptp/ptp_kvm_x86.c b/drivers/ptp/ptp_kvm_x86.c
|
||||
new file mode 100644
|
||||
index 000000000000..a52cf1c2990c
|
||||
--- /dev/null
|
||||
+++ b/drivers/ptp/ptp_kvm_x86.c
|
||||
@@ -0,0 +1,87 @@
|
||||
+// SPDX-License-Identifier: GPL-2.0-or-later
|
||||
+/*
|
||||
+ * Virtual PTP 1588 clock for use with KVM guests
|
||||
+ *
|
||||
+ * Copyright (C) 2017 Red Hat Inc.
|
||||
+ */
|
||||
+
|
||||
+#include <asm/pvclock.h>
|
||||
+#include <asm/kvmclock.h>
|
||||
+#include <linux/module.h>
|
||||
+#include <uapi/asm/kvm_para.h>
|
||||
+#include <uapi/linux/kvm_para.h>
|
||||
+#include <linux/ptp_clock_kernel.h>
|
||||
+
|
||||
+phys_addr_t clock_pair_gpa;
|
||||
+struct kvm_clock_pairing clock_pair;
|
||||
+struct pvclock_vsyscall_time_info *hv_clock;
|
||||
+
|
||||
+int kvm_arch_ptp_init(void)
|
||||
+{
|
||||
+ int ret;
|
||||
+
|
||||
+ if (!kvm_para_available())
|
||||
+ return -ENODEV;
|
||||
+
|
||||
+ clock_pair_gpa = slow_virt_to_phys(&clock_pair);
|
||||
+ hv_clock = pvclock_get_pvti_cpu0_va();
|
||||
+ if (!hv_clock)
|
||||
+ return -ENODEV;
|
||||
+
|
||||
+ ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING, clock_pair_gpa,
|
||||
+ KVM_CLOCK_PAIRING_WALLCLOCK);
|
||||
+ if (ret == -KVM_ENOSYS || ret == -KVM_EOPNOTSUPP)
|
||||
+ return -ENODEV;
|
||||
+
|
||||
+ return 0;
|
||||
+}
|
||||
+
|
||||
+int kvm_arch_ptp_get_clock(struct timespec64 *ts)
|
||||
+{
|
||||
+ long ret;
|
||||
+
|
||||
+ ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
|
||||
+ clock_pair_gpa,
|
||||
+ KVM_CLOCK_PAIRING_WALLCLOCK);
|
||||
+ if (ret != 0)
|
||||
+ return -EOPNOTSUPP;
|
||||
+
|
||||
+ ts->tv_sec = clock_pair.sec;
|
||||
+ ts->tv_nsec = clock_pair.nsec;
|
||||
+
|
||||
+ return 0;
|
||||
+}
|
||||
+
|
||||
+int kvm_arch_ptp_get_clock_fn(unsigned long *cycle, struct timespec64 *tspec,
|
||||
+ struct clocksource **cs)
|
||||
+{
|
||||
+ unsigned long ret;
|
||||
+ unsigned int version;
|
||||
+ int cpu;
|
||||
+ struct pvclock_vcpu_time_info *src;
|
||||
+
|
||||
+ cpu = smp_processor_id();
|
||||
+ src = &hv_clock[cpu].pvti;
|
||||
+
|
||||
+ do {
|
||||
+ /*
|
||||
+ * We are using a TSC value read in the hosts
|
||||
+ * kvm_hc_clock_pairing handling.
|
||||
+ * So any changes to tsc_to_system_mul
|
||||
+ * and tsc_shift or any other pvclock
|
||||
+ * data invalidate that measurement.
|
||||
+ */
|
||||
+ version = pvclock_read_begin(src);
|
||||
+
|
||||
+ ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
|
||||
+ clock_pair_gpa,
|
||||
+ KVM_CLOCK_PAIRING_WALLCLOCK);
|
||||
+ tspec->tv_sec = clock_pair.sec;
|
||||
+ tspec->tv_nsec = clock_pair.nsec;
|
||||
+ *cycle = __pvclock_read_cycles(src, clock_pair.tsc);
|
||||
+ } while (pvclock_read_retry(src, version));
|
||||
+
|
||||
+ *cs = &kvm_clock;
|
||||
+
|
||||
+ return 0;
|
||||
+}
|
||||
diff --git a/include/asm-generic/ptp_kvm.h b/include/asm-generic/ptp_kvm.h
|
||||
new file mode 100644
|
||||
index 000000000000..883eea494a80
|
||||
--- /dev/null
|
||||
+++ b/include/asm-generic/ptp_kvm.h
|
||||
@@ -0,0 +1,12 @@
|
||||
+// SPDX-License-Identifier: GPL-2.0-only
|
||||
+/*
|
||||
+ * linux/drivers/clocksource/arm_arch_timer.c
|
||||
+ *
|
||||
+ * Copyright (C) 2019 ARM Ltd.
|
||||
+ * All Rights Reserved
|
||||
+ */
|
||||
+
|
||||
+int kvm_arch_ptp_init(void);
|
||||
+int kvm_arch_ptp_get_clock(struct timespec64 *ts);
|
||||
+int kvm_arch_ptp_get_clock_fn(unsigned long *cycle,
|
||||
+ struct timespec64 *tspec, void *cs);
|
||||
diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
|
||||
index 18863d56273c..10e99c82d098 100644
|
||||
--- a/include/linux/arm-smccc.h
|
||||
+++ b/include/linux/arm-smccc.h
|
||||
@@ -75,6 +75,11 @@
|
||||
ARM_SMCCC_SMC_32, \
|
||||
0, 1)
|
||||
|
||||
+#define ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID \
|
||||
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
|
||||
+ ARM_SMCCC_SMC_32, \
|
||||
+ 0, 2)
|
||||
+
|
||||
#define ARM_SMCCC_ARCH_WORKAROUND_1 \
|
||||
ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
|
||||
ARM_SMCCC_SMC_32, \
|
||||
diff --git a/virt/kvm/arm/psci.c b/virt/kvm/arm/psci.c
|
||||
index 9b73d3ad918a..9b9999bdeab7 100644
|
||||
--- a/virt/kvm/arm/psci.c
|
||||
+++ b/virt/kvm/arm/psci.c
|
||||
@@ -407,6 +407,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
|
||||
u32 func_id = smccc_get_function(vcpu);
|
||||
u32 val = SMCCC_RET_NOT_SUPPORTED;
|
||||
u32 feature;
|
||||
+ struct timespec64 ts;
|
||||
+ u64 cycles, cycle_high, cycle_low;
|
||||
+ struct system_time_snapshot systime_snapshot;
|
||||
|
||||
switch (func_id) {
|
||||
case ARM_SMCCC_VERSION_FUNC_ID:
|
||||
@@ -435,6 +438,15 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
|
||||
break;
|
||||
}
|
||||
break;
|
||||
+ case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
|
||||
+ ktime_get_real_ts64(&ts);
|
||||
+ ktime_get_snapshot(&systime_snapshot);
|
||||
+ cycles = systime_snapshot.cycles - vcpu_vtimer(vcpu)->cntvoff;
|
||||
+ cycle_high = cycles >> 32;
|
||||
+ cycle_low = cycles << 32 >> 32;
|
||||
+
|
||||
+ smccc_set_retval(vcpu, ts.tv_sec, ts.tv_nsec, cycle_high, cycle_low);
|
||||
+ return 1;
|
||||
default:
|
||||
return kvm_psci_call(vcpu);
|
||||
}
|
||||
--
|
||||
2.17.1
|
||||
|
||||
@@ -0,0 +1,98 @@
|
||||
From 33ffc9a93a1d9e72594d5eb3e4fc583a1a2911d1 Mon Sep 17 00:00:00 2001
|
||||
From: Jianyong Wu <jianyong.wu@arm.com>
|
||||
Date: Tue, 19 Feb 2019 01:15:32 -0500
|
||||
Subject: [PATCH 2/5] Enable memory-hotplug using probe for arm64
|
||||
|
||||
---
|
||||
arch/arm64/Kconfig | 7 +++++++
|
||||
arch/arm64/mm/init.c | 9 ++++++++-
|
||||
arch/arm64/mm/mmu.c | 17 +++++++++++++++++
|
||||
arch/arm64/mm/numa.c | 10 ++++++++++
|
||||
4 files changed, 42 insertions(+), 1 deletion(-)
|
||||
|
||||
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
|
||||
index 1b1a0e95c751..881bea194d53 100644
|
||||
--- a/arch/arm64/Kconfig
|
||||
+++ b/arch/arm64/Kconfig
|
||||
@@ -740,6 +740,13 @@ config NUMA
|
||||
local memory of the CPU and add some more
|
||||
NUMA awareness to the kernel.
|
||||
|
||||
+config ARCH_MEMORY_PROBE
|
||||
+ def_bool y
|
||||
+ depends on MEMORY_HOTPLUG
|
||||
+
|
||||
+config ARCH_ENABLE_MEMORY_HOTPLUG
|
||||
+ def_bool y
|
||||
+
|
||||
config NODES_SHIFT
|
||||
int "Maximum NUMA Nodes (as a power of 2)"
|
||||
range 1 10
|
||||
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
|
||||
index 787e27964ab9..e66e44b7bafe 100644
|
||||
--- a/arch/arm64/mm/init.c
|
||||
+++ b/arch/arm64/mm/init.c
|
||||
@@ -288,9 +288,16 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
|
||||
int pfn_valid(unsigned long pfn)
|
||||
{
|
||||
phys_addr_t addr = pfn << PAGE_SHIFT;
|
||||
-
|
||||
if ((addr >> PAGE_SHIFT) != pfn)
|
||||
return 0;
|
||||
+
|
||||
+#ifdef CONFIG_SPARSEMEM
|
||||
+ if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
|
||||
+ return 0;
|
||||
+
|
||||
+ if (!valid_section(__nr_to_section(pfn_to_section_nr(pfn))))
|
||||
+ return 0;
|
||||
+#endif
|
||||
return memblock_is_map_memory(addr);
|
||||
}
|
||||
EXPORT_SYMBOL(pfn_valid);
|
||||
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
|
||||
index 8080c9f489c3..c393b37597af 100644
|
||||
--- a/arch/arm64/mm/mmu.c
|
||||
+++ b/arch/arm64/mm/mmu.c
|
||||
@@ -1028,3 +1028,20 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr)
|
||||
pmd_free(NULL, table);
|
||||
return 1;
|
||||
}
|
||||
+
|
||||
+#ifdef CONFIG_MEMORY_HOTPLUG
|
||||
+int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
|
||||
+ bool want_memblock)
|
||||
+{
|
||||
+ int flags = 0;
|
||||
+
|
||||
+ if (debug_pagealloc_enabled())
|
||||
+ flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
|
||||
+
|
||||
+ __create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
|
||||
+ size, PAGE_KERNEL, pgd_pgtable_alloc, flags);
|
||||
+
|
||||
+ return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
|
||||
+ altmap, want_memblock);
|
||||
+}
|
||||
+#endif
|
||||
diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
|
||||
index 146c04ceaa51..d276bd4d38b5 100644
|
||||
--- a/arch/arm64/mm/numa.c
|
||||
+++ b/arch/arm64/mm/numa.c
|
||||
@@ -464,3 +464,13 @@ void __init arm64_numa_init(void)
|
||||
|
||||
numa_init(dummy_numa_init);
|
||||
}
|
||||
+
|
||||
+/*
|
||||
+ * We hope that we will be hotplugging memory on nodes we already know about,
|
||||
+ * such that acpi_get_node() succeeds and we never fall back to this...
|
||||
+ */
|
||||
+int memory_add_physaddr_to_nid(u64 addr)
|
||||
+{
|
||||
+ pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr);
|
||||
+ return 0;
|
||||
+}
|
||||
--
|
||||
2.20.1
|
||||
|
||||
@@ -0,0 +1,47 @@
|
||||
From cab495651e8f71c39e87a08abbe051916110b3ca Mon Sep 17 00:00:00 2001
|
||||
From: Julio Montes <julio.montes@intel.com>
|
||||
Date: Mon, 18 Sep 2017 11:46:59 -0500
|
||||
Subject: [PATCH 3/5] NO-UPSTREAM: 9P: always use cached inode to fill in
|
||||
v9fs_vfs_getattr
|
||||
|
||||
So that if in cache=none mode, we don't have to lookup server that
|
||||
might not support open-unlink-fstat operation.
|
||||
|
||||
fixes https://github.com/01org/cc-oci-runtime/issues/47
|
||||
fixes https://github.com/01org/cc-oci-runtime/issues/1062
|
||||
|
||||
Signed-off-by: Peng Tao <bergwolf@gmail.com>
|
||||
---
|
||||
fs/9p/vfs_inode.c | 2 +-
|
||||
fs/9p/vfs_inode_dotl.c | 2 +-
|
||||
2 files changed, 2 insertions(+), 2 deletions(-)
|
||||
|
||||
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
|
||||
index 85ff859d3af5..efdc2a8f37bb 100644
|
||||
--- a/fs/9p/vfs_inode.c
|
||||
+++ b/fs/9p/vfs_inode.c
|
||||
@@ -1080,7 +1080,7 @@ v9fs_vfs_getattr(const struct path *path, struct kstat *stat,
|
||||
|
||||
p9_debug(P9_DEBUG_VFS, "dentry: %p\n", dentry);
|
||||
v9ses = v9fs_dentry2v9ses(dentry);
|
||||
- if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) {
|
||||
+ if (!d_really_is_negative(dentry) || v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) {
|
||||
generic_fillattr(d_inode(dentry), stat);
|
||||
return 0;
|
||||
}
|
||||
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
|
||||
index 4823e1c46999..daa5e6a41864 100644
|
||||
--- a/fs/9p/vfs_inode_dotl.c
|
||||
+++ b/fs/9p/vfs_inode_dotl.c
|
||||
@@ -480,7 +480,7 @@ v9fs_vfs_getattr_dotl(const struct path *path, struct kstat *stat,
|
||||
|
||||
p9_debug(P9_DEBUG_VFS, "dentry: %p\n", dentry);
|
||||
v9ses = v9fs_dentry2v9ses(dentry);
|
||||
- if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) {
|
||||
+ if (!d_really_is_negative(dentry) || v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) {
|
||||
generic_fillattr(d_inode(dentry), stat);
|
||||
return 0;
|
||||
}
|
||||
--
|
||||
2.20.1
|
||||
|
||||
@@ -0,0 +1,29 @@
|
||||
From d78297bf9d8e41711bddc6003f460e815340a214 Mon Sep 17 00:00:00 2001
|
||||
From: Arjan van de Ven <arjan@linux.intel.com>
|
||||
Date: Fri, 10 Aug 2018 13:22:08 +0000
|
||||
Subject: [PATCH 4/5] Compile in evged always
|
||||
|
||||
We need evged for NEMU (and in general for hw reduced)
|
||||
|
||||
The config option cannot be set normally since it breaks all
|
||||
regular systems, and hardware reduced is really a runtime choice.
|
||||
---
|
||||
drivers/acpi/Makefile | 2 +-
|
||||
1 file changed, 1 insertion(+), 1 deletion(-)
|
||||
|
||||
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
|
||||
index 6d59aa109a91..97f2fbbd5014 100644
|
||||
--- a/drivers/acpi/Makefile
|
||||
+++ b/drivers/acpi/Makefile
|
||||
@@ -47,7 +47,7 @@ acpi-y += acpi_pnp.o
|
||||
acpi-$(CONFIG_ARM_AMBA) += acpi_amba.o
|
||||
acpi-y += power.o
|
||||
acpi-y += event.o
|
||||
-acpi-$(CONFIG_ACPI_REDUCED_HARDWARE_ONLY) += evged.o
|
||||
+acpi-y += evged.o
|
||||
acpi-y += sysfs.o
|
||||
acpi-y += property.o
|
||||
acpi-$(CONFIG_X86) += acpi_cmos_rtc.o
|
||||
--
|
||||
2.20.1
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,49 @@
|
||||
From 267ca21784bb307babbbb2f5a4a111da4da4c015 Mon Sep 17 00:00:00 2001
|
||||
From: Sebastien Boeuf <sebastien.boeuf@intel.com>
|
||||
Date: Thu, 13 Feb 2020 08:50:38 +0100
|
||||
Subject: [PATCH] net: virtio_vsock: Fix race condition between bind and listen
|
||||
|
||||
Whenever the vsock backend on the host sends a packet through the RX
|
||||
queue, it expects an answer on the TX queue. Unfortunately, there is one
|
||||
case where the host side will hang waiting for the answer and will
|
||||
effectively never recover.
|
||||
|
||||
This issue happens when the guest side starts binding to the socket,
|
||||
which insert a new bound socket into the list of already bound sockets.
|
||||
At this time, we expect the guest to also start listening, which will
|
||||
trigger the sk_state to move from TCP_CLOSE to TCP_LISTEN. The problem
|
||||
occurs if the host side queued a RX packet and triggered an interrupt
|
||||
right between the end of the binding process and the beginning of the
|
||||
listening process. In this specific case, the function processing the
|
||||
packet virtio_transport_recv_pkt() will find a bound socket, which means
|
||||
it will hit the switch statement checking for the sk_state, but the
|
||||
state won't be changed into TCP_LISTEN yet, which leads the code to pick
|
||||
the default statement. This default statement will only free the buffer,
|
||||
while it should also respond to the host side, by sending a packet on
|
||||
its TX queue.
|
||||
|
||||
In order to simply fix this unfortunate chain of events, it is important
|
||||
that in case the default statement is entered, and because at this stage
|
||||
we know the host side is waiting for an answer, we must send back a
|
||||
packet containing the operation VIRTIO_VSOCK_OP_RST.
|
||||
|
||||
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
|
||||
---
|
||||
net/vmw_vsock/virtio_transport_common.c | 1 +
|
||||
1 file changed, 1 insertion(+)
|
||||
|
||||
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
|
||||
index 2a8651aa90c8..7d83e2c80b15 100644
|
||||
--- a/net/vmw_vsock/virtio_transport_common.c
|
||||
+++ b/net/vmw_vsock/virtio_transport_common.c
|
||||
@@ -1051,6 +1051,7 @@ void virtio_transport_recv_pkt(struct virtio_vsock_pkt *pkt)
|
||||
virtio_transport_free_pkt(pkt);
|
||||
break;
|
||||
default:
|
||||
+ (void)virtio_transport_reset_no_sock(pkt);
|
||||
virtio_transport_free_pkt(pkt);
|
||||
break;
|
||||
}
|
||||
--
|
||||
2.20.1
|
||||
|
||||
@@ -0,0 +1,47 @@
|
||||
From cab495651e8f71c39e87a08abbe051916110b3ca Mon Sep 17 00:00:00 2001
|
||||
From: Julio Montes <julio.montes@intel.com>
|
||||
Date: Mon, 18 Sep 2017 11:46:59 -0500
|
||||
Subject: [PATCH 3/5] NO-UPSTREAM: 9P: always use cached inode to fill in
|
||||
v9fs_vfs_getattr
|
||||
|
||||
So that if in cache=none mode, we don't have to lookup server that
|
||||
might not support open-unlink-fstat operation.
|
||||
|
||||
fixes https://github.com/01org/cc-oci-runtime/issues/47
|
||||
fixes https://github.com/01org/cc-oci-runtime/issues/1062
|
||||
|
||||
Signed-off-by: Peng Tao <bergwolf@gmail.com>
|
||||
---
|
||||
fs/9p/vfs_inode.c | 2 +-
|
||||
fs/9p/vfs_inode_dotl.c | 2 +-
|
||||
2 files changed, 2 insertions(+), 2 deletions(-)
|
||||
|
||||
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
|
||||
index 85ff859d3af5..efdc2a8f37bb 100644
|
||||
--- a/fs/9p/vfs_inode.c
|
||||
+++ b/fs/9p/vfs_inode.c
|
||||
@@ -1080,7 +1080,7 @@ v9fs_vfs_getattr(const struct path *path, struct kstat *stat,
|
||||
|
||||
p9_debug(P9_DEBUG_VFS, "dentry: %p\n", dentry);
|
||||
v9ses = v9fs_dentry2v9ses(dentry);
|
||||
- if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) {
|
||||
+ if (!d_really_is_negative(dentry) || v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) {
|
||||
generic_fillattr(d_inode(dentry), stat);
|
||||
return 0;
|
||||
}
|
||||
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
|
||||
index 4823e1c46999..daa5e6a41864 100644
|
||||
--- a/fs/9p/vfs_inode_dotl.c
|
||||
+++ b/fs/9p/vfs_inode_dotl.c
|
||||
@@ -480,7 +480,7 @@ v9fs_vfs_getattr_dotl(const struct path *path, struct kstat *stat,
|
||||
|
||||
p9_debug(P9_DEBUG_VFS, "dentry: %p\n", dentry);
|
||||
v9ses = v9fs_dentry2v9ses(dentry);
|
||||
- if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) {
|
||||
+ if (!d_really_is_negative(dentry) || v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) {
|
||||
generic_fillattr(d_inode(dentry), stat);
|
||||
return 0;
|
||||
}
|
||||
--
|
||||
2.20.1
|
||||
|
||||
@@ -0,0 +1,49 @@
|
||||
From ac1956caf20f8ac0589f69b2d5fcc81e6ba7c71a Mon Sep 17 00:00:00 2001
|
||||
From: Sebastien Boeuf <sebastien.boeuf@intel.com>
|
||||
Date: Thu, 13 Feb 2020 08:50:38 +0100
|
||||
Subject: [PATCH] net: virtio_vsock: Fix race condition between bind and listen
|
||||
|
||||
Whenever the vsock backend on the host sends a packet through the RX
|
||||
queue, it expects an answer on the TX queue. Unfortunately, there is one
|
||||
case where the host side will hang waiting for the answer and will
|
||||
effectively never recover.
|
||||
|
||||
This issue happens when the guest side starts binding to the socket,
|
||||
which insert a new bound socket into the list of already bound sockets.
|
||||
At this time, we expect the guest to also start listening, which will
|
||||
trigger the sk_state to move from TCP_CLOSE to TCP_LISTEN. The problem
|
||||
occurs if the host side queued a RX packet and triggered an interrupt
|
||||
right between the end of the binding process and the beginning of the
|
||||
listening process. In this specific case, the function processing the
|
||||
packet virtio_transport_recv_pkt() will find a bound socket, which means
|
||||
it will hit the switch statement checking for the sk_state, but the
|
||||
state won't be changed into TCP_LISTEN yet, which leads the code to pick
|
||||
the default statement. This default statement will only free the buffer,
|
||||
while it should also respond to the host side, by sending a packet on
|
||||
its TX queue.
|
||||
|
||||
In order to simply fix this unfortunate chain of events, it is important
|
||||
that in case the default statement is entered, and because at this stage
|
||||
we know the host side is waiting for an answer, we must send back a
|
||||
packet containing the operation VIRTIO_VSOCK_OP_RST.
|
||||
|
||||
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
|
||||
---
|
||||
net/vmw_vsock/virtio_transport_common.c | 1 +
|
||||
1 file changed, 1 insertion(+)
|
||||
|
||||
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
|
||||
index fb2060dffb0a..696e9a03ad0f 100644
|
||||
--- a/net/vmw_vsock/virtio_transport_common.c
|
||||
+++ b/net/vmw_vsock/virtio_transport_common.c
|
||||
@@ -1127,6 +1127,7 @@ void virtio_transport_recv_pkt(struct virtio_vsock_pkt *pkt)
|
||||
virtio_transport_free_pkt(pkt);
|
||||
break;
|
||||
default:
|
||||
+ (void)virtio_transport_reset_no_sock(pkt);
|
||||
virtio_transport_free_pkt(pkt);
|
||||
break;
|
||||
}
|
||||
--
|
||||
2.20.1
|
||||
|
||||
@@ -0,0 +1,81 @@
|
||||
From 3d1d7f8922ed2f080f6d8e08df0d51e22f9590ec Mon Sep 17 00:00:00 2001
|
||||
From: Jianyong Wu <jianyong.wu@arm.com>
|
||||
Date: Wed, 1 Apr 2020 15:19:29 +0800
|
||||
Subject: [PATCH 1/9] arm/arm64: Provide a wrapper for SMCCC 1.1 calls
|
||||
|
||||
From: Steven Price <steven.price@arm.com>
|
||||
|
||||
SMCCC 1.1 calls may use either HVC or SMC depending on the PSCI
|
||||
conduit. Rather than coding this in every call site, provide a macro
|
||||
which uses the correct instruction. The macro also handles the case
|
||||
where no conduit is configured/available returning a not supported error
|
||||
in res, along with returning the conduit used for the call.
|
||||
|
||||
This allow us to remove some duplicated code and will be useful later
|
||||
when adding paravirtualized time hypervisor calls.
|
||||
|
||||
Signed-off-by: Steven Price <steven.price@arm.com>
|
||||
Acked-by: Will Deacon <will@kernel.org>
|
||||
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
||||
---
|
||||
include/linux/arm-smccc.h | 45 +++++++++++++++++++++++++++++++++++++++
|
||||
1 file changed, 45 insertions(+)
|
||||
|
||||
diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
|
||||
index 080012a6f025..131edde5d37e 100644
|
||||
--- a/include/linux/arm-smccc.h
|
||||
+++ b/include/linux/arm-smccc.h
|
||||
@@ -302,5 +302,50 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1,
|
||||
#define SMCCC_RET_NOT_SUPPORTED -1
|
||||
#define SMCCC_RET_NOT_REQUIRED -2
|
||||
|
||||
+/*
|
||||
+ * Like arm_smccc_1_1* but always returns SMCCC_RET_NOT_SUPPORTED.
|
||||
+ * Used when the SMCCC conduit is not defined. The empty asm statement
|
||||
+ * avoids compiler warnings about unused variables.
|
||||
+ */
|
||||
+#define __fail_smccc_1_1(...) \
|
||||
+ do { \
|
||||
+ __declare_args(__count_args(__VA_ARGS__), __VA_ARGS__); \
|
||||
+ asm ("" __constraints(__count_args(__VA_ARGS__))); \
|
||||
+ if (___res) \
|
||||
+ ___res->a0 = SMCCC_RET_NOT_SUPPORTED; \
|
||||
+ } while (0)
|
||||
+
|
||||
+/*
|
||||
+ * arm_smccc_1_1_invoke() - make an SMCCC v1.1 compliant call
|
||||
+ *
|
||||
+ * This is a variadic macro taking one to eight source arguments, and
|
||||
+ * an optional return structure.
|
||||
+ *
|
||||
+ * @a0-a7: arguments passed in registers 0 to 7
|
||||
+ * @res: result values from registers 0 to 3
|
||||
+ *
|
||||
+ * This macro will make either an HVC call or an SMC call depending on the
|
||||
+ * current SMCCC conduit. If no valid conduit is available then -1
|
||||
+ * (SMCCC_RET_NOT_SUPPORTED) is returned in @res.a0 (if supplied).
|
||||
+ *
|
||||
+ * The return value also provides the conduit that was used.
|
||||
+ */
|
||||
+#define arm_smccc_1_1_invoke(...) ({ \
|
||||
+ int method = arm_smccc_1_1_get_conduit(); \
|
||||
+ switch (method) { \
|
||||
+ case SMCCC_CONDUIT_HVC: \
|
||||
+ arm_smccc_1_1_hvc(__VA_ARGS__); \
|
||||
+ break; \
|
||||
+ case SMCCC_CONDUIT_SMC: \
|
||||
+ arm_smccc_1_1_smc(__VA_ARGS__); \
|
||||
+ break; \
|
||||
+ default: \
|
||||
+ __fail_smccc_1_1(__VA_ARGS__); \
|
||||
+ method = SMCCC_CONDUIT_NONE; \
|
||||
+ break; \
|
||||
+ } \
|
||||
+ method; \
|
||||
+ })
|
||||
+
|
||||
#endif /*__ASSEMBLY__*/
|
||||
#endif /*__LINUX_ARM_SMCCC_H*/
|
||||
--
|
||||
2.17.1
|
||||
|
||||
@@ -0,0 +1,81 @@
|
||||
From b830806f5cd02119be9b25812b3ea56d97cd08f3 Mon Sep 17 00:00:00 2001
|
||||
From: Mark Rutland <mark.rutland@arm.com>
|
||||
Date: Fri, 9 Aug 2019 14:22:40 +0100
|
||||
Subject: [PATCH 2/9] arm/arm64: smccc/psci: add arm_smccc_1_1_get_conduit()
|
||||
|
||||
SMCCC callers are currently amassing a collection of enums for the SMCCC
|
||||
conduit, and are having to dig into the PSCI driver's internals in order
|
||||
to figure out what to do.
|
||||
|
||||
Let's clean this up, with common SMCCC_CONDUIT_* definitions, and an
|
||||
arm_smccc_1_1_get_conduit() helper that abstracts the PSCI driver's
|
||||
internal state.
|
||||
|
||||
We can kill off the PSCI_CONDUIT_* definitions once we've migrated users
|
||||
over to the new interface.
|
||||
|
||||
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
|
||||
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
|
||||
Acked-by: Will Deacon <will.deacon@arm.com>
|
||||
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
||||
---
|
||||
drivers/firmware/psci/psci.c | 15 +++++++++++++++
|
||||
include/linux/arm-smccc.h | 16 ++++++++++++++++
|
||||
2 files changed, 31 insertions(+)
|
||||
|
||||
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
|
||||
index 84f4ff351c62..eb797081d159 100644
|
||||
--- a/drivers/firmware/psci/psci.c
|
||||
+++ b/drivers/firmware/psci/psci.c
|
||||
@@ -57,6 +57,21 @@ struct psci_operations psci_ops = {
|
||||
.smccc_version = SMCCC_VERSION_1_0,
|
||||
};
|
||||
|
||||
+enum arm_smccc_conduit arm_smccc_1_1_get_conduit(void)
|
||||
+{
|
||||
+ if (psci_ops.smccc_version < SMCCC_VERSION_1_1)
|
||||
+ return SMCCC_CONDUIT_NONE;
|
||||
+
|
||||
+ switch (psci_ops.conduit) {
|
||||
+ case PSCI_CONDUIT_SMC:
|
||||
+ return SMCCC_CONDUIT_SMC;
|
||||
+ case PSCI_CONDUIT_HVC:
|
||||
+ return SMCCC_CONDUIT_HVC;
|
||||
+ default:
|
||||
+ return SMCCC_CONDUIT_NONE;
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
typedef unsigned long (psci_fn)(unsigned long, unsigned long,
|
||||
unsigned long, unsigned long);
|
||||
static psci_fn *invoke_psci_fn;
|
||||
diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
|
||||
index 131edde5d37e..e6d4cb4f61f1 100644
|
||||
--- a/include/linux/arm-smccc.h
|
||||
+++ b/include/linux/arm-smccc.h
|
||||
@@ -80,6 +80,22 @@
|
||||
|
||||
#include <linux/linkage.h>
|
||||
#include <linux/types.h>
|
||||
+
|
||||
+enum arm_smccc_conduit {
|
||||
+ SMCCC_CONDUIT_NONE,
|
||||
+ SMCCC_CONDUIT_SMC,
|
||||
+ SMCCC_CONDUIT_HVC,
|
||||
+};
|
||||
+
|
||||
+/**
|
||||
+ * arm_smccc_1_1_get_conduit()
|
||||
+ *
|
||||
+ * Returns the conduit to be used for SMCCCv1.1 or later.
|
||||
+ *
|
||||
+ * When SMCCCv1.1 is not present, returns SMCCC_CONDUIT_NONE.
|
||||
+ */
|
||||
+enum arm_smccc_conduit arm_smccc_1_1_get_conduit(void);
|
||||
+
|
||||
/**
|
||||
* struct arm_smccc_res - Result from SMC/HVC call
|
||||
* @a0-a3 result values from registers 0 to 3
|
||||
--
|
||||
2.17.1
|
||||
|
||||
@@ -0,0 +1,641 @@
|
||||
From cb55878a1cecb7ef56956a28a9f1b745d0ac522b Mon Sep 17 00:00:00 2001
|
||||
From: Jianyong Wu <jianyong.wu@arm.com>
|
||||
Date: Wed, 1 Apr 2020 15:39:44 +0800
|
||||
Subject: [PATCH 3/3] ptp: arm64: Enable ptp_kvm for arm64.
|
||||
|
||||
Currently in arm64 virtualization environment, there is no mechanism to
|
||||
keep time sync between guest and host. Time in guest will drift compared
|
||||
with host after boot up as they may both use third party time sources
|
||||
to correct their time respectively. The time deviation will be in order
|
||||
of milliseconds but some scenarios ask for higher time precision, like
|
||||
in cloud envirenment, we want all the VMs running in the host aquire the
|
||||
same level accuracy from host clock.
|
||||
|
||||
Use of kvm ptp clock, which choose the host clock source clock as a
|
||||
reference clock to sync time clock between guest and host has been adopted
|
||||
by x86 which makes the time sync order from milliseconds to nanoseconds.
|
||||
|
||||
This patch enables kvm ptp on arm64.
|
||||
|
||||
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
|
||||
---
|
||||
drivers/clocksource/arm_arch_timer.c | 24 ++++++
|
||||
drivers/firmware/psci/psci.c | 1 +
|
||||
drivers/ptp/Kconfig | 2 +-
|
||||
drivers/ptp/Makefile | 1 +
|
||||
drivers/ptp/ptp_kvm.h | 11 +++
|
||||
drivers/ptp/ptp_kvm_arm64.c | 51 ++++++++++++
|
||||
drivers/ptp/{ptp_kvm.c => ptp_kvm_common.c} | 78 +++++-------------
|
||||
drivers/ptp/ptp_kvm_x86.c | 87 +++++++++++++++++++++
|
||||
include/linux/arm-smccc.h | 8 ++
|
||||
include/linux/clocksource.h | 6 ++
|
||||
include/linux/clocksource_ids.h | 13 +++
|
||||
include/linux/timekeeping.h | 12 +--
|
||||
include/uapi/linux/kvm.h | 1 +
|
||||
kernel/time/clocksource.c | 3 +
|
||||
kernel/time/timekeeping.c | 1 +
|
||||
virt/kvm/arm/arm.c | 1 +
|
||||
virt/kvm/arm/psci.c | 23 ++++++
|
||||
17 files changed, 258 insertions(+), 65 deletions(-)
|
||||
create mode 100644 drivers/ptp/ptp_kvm.h
|
||||
create mode 100644 drivers/ptp/ptp_kvm_arm64.c
|
||||
rename drivers/ptp/{ptp_kvm.c => ptp_kvm_common.c} (63%)
|
||||
create mode 100644 drivers/ptp/ptp_kvm_x86.c
|
||||
create mode 100644 include/linux/clocksource_ids.h
|
||||
|
||||
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
|
||||
index 9a5464c625b4..0c723df39b55 100644
|
||||
--- a/drivers/clocksource/arm_arch_timer.c
|
||||
+++ b/drivers/clocksource/arm_arch_timer.c
|
||||
@@ -16,6 +16,7 @@
|
||||
#include <linux/cpu_pm.h>
|
||||
#include <linux/clockchips.h>
|
||||
#include <linux/clocksource.h>
|
||||
+#include <linux/clocksource_ids.h>
|
||||
#include <linux/interrupt.h>
|
||||
#include <linux/of_irq.h>
|
||||
#include <linux/of_address.h>
|
||||
@@ -187,6 +188,7 @@ static u64 arch_counter_read_cc(const struct cyclecounter *cc)
|
||||
|
||||
static struct clocksource clocksource_counter = {
|
||||
.name = "arch_sys_counter",
|
||||
+ .id = CSID_ARM_ARCH_COUNTER,
|
||||
.rating = 400,
|
||||
.read = arch_counter_read,
|
||||
.mask = CLOCKSOURCE_MASK(56),
|
||||
@@ -1623,3 +1625,25 @@ static int __init arch_timer_acpi_init(struct acpi_table_header *table)
|
||||
}
|
||||
TIMER_ACPI_DECLARE(arch_timer, ACPI_SIG_GTDT, arch_timer_acpi_init);
|
||||
#endif
|
||||
+
|
||||
+#if IS_ENABLED(CONFIG_PTP_1588_CLOCK_KVM)
|
||||
+#include <linux/arm-smccc.h>
|
||||
+int kvm_arch_ptp_get_crosststamp(unsigned long *cycle, struct timespec64 *ts,
|
||||
+ struct clocksource **cs)
|
||||
+{
|
||||
+ struct arm_smccc_res hvc_res;
|
||||
+ ktime_t ktime_overall;
|
||||
+
|
||||
+ arm_smccc_1_1_invoke(ARM_SMCCC_HYP_KVM_PTP_FUNC_ID, &hvc_res);
|
||||
+ if ((long)(hvc_res.a0) < 0)
|
||||
+ return -EOPNOTSUPP;
|
||||
+
|
||||
+ ktime_overall = hvc_res.a0 << 32 | hvc_res.a1;
|
||||
+ *ts = ktime_to_timespec64(ktime_overall);
|
||||
+ *cycle = hvc_res.a2 << 32 | hvc_res.a3;
|
||||
+ *cs = &clocksource_counter;
|
||||
+
|
||||
+ return 0;
|
||||
+}
|
||||
+EXPORT_SYMBOL_GPL(kvm_arch_ptp_get_crosststamp);
|
||||
+#endif
|
||||
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
|
||||
index eb797081d159..87a7dc18b175 100644
|
||||
--- a/drivers/firmware/psci/psci.c
|
||||
+++ b/drivers/firmware/psci/psci.c
|
||||
@@ -71,6 +71,7 @@ enum arm_smccc_conduit arm_smccc_1_1_get_conduit(void)
|
||||
return SMCCC_CONDUIT_NONE;
|
||||
}
|
||||
}
|
||||
+EXPORT_SYMBOL(arm_smccc_1_1_get_conduit);
|
||||
|
||||
typedef unsigned long (psci_fn)(unsigned long, unsigned long,
|
||||
unsigned long, unsigned long);
|
||||
diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
|
||||
index 0517272a268e..6f3688e7e440 100644
|
||||
--- a/drivers/ptp/Kconfig
|
||||
+++ b/drivers/ptp/Kconfig
|
||||
@@ -110,7 +110,7 @@ config PTP_1588_CLOCK_PCH
|
||||
config PTP_1588_CLOCK_KVM
|
||||
tristate "KVM virtual PTP clock"
|
||||
depends on PTP_1588_CLOCK
|
||||
- depends on KVM_GUEST && X86
|
||||
+ depends on KVM_GUEST && X86 || ARM64 && ARM_ARCH_TIMER
|
||||
default y
|
||||
help
|
||||
This driver adds support for using kvm infrastructure as a PTP
|
||||
diff --git a/drivers/ptp/Makefile b/drivers/ptp/Makefile
|
||||
index 677d1d178a3e..3b7554f56ad9 100644
|
||||
--- a/drivers/ptp/Makefile
|
||||
+++ b/drivers/ptp/Makefile
|
||||
@@ -4,6 +4,7 @@
|
||||
#
|
||||
|
||||
ptp-y := ptp_clock.o ptp_chardev.o ptp_sysfs.o
|
||||
+ptp_kvm-y := ptp_kvm_$(ARCH).o ptp_kvm_common.o
|
||||
obj-$(CONFIG_PTP_1588_CLOCK) += ptp.o
|
||||
obj-$(CONFIG_PTP_1588_CLOCK_DTE) += ptp_dte.o
|
||||
obj-$(CONFIG_PTP_1588_CLOCK_IXP46X) += ptp_ixp46x.o
|
||||
diff --git a/drivers/ptp/ptp_kvm.h b/drivers/ptp/ptp_kvm.h
|
||||
new file mode 100644
|
||||
index 000000000000..4bf1802bbeb8
|
||||
--- /dev/null
|
||||
+++ b/drivers/ptp/ptp_kvm.h
|
||||
@@ -0,0 +1,11 @@
|
||||
+/* SPDX-License-Identifier: GPL-2.0-or-later */
|
||||
+/*
|
||||
+ * Virtual PTP 1588 clock for use with KVM guests
|
||||
+ *
|
||||
+ * Copyright (C) 2017 Red Hat Inc.
|
||||
+ */
|
||||
+
|
||||
+int kvm_arch_ptp_init(void);
|
||||
+int kvm_arch_ptp_get_clock(struct timespec64 *ts);
|
||||
+int kvm_arch_ptp_get_crosststamp(unsigned long *cycle,
|
||||
+ struct timespec64 *tspec, void *cs);
|
||||
diff --git a/drivers/ptp/ptp_kvm_arm64.c b/drivers/ptp/ptp_kvm_arm64.c
|
||||
new file mode 100644
|
||||
index 000000000000..446f2444d285
|
||||
--- /dev/null
|
||||
+++ b/drivers/ptp/ptp_kvm_arm64.c
|
||||
@@ -0,0 +1,51 @@
|
||||
+// SPDX-License-Identifier: GPL-2.0-only
|
||||
+/*
|
||||
+ * Virtual PTP 1588 clock for use with KVM guests
|
||||
+ * Copyright (C) 2019 ARM Ltd.
|
||||
+ * All Rights Reserved
|
||||
+ */
|
||||
+
|
||||
+#include <linux/kernel.h>
|
||||
+#include <linux/err.h>
|
||||
+#include <asm/hypervisor.h>
|
||||
+#include <linux/module.h>
|
||||
+#include <linux/psci.h>
|
||||
+#include <linux/arm-smccc.h>
|
||||
+#include <linux/timecounter.h>
|
||||
+#include <linux/sched/clock.h>
|
||||
+#include <asm/arch_timer.h>
|
||||
+
|
||||
+int kvm_arch_ptp_init(void)
|
||||
+{
|
||||
+ struct arm_smccc_res hvc_res;
|
||||
+
|
||||
+ arm_smccc_1_1_invoke(ARM_SMCCC_HYP_KVM_PTP_FUNC_ID, &hvc_res);
|
||||
+ if ((long)(hvc_res.a0) < 0)
|
||||
+ return -EOPNOTSUPP;
|
||||
+
|
||||
+ return 0;
|
||||
+}
|
||||
+
|
||||
+int kvm_arch_ptp_get_clock_generic(struct timespec64 *ts,
|
||||
+ struct arm_smccc_res *hvc_res)
|
||||
+{
|
||||
+ ktime_t ktime_overall;
|
||||
+
|
||||
+ arm_smccc_1_1_invoke(ARM_SMCCC_HYP_KVM_PTP_FUNC_ID, hvc_res);
|
||||
+ if ((long)(hvc_res->a0) < 0)
|
||||
+ return -EOPNOTSUPP;
|
||||
+
|
||||
+ ktime_overall = hvc_res->a0 << 32 | hvc_res->a1;
|
||||
+ *ts = ktime_to_timespec64(ktime_overall);
|
||||
+
|
||||
+ return 0;
|
||||
+}
|
||||
+
|
||||
+int kvm_arch_ptp_get_clock(struct timespec64 *ts)
|
||||
+{
|
||||
+ struct arm_smccc_res hvc_res;
|
||||
+
|
||||
+ kvm_arch_ptp_get_clock_generic(ts, &hvc_res);
|
||||
+
|
||||
+ return 0;
|
||||
+}
|
||||
diff --git a/drivers/ptp/ptp_kvm.c b/drivers/ptp/ptp_kvm_common.c
|
||||
similarity index 63%
|
||||
rename from drivers/ptp/ptp_kvm.c
|
||||
rename to drivers/ptp/ptp_kvm_common.c
|
||||
index fc7d0b77e118..60442f70d3fc 100644
|
||||
--- a/drivers/ptp/ptp_kvm.c
|
||||
+++ b/drivers/ptp/ptp_kvm_common.c
|
||||
@@ -8,15 +8,16 @@
|
||||
#include <linux/err.h>
|
||||
#include <linux/init.h>
|
||||
#include <linux/kernel.h>
|
||||
+#include <linux/slab.h>
|
||||
#include <linux/module.h>
|
||||
#include <uapi/linux/kvm_para.h>
|
||||
#include <asm/kvm_para.h>
|
||||
-#include <asm/pvclock.h>
|
||||
-#include <asm/kvmclock.h>
|
||||
#include <uapi/asm/kvm_para.h>
|
||||
|
||||
#include <linux/ptp_clock_kernel.h>
|
||||
|
||||
+#include "ptp_kvm.h"
|
||||
+
|
||||
struct kvm_ptp_clock {
|
||||
struct ptp_clock *ptp_clock;
|
||||
struct ptp_clock_info caps;
|
||||
@@ -24,56 +25,29 @@ struct kvm_ptp_clock {
|
||||
|
||||
DEFINE_SPINLOCK(kvm_ptp_lock);
|
||||
|
||||
-static struct pvclock_vsyscall_time_info *hv_clock;
|
||||
-
|
||||
-static struct kvm_clock_pairing clock_pair;
|
||||
-static phys_addr_t clock_pair_gpa;
|
||||
-
|
||||
static int ptp_kvm_get_time_fn(ktime_t *device_time,
|
||||
struct system_counterval_t *system_counter,
|
||||
void *ctx)
|
||||
{
|
||||
- unsigned long ret;
|
||||
+ unsigned long ret, cycle;
|
||||
struct timespec64 tspec;
|
||||
- unsigned version;
|
||||
- int cpu;
|
||||
- struct pvclock_vcpu_time_info *src;
|
||||
+ struct clocksource *cs;
|
||||
|
||||
spin_lock(&kvm_ptp_lock);
|
||||
|
||||
preempt_disable_notrace();
|
||||
- cpu = smp_processor_id();
|
||||
- src = &hv_clock[cpu].pvti;
|
||||
-
|
||||
- do {
|
||||
- /*
|
||||
- * We are using a TSC value read in the hosts
|
||||
- * kvm_hc_clock_pairing handling.
|
||||
- * So any changes to tsc_to_system_mul
|
||||
- * and tsc_shift or any other pvclock
|
||||
- * data invalidate that measurement.
|
||||
- */
|
||||
- version = pvclock_read_begin(src);
|
||||
-
|
||||
- ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
|
||||
- clock_pair_gpa,
|
||||
- KVM_CLOCK_PAIRING_WALLCLOCK);
|
||||
- if (ret != 0) {
|
||||
- pr_err_ratelimited("clock pairing hypercall ret %lu\n", ret);
|
||||
- spin_unlock(&kvm_ptp_lock);
|
||||
- preempt_enable_notrace();
|
||||
- return -EOPNOTSUPP;
|
||||
- }
|
||||
-
|
||||
- tspec.tv_sec = clock_pair.sec;
|
||||
- tspec.tv_nsec = clock_pair.nsec;
|
||||
- ret = __pvclock_read_cycles(src, clock_pair.tsc);
|
||||
- } while (pvclock_read_retry(src, version));
|
||||
+ ret = kvm_arch_ptp_get_crosststamp(&cycle, &tspec, &cs);
|
||||
+ if (ret != 0) {
|
||||
+ pr_err_ratelimited("clock pairing hypercall ret %lu\n", ret);
|
||||
+ spin_unlock(&kvm_ptp_lock);
|
||||
+ preempt_enable_notrace();
|
||||
+ return -EOPNOTSUPP;
|
||||
+ }
|
||||
|
||||
preempt_enable_notrace();
|
||||
|
||||
- system_counter->cycles = ret;
|
||||
- system_counter->cs = &kvm_clock;
|
||||
+ system_counter->cycles = cycle;
|
||||
+ system_counter->cs = cs;
|
||||
|
||||
*device_time = timespec64_to_ktime(tspec);
|
||||
|
||||
@@ -116,17 +90,13 @@ static int ptp_kvm_gettime(struct ptp_clock_info *ptp, struct timespec64 *ts)
|
||||
|
||||
spin_lock(&kvm_ptp_lock);
|
||||
|
||||
- ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
|
||||
- clock_pair_gpa,
|
||||
- KVM_CLOCK_PAIRING_WALLCLOCK);
|
||||
+ ret = kvm_arch_ptp_get_clock(&tspec);
|
||||
if (ret != 0) {
|
||||
pr_err_ratelimited("clock offset hypercall ret %lu\n", ret);
|
||||
spin_unlock(&kvm_ptp_lock);
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
|
||||
- tspec.tv_sec = clock_pair.sec;
|
||||
- tspec.tv_nsec = clock_pair.nsec;
|
||||
spin_unlock(&kvm_ptp_lock);
|
||||
|
||||
memcpy(ts, &tspec, sizeof(struct timespec64));
|
||||
@@ -166,21 +136,11 @@ static void __exit ptp_kvm_exit(void)
|
||||
|
||||
static int __init ptp_kvm_init(void)
|
||||
{
|
||||
- long ret;
|
||||
-
|
||||
- if (!kvm_para_available())
|
||||
- return -ENODEV;
|
||||
+ int ret;
|
||||
|
||||
- clock_pair_gpa = slow_virt_to_phys(&clock_pair);
|
||||
- hv_clock = pvclock_get_pvti_cpu0_va();
|
||||
-
|
||||
- if (!hv_clock)
|
||||
- return -ENODEV;
|
||||
-
|
||||
- ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING, clock_pair_gpa,
|
||||
- KVM_CLOCK_PAIRING_WALLCLOCK);
|
||||
- if (ret == -KVM_ENOSYS || ret == -KVM_EOPNOTSUPP)
|
||||
- return -ENODEV;
|
||||
+ ret = kvm_arch_ptp_init();
|
||||
+ if (ret)
|
||||
+ return -EOPNOTSUPP;
|
||||
|
||||
kvm_ptp_clock.caps = ptp_kvm_caps;
|
||||
|
||||
diff --git a/drivers/ptp/ptp_kvm_x86.c b/drivers/ptp/ptp_kvm_x86.c
|
||||
new file mode 100644
|
||||
index 000000000000..6c891d7299c6
|
||||
--- /dev/null
|
||||
+++ b/drivers/ptp/ptp_kvm_x86.c
|
||||
@@ -0,0 +1,87 @@
|
||||
+// SPDX-License-Identifier: GPL-2.0-or-later
|
||||
+/*
|
||||
+ * Virtual PTP 1588 clock for use with KVM guests
|
||||
+ *
|
||||
+ * Copyright (C) 2017 Red Hat Inc.
|
||||
+ */
|
||||
+
|
||||
+#include <asm/pvclock.h>
|
||||
+#include <asm/kvmclock.h>
|
||||
+#include <linux/module.h>
|
||||
+#include <uapi/asm/kvm_para.h>
|
||||
+#include <uapi/linux/kvm_para.h>
|
||||
+#include <linux/ptp_clock_kernel.h>
|
||||
+
|
||||
+phys_addr_t clock_pair_gpa;
|
||||
+struct kvm_clock_pairing clock_pair;
|
||||
+struct pvclock_vsyscall_time_info *hv_clock;
|
||||
+
|
||||
+int kvm_arch_ptp_init(void)
|
||||
+{
|
||||
+ int ret;
|
||||
+
|
||||
+ if (!kvm_para_available())
|
||||
+ return -ENODEV;
|
||||
+
|
||||
+ clock_pair_gpa = slow_virt_to_phys(&clock_pair);
|
||||
+ hv_clock = pvclock_get_pvti_cpu0_va();
|
||||
+ if (!hv_clock)
|
||||
+ return -ENODEV;
|
||||
+
|
||||
+ ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING, clock_pair_gpa,
|
||||
+ KVM_CLOCK_PAIRING_WALLCLOCK);
|
||||
+ if (ret == -KVM_ENOSYS || ret == -KVM_EOPNOTSUPP)
|
||||
+ return -ENODEV;
|
||||
+
|
||||
+ return 0;
|
||||
+}
|
||||
+
|
||||
+int kvm_arch_ptp_get_clock(struct timespec64 *ts)
|
||||
+{
|
||||
+ long ret;
|
||||
+
|
||||
+ ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
|
||||
+ clock_pair_gpa,
|
||||
+ KVM_CLOCK_PAIRING_WALLCLOCK);
|
||||
+ if (ret != 0)
|
||||
+ return -EOPNOTSUPP;
|
||||
+
|
||||
+ ts->tv_sec = clock_pair.sec;
|
||||
+ ts->tv_nsec = clock_pair.nsec;
|
||||
+
|
||||
+ return 0;
|
||||
+}
|
||||
+
|
||||
+int kvm_arch_ptp_get_crosststamp(unsigned long *cycle, struct timespec64 *tspec,
|
||||
+ struct clocksource **cs)
|
||||
+{
|
||||
+ unsigned long ret;
|
||||
+ unsigned int version;
|
||||
+ int cpu;
|
||||
+ struct pvclock_vcpu_time_info *src;
|
||||
+
|
||||
+ cpu = smp_processor_id();
|
||||
+ src = &hv_clock[cpu].pvti;
|
||||
+
|
||||
+ do {
|
||||
+ /*
|
||||
+ * We are using a TSC value read in the hosts
|
||||
+ * kvm_hc_clock_pairing handling.
|
||||
+ * So any changes to tsc_to_system_mul
|
||||
+ * and tsc_shift or any other pvclock
|
||||
+ * data invalidate that measurement.
|
||||
+ */
|
||||
+ version = pvclock_read_begin(src);
|
||||
+
|
||||
+ ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
|
||||
+ clock_pair_gpa,
|
||||
+ KVM_CLOCK_PAIRING_WALLCLOCK);
|
||||
+ tspec->tv_sec = clock_pair.sec;
|
||||
+ tspec->tv_nsec = clock_pair.nsec;
|
||||
+ *cycle = __pvclock_read_cycles(src, clock_pair.tsc);
|
||||
+ } while (pvclock_read_retry(src, version));
|
||||
+
|
||||
+ *cs = &kvm_clock;
|
||||
+
|
||||
+ return 0;
|
||||
+}
|
||||
diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
|
||||
index e6d4cb4f61f1..32a46d564934 100644
|
||||
--- a/include/linux/arm-smccc.h
|
||||
+++ b/include/linux/arm-smccc.h
|
||||
@@ -45,6 +45,7 @@
|
||||
#define ARM_SMCCC_OWNER_SIP 2
|
||||
#define ARM_SMCCC_OWNER_OEM 3
|
||||
#define ARM_SMCCC_OWNER_STANDARD 4
|
||||
+#define ARM_SMCCC_OWNER_STANDARD_HYP 5
|
||||
#define ARM_SMCCC_OWNER_TRUSTED_APP 48
|
||||
#define ARM_SMCCC_OWNER_TRUSTED_APP_END 49
|
||||
#define ARM_SMCCC_OWNER_TRUSTED_OS 50
|
||||
@@ -76,6 +77,13 @@
|
||||
ARM_SMCCC_SMC_32, \
|
||||
0, 0x7fff)
|
||||
|
||||
+/* PTP KVM call requests clock time from guest OS to host */
|
||||
+#define ARM_SMCCC_HYP_KVM_PTP_FUNC_ID \
|
||||
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
|
||||
+ ARM_SMCCC_SMC_32, \
|
||||
+ ARM_SMCCC_OWNER_STANDARD_HYP, \
|
||||
+ 0)
|
||||
+
|
||||
#ifndef __ASSEMBLY__
|
||||
|
||||
#include <linux/linkage.h>
|
||||
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
|
||||
index b21db536fd52..96e85b6f9ca0 100644
|
||||
--- a/include/linux/clocksource.h
|
||||
+++ b/include/linux/clocksource.h
|
||||
@@ -17,6 +17,7 @@
|
||||
#include <linux/timer.h>
|
||||
#include <linux/init.h>
|
||||
#include <linux/of.h>
|
||||
+#include <linux/clocksource_ids.h>
|
||||
#include <asm/div64.h>
|
||||
#include <asm/io.h>
|
||||
|
||||
@@ -49,6 +50,10 @@ struct module;
|
||||
* 400-499: Perfect
|
||||
* The ideal clocksource. A must-use where
|
||||
* available.
|
||||
+ * @id: Defaults to CSID_GENERIC. The id value is captured
|
||||
+ * in certain snapshot functions to allow callers to
|
||||
+ * validate the clocksource from which the snapshot was
|
||||
+ * taken.
|
||||
* @read: returns a cycle value, passes clocksource as argument
|
||||
* @enable: optional function to enable the clocksource
|
||||
* @disable: optional function to disable the clocksource
|
||||
@@ -91,6 +96,7 @@ struct clocksource {
|
||||
const char *name;
|
||||
struct list_head list;
|
||||
int rating;
|
||||
+ enum clocksource_ids id;
|
||||
int (*enable)(struct clocksource *cs);
|
||||
void (*disable)(struct clocksource *cs);
|
||||
unsigned long flags;
|
||||
diff --git a/include/linux/clocksource_ids.h b/include/linux/clocksource_ids.h
|
||||
new file mode 100644
|
||||
index 000000000000..93bec8426c44
|
||||
--- /dev/null
|
||||
+++ b/include/linux/clocksource_ids.h
|
||||
@@ -0,0 +1,13 @@
|
||||
+/* SPDX-License-Identifier: GPL-2.0 */
|
||||
+#ifndef _LINUX_CLOCKSOURCE_IDS_H
|
||||
+#define _LINUX_CLOCKSOURCE_IDS_H
|
||||
+
|
||||
+/* Enum to give clocksources a unique identifier */
|
||||
+enum clocksource_ids {
|
||||
+ CSID_GENERIC = 0,
|
||||
+ CSID_ARM_ARCH_COUNTER,
|
||||
+ CSID_MAX,
|
||||
+};
|
||||
+
|
||||
+#endif
|
||||
+
|
||||
diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
|
||||
index b27e2ffa96c1..4ecc32ad3879 100644
|
||||
--- a/include/linux/timekeeping.h
|
||||
+++ b/include/linux/timekeeping.h
|
||||
@@ -2,6 +2,7 @@
|
||||
#ifndef _LINUX_TIMEKEEPING_H
|
||||
#define _LINUX_TIMEKEEPING_H
|
||||
|
||||
+#include <linux/clocksource_ids.h>
|
||||
#include <linux/errno.h>
|
||||
|
||||
/* Included from linux/ktime.h */
|
||||
@@ -232,11 +233,12 @@ extern void timekeeping_inject_sleeptime64(const struct timespec64 *delta);
|
||||
* @cs_was_changed_seq: The sequence number of clocksource change events
|
||||
*/
|
||||
struct system_time_snapshot {
|
||||
- u64 cycles;
|
||||
- ktime_t real;
|
||||
- ktime_t raw;
|
||||
- unsigned int clock_was_set_seq;
|
||||
- u8 cs_was_changed_seq;
|
||||
+ u64 cycles;
|
||||
+ ktime_t real;
|
||||
+ ktime_t raw;
|
||||
+ enum clocksource_ids cs_id;
|
||||
+ unsigned int clock_was_set_seq;
|
||||
+ u8 cs_was_changed_seq;
|
||||
};
|
||||
|
||||
/*
|
||||
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
|
||||
index 52641d8ca9e8..16008ebe5474 100644
|
||||
--- a/include/uapi/linux/kvm.h
|
||||
+++ b/include/uapi/linux/kvm.h
|
||||
@@ -1000,6 +1000,7 @@ struct kvm_ppc_resize_hpt {
|
||||
#define KVM_CAP_PMU_EVENT_FILTER 173
|
||||
#define KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 174
|
||||
#define KVM_CAP_HYPERV_DIRECT_TLBFLUSH 175
|
||||
+#define KVM_CAP_ARM_KVM_PTP 176
|
||||
|
||||
#ifdef KVM_CAP_IRQ_ROUTING
|
||||
|
||||
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
|
||||
index fff5f64981c6..5fe2d61172b1 100644
|
||||
--- a/kernel/time/clocksource.c
|
||||
+++ b/kernel/time/clocksource.c
|
||||
@@ -921,6 +921,9 @@ int __clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq)
|
||||
|
||||
clocksource_arch_init(cs);
|
||||
|
||||
+ if (WARN_ON_ONCE((unsigned int)cs->id >= CSID_MAX))
|
||||
+ cs->id = CSID_GENERIC;
|
||||
+
|
||||
/* Initialize mult/shift and max_idle_ns */
|
||||
__clocksource_update_freq_scale(cs, scale, freq);
|
||||
|
||||
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
|
||||
index ca69290bee2a..a8b378338b9e 100644
|
||||
--- a/kernel/time/timekeeping.c
|
||||
+++ b/kernel/time/timekeeping.c
|
||||
@@ -979,6 +979,7 @@ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
|
||||
do {
|
||||
seq = read_seqcount_begin(&tk_core.seq);
|
||||
now = tk_clock_read(&tk->tkr_mono);
|
||||
+ systime_snapshot->cs_id = tk->tkr_mono.clock->id;
|
||||
systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq;
|
||||
systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
|
||||
base_real = ktime_add(tk->tkr_mono.base,
|
||||
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
|
||||
index 86c6aa1cb58e..ee159ce9ca39 100644
|
||||
--- a/virt/kvm/arm/arm.c
|
||||
+++ b/virt/kvm/arm/arm.c
|
||||
@@ -197,6 +197,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
|
||||
case KVM_CAP_IMMEDIATE_EXIT:
|
||||
case KVM_CAP_VCPU_EVENTS:
|
||||
case KVM_CAP_ARM_IRQ_LINE_LAYOUT_2:
|
||||
+ case KVM_CAP_ARM_KVM_PTP:
|
||||
r = 1;
|
||||
break;
|
||||
case KVM_CAP_ARM_SET_DEVICE_ADDR:
|
||||
diff --git a/virt/kvm/arm/psci.c b/virt/kvm/arm/psci.c
|
||||
index 87927f7e1ee7..6e689f9952fb 100644
|
||||
--- a/virt/kvm/arm/psci.c
|
||||
+++ b/virt/kvm/arm/psci.c
|
||||
@@ -9,6 +9,7 @@
|
||||
#include <linux/kvm_host.h>
|
||||
#include <linux/uaccess.h>
|
||||
#include <linux/wait.h>
|
||||
+#include <linux/clocksource_ids.h>
|
||||
|
||||
#include <asm/cputype.h>
|
||||
#include <asm/kvm_emulate.h>
|
||||
@@ -389,6 +390,9 @@ static int kvm_psci_call(struct kvm_vcpu *vcpu)
|
||||
|
||||
int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
+ struct system_time_snapshot systime_snapshot;
|
||||
+ long arg[4];
|
||||
+ u64 cycles;
|
||||
u32 func_id = smccc_get_function(vcpu);
|
||||
u32 val = SMCCC_RET_NOT_SUPPORTED;
|
||||
u32 feature;
|
||||
@@ -428,6 +432,25 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
|
||||
break;
|
||||
}
|
||||
break;
|
||||
+ /*
|
||||
+ * This will used for virtual ptp kvm clock. three values will be
|
||||
+ * passed back.
|
||||
+ * reg0 stores high 32-bit host ktime;
|
||||
+ * reg1 stores low 32-bit host ktime;
|
||||
+ * reg2 stores high 32-bit difference of host cycles and cntvoff;
|
||||
+ * reg3 stores low 32-bit difference of host cycles and cntvoff.
|
||||
+ */
|
||||
+ case ARM_SMCCC_HYP_KVM_PTP_FUNC_ID:
|
||||
+ ktime_get_snapshot(&systime_snapshot);
|
||||
+ if (systime_snapshot.cs_id != CSID_ARM_ARCH_COUNTER)
|
||||
+ break;
|
||||
+ arg[0] = systime_snapshot.real >> 32;
|
||||
+ arg[1] = systime_snapshot.real << 32 >> 32;
|
||||
+ cycles = systime_snapshot.cycles - vcpu_vtimer(vcpu)->cntvoff;
|
||||
+ arg[2] = cycles >> 32;
|
||||
+ arg[3] = cycles << 32 >> 32;
|
||||
+ smccc_set_retval(vcpu, arg[0], arg[1], arg[2], arg[3]);
|
||||
+ return 1;
|
||||
default:
|
||||
return kvm_psci_call(vcpu);
|
||||
}
|
||||
--
|
||||
2.17.1
|
||||
|
||||
@@ -0,0 +1,498 @@
|
||||
From ba91422b18892bceacf3b4aa60354cf36fcabf9b Mon Sep 17 00:00:00 2001
|
||||
From: Penny Zheng <penny.zheng@arm.com>
|
||||
Date: Wed, 8 Apr 2020 10:26:52 +0800
|
||||
Subject: [PATCH] arm64/mm: Enable memory hot remove
|
||||
|
||||
Backport Anshuman Khandual's patch series of Enabling memory hot
|
||||
remove on aarch64(https://patchwork.kernel.org/cover/11419305/)
|
||||
to v5.4.x.
|
||||
This patch series has already been merged, and queued for 5.7.
|
||||
|
||||
Signed-off-by: Penny Zheng <penny.zheng@arm.com>
|
||||
---
|
||||
arch/arm64/Kconfig | 3 +
|
||||
arch/arm64/include/asm/memory.h | 1 +
|
||||
arch/arm64/mm/mmu.c | 379 +++++++++++++++++++++++++++++++-
|
||||
arch/arm64/mm/ptdump_debugfs.c | 4 +
|
||||
4 files changed, 378 insertions(+), 9 deletions(-)
|
||||
|
||||
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
|
||||
index 6ccd2ed30963..d18b716fa569 100644
|
||||
--- a/arch/arm64/Kconfig
|
||||
+++ b/arch/arm64/Kconfig
|
||||
@@ -274,6 +274,9 @@ config ZONE_DMA32
|
||||
config ARCH_ENABLE_MEMORY_HOTPLUG
|
||||
def_bool y
|
||||
|
||||
+config ARCH_ENABLE_MEMORY_HOTREMOVE
|
||||
+ def_bool y
|
||||
+
|
||||
config SMP
|
||||
def_bool y
|
||||
|
||||
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
|
||||
index c23c47360664..dbba06e258f5 100644
|
||||
--- a/arch/arm64/include/asm/memory.h
|
||||
+++ b/arch/arm64/include/asm/memory.h
|
||||
@@ -54,6 +54,7 @@
|
||||
#define MODULES_VADDR (BPF_JIT_REGION_END)
|
||||
#define MODULES_VSIZE (SZ_128M)
|
||||
#define VMEMMAP_START (-VMEMMAP_SIZE - SZ_2M)
|
||||
+#define VMEMMAP_END (VMEMMAP_START + VMEMMAP_SIZE)
|
||||
#define PCI_IO_END (VMEMMAP_START - SZ_2M)
|
||||
#define PCI_IO_START (PCI_IO_END - PCI_IO_SIZE)
|
||||
#define FIXADDR_TOP (PCI_IO_START - SZ_2M)
|
||||
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
|
||||
index d10247fab0fd..99fec235144e 100644
|
||||
--- a/arch/arm64/mm/mmu.c
|
||||
+++ b/arch/arm64/mm/mmu.c
|
||||
@@ -17,6 +17,7 @@
|
||||
#include <linux/mman.h>
|
||||
#include <linux/nodemask.h>
|
||||
#include <linux/memblock.h>
|
||||
+#include <linux/memory.h>
|
||||
#include <linux/fs.h>
|
||||
#include <linux/io.h>
|
||||
#include <linux/mm.h>
|
||||
@@ -725,6 +726,312 @@ int kern_addr_valid(unsigned long addr)
|
||||
|
||||
return pfn_valid(pte_pfn(pte));
|
||||
}
|
||||
+
|
||||
+#ifdef CONFIG_MEMORY_HOTPLUG
|
||||
+static void free_hotplug_page_range(struct page *page, size_t size)
|
||||
+{
|
||||
+ WARN_ON(PageReserved(page));
|
||||
+ free_pages((unsigned long)page_address(page), get_order(size));
|
||||
+}
|
||||
+
|
||||
+static void free_hotplug_pgtable_page(struct page *page)
|
||||
+{
|
||||
+ free_hotplug_page_range(page, PAGE_SIZE);
|
||||
+}
|
||||
+
|
||||
+static bool pgtable_range_aligned(unsigned long start, unsigned long end,
|
||||
+ unsigned long floor, unsigned long ceiling,
|
||||
+ unsigned long mask)
|
||||
+{
|
||||
+ start &= mask;
|
||||
+ if (start < floor)
|
||||
+ return false;
|
||||
+
|
||||
+ if (ceiling) {
|
||||
+ ceiling &= mask;
|
||||
+ if (!ceiling)
|
||||
+ return false;
|
||||
+ }
|
||||
+
|
||||
+ if (end - 1 > ceiling - 1)
|
||||
+ return false;
|
||||
+ return true;
|
||||
+}
|
||||
+
|
||||
+static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr,
|
||||
+ unsigned long end, bool free_mapped)
|
||||
+{
|
||||
+ pte_t *ptep, pte;
|
||||
+
|
||||
+ do {
|
||||
+ ptep = pte_offset_kernel(pmdp, addr);
|
||||
+ pte = READ_ONCE(*ptep);
|
||||
+ if (pte_none(pte))
|
||||
+ continue;
|
||||
+
|
||||
+ WARN_ON(!pte_present(pte));
|
||||
+ pte_clear(&init_mm, addr, ptep);
|
||||
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
|
||||
+ if (free_mapped)
|
||||
+ free_hotplug_page_range(pte_page(pte), PAGE_SIZE);
|
||||
+ } while (addr += PAGE_SIZE, addr < end);
|
||||
+}
|
||||
+
|
||||
+static void unmap_hotplug_pmd_range(pud_t *pudp, unsigned long addr,
|
||||
+ unsigned long end, bool free_mapped)
|
||||
+{
|
||||
+ unsigned long next;
|
||||
+ pmd_t *pmdp, pmd;
|
||||
+
|
||||
+ do {
|
||||
+ next = pmd_addr_end(addr, end);
|
||||
+ pmdp = pmd_offset(pudp, addr);
|
||||
+ pmd = READ_ONCE(*pmdp);
|
||||
+ if (pmd_none(pmd))
|
||||
+ continue;
|
||||
+
|
||||
+ WARN_ON(!pmd_present(pmd));
|
||||
+ if (pmd_sect(pmd)) {
|
||||
+ pmd_clear(pmdp);
|
||||
+
|
||||
+ /*
|
||||
+ * One TLBI should be sufficient here as the PMD_SIZE
|
||||
+ * range is mapped with a single block entry.
|
||||
+ */
|
||||
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
|
||||
+ if (free_mapped)
|
||||
+ free_hotplug_page_range(pmd_page(pmd),
|
||||
+ PMD_SIZE);
|
||||
+ continue;
|
||||
+ }
|
||||
+ WARN_ON(!pmd_table(pmd));
|
||||
+ unmap_hotplug_pte_range(pmdp, addr, next, free_mapped);
|
||||
+ } while (addr = next, addr < end);
|
||||
+}
|
||||
+
|
||||
+static void unmap_hotplug_pud_range(p4d_t *p4dp, unsigned long addr,
|
||||
+ unsigned long end, bool free_mapped)
|
||||
+{
|
||||
+ unsigned long next;
|
||||
+ pud_t *pudp, pud;
|
||||
+
|
||||
+ do {
|
||||
+ next = pud_addr_end(addr, end);
|
||||
+ pudp = pud_offset(p4dp, addr);
|
||||
+ pud = READ_ONCE(*pudp);
|
||||
+ if (pud_none(pud))
|
||||
+ continue;
|
||||
+
|
||||
+ WARN_ON(!pud_present(pud));
|
||||
+ if (pud_sect(pud)) {
|
||||
+ pud_clear(pudp);
|
||||
+
|
||||
+ /*
|
||||
+ * One TLBI should be sufficient here as the PUD_SIZE
|
||||
+ * range is mapped with a single block entry.
|
||||
+ */
|
||||
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
|
||||
+ if (free_mapped)
|
||||
+ free_hotplug_page_range(pud_page(pud),
|
||||
+ PUD_SIZE);
|
||||
+ continue;
|
||||
+ }
|
||||
+ WARN_ON(!pud_table(pud));
|
||||
+ unmap_hotplug_pmd_range(pudp, addr, next, free_mapped);
|
||||
+ } while (addr = next, addr < end);
|
||||
+}
|
||||
+
|
||||
+static void unmap_hotplug_p4d_range(pgd_t *pgdp, unsigned long addr,
|
||||
+ unsigned long end, bool free_mapped)
|
||||
+{
|
||||
+ unsigned long next;
|
||||
+ p4d_t *p4dp, p4d;
|
||||
+
|
||||
+ do {
|
||||
+ next = p4d_addr_end(addr, end);
|
||||
+ p4dp = p4d_offset(pgdp, addr);
|
||||
+ p4d = READ_ONCE(*p4dp);
|
||||
+ if (p4d_none(p4d))
|
||||
+ continue;
|
||||
+
|
||||
+ WARN_ON(!p4d_present(p4d));
|
||||
+ unmap_hotplug_pud_range(p4dp, addr, next, free_mapped);
|
||||
+ } while (addr = next, addr < end);
|
||||
+}
|
||||
+
|
||||
+static void unmap_hotplug_range(unsigned long addr, unsigned long end,
|
||||
+ bool free_mapped)
|
||||
+{
|
||||
+ unsigned long next;
|
||||
+ pgd_t *pgdp, pgd;
|
||||
+
|
||||
+ do {
|
||||
+ next = pgd_addr_end(addr, end);
|
||||
+ pgdp = pgd_offset_k(addr);
|
||||
+ pgd = READ_ONCE(*pgdp);
|
||||
+ if (pgd_none(pgd))
|
||||
+ continue;
|
||||
+
|
||||
+ WARN_ON(!pgd_present(pgd));
|
||||
+ unmap_hotplug_p4d_range(pgdp, addr, next, free_mapped);
|
||||
+ } while (addr = next, addr < end);
|
||||
+}
|
||||
+
|
||||
+static void free_empty_pte_table(pmd_t *pmdp, unsigned long addr,
|
||||
+ unsigned long end, unsigned long floor,
|
||||
+ unsigned long ceiling)
|
||||
+{
|
||||
+ pte_t *ptep, pte;
|
||||
+ unsigned long i, start = addr;
|
||||
+
|
||||
+ do {
|
||||
+ ptep = pte_offset_kernel(pmdp, addr);
|
||||
+ pte = READ_ONCE(*ptep);
|
||||
+
|
||||
+ /*
|
||||
+ * This is just a sanity check here which verifies that
|
||||
+ * pte clearing has been done by earlier unmap loops.
|
||||
+ */
|
||||
+ WARN_ON(!pte_none(pte));
|
||||
+ } while (addr += PAGE_SIZE, addr < end);
|
||||
+
|
||||
+ if (!pgtable_range_aligned(start, end, floor, ceiling, PMD_MASK))
|
||||
+ return;
|
||||
+
|
||||
+ /*
|
||||
+ * Check whether we can free the pte page if the rest of the
|
||||
+ * entries are empty. Overlap with other regions have been
|
||||
+ * handled by the floor/ceiling check.
|
||||
+ */
|
||||
+ ptep = pte_offset_kernel(pmdp, 0UL);
|
||||
+ for (i = 0; i < PTRS_PER_PTE; i++) {
|
||||
+ if (!pte_none(READ_ONCE(ptep[i])))
|
||||
+ return;
|
||||
+ }
|
||||
+
|
||||
+ pmd_clear(pmdp);
|
||||
+ __flush_tlb_kernel_pgtable(start);
|
||||
+ free_hotplug_pgtable_page(virt_to_page(ptep));
|
||||
+}
|
||||
+
|
||||
+static void free_empty_pmd_table(pud_t *pudp, unsigned long addr,
|
||||
+ unsigned long end, unsigned long floor,
|
||||
+ unsigned long ceiling)
|
||||
+{
|
||||
+ pmd_t *pmdp, pmd;
|
||||
+ unsigned long i, next, start = addr;
|
||||
+
|
||||
+ do {
|
||||
+ next = pmd_addr_end(addr, end);
|
||||
+ pmdp = pmd_offset(pudp, addr);
|
||||
+ pmd = READ_ONCE(*pmdp);
|
||||
+ if (pmd_none(pmd))
|
||||
+ continue;
|
||||
+
|
||||
+ WARN_ON(!pmd_present(pmd) || !pmd_table(pmd) || pmd_sect(pmd));
|
||||
+ free_empty_pte_table(pmdp, addr, next, floor, ceiling);
|
||||
+ } while (addr = next, addr < end);
|
||||
+
|
||||
+ if (CONFIG_PGTABLE_LEVELS <= 2)
|
||||
+ return;
|
||||
+
|
||||
+ if (!pgtable_range_aligned(start, end, floor, ceiling, PUD_MASK))
|
||||
+ return;
|
||||
+
|
||||
+ /*
|
||||
+ * Check whether we can free the pmd page if the rest of the
|
||||
+ * entries are empty. Overlap with other regions have been
|
||||
+ * handled by the floor/ceiling check.
|
||||
+ */
|
||||
+ pmdp = pmd_offset(pudp, 0UL);
|
||||
+ for (i = 0; i < PTRS_PER_PMD; i++) {
|
||||
+ if (!pmd_none(READ_ONCE(pmdp[i])))
|
||||
+ return;
|
||||
+ }
|
||||
+
|
||||
+ pud_clear(pudp);
|
||||
+ __flush_tlb_kernel_pgtable(start);
|
||||
+ free_hotplug_pgtable_page(virt_to_page(pmdp));
|
||||
+}
|
||||
+
|
||||
+static void free_empty_pud_table(p4d_t *p4dp, unsigned long addr,
|
||||
+ unsigned long end, unsigned long floor,
|
||||
+ unsigned long ceiling)
|
||||
+{
|
||||
+ pud_t *pudp, pud;
|
||||
+ unsigned long i, next, start = addr;
|
||||
+
|
||||
+ do {
|
||||
+ next = pud_addr_end(addr, end);
|
||||
+ pudp = pud_offset(p4dp, addr);
|
||||
+ pud = READ_ONCE(*pudp);
|
||||
+ if (pud_none(pud))
|
||||
+ continue;
|
||||
+
|
||||
+ WARN_ON(!pud_present(pud) || !pud_table(pud) || pud_sect(pud));
|
||||
+ free_empty_pmd_table(pudp, addr, next, floor, ceiling);
|
||||
+ } while (addr = next, addr < end);
|
||||
+
|
||||
+ if (CONFIG_PGTABLE_LEVELS <= 3)
|
||||
+ return;
|
||||
+
|
||||
+ if (!pgtable_range_aligned(start, end, floor, ceiling, PGDIR_MASK))
|
||||
+ return;
|
||||
+
|
||||
+ /*
|
||||
+ * Check whether we can free the pud page if the rest of the
|
||||
+ * entries are empty. Overlap with other regions have been
|
||||
+ * handled by the floor/ceiling check.
|
||||
+ */
|
||||
+ pudp = pud_offset(p4dp, 0UL);
|
||||
+ for (i = 0; i < PTRS_PER_PUD; i++) {
|
||||
+ if (!pud_none(READ_ONCE(pudp[i])))
|
||||
+ return;
|
||||
+ }
|
||||
+
|
||||
+ p4d_clear(p4dp);
|
||||
+ __flush_tlb_kernel_pgtable(start);
|
||||
+ free_hotplug_pgtable_page(virt_to_page(pudp));
|
||||
+}
|
||||
+
|
||||
+static void free_empty_p4d_table(pgd_t *pgdp, unsigned long addr,
|
||||
+ unsigned long end, unsigned long floor,
|
||||
+ unsigned long ceiling)
|
||||
+{
|
||||
+ unsigned long next;
|
||||
+ p4d_t *p4dp, p4d;
|
||||
+
|
||||
+ do {
|
||||
+ next = p4d_addr_end(addr, end);
|
||||
+ p4dp = p4d_offset(pgdp, addr);
|
||||
+ p4d = READ_ONCE(*p4dp);
|
||||
+ if (p4d_none(p4d))
|
||||
+ continue;
|
||||
+
|
||||
+ WARN_ON(!p4d_present(p4d));
|
||||
+ free_empty_pud_table(p4dp, addr, next, floor, ceiling);
|
||||
+ } while (addr = next, addr < end);
|
||||
+}
|
||||
+
|
||||
+static void free_empty_tables(unsigned long addr, unsigned long end,
|
||||
+ unsigned long floor, unsigned long ceiling)
|
||||
+{
|
||||
+ unsigned long next;
|
||||
+ pgd_t *pgdp, pgd;
|
||||
+
|
||||
+ do {
|
||||
+ next = pgd_addr_end(addr, end);
|
||||
+ pgdp = pgd_offset_k(addr);
|
||||
+ pgd = READ_ONCE(*pgdp);
|
||||
+ if (pgd_none(pgd))
|
||||
+ continue;
|
||||
+
|
||||
+ WARN_ON(!pgd_present(pgd));
|
||||
+ free_empty_p4d_table(pgdp, addr, next, floor, ceiling);
|
||||
+ } while (addr = next, addr < end);
|
||||
+}
|
||||
+#endif
|
||||
+
|
||||
#ifdef CONFIG_SPARSEMEM_VMEMMAP
|
||||
#if !ARM64_SWAPPER_USES_SECTION_MAPS
|
||||
int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
|
||||
@@ -772,6 +1079,12 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
|
||||
void vmemmap_free(unsigned long start, unsigned long end,
|
||||
struct vmem_altmap *altmap)
|
||||
{
|
||||
+#ifdef CONFIG_MEMORY_HOTPLUG
|
||||
+ WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
|
||||
+
|
||||
+ unmap_hotplug_range(start, end, true);
|
||||
+ free_empty_tables(start, end, VMEMMAP_START, VMEMMAP_END);
|
||||
+#endif
|
||||
}
|
||||
#endif /* CONFIG_SPARSEMEM_VMEMMAP */
|
||||
|
||||
@@ -1050,10 +1363,21 @@ int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
|
||||
}
|
||||
|
||||
#ifdef CONFIG_MEMORY_HOTPLUG
|
||||
+static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size)
|
||||
+{
|
||||
+ unsigned long end = start + size;
|
||||
+
|
||||
+ WARN_ON(pgdir != init_mm.pgd);
|
||||
+ WARN_ON((start < PAGE_OFFSET) || (end > PAGE_END));
|
||||
+
|
||||
+ unmap_hotplug_range(start, end, false);
|
||||
+ free_empty_tables(start, end, PAGE_OFFSET, PAGE_END);
|
||||
+}
|
||||
+
|
||||
int arch_add_memory(int nid, u64 start, u64 size,
|
||||
struct mhp_restrictions *restrictions)
|
||||
{
|
||||
- int flags = 0;
|
||||
+ int ret, flags = 0;
|
||||
|
||||
if (rodata_full || debug_pagealloc_enabled())
|
||||
flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
|
||||
@@ -1061,22 +1385,59 @@ int arch_add_memory(int nid, u64 start, u64 size,
|
||||
__create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
|
||||
size, PAGE_KERNEL, __pgd_pgtable_alloc, flags);
|
||||
|
||||
- return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
|
||||
+ ret = __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
|
||||
restrictions);
|
||||
+ if (ret)
|
||||
+ __remove_pgd_mapping(swapper_pg_dir,
|
||||
+ __phys_to_virt(start), size);
|
||||
+ return ret;
|
||||
}
|
||||
+
|
||||
void arch_remove_memory(int nid, u64 start, u64 size,
|
||||
struct vmem_altmap *altmap)
|
||||
{
|
||||
unsigned long start_pfn = start >> PAGE_SHIFT;
|
||||
unsigned long nr_pages = size >> PAGE_SHIFT;
|
||||
|
||||
- /*
|
||||
- * FIXME: Cleanup page tables (also in arch_add_memory() in case
|
||||
- * adding fails). Until then, this function should only be used
|
||||
- * during memory hotplug (adding memory), not for memory
|
||||
- * unplug. ARCH_ENABLE_MEMORY_HOTREMOVE must not be
|
||||
- * unlocked yet.
|
||||
- */
|
||||
__remove_pages(start_pfn, nr_pages, altmap);
|
||||
+ __remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size);
|
||||
+}
|
||||
+
|
||||
+/*
|
||||
+ * This memory hotplug notifier helps prevent boot memory from being
|
||||
+ * inadvertently removed as it blocks pfn range offlining process in
|
||||
+ * __offline_pages(). Hence this prevents both offlining as well as
|
||||
+ * removal process for boot memory which is initially always online.
|
||||
+ * In future if and when boot memory could be removed, this notifier
|
||||
+ * should be dropped and free_hotplug_page_range() should handle any
|
||||
+ * reserved pages allocated during boot.
|
||||
+ */
|
||||
+static int prevent_bootmem_remove_notifier(struct notifier_block *nb,
|
||||
+ unsigned long action, void *data)
|
||||
+{
|
||||
+ struct mem_section *ms;
|
||||
+ struct memory_notify *arg = data;
|
||||
+ unsigned long end_pfn = arg->start_pfn + arg->nr_pages;
|
||||
+ unsigned long pfn = arg->start_pfn;
|
||||
+
|
||||
+ if (action != MEM_GOING_OFFLINE)
|
||||
+ return NOTIFY_OK;
|
||||
+
|
||||
+ for (; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
|
||||
+ ms = __pfn_to_section(pfn);
|
||||
+ if (early_section(ms))
|
||||
+ return NOTIFY_BAD;
|
||||
+ }
|
||||
+ return NOTIFY_OK;
|
||||
+}
|
||||
+
|
||||
+static struct notifier_block prevent_bootmem_remove_nb = {
|
||||
+ .notifier_call = prevent_bootmem_remove_notifier,
|
||||
+};
|
||||
+
|
||||
+static int __init prevent_bootmem_remove_init(void)
|
||||
+{
|
||||
+ return register_memory_notifier(&prevent_bootmem_remove_nb);
|
||||
}
|
||||
+device_initcall(prevent_bootmem_remove_init);
|
||||
#endif
|
||||
diff --git a/arch/arm64/mm/ptdump_debugfs.c b/arch/arm64/mm/ptdump_debugfs.c
|
||||
index 064163f25592..b5eebc8c4924 100644
|
||||
--- a/arch/arm64/mm/ptdump_debugfs.c
|
||||
+++ b/arch/arm64/mm/ptdump_debugfs.c
|
||||
@@ -1,5 +1,6 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
#include <linux/debugfs.h>
|
||||
+#include <linux/memory_hotplug.h>
|
||||
#include <linux/seq_file.h>
|
||||
|
||||
#include <asm/ptdump.h>
|
||||
@@ -7,7 +8,10 @@
|
||||
static int ptdump_show(struct seq_file *m, void *v)
|
||||
{
|
||||
struct ptdump_info *info = m->private;
|
||||
+
|
||||
+ get_online_mems();
|
||||
ptdump_walk_pgd(m, info);
|
||||
+ put_online_mems();
|
||||
return 0;
|
||||
}
|
||||
DEFINE_SHOW_ATTRIBUTE(ptdump);
|
||||
--
|
||||
2.17.1
|
||||
|
||||
@@ -0,0 +1,49 @@
|
||||
From c7ec155ec5e0f573e9c3cc4eb38d47543a2f1e81 Mon Sep 17 00:00:00 2001
|
||||
From: Sebastien Boeuf <sebastien.boeuf@intel.com>
|
||||
Date: Thu, 13 Feb 2020 08:50:38 +0100
|
||||
Subject: [PATCH] net: virtio_vsock: Fix race condition between bind and listen
|
||||
|
||||
Whenever the vsock backend on the host sends a packet through the RX
|
||||
queue, it expects an answer on the TX queue. Unfortunately, there is one
|
||||
case where the host side will hang waiting for the answer and will
|
||||
effectively never recover.
|
||||
|
||||
This issue happens when the guest side starts binding to the socket,
|
||||
which insert a new bound socket into the list of already bound sockets.
|
||||
At this time, we expect the guest to also start listening, which will
|
||||
trigger the sk_state to move from TCP_CLOSE to TCP_LISTEN. The problem
|
||||
occurs if the host side queued a RX packet and triggered an interrupt
|
||||
right between the end of the binding process and the beginning of the
|
||||
listening process. In this specific case, the function processing the
|
||||
packet virtio_transport_recv_pkt() will find a bound socket, which means
|
||||
it will hit the switch statement checking for the sk_state, but the
|
||||
state won't be changed into TCP_LISTEN yet, which leads the code to pick
|
||||
the default statement. This default statement will only free the buffer,
|
||||
while it should also respond to the host side, by sending a packet on
|
||||
its TX queue.
|
||||
|
||||
In order to simply fix this unfortunate chain of events, it is important
|
||||
that in case the default statement is entered, and because at this stage
|
||||
we know the host side is waiting for an answer, we must send back a
|
||||
packet containing the operation VIRTIO_VSOCK_OP_RST.
|
||||
|
||||
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
|
||||
---
|
||||
net/vmw_vsock/virtio_transport_common.c | 1 +
|
||||
1 file changed, 1 insertion(+)
|
||||
|
||||
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
|
||||
index 6f1a8aff65c5..0b6fb687a3e0 100644
|
||||
--- a/net/vmw_vsock/virtio_transport_common.c
|
||||
+++ b/net/vmw_vsock/virtio_transport_common.c
|
||||
@@ -1048,6 +1048,7 @@ void virtio_transport_recv_pkt(struct virtio_vsock_pkt *pkt)
|
||||
virtio_transport_free_pkt(pkt);
|
||||
break;
|
||||
default:
|
||||
+ (void)virtio_transport_reset_no_sock(t, pkt);
|
||||
virtio_transport_free_pkt(pkt);
|
||||
break;
|
||||
}
|
||||
--
|
||||
2.20.1
|
||||
|
||||
@@ -0,0 +1,453 @@
|
||||
From: Anshuman Khandual <anshuman.khandual@arm.com>
|
||||
Date: Mon, 15 Jul 2019 11:47:50 +0530
|
||||
Subject: [PATCH] arm64/mm: Enable memory hot remove
|
||||
|
||||
The arch code for hot-remove must tear down portions of the linear map and
|
||||
vmemmap corresponding to memory being removed. In both cases the page
|
||||
tables mapping these regions must be freed, and when sparse vmemmap is in
|
||||
use the memory backing the vmemmap must also be freed.
|
||||
|
||||
This patch adds a new remove_pagetable() helper which can be used to tear
|
||||
down either region, and calls it from vmemmap_free() and
|
||||
___remove_pgd_mapping(). The sparse_vmap argument determines whether the
|
||||
backing memory will be freed.
|
||||
|
||||
remove_pagetable() makes two distinct passes over the kernel page table.
|
||||
In the first pass it unmaps, invalidates applicable TLB cache and frees
|
||||
backing memory if required (vmemmap) for each mapped leaf entry. In the
|
||||
second pass it looks for empty page table sections whose page table page
|
||||
can be unmapped, TLB invalidated and freed.
|
||||
|
||||
While freeing intermediate level page table pages bail out if any of its
|
||||
entries are still valid. This can happen for partially filled kernel page
|
||||
table either from a previously attempted failed memory hot add or while
|
||||
removing an address range which does not span the entire page table page
|
||||
range.
|
||||
|
||||
The vmemmap region may share levels of table with the vmalloc region.
|
||||
There can be conflicts between hot remove freeing page table pages with
|
||||
a concurrent vmalloc() walking the kernel page table. This conflict can
|
||||
not just be solved by taking the init_mm ptl because of existing locking
|
||||
scheme in vmalloc(). Hence unlike linear mapping, skip freeing page table
|
||||
pages while tearing down vmemmap mapping.
|
||||
|
||||
While here update arch_add_memory() to handle __add_pages() failures by
|
||||
just unmapping recently added kernel linear mapping. Now enable memory hot
|
||||
remove on arm64 platforms by default with ARCH_ENABLE_MEMORY_HOTREMOVE.
|
||||
|
||||
This implementation is overall inspired from kernel page table tear down
|
||||
procedure on X86 architecture.
|
||||
|
||||
Acked-by: Steve Capper <steve.capper@arm.com>
|
||||
Acked-by: David Hildenbrand <david@redhat.com>
|
||||
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
|
||||
---
|
||||
arch/arm64/Kconfig | 3 +
|
||||
arch/arm64/include/asm/pgtable.h | 7 +-
|
||||
arch/arm64/mm/mmu.c | 290 ++++++++++++++++++++++++++++++-
|
||||
include/linux/mmzone.h | 1 +
|
||||
mm/Kconfig | 2 +-
|
||||
5 files changed, 291 insertions(+), 12 deletions(-)
|
||||
|
||||
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
|
||||
index 3adcec05b1f6..5a1231b8b8cf 100644
|
||||
--- a/arch/arm64/Kconfig
|
||||
+++ b/arch/arm64/Kconfig
|
||||
@@ -273,6 +273,9 @@ config ZONE_DMA32
|
||||
config ARCH_ENABLE_MEMORY_HOTPLUG
|
||||
def_bool y
|
||||
|
||||
+config ARCH_ENABLE_MEMORY_HOTREMOVE
|
||||
+ def_bool y
|
||||
+
|
||||
config SMP
|
||||
def_bool y
|
||||
|
||||
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
|
||||
index 5fdcfe237338..e09760ece844 100644
|
||||
--- a/arch/arm64/include/asm/pgtable.h
|
||||
+++ b/arch/arm64/include/asm/pgtable.h
|
||||
@@ -209,7 +209,7 @@ static inline pmd_t pmd_mkcont(pmd_t pmd)
|
||||
|
||||
static inline pte_t pte_mkdevmap(pte_t pte)
|
||||
{
|
||||
- return set_pte_bit(pte, __pgprot(PTE_DEVMAP));
|
||||
+ return set_pte_bit(pte, __pgprot(PTE_DEVMAP | PTE_SPECIAL));
|
||||
}
|
||||
|
||||
static inline void set_pte(pte_t *ptep, pte_t pte)
|
||||
@@ -396,7 +396,10 @@ static inline int pmd_protnone(pmd_t pmd)
|
||||
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
|
||||
#define pmd_devmap(pmd) pte_devmap(pmd_pte(pmd))
|
||||
#endif
|
||||
-#define pmd_mkdevmap(pmd) pte_pmd(pte_mkdevmap(pmd_pte(pmd)))
|
||||
+static inline pmd_t pmd_mkdevmap(pmd_t pmd)
|
||||
+{
|
||||
+ return pte_pmd(set_pte_bit(pmd_pte(pmd), __pgprot(PTE_DEVMAP)));
|
||||
+}
|
||||
|
||||
#define __pmd_to_phys(pmd) __pte_to_phys(pmd_pte(pmd))
|
||||
#define __phys_to_pmd_val(phys) __phys_to_pte_val(phys)
|
||||
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
|
||||
index 750a69dde39b..282a4b26218c 100644
|
||||
--- a/arch/arm64/mm/mmu.c
|
||||
+++ b/arch/arm64/mm/mmu.c
|
||||
@@ -722,6 +722,250 @@ int kern_addr_valid(unsigned long addr)
|
||||
|
||||
return pfn_valid(pte_pfn(pte));
|
||||
}
|
||||
+
|
||||
+#ifdef CONFIG_MEMORY_HOTPLUG
|
||||
+static void free_hotplug_page_range(struct page *page, size_t size)
|
||||
+{
|
||||
+ WARN_ON(!page || PageReserved(page));
|
||||
+ free_pages((unsigned long)page_address(page), get_order(size));
|
||||
+}
|
||||
+
|
||||
+static void free_hotplug_pgtable_page(struct page *page)
|
||||
+{
|
||||
+ free_hotplug_page_range(page, PAGE_SIZE);
|
||||
+}
|
||||
+
|
||||
+static void free_pte_table(pmd_t *pmdp, unsigned long addr)
|
||||
+{
|
||||
+ struct page *page;
|
||||
+ pte_t *ptep;
|
||||
+ int i;
|
||||
+
|
||||
+ ptep = pte_offset_kernel(pmdp, 0UL);
|
||||
+ for (i = 0; i < PTRS_PER_PTE; i++) {
|
||||
+ if (!pte_none(READ_ONCE(ptep[i])))
|
||||
+ return;
|
||||
+ }
|
||||
+
|
||||
+ page = pmd_page(READ_ONCE(*pmdp));
|
||||
+ pmd_clear(pmdp);
|
||||
+ __flush_tlb_kernel_pgtable(addr);
|
||||
+ free_hotplug_pgtable_page(page);
|
||||
+}
|
||||
+
|
||||
+static void free_pmd_table(pud_t *pudp, unsigned long addr)
|
||||
+{
|
||||
+ struct page *page;
|
||||
+ pmd_t *pmdp;
|
||||
+ int i;
|
||||
+
|
||||
+ if (CONFIG_PGTABLE_LEVELS <= 2)
|
||||
+ return;
|
||||
+
|
||||
+ pmdp = pmd_offset(pudp, 0UL);
|
||||
+ for (i = 0; i < PTRS_PER_PMD; i++) {
|
||||
+ if (!pmd_none(READ_ONCE(pmdp[i])))
|
||||
+ return;
|
||||
+ }
|
||||
+
|
||||
+ page = pud_page(READ_ONCE(*pudp));
|
||||
+ pud_clear(pudp);
|
||||
+ __flush_tlb_kernel_pgtable(addr);
|
||||
+ free_hotplug_pgtable_page(page);
|
||||
+}
|
||||
+
|
||||
+static void free_pud_table(pgd_t *pgdp, unsigned long addr)
|
||||
+{
|
||||
+ struct page *page;
|
||||
+ pud_t *pudp;
|
||||
+ int i;
|
||||
+
|
||||
+ if (CONFIG_PGTABLE_LEVELS <= 3)
|
||||
+ return;
|
||||
+
|
||||
+ pudp = pud_offset(pgdp, 0UL);
|
||||
+ for (i = 0; i < PTRS_PER_PUD; i++) {
|
||||
+ if (!pud_none(READ_ONCE(pudp[i])))
|
||||
+ return;
|
||||
+ }
|
||||
+
|
||||
+ page = pgd_page(READ_ONCE(*pgdp));
|
||||
+ pgd_clear(pgdp);
|
||||
+ __flush_tlb_kernel_pgtable(addr);
|
||||
+ free_hotplug_pgtable_page(page);
|
||||
+}
|
||||
+
|
||||
+static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr,
|
||||
+ unsigned long end, bool sparse_vmap)
|
||||
+{
|
||||
+ struct page *page;
|
||||
+ pte_t *ptep, pte;
|
||||
+
|
||||
+ do {
|
||||
+ ptep = pte_offset_kernel(pmdp, addr);
|
||||
+ pte = READ_ONCE(*ptep);
|
||||
+ if (pte_none(pte))
|
||||
+ continue;
|
||||
+
|
||||
+ WARN_ON(!pte_present(pte));
|
||||
+ page = sparse_vmap ? pte_page(pte) : NULL;
|
||||
+ pte_clear(&init_mm, addr, ptep);
|
||||
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
|
||||
+ if (sparse_vmap)
|
||||
+ free_hotplug_page_range(page, PAGE_SIZE);
|
||||
+ } while (addr += PAGE_SIZE, addr < end);
|
||||
+}
|
||||
+
|
||||
+static void unmap_hotplug_pmd_range(pud_t *pudp, unsigned long addr,
|
||||
+ unsigned long end, bool sparse_vmap)
|
||||
+{
|
||||
+ unsigned long next;
|
||||
+ struct page *page;
|
||||
+ pmd_t *pmdp, pmd;
|
||||
+
|
||||
+ do {
|
||||
+ next = pmd_addr_end(addr, end);
|
||||
+ pmdp = pmd_offset(pudp, addr);
|
||||
+ pmd = READ_ONCE(*pmdp);
|
||||
+ if (pmd_none(pmd))
|
||||
+ continue;
|
||||
+
|
||||
+ WARN_ON(!pmd_present(pmd));
|
||||
+ if (pmd_sect(pmd)) {
|
||||
+ page = sparse_vmap ? pmd_page(pmd) : NULL;
|
||||
+ pmd_clear(pmdp);
|
||||
+ flush_tlb_kernel_range(addr, next);
|
||||
+ if (sparse_vmap)
|
||||
+ free_hotplug_page_range(page, PMD_SIZE);
|
||||
+ continue;
|
||||
+ }
|
||||
+ WARN_ON(!pmd_table(pmd));
|
||||
+ unmap_hotplug_pte_range(pmdp, addr, next, sparse_vmap);
|
||||
+ } while (addr = next, addr < end);
|
||||
+}
|
||||
+
|
||||
+static void unmap_hotplug_pud_range(pgd_t *pgdp, unsigned long addr,
|
||||
+ unsigned long end, bool sparse_vmap)
|
||||
+{
|
||||
+ unsigned long next;
|
||||
+ struct page *page;
|
||||
+ pud_t *pudp, pud;
|
||||
+
|
||||
+ do {
|
||||
+ next = pud_addr_end(addr, end);
|
||||
+ pudp = pud_offset(pgdp, addr);
|
||||
+ pud = READ_ONCE(*pudp);
|
||||
+ if (pud_none(pud))
|
||||
+ continue;
|
||||
+
|
||||
+ WARN_ON(!pud_present(pud));
|
||||
+ if (pud_sect(pud)) {
|
||||
+ page = sparse_vmap ? pud_page(pud) : NULL;
|
||||
+ pud_clear(pudp);
|
||||
+ flush_tlb_kernel_range(addr, next);
|
||||
+ if (sparse_vmap)
|
||||
+ free_hotplug_page_range(page, PUD_SIZE);
|
||||
+ continue;
|
||||
+ }
|
||||
+ WARN_ON(!pud_table(pud));
|
||||
+ unmap_hotplug_pmd_range(pudp, addr, next, sparse_vmap);
|
||||
+ } while (addr = next, addr < end);
|
||||
+}
|
||||
+
|
||||
+static void unmap_hotplug_range(unsigned long addr, unsigned long end,
|
||||
+ bool sparse_vmap)
|
||||
+{
|
||||
+ unsigned long next;
|
||||
+ pgd_t *pgdp, pgd;
|
||||
+
|
||||
+ do {
|
||||
+ next = pgd_addr_end(addr, end);
|
||||
+ pgdp = pgd_offset_k(addr);
|
||||
+ pgd = READ_ONCE(*pgdp);
|
||||
+ if (pgd_none(pgd))
|
||||
+ continue;
|
||||
+
|
||||
+ WARN_ON(!pgd_present(pgd));
|
||||
+ unmap_hotplug_pud_range(pgdp, addr, next, sparse_vmap);
|
||||
+ } while (addr = next, addr < end);
|
||||
+}
|
||||
+
|
||||
+static void free_empty_pte_table(pmd_t *pmdp, unsigned long addr,
|
||||
+ unsigned long end)
|
||||
+{
|
||||
+ pte_t *ptep, pte;
|
||||
+
|
||||
+ do {
|
||||
+ ptep = pte_offset_kernel(pmdp, addr);
|
||||
+ pte = READ_ONCE(*ptep);
|
||||
+ WARN_ON(!pte_none(pte));
|
||||
+ } while (addr += PAGE_SIZE, addr < end);
|
||||
+}
|
||||
+
|
||||
+static void free_empty_pmd_table(pud_t *pudp, unsigned long addr,
|
||||
+ unsigned long end)
|
||||
+{
|
||||
+ unsigned long next;
|
||||
+ pmd_t *pmdp, pmd;
|
||||
+
|
||||
+ do {
|
||||
+ next = pmd_addr_end(addr, end);
|
||||
+ pmdp = pmd_offset(pudp, addr);
|
||||
+ pmd = READ_ONCE(*pmdp);
|
||||
+ if (pmd_none(pmd))
|
||||
+ continue;
|
||||
+
|
||||
+ WARN_ON(!pmd_present(pmd) || !pmd_table(pmd) || pmd_sect(pmd));
|
||||
+ free_empty_pte_table(pmdp, addr, next);
|
||||
+ free_pte_table(pmdp, addr);
|
||||
+ } while (addr = next, addr < end);
|
||||
+}
|
||||
+
|
||||
+static void free_empty_pud_table(pgd_t *pgdp, unsigned long addr,
|
||||
+ unsigned long end)
|
||||
+{
|
||||
+ unsigned long next;
|
||||
+ pud_t *pudp, pud;
|
||||
+
|
||||
+ do {
|
||||
+ next = pud_addr_end(addr, end);
|
||||
+ pudp = pud_offset(pgdp, addr);
|
||||
+ pud = READ_ONCE(*pudp);
|
||||
+ if (pud_none(pud))
|
||||
+ continue;
|
||||
+
|
||||
+ WARN_ON(!pud_present(pud) || !pud_table(pud) || pud_sect(pud));
|
||||
+ free_empty_pmd_table(pudp, addr, next);
|
||||
+ free_pmd_table(pudp, addr);
|
||||
+ } while (addr = next, addr < end);
|
||||
+}
|
||||
+
|
||||
+static void free_empty_tables(unsigned long addr, unsigned long end)
|
||||
+{
|
||||
+ unsigned long next;
|
||||
+ pgd_t *pgdp, pgd;
|
||||
+
|
||||
+ do {
|
||||
+ next = pgd_addr_end(addr, end);
|
||||
+ pgdp = pgd_offset_k(addr);
|
||||
+ pgd = READ_ONCE(*pgdp);
|
||||
+ if (pgd_none(pgd))
|
||||
+ continue;
|
||||
+
|
||||
+ WARN_ON(!pgd_present(pgd));
|
||||
+ free_empty_pud_table(pgdp, addr, next);
|
||||
+ free_pud_table(pgdp, addr);
|
||||
+ } while (addr = next, addr < end);
|
||||
+}
|
||||
+
|
||||
+static void remove_pagetable(unsigned long start, unsigned long end,
|
||||
+ bool sparse_vmap)
|
||||
+{
|
||||
+ unmap_hotplug_range(start, end, sparse_vmap);
|
||||
+ free_empty_tables(start, end);
|
||||
+}
|
||||
+#endif
|
||||
+
|
||||
#ifdef CONFIG_SPARSEMEM_VMEMMAP
|
||||
#if !ARM64_SWAPPER_USES_SECTION_MAPS
|
||||
int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
|
||||
@@ -769,6 +1013,27 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
|
||||
void vmemmap_free(unsigned long start, unsigned long end,
|
||||
struct vmem_altmap *altmap)
|
||||
{
|
||||
+#ifdef CONFIG_MEMORY_HOTPLUG
|
||||
+ /*
|
||||
+ * FIXME: We should have called remove_pagetable(start, end, true).
|
||||
+ * vmemmap and vmalloc virtual range might share intermediate kernel
|
||||
+ * page table entries. Removing vmemmap range page table pages here
|
||||
+ * can potentially conflict with a concurrent vmalloc() allocation.
|
||||
+ *
|
||||
+ * This is primarily because vmalloc() does not take init_mm ptl for
|
||||
+ * the entire page table walk and it's modification. Instead it just
|
||||
+ * takes the lock while allocating and installing page table pages
|
||||
+ * via [p4d|pud|pmd|pte]_alloc(). A concurrently vanishing page table
|
||||
+ * entry via memory hot remove can cause vmalloc() kernel page table
|
||||
+ * walk pointers to be invalid on the fly which can cause corruption
|
||||
+ * or worst, a crash.
|
||||
+ *
|
||||
+ * To avoid this problem, lets not free empty page table pages for
|
||||
+ * given vmemmap range being hot-removed. Just unmap and free the
|
||||
+ * range instead.
|
||||
+ */
|
||||
+ unmap_hotplug_range(start, end, true);
|
||||
+#endif
|
||||
}
|
||||
#endif /* CONFIG_SPARSEMEM_VMEMMAP */
|
||||
|
||||
@@ -1060,10 +1325,18 @@ int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
|
||||
}
|
||||
|
||||
#ifdef CONFIG_MEMORY_HOTPLUG
|
||||
+static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size)
|
||||
+{
|
||||
+ unsigned long end = start + size;
|
||||
+
|
||||
+ WARN_ON(pgdir != init_mm.pgd);
|
||||
+ remove_pagetable(start, end, false);
|
||||
+}
|
||||
+
|
||||
int arch_add_memory(int nid, u64 start, u64 size,
|
||||
struct mhp_restrictions *restrictions)
|
||||
{
|
||||
- int flags = 0;
|
||||
+ int ret, flags = 0;
|
||||
|
||||
if (rodata_full || debug_pagealloc_enabled())
|
||||
flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
|
||||
@@ -1071,9 +1344,14 @@ int arch_add_memory(int nid, u64 start, u64 size,
|
||||
__create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
|
||||
size, PAGE_KERNEL, __pgd_pgtable_alloc, flags);
|
||||
|
||||
- return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
|
||||
+ ret = __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
|
||||
restrictions);
|
||||
+ if (ret)
|
||||
+ __remove_pgd_mapping(swapper_pg_dir,
|
||||
+ __phys_to_virt(start), size);
|
||||
+ return ret;
|
||||
}
|
||||
+
|
||||
void arch_remove_memory(int nid, u64 start, u64 size,
|
||||
struct vmem_altmap *altmap)
|
||||
{
|
||||
@@ -1081,14 +1359,8 @@ void arch_remove_memory(int nid, u64 start, u64 size,
|
||||
unsigned long nr_pages = size >> PAGE_SHIFT;
|
||||
struct zone *zone;
|
||||
|
||||
- /*
|
||||
- * FIXME: Cleanup page tables (also in arch_add_memory() in case
|
||||
- * adding fails). Until then, this function should only be used
|
||||
- * during memory hotplug (adding memory), not for memory
|
||||
- * unplug. ARCH_ENABLE_MEMORY_HOTREMOVE must not be
|
||||
- * unlocked yet.
|
||||
- */
|
||||
zone = page_zone(pfn_to_page(start_pfn));
|
||||
__remove_pages(zone, start_pfn, nr_pages, altmap);
|
||||
+ __remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size);
|
||||
}
|
||||
#endif
|
||||
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
|
||||
index d77d717c620c..47230ebdcb01 100644
|
||||
--- a/include/linux/mmzone.h
|
||||
+++ b/include/linux/mmzone.h
|
||||
@@ -1122,6 +1122,7 @@ static inline unsigned long early_pfn_to_nid(unsigned long pfn)
|
||||
* PFN_SECTION_SHIFT pfn to/from section number
|
||||
*/
|
||||
#define PA_SECTION_SHIFT (SECTION_SIZE_BITS)
|
||||
+#define PA_SECTION_SIZE (1UL << PA_SECTION_SHIFT)
|
||||
#define PFN_SECTION_SHIFT (SECTION_SIZE_BITS - PAGE_SHIFT)
|
||||
|
||||
#define NR_MEM_SECTIONS (1UL << SECTIONS_SHIFT)
|
||||
diff --git a/mm/Kconfig b/mm/Kconfig
|
||||
index 56cec636a1fc..7c980f483a7d 100644
|
||||
--- a/mm/Kconfig
|
||||
+++ b/mm/Kconfig
|
||||
@@ -677,7 +677,7 @@ config DEV_PAGEMAP_OPS
|
||||
|
||||
config HMM_MIRROR
|
||||
bool "HMM mirror CPU page table into a device page table"
|
||||
- depends on (X86_64 || PPC64)
|
||||
+ depends on (X86_64 || PPC64 || ARM64)
|
||||
depends on MMU && 64BIT
|
||||
select MMU_NOTIFIER
|
||||
help
|
||||
--
|
||||
2.17.1
|
||||
|
||||
@@ -0,0 +1,49 @@
|
||||
From c7ec155ec5e0f573e9c3cc4eb38d47543a2f1e81 Mon Sep 17 00:00:00 2001
|
||||
From: Sebastien Boeuf <sebastien.boeuf@intel.com>
|
||||
Date: Thu, 13 Feb 2020 08:50:38 +0100
|
||||
Subject: [PATCH] net: virtio_vsock: Fix race condition between bind and listen
|
||||
|
||||
Whenever the vsock backend on the host sends a packet through the RX
|
||||
queue, it expects an answer on the TX queue. Unfortunately, there is one
|
||||
case where the host side will hang waiting for the answer and will
|
||||
effectively never recover.
|
||||
|
||||
This issue happens when the guest side starts binding to the socket,
|
||||
which insert a new bound socket into the list of already bound sockets.
|
||||
At this time, we expect the guest to also start listening, which will
|
||||
trigger the sk_state to move from TCP_CLOSE to TCP_LISTEN. The problem
|
||||
occurs if the host side queued a RX packet and triggered an interrupt
|
||||
right between the end of the binding process and the beginning of the
|
||||
listening process. In this specific case, the function processing the
|
||||
packet virtio_transport_recv_pkt() will find a bound socket, which means
|
||||
it will hit the switch statement checking for the sk_state, but the
|
||||
state won't be changed into TCP_LISTEN yet, which leads the code to pick
|
||||
the default statement. This default statement will only free the buffer,
|
||||
while it should also respond to the host side, by sending a packet on
|
||||
its TX queue.
|
||||
|
||||
In order to simply fix this unfortunate chain of events, it is important
|
||||
that in case the default statement is entered, and because at this stage
|
||||
we know the host side is waiting for an answer, we must send back a
|
||||
packet containing the operation VIRTIO_VSOCK_OP_RST.
|
||||
|
||||
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
|
||||
---
|
||||
net/vmw_vsock/virtio_transport_common.c | 1 +
|
||||
1 file changed, 1 insertion(+)
|
||||
|
||||
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
|
||||
index 6f1a8aff65c5..0b6fb687a3e0 100644
|
||||
--- a/net/vmw_vsock/virtio_transport_common.c
|
||||
+++ b/net/vmw_vsock/virtio_transport_common.c
|
||||
@@ -1048,6 +1048,7 @@ void virtio_transport_recv_pkt(struct virtio_vsock_pkt *pkt)
|
||||
virtio_transport_free_pkt(pkt);
|
||||
break;
|
||||
default:
|
||||
+ (void)virtio_transport_reset_no_sock(pkt);
|
||||
virtio_transport_free_pkt(pkt);
|
||||
break;
|
||||
}
|
||||
--
|
||||
2.20.1
|
||||
|
||||
@@ -0,0 +1,39 @@
|
||||
From ac36d37e943635fc072e9d4f47e40a48fbcdb3f0 Mon Sep 17 00:00:00 2001
|
||||
From: Arjan van de Ven <arjan@linux.intel.com>
|
||||
Date: Wed, 9 Oct 2019 15:04:33 +0200
|
||||
Subject: ACPI: Always build evged in
|
||||
|
||||
Although the Generic Event Device is a Hardware-reduced
|
||||
platfom device in principle, it should not be restricted to
|
||||
ACPI_REDUCED_HARDWARE_ONLY.
|
||||
|
||||
Kernels supporting both fixed and hardware-reduced ACPI platforms
|
||||
should be able to probe the GED when dynamically detecting that a
|
||||
platform is hardware-reduced. For that, the driver must be
|
||||
unconditionally built in.
|
||||
|
||||
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
|
||||
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
|
||||
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
||||
---
|
||||
drivers/acpi/Makefile | 2 +-
|
||||
1 file changed, 1 insertion(+), 1 deletion(-)
|
||||
|
||||
(limited to 'drivers/acpi/Makefile')
|
||||
|
||||
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
|
||||
index 5d361e4e3405..ef1ac4d127da 100644
|
||||
--- a/drivers/acpi/Makefile
|
||||
+++ b/drivers/acpi/Makefile
|
||||
@@ -48,7 +48,7 @@ acpi-y += acpi_pnp.o
|
||||
acpi-$(CONFIG_ARM_AMBA) += acpi_amba.o
|
||||
acpi-y += power.o
|
||||
acpi-y += event.o
|
||||
-acpi-$(CONFIG_ACPI_REDUCED_HARDWARE_ONLY) += evged.o
|
||||
+acpi-y += evged.o
|
||||
acpi-y += sysfs.o
|
||||
acpi-y += property.o
|
||||
acpi-$(CONFIG_X86) += acpi_cmos_rtc.o
|
||||
--
|
||||
cgit 1.2-0.3.lf.el7
|
||||
|
||||
Reference in New Issue
Block a user