diff --git a/.github/workflows/kata-deploy-push.yaml b/.github/workflows/kata-deploy-push.yaml index 88030b8e9..54110d2c7 100644 --- a/.github/workflows/kata-deploy-push.yaml +++ b/.github/workflows/kata-deploy-push.yaml @@ -18,7 +18,6 @@ jobs: matrix: asset: - kernel - - kernel-experimental - shim-v2 - qemu - cloud-hypervisor diff --git a/README.md b/README.md index e93c5f740..0d8c5d2b8 100644 --- a/README.md +++ b/README.md @@ -17,16 +17,73 @@ standard implementation of lightweight Virtual Machines (VMs) that feel and perform like containers, but provide the workload isolation and security advantages of VMs. +## License + +The code is licensed under the Apache 2.0 license. +See [the license file](LICENSE) for further details. + +## Platform support + +Kata Containers currently runs on 64-bit systems supporting the following +technologies: + +| Architecture | Virtualization technology | +|-|-| +| `x86_64`, `amd64` | [Intel](https://www.intel.com) VT-x, AMD SVM | +| `aarch64` ("`arm64`")| [ARM](https://www.arm.com) Hyp | +| `ppc64le` | [IBM](https://www.ibm.com) Power | +| `s390x` | [IBM](https://www.ibm.com) Z & LinuxONE SIE | + +### Hardware requirements + +The [Kata Containers runtime](src/runtime) provides a command to +determine if your host system is capable of running and creating a +Kata Container: + +```bash +$ kata-runtime check +``` + +> **Notes:** +> +> - This command runs a number of checks including connecting to the +> network to determine if a newer release of Kata Containers is +> available on GitHub. If you do not wish this to check to run, add +> the `--no-network-checks` option. +> +> - By default, only a brief success / failure message is printed. +> If more details are needed, the `--verbose` flag can be used to display the +> list of all the checks performed. +> +> - If the command is run as the `root` user additional checks are +> run (including checking if another incompatible hypervisor is running). +> When running as `root`, network checks are automatically disabled. + ## Getting started See the [installation documentation](docs/install). ## Documentation -See the [official documentation](docs) -(including [installation guides](docs/install), -[the developer guide](docs/Developer-Guide.md), -[design documents](docs/design) and more). +See the [official documentation](docs) including: + +- [Installation guides](docs/install) +- [Developer guide](docs/Developer-Guide.md) +- [Design documents](docs/design) + - [Architecture overview](docs/design/architecture) + +## Configuration + +Kata Containers uses a single +[configuration file](src/runtime/README.md#configuration) +which contains a number of sections for various parts of the Kata +Containers system including the [runtime](src/runtime), the +[agent](src/agent) and the [hypervisor](#hypervisors). + +## Hypervisors + +See the [hypervisors document](docs/hypervisors.md) and the +[Hypervisor specific configuration details](src/runtime/README.md#hypervisor-specific-configuration). ## Community @@ -48,6 +105,8 @@ Please raise an issue ## Developers +See the [developer guide](docs/Developer-Guide.md). + ### Components ### Main components @@ -84,8 +143,4 @@ the [components](#components) section for further details. ## Glossary of Terms -See the [glossary of terms](Glossary.md) related to Kata Containers. ---- - -[kernel]: https://www.kernel.org -[github-katacontainers.io]: https://github.com/kata-containers/www.katacontainers.io +See the [glossary of terms](https://github.com/kata-containers/kata-containers/wiki/Glossary) related to Kata Containers. diff --git a/docs/Limitations.md b/docs/Limitations.md index 9ffd5833a..d7b73776c 100644 --- a/docs/Limitations.md +++ b/docs/Limitations.md @@ -57,6 +57,13 @@ for advice on which repository to raise the issue against. This section lists items that might be possible to fix. +## OCI CLI commands + +### Docker and Podman support +Currently Kata Containers does not support Docker or Podman. + +See issue https://github.com/kata-containers/kata-containers/issues/722 for more information. + ## Runtime commands ### checkpoint and restore @@ -97,57 +104,12 @@ See issue https://github.com/clearcontainers/runtime/issues/341 and [the constra For CPUs resource management see [CPU constraints](design/vcpu-handling.md). -### docker run and shared memory - -The runtime does not implement the `docker run --shm-size` command to -set the size of the `/dev/shm tmpfs` within the container. It is possible to pass this configuration value into the VM container so the appropriate mount command happens at launch time. - -See issue https://github.com/kata-containers/kata-containers/issues/21 for more information. - # Architectural limitations This section lists items that might not be fixed due to fundamental architectural differences between "soft containers" (i.e. traditional Linux* containers) and those based on VMs. -## Networking limitations - -### Support for joining an existing VM network - -Docker supports the ability for containers to join another containers -namespace with the `docker run --net=containers` syntax. This allows -multiple containers to share a common network namespace and the network -interfaces placed in the network namespace. Kata Containers does not -support network namespace sharing. If a Kata Container is setup to -share the network namespace of a `runc` container, the runtime -effectively takes over all the network interfaces assigned to the -namespace and binds them to the VM. Consequently, the `runc` container loses -its network connectivity. - -### docker --net=host - -Docker host network support (`docker --net=host run`) is not supported. -It is not possible to directly access the host networking configuration -from within the VM. - -The `--net=host` option can still be used with `runc` containers and -inter-mixed with running Kata Containers, thus enabling use of `--net=host` -when necessary. - -It should be noted, currently passing the `--net=host` option into a -Kata Container may result in the Kata Container networking setup -modifying, re-configuring and therefore possibly breaking the host -networking setup. Do not use `--net=host` with Kata Containers. - -### docker run --link - -The runtime does not support the `docker run --link` command. This -command is now deprecated by docker and we have no intention of adding support. -Equivalent functionality can be achieved with the newer docker networking commands. - -See more documentation at -[docs.docker.com](https://docs.docker.com/engine/userguide/networking/default_network/dockerlinks/). - ## Storage limitations ### Kubernetes `volumeMounts.subPaths` @@ -158,15 +120,11 @@ moment. See [this issue](https://github.com/kata-containers/runtime/issues/2812) for more details. [Another issue](https://github.com/kata-containers/kata-containers/issues/1728) focuses on the case of `emptyDir`. - ## Host resource sharing -### docker run --privileged +### Privileged containers Privileged support in Kata is essentially different from `runc` containers. -Kata does support `docker run --privileged` command, but in this case full access -to the guest VM is provided in addition to some host access. - The container runs with elevated capabilities within the guest and is granted access to guest devices instead of the host devices. This is also true with using `securityContext privileged=true` with Kubernetes. @@ -176,17 +134,6 @@ The container may also be granted full access to a subset of host devices See [Privileged Kata Containers](how-to/privileged.md) for how to configure some of this behavior. -# Miscellaneous - -This section lists limitations where the possible solutions are uncertain. - -## Docker --security-opt option partially supported - -The `--security-opt=` option used by Docker is partially supported. -We only support `--security-opt=no-new-privileges` and `--security-opt seccomp=/path/to/seccomp/profile.json` -option as of today. - -Note: The `--security-opt apparmor=your_profile` is not yet supported. See https://github.com/kata-containers/runtime/issues/707. # Appendices ## The constraints challenge diff --git a/docs/how-to/how-to-use-virtio-fs-nydus-with-kata.md b/docs/how-to/how-to-use-virtio-fs-nydus-with-kata.md index bbc177e0f..9b04d49cf 100644 --- a/docs/how-to/how-to-use-virtio-fs-nydus-with-kata.md +++ b/docs/how-to/how-to-use-virtio-fs-nydus-with-kata.md @@ -2,7 +2,7 @@ ## Introduction -Refer to [kata-`nydus`-design](../design/kata-nydus-design.md) +Refer to [kata-`nydus`-design](../design/kata-nydus-design.md) for introduction and `nydus` has supported Kata Containers with hypervisor `QEMU` and `CLH` currently. ## How to @@ -16,7 +16,7 @@ You can use Kata Containers with `nydus` as follows, 4. Use [kata-containers](https://github.com/kata-containers/kata-containers) `latest` branch to compile and build `kata-containers.img`; -5. Update `configuration-qemu.toml` to include: +5. Update `configuration-qemu.toml` or `configuration-clh.toml`to include: ```toml shared_fs = "virtio-fs-nydus" @@ -24,7 +24,7 @@ virtio_fs_daemon = "" virtio_fs_extra_args = [] ``` -6. run `crictl run -r kata-qemu nydus-container.yaml nydus-sandbox.yaml`; +6. run `crictl run -r kata nydus-container.yaml nydus-sandbox.yaml`; The `nydus-sandbox.yaml` looks like below: diff --git a/snap/snapcraft.yaml b/snap/snapcraft.yaml index d8819736f..4f90329c1 100644 --- a/snap/snapcraft.yaml +++ b/snap/snapcraft.yaml @@ -204,14 +204,7 @@ parts: kernel_dir_prefix="kata-linux-" # Setup and build kernel - if [ "$(uname -m)" = "x86_64" ]; then - kernel_version="$(${yq} r $versions_file assets.kernel-experimental.tag)" - kernel_version=${kernel_version#v} - kernel_dir_prefix="kata-linux-experimental-" - ./build-kernel.sh -e -v ${kernel_version} -d setup - else - ./build-kernel.sh -v ${kernel_version} -d setup - fi + ./build-kernel.sh -v ${kernel_version} -d setup cd ${kernel_dir_prefix}* make -j $(($(nproc)-1)) EXTRAVERSION=".container" diff --git a/src/agent/Makefile b/src/agent/Makefile index 07439b09b..a0f330ea1 100644 --- a/src/agent/Makefile +++ b/src/agent/Makefile @@ -98,6 +98,8 @@ define INSTALL_FILE install -D -m 644 $1 $(DESTDIR)$2/$1 || exit 1; endef +.DEFAULT_GOAL := default + ##TARGET default: build code default: $(TARGET) show-header @@ -116,17 +118,6 @@ $(GENERATED_FILES): %: %.in optimize: $(SOURCES) | show-summary show-header @RUSTFLAGS="-C link-arg=-s $(EXTRA_RUSTFLAGS) --deny warnings" cargo build --target $(TRIPLE) --$(BUILD_TYPE) $(EXTRA_RUSTFEATURES) -##TARGET clippy: run clippy linter -clippy: $(GENERATED_CODE) - cargo clippy --all-targets --all-features --release \ - -- \ - -Aclippy::redundant_allocation \ - -D warnings - -format: - cargo fmt -- --check - - ##TARGET install: install agent install: install-services @install -D $(TARGET_PATH) $(DESTDIR)/$(BINDIR)/$(TARGET) @@ -146,7 +137,7 @@ test: @cargo test --all --target $(TRIPLE) $(EXTRA_RUSTFEATURES) -- --nocapture ##TARGET check: run test -check: clippy format +check: $(GENERATED_FILES) standard_rust_check ##TARGET run: build and run agent run: diff --git a/src/agent/rustjail/src/container.rs b/src/agent/rustjail/src/container.rs index 088364c7c..942a625d5 100644 --- a/src/agent/rustjail/src/container.rs +++ b/src/agent/rustjail/src/container.rs @@ -1488,14 +1488,9 @@ async fn execute_hook(logger: &Logger, h: &Hook, st: &OCIState) -> Result<()> { if args.len() > 1 { args.remove(0); } - let env: HashMap = h - .env - .iter() - .map(|e| { - let v: Vec<&str> = e.split('=').collect(); - (v[0].to_string(), v[1].to_string()) - }) - .collect(); + + // all invalid envs will be omitted, only valid envs will be passed to hook. + let env: HashMap<&str, &str> = h.env.iter().filter_map(|e| valid_env(e)).collect(); // Avoid the exit signal to be reaped by the global reaper. let _wait_locker = WAIT_PID_LOCKER.lock().await; @@ -1506,8 +1501,7 @@ async fn execute_hook(logger: &Logger, h: &Hook, st: &OCIState) -> Result<()> { .stdin(Stdio::piped()) .stdout(Stdio::piped()) .stderr(Stdio::piped()) - .spawn() - .unwrap(); + .spawn()?; // default timeout 10s let mut timeout: u64 = 10; @@ -1523,37 +1517,39 @@ async fn execute_hook(logger: &Logger, h: &Hook, st: &OCIState) -> Result<()> { let path = h.path.clone(); let join_handle = tokio::spawn(async move { - child - .stdin - .as_mut() - .unwrap() - .write_all(state.as_bytes()) - .await - .unwrap(); - - // Close stdin so that hook program could receive EOF - child.stdin.take(); + if let Some(mut stdin) = child.stdin.take() { + match stdin.write_all(state.as_bytes()).await { + Ok(_) => {} + Err(e) => { + info!(logger, "write to child stdin failed: {:?}", e); + } + } + } // read something from stdout and stderr for debug - let mut out = String::new(); - child - .stdout - .as_mut() - .unwrap() - .read_to_string(&mut out) - .await - .unwrap(); - info!(logger, "child stdout: {}", out.as_str()); + if let Some(stdout) = child.stdout.as_mut() { + let mut out = String::new(); + match stdout.read_to_string(&mut out).await { + Ok(_) => { + info!(logger, "child stdout: {}", out.as_str()); + } + Err(e) => { + info!(logger, "read from child stdout failed: {:?}", e); + } + } + } let mut err = String::new(); - child - .stderr - .as_mut() - .unwrap() - .read_to_string(&mut err) - .await - .unwrap(); - info!(logger, "child stderr: {}", err.as_str()); + if let Some(stderr) = child.stderr.as_mut() { + match stderr.read_to_string(&mut err).await { + Ok(_) => { + info!(logger, "child stderr: {}", err.as_str()); + } + Err(e) => { + info!(logger, "read from child stderr failed: {:?}", e); + } + } + } match child.wait().await { Ok(exit) => { @@ -1647,13 +1643,16 @@ mod tests { let touch = which("touch").await; defer!(fs::remove_file(temp_file).unwrap();); + let invalid_str = vec![97, b'\0', 98]; + let invalid_string = std::str::from_utf8(&invalid_str).unwrap(); + let invalid_env = format!("{}=value", invalid_string); execute_hook( &slog_scope::logger(), &Hook { path: touch, args: vec!["touch".to_string(), temp_file.to_string()], - env: vec![], + env: vec![invalid_env], timeout: Some(10), }, &OCIState { diff --git a/src/agent/src/device.rs b/src/agent/src/device.rs index 0f3eca513..7d89d0124 100644 --- a/src/agent/src/device.rs +++ b/src/agent/src/device.rs @@ -52,6 +52,7 @@ pub const DRIVER_VFIO_GK_TYPE: &str = "vfio-gk"; // container as a VFIO device node pub const DRIVER_VFIO_TYPE: &str = "vfio"; pub const DRIVER_OVERLAYFS_TYPE: &str = "overlayfs"; +pub const FS_TYPE_HUGETLB: &str = "hugetlbfs"; #[instrument] pub fn online_device(path: &str) -> Result<()> { diff --git a/src/agent/src/mount.rs b/src/agent/src/mount.rs index 9618ae94c..d7dbc08ef 100644 --- a/src/agent/src/mount.rs +++ b/src/agent/src/mount.rs @@ -5,8 +5,8 @@ use std::collections::HashMap; use std::fs; -use std::fs::File; -use std::io::{BufRead, BufReader}; +use std::fs::{File, OpenOptions}; +use std::io::{BufRead, BufReader, Write}; use std::iter; use std::os::unix::fs::{MetadataExt, PermissionsExt}; use std::path::Path; @@ -24,7 +24,7 @@ use crate::device::{ get_scsi_device_name, get_virtio_blk_pci_device_name, online_device, wait_for_pmem_device, DRIVER_9P_TYPE, DRIVER_BLK_CCW_TYPE, DRIVER_BLK_TYPE, DRIVER_EPHEMERAL_TYPE, DRIVER_LOCAL_TYPE, DRIVER_MMIO_BLK_TYPE, DRIVER_NVDIMM_TYPE, DRIVER_OVERLAYFS_TYPE, DRIVER_SCSI_TYPE, - DRIVER_VIRTIOFS_TYPE, DRIVER_WATCHABLE_BIND_TYPE, + DRIVER_VIRTIOFS_TYPE, DRIVER_WATCHABLE_BIND_TYPE, FS_TYPE_HUGETLB, }; use crate::linux_abi::*; use crate::pci; @@ -37,7 +37,7 @@ use slog::Logger; use tracing::instrument; pub const TYPE_ROOTFS: &str = "rootfs"; - +const SYS_FS_HUGEPAGES_PREFIX: &str = "/sys/kernel/mm/hugepages"; pub const MOUNT_GUEST_TAG: &str = "kataShared"; // Allocating an FSGroup that owns the pod's volumes @@ -200,6 +200,12 @@ async fn ephemeral_storage_handler( return Ok("".to_string()); } + // hugetlbfs + if storage.fstype == FS_TYPE_HUGETLB { + return handle_hugetlbfs_storage(logger, storage).await; + } + + // normal ephemeral storage fs::create_dir_all(Path::new(&storage.mount_point))?; // By now we only support one option field: "fsGroup" which @@ -299,6 +305,97 @@ async fn virtio9p_storage_handler( common_storage_handler(logger, storage) } +#[instrument] +async fn handle_hugetlbfs_storage(logger: &Logger, storage: &Storage) -> Result { + info!(logger, "handle hugetlbfs storage"); + // Allocate hugepages before mount + // /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages + // /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages + // options eg "pagesize=2097152,size=524288000"(2M, 500M) + allocate_hugepages(logger, &storage.options.to_vec()).context("allocate hugepages")?; + + common_storage_handler(logger, storage)?; + + // hugetlbfs return empty string as ephemeral_storage_handler do. + // this is a sandbox level storage, but not a container-level mount. + Ok("".to_string()) +} + +// Allocate hugepages by writing to sysfs +fn allocate_hugepages(logger: &Logger, options: &[String]) -> Result<()> { + info!(logger, "mounting hugePages storage options: {:?}", options); + + let (pagesize, size) = get_pagesize_and_size_from_option(options) + .context(format!("parse mount options: {:?}", &options))?; + + info!( + logger, + "allocate hugepages. pageSize: {}, size: {}", pagesize, size + ); + + // sysfs entry is always of the form hugepages-${pagesize}kB + // Ref: https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt + let path = Path::new(SYS_FS_HUGEPAGES_PREFIX).join(format!("hugepages-{}kB", pagesize / 1024)); + + if !path.exists() { + fs::create_dir_all(&path).context("create hugepages-size directory")?; + } + + // write numpages to nr_hugepages file. + let path = path.join("nr_hugepages"); + let numpages = format!("{}", size / pagesize); + info!(logger, "write {} pages to {:?}", &numpages, &path); + + let mut file = OpenOptions::new() + .write(true) + .create(true) + .open(&path) + .context(format!("open nr_hugepages directory {:?}", &path))?; + + file.write_all(numpages.as_bytes()) + .context(format!("write nr_hugepages failed: {:?}", &path))?; + + Ok(()) +} + +// Parse filesystem options string to retrieve hugepage details +// options eg "pagesize=2048,size=107374182" +fn get_pagesize_and_size_from_option(options: &[String]) -> Result<(u64, u64)> { + let mut pagesize_str: Option<&str> = None; + let mut size_str: Option<&str> = None; + + for option in options { + let vars: Vec<&str> = option.trim().split(',').collect(); + + for var in vars { + if let Some(stripped) = var.strip_prefix("pagesize=") { + pagesize_str = Some(stripped); + } else if let Some(stripped) = var.strip_prefix("size=") { + size_str = Some(stripped); + } + + if pagesize_str.is_some() && size_str.is_some() { + break; + } + } + } + + if pagesize_str.is_none() || size_str.is_none() { + return Err(anyhow!("no pagesize/size options found")); + } + + let pagesize = pagesize_str + .unwrap() + .parse::() + .context(format!("parse pagesize: {:?}", &pagesize_str))?; + let size = size_str + .unwrap() + .parse::() + .context(format!("parse size: {:?}", &pagesize_str))?; + + Ok((pagesize, size)) +} + // virtiommio_blk_storage_handler handles the storage for mmio blk driver. #[instrument] async fn virtiommio_blk_storage_handler( @@ -1392,4 +1489,60 @@ mod tests { assert!(testfile.is_file()); } + + #[test] + fn test_get_pagesize_and_size_from_option() { + let expected_pagesize = 2048; + let expected_size = 107374182; + let expected = (expected_pagesize, expected_size); + + let data = vec![ + // (input, expected, is_ok) + ("size-1=107374182,pagesize-1=2048", expected, false), + ("size-1=107374182,pagesize=2048", expected, false), + ("size=107374182,pagesize-1=2048", expected, false), + ("size=107374182,pagesize=abc", expected, false), + ("size=abc,pagesize=2048", expected, false), + ("size=,pagesize=2048", expected, false), + ("size=107374182,pagesize=", expected, false), + ("size=107374182,pagesize=2048", expected, true), + ("pagesize=2048,size=107374182", expected, true), + ("foo=bar,pagesize=2048,size=107374182", expected, true), + ( + "foo=bar,pagesize=2048,foo1=bar1,size=107374182", + expected, + true, + ), + ( + "pagesize=2048,foo1=bar1,foo=bar,size=107374182", + expected, + true, + ), + ( + "foo=bar,pagesize=2048,foo1=bar1,size=107374182,foo2=bar2", + expected, + true, + ), + ( + "foo=bar,size=107374182,foo1=bar1,pagesize=2048", + expected, + true, + ), + ]; + + for case in data { + let input = case.0; + let r = get_pagesize_and_size_from_option(&[input.to_string()]); + + let is_ok = case.2; + if is_ok { + let expected = case.1; + let (pagesize, size) = r.unwrap(); + assert_eq!(expected.0, pagesize); + assert_eq!(expected.1, size); + } else { + assert!(r.is_err()); + } + } + } } diff --git a/src/runtime/Makefile b/src/runtime/Makefile index f936bd796..ff0f1fb38 100644 --- a/src/runtime/Makefile +++ b/src/runtime/Makefile @@ -163,6 +163,7 @@ DEFENTROPYSOURCE := /dev/urandom DEFVALIDENTROPYSOURCES := [\"/dev/urandom\",\"/dev/random\",\"\"] DEFDISABLEBLOCK := false +DEFSHAREDFS_CLH_VIRTIOFS := virtio-fs DEFSHAREDFS_QEMU_VIRTIOFS := virtio-fs DEFVIRTIOFSDAEMON := $(LIBEXECDIR)/kata-qemu/virtiofsd DEFVALIDVIRTIOFSDAEMONPATHS := [\"$(DEFVIRTIOFSDAEMON)\"] @@ -175,7 +176,7 @@ DEFVIRTIOFSCACHE ?= auto # # see `virtiofsd -h` for possible options. # Make sure you quote args. -DEFVIRTIOFSEXTRAARGS ?= [\"--thread-pool-size=1\"] +DEFVIRTIOFSEXTRAARGS ?= [\"--thread-pool-size=1\", \"-o\", \"announce_submounts\"] DEFENABLEIOTHREADS := false DEFENABLEVHOSTUSERSTORE := false DEFVHOSTUSERSTOREPATH := $(PKGRUNDIR)/vhost-user @@ -437,6 +438,7 @@ USER_VARS += DEFDISABLEBLOCK USER_VARS += DEFBLOCKSTORAGEDRIVER_ACRN USER_VARS += DEFBLOCKSTORAGEDRIVER_FC USER_VARS += DEFBLOCKSTORAGEDRIVER_QEMU +USER_VARS += DEFSHAREDFS_CLH_VIRTIOFS USER_VARS += DEFSHAREDFS_QEMU_VIRTIOFS USER_VARS += DEFVIRTIOFSDAEMON USER_VARS += DEFVALIDVIRTIOFSDAEMONPATHS diff --git a/src/runtime/README.md b/src/runtime/README.md index 217cd2d44..fcffc238a 100644 --- a/src/runtime/README.md +++ b/src/runtime/README.md @@ -2,19 +2,25 @@ # Runtime -This repository contains the runtime for the -[Kata Containers](https://github.com/kata-containers) project. +## Binary names + +This repository contains the following components: + +| Binary name | Description | +|-|-| +| `containerd-shim-kata-v2` | The [shimv2 runtime](../../docs/design/architecture/README.md#runtime) | +| `kata-runtime` | [utility program](../../docs/design/architecture/README.md#utility-program) | For details of the other Kata Containers repositories, see the [repository summary](https://github.com/kata-containers/kata-containers). ## Introduction -`kata-runtime`, referred to as "the runtime", is the Command-Line Interface -(CLI) part of the Kata Containers runtime component. It leverages the +The `containerd-shim-kata-v2` [binary](#binary-names) is the Kata +Containers [shimv2](../../docs/design/architecture/README.md#shim-v2-architecture) runtime. It leverages the [virtcontainers](virtcontainers) package to provide a high-performance standards-compliant runtime that creates -hardware-virtualized [Linux](https://www.kernel.org/) containers running on Linux hosts. +hardware-virtualized [Linux](https://www.kernel.org) containers running on Linux hosts. The runtime is [OCI](https://github.com/opencontainers/runtime-spec)-compatible, @@ -23,39 +29,6 @@ The runtime is allowing it to work seamlessly with both Docker and Kubernetes respectively. -## License - -The code is licensed under an Apache 2.0 license. -See [the license file](../../LICENSE) for further details. - -## Platform support - -Kata Containers currently works on systems supporting the following -technologies: - -- [Intel](https://www.intel.com) VT-x technology. -- [ARM](https://www.arm.com) Hyp mode (virtualization extension). -- [IBM](https://www.ibm.com) Power Systems. -- [IBM](https://www.ibm.com) Z mainframes. -### Hardware requirements - -The runtime has a built-in command to determine if your host system is capable -of running and creating a Kata Container: - -```bash -$ kata-runtime check -``` - -> **Note:** -> -> - By default, only a brief success / failure message is printed. -> If more details are needed, the `--verbose` flag can be used to display the -> list of all the checks performed. -> -> - `root` permission is needed to check if the system is capable of running -> Kata containers. In this case, additional checks are performed (e.g., if another -> incompatible hypervisor is running). - ## Download and install [![Get it from the Snap Store](https://snapcraft.io/static/images/badges/en/snap-store-black.svg)](https://snapcraft.io/kata-containers) @@ -63,11 +36,6 @@ $ kata-runtime check See the [installation guides](../../docs/install/README.md) available for various operating systems. -## Quick start for developers - -See the -[developer guide](../../docs/Developer-Guide.md). - ## Architecture overview See the [architecture overview](../../docs/design/architecture) @@ -76,7 +44,11 @@ for details on the Kata Containers design. ## Configuration The runtime uses a TOML format configuration file called `configuration.toml`. -The file contains comments explaining all options. +The file is divided into sections for settings related to various +parts of the system including the runtime itself, the [agent](../agent) and +the [hypervisor](#hypervisor-specific-configuration). + +Each option has a comment explaining its use. > **Note:** > @@ -84,6 +56,36 @@ The file contains comments explaining all options. > You may need to modify this file to optimise or tailor your system, or if you have > specific requirements. +### Configuration file location + +#### Runtime configuration file location + +The shimv2 runtime looks for its configuration in the following places (in order): + +- The `io.data containers.config.config_path` annotation specified + in the OCI configuration file (`config.json` file) used to create the pod sandbox. + +- The containerd + [shimv2](/docs/design/architecture/README.md#shim-v2-architecture) + options passed to the runtime. + +- The value of the `KATA_CONF_FILE` environment variable. + +- The [default configuration paths](#stateless-systems). + +#### Utility program configuration file location + +The `kata-runtime` utility program looks for its configuration in the +following locations (in order): + +- The path specified by the `--config` command-line option. + +- The value of the `KATA_CONF_FILE` environment variable. + +- The [default configuration paths](#stateless-systems). + +> **Note:** For both binaries, the first path that exists will be used. + ### Hypervisor specific configuration Kata Containers supports multiple hypervisors so your `configuration.toml` @@ -108,13 +110,6 @@ runtime attempts to load. The first path that exists will be used: $ kata-runtime --show-default-config-paths ``` -Aside from the built-in locations, it is possible to specify the path to a -custom configuration file using the `--config` option: - -```bash -$ kata-runtime --config=/some/where/configuration.toml ... -``` - The runtime will log the full path to the configuration file it is using. See the [logging](#logging) section for further details. @@ -132,27 +127,15 @@ components, see the documentation for the [`kata-log-parser`](https://github.com/kata-containers/tests/tree/main/cmd/log-parser) tool. -For runtime logs, see the following sections for the CRI-O and containerd shimv2 based runtimes. - -### Kata OCI - -The Kata OCI runtime (including when used with CRI-O), provides `--log=` and `--log-format=` options. -However, the runtime also always logs to the system log (`syslog` or `journald`). - -To view runtime log output: - -```bash -$ sudo journalctl -t kata-runtime -``` - ### Kata containerd shimv2 The Kata containerd shimv2 runtime logs through `containerd`, and its logs will be sent to wherever the `containerd` logs are directed. However, the -shimv2 runtime also always logs to the system log (`syslog` or `journald`) under the -identifier name of `kata`. +shimv2 runtime also always logs to the system log (`syslog` or `journald`) using the `kata` identifier. -To view the `shimv2` runtime log output: +> **Note:** Kata logging [requires containerd debug to be enabled](../../docs/Developer-Guide.md#enabling-full-containerd-debug). + +To view the `shimv2` runtime logs: ```bash $ sudo journalctl -t kata diff --git a/src/runtime/config/configuration-clh.toml.in b/src/runtime/config/configuration-clh.toml.in index 7f06dd8eb..07e3f31a4 100644 --- a/src/runtime/config/configuration-clh.toml.in +++ b/src/runtime/config/configuration-clh.toml.in @@ -70,6 +70,11 @@ default_memory = @DEFMEMSZ@ # This is will determine the times that memory will be hotadded to sandbox/VM. #memory_slots = @DEFMEMSLOTS@ +# Shared file system type: +# - virtio-fs (default) +# - virtio-fs-nydus +shared_fs = "@DEFSHAREDFS_CLH_VIRTIOFS@" + # Path to vhost-user-fs daemon. virtio_fs_daemon = "@DEFVIRTIOFSDAEMON@" diff --git a/src/runtime/go.mod b/src/runtime/go.mod index ee5778488..1176012f5 100644 --- a/src/runtime/go.mod +++ b/src/runtime/go.mod @@ -16,6 +16,7 @@ require ( github.com/containernetworking/plugins v1.0.1 github.com/coreos/go-systemd/v22 v22.3.2 github.com/cri-o/cri-o v1.0.0-rc2.0.20170928185954-3394b3b2d6af + github.com/docker/go-units v0.4.0 github.com/fsnotify/fsnotify v1.4.9 github.com/go-ini/ini v1.28.2 github.com/go-openapi/errors v0.18.0 @@ -27,7 +28,7 @@ require ( github.com/gogo/protobuf v1.3.2 github.com/hashicorp/go-multierror v1.1.1 github.com/intel-go/cpuid v0.0.0-20210602155658-5747e5cec0d9 - github.com/mdlayher/vsock v0.0.0-20191108225356-d9c65923cb8f + github.com/mdlayher/vsock v1.1.0 github.com/opencontainers/image-spec v1.0.2 // indirect github.com/opencontainers/runc v1.1.0 github.com/opencontainers/runtime-spec v1.0.3-0.20210326190908-1c3f411f0417 @@ -47,9 +48,9 @@ require ( go.opentelemetry.io/otel/exporters/jaeger v1.0.0 go.opentelemetry.io/otel/sdk v1.3.0 go.opentelemetry.io/otel/trace v1.3.0 - golang.org/x/net v0.0.0-20211216030914-fe4d6282115f + golang.org/x/net v0.0.0-20220127200216-cd36cc0744dd golang.org/x/oauth2 v0.0.0-20210819190943-2bc19b11175f - golang.org/x/sys v0.0.0-20211216021012-1d35b9e2eb4e + golang.org/x/sys v0.0.0-20220204135822-1c1b9b1eba6a google.golang.org/grpc v1.43.0 k8s.io/apimachinery v0.22.5 k8s.io/cri-api v0.23.1 diff --git a/src/runtime/go.sum b/src/runtime/go.sum index f7f583a5a..dcd31a4b0 100644 --- a/src/runtime/go.sum +++ b/src/runtime/go.sum @@ -363,8 +363,9 @@ github.com/google/go-cmp v0.5.2/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/ github.com/google/go-cmp v0.5.3/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= github.com/google/go-cmp v0.5.4/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= github.com/google/go-cmp v0.5.5/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= -github.com/google/go-cmp v0.5.6 h1:BKbKCqvP6I+rmFHt06ZmyQtvB8xAkWdhFyr0ZUNZcxQ= github.com/google/go-cmp v0.5.6/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= +github.com/google/go-cmp v0.5.7 h1:81/ik6ipDQS2aGcBfIN5dHDB36BwrStyeAQquSYCV4o= +github.com/google/go-cmp v0.5.7/go.mod h1:n+brtR0CgQNWTVd5ZUFpTBC8YFBDLK/h/bpaJ8/DtOE= github.com/google/go-containerregistry v0.5.1/go.mod h1:Ct15B4yir3PLOP5jsy0GNeYVaIZs/MK/Jz5any1wFW0= github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg= github.com/google/gofuzz v1.1.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg= @@ -494,8 +495,10 @@ github.com/matttproud/golang_protobuf_extensions v1.0.1/go.mod h1:D8He9yQNgCq6Z5 github.com/matttproud/golang_protobuf_extensions v1.0.2-0.20181231171920-c182affec369 h1:I0XW9+e1XWDxdcEniV4rQAIOPUGDq67JSCiRCgGCZLI= github.com/matttproud/golang_protobuf_extensions v1.0.2-0.20181231171920-c182affec369/go.mod h1:BSXmuO+STAnVfrANrmjBb36TMTDstsz7MSK+HVaYKv4= github.com/maxbrunsfeld/counterfeiter/v6 v6.2.2/go.mod h1:eD9eIE7cdwcMi9rYluz88Jz2VyhSmden33/aXg4oVIY= -github.com/mdlayher/vsock v0.0.0-20191108225356-d9c65923cb8f h1:t9bhAC/9+wqdIb49Jamux+Sxqa7MhkyuTtsHkmVg6tk= -github.com/mdlayher/vsock v0.0.0-20191108225356-d9c65923cb8f/go.mod h1:4GtNxrXX+cNil8xnCdz0zGYemDZDDHSsXbopCRZrRRw= +github.com/mdlayher/socket v0.2.0 h1:EY4YQd6hTAg2tcXF84p5DTHazShE50u5HeBzBaNgjkA= +github.com/mdlayher/socket v0.2.0/go.mod h1:QLlNPkFR88mRUNQIzRBMfXxwKal8H7u1h3bL1CV+f0E= +github.com/mdlayher/vsock v1.1.0 h1:2k9udP/hUkLUOboGxXMHOk4f0GWWZwS3IuE3Ee/YYfk= +github.com/mdlayher/vsock v1.1.0/go.mod h1:nsVhPsVuBBwAKh6i6PzdNoke6/TNYTjkxoRKAp/+pXs= github.com/miekg/dns v1.0.14/go.mod h1:W1PPwlIAgtquWBMBEV9nkV9Cazfe8ScdGz/Lj7v3Nrg= github.com/miekg/pkcs11 v1.0.3/go.mod h1:XsNlhZGX73bx86s2hdc/FuaLm2CPZJemRLMA+WTFxgs= github.com/mistifyio/go-zfs v2.1.2-0.20190413222219-f784269be439+incompatible/go.mod h1:8AuVvqP/mXw1px98n46wfvcGfQ4ci2FwoAjKYxuo3Z4= @@ -809,7 +812,6 @@ golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLL golang.org/x/net v0.0.0-20190628185345-da137c7871d7/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= golang.org/x/net v0.0.0-20190724013045-ca1201d0de80/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= golang.org/x/net v0.0.0-20190827160401-ba9fcec4b297/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= -golang.org/x/net v0.0.0-20191108221443-4ba9e2ef068c/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= golang.org/x/net v0.0.0-20191209160850-c0dbc17a3553/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= golang.org/x/net v0.0.0-20200114155413-6afb5195e5aa/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= golang.org/x/net v0.0.0-20200202094626-16171245cfb2/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= @@ -840,8 +842,9 @@ golang.org/x/net v0.0.0-20210520170846-37e1c6afe023/go.mod h1:9nx3DQGgdP8bBQD5qx golang.org/x/net v0.0.0-20210525063256-abc453219eb5/go.mod h1:9nx3DQGgdP8bBQD5qxJ1jj9UTztislL4KSBs9R2vV5Y= golang.org/x/net v0.0.0-20210825183410-e898025ed96a/go.mod h1:9nx3DQGgdP8bBQD5qxJ1jj9UTztislL4KSBs9R2vV5Y= golang.org/x/net v0.0.0-20211209124913-491a49abca63/go.mod h1:9nx3DQGgdP8bBQD5qxJ1jj9UTztislL4KSBs9R2vV5Y= -golang.org/x/net v0.0.0-20211216030914-fe4d6282115f h1:hEYJvxw1lSnWIl8X9ofsYMklzaDs90JI2az5YMd4fPM= golang.org/x/net v0.0.0-20211216030914-fe4d6282115f/go.mod h1:9nx3DQGgdP8bBQD5qxJ1jj9UTztislL4KSBs9R2vV5Y= +golang.org/x/net v0.0.0-20220127200216-cd36cc0744dd h1:O7DYs+zxREGLKzKoMQrtrEacpb0ZVXA5rIwylE2Xchk= +golang.org/x/net v0.0.0-20220127200216-cd36cc0744dd/go.mod h1:CfG3xpIq0wQ8r1q4Su4UZFWDARRcnwPjda9FqA0JpMk= golang.org/x/oauth2 v0.0.0-20180821212333-d2e6202438be/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U= golang.org/x/oauth2 v0.0.0-20190226205417-e64efc72b421/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw= golang.org/x/oauth2 v0.0.0-20190604053449-0f29369cfe45/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw= @@ -892,7 +895,6 @@ golang.org/x/sys v0.0.0-20190904154756-749cb33beabd/go.mod h1:h1NjWce9XRLGQEsW7w golang.org/x/sys v0.0.0-20191001151750-bb3f8db39f24/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20191005200804-aed5e4c7ecf9/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20191026070338-33540a1f6037/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20191105231009-c1f44814a5cd/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20191115151921-52ab43148777/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20191120155948-bd437916bb0e/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20191204072324-ce4227a45e2e/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= @@ -947,13 +949,16 @@ golang.org/x/sys v0.0.0-20210616094352-59db8d763f22/go.mod h1:oPkhp1MJrh7nUepCBc golang.org/x/sys v0.0.0-20210809222454-d867a43fc93e/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.0.0-20210831042530-f4d43177bf5e/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.0.0-20210903071746-97244b99971b/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= +golang.org/x/sys v0.0.0-20210927094055-39ccf1dd6fa6/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.0.0-20211025201205-69cdffdb9359/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= -golang.org/x/sys v0.0.0-20211216021012-1d35b9e2eb4e h1:fLOSk5Q00efkSvAm+4xcoXD+RRmLmmulPn5I3Y9F2EM= golang.org/x/sys v0.0.0-20211216021012-1d35b9e2eb4e/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= +golang.org/x/sys v0.0.0-20220204135822-1c1b9b1eba6a h1:ppl5mZgokTT8uPkmYOyEUmPTr3ypaKkg5eFOGrAmxxE= +golang.org/x/sys v0.0.0-20220204135822-1c1b9b1eba6a/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/term v0.0.0-20201117132131-f5c789dd3221/go.mod h1:Nr5EML6q2oocZ2LXRh80K7BxOlk5/8JxuGnuhpl+muw= golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo= golang.org/x/term v0.0.0-20210220032956-6a3ed077a48d/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo= golang.org/x/term v0.0.0-20210615171337-6886f2dfbf5b/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8= +golang.org/x/term v0.0.0-20210927222741-03fcf44c2211/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8= golang.org/x/text v0.0.0-20170915032832-14c0d48ead0c/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= golang.org/x/text v0.3.1-0.20180807135948-17ff2d5776d2/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= diff --git a/src/runtime/pkg/kata-monitor/cri.go b/src/runtime/pkg/kata-monitor/cri.go index e973e9caf..206726521 100644 --- a/src/runtime/pkg/kata-monitor/cri.go +++ b/src/runtime/pkg/kata-monitor/cri.go @@ -40,7 +40,6 @@ func getAddressAndDialer(endpoint string) (string, func(ctx context.Context, add func getConnection(endPoint string) (*grpc.ClientConn, error) { var conn *grpc.ClientConn - monitorLog.Debugf("connect using endpoint '%s' with '%s' timeout", endPoint, defaultTimeout) addr, dialer, err := getAddressAndDialer(endPoint) if err != nil { return nil, err @@ -52,7 +51,7 @@ func getConnection(endPoint string) (*grpc.ClientConn, error) { errMsg := errors.Wrapf(err, "connect endpoint '%s', make sure you are running as root and the endpoint has been started", endPoint) return nil, errMsg } - monitorLog.Debugf("connected successfully using endpoint: %s", endPoint) + monitorLog.Tracef("connected successfully using endpoint: %s", endPoint) return conn, nil } @@ -115,15 +114,15 @@ func parseEndpoint(endpoint string) (string, string, error) { } } -// getSandboxes gets ready sandboxes from the container engine and returns an updated sandboxMap -func (km *KataMonitor) getSandboxes(sandboxMap map[string]bool) (map[string]bool, error) { - newMap := make(map[string]bool) +// syncSandboxes gets pods metadata from the container manager and updates the sandbox cache. +func (km *KataMonitor) syncSandboxes(sandboxList []string) ([]string, error) { runtimeClient, runtimeConn, err := getRuntimeClient(km.runtimeEndpoint) if err != nil { - return newMap, err + return sandboxList, err } defer closeConnection(runtimeConn) + // TODO: if len(sandboxList) is 1, better we just runtimeClient.PodSandboxStatus(...) targeting the single sandbox filter := &pb.PodSandboxFilter{ State: &pb.PodSandboxStateValue{ State: pb.PodSandboxState_SANDBOX_READY, @@ -133,29 +132,35 @@ func (km *KataMonitor) getSandboxes(sandboxMap map[string]bool) (map[string]bool request := &pb.ListPodSandboxRequest{ Filter: filter, } - monitorLog.Debugf("ListPodSandboxRequest: %v", request) + monitorLog.Tracef("ListPodSandboxRequest: %v", request) r, err := runtimeClient.ListPodSandbox(context.Background(), request) if err != nil { - return newMap, err + return sandboxList, err } - monitorLog.Debugf("ListPodSandboxResponse: %v", r) + monitorLog.Tracef("ListPodSandboxResponse: %v", r) for _, pod := range r.Items { - // Use the cached data if available - if isKata, ok := sandboxMap[pod.Id]; ok { - newMap[pod.Id] = isKata - continue + for _, sandbox := range sandboxList { + if pod.Id == sandbox { + km.sandboxCache.setMetadata(sandbox, sandboxKubeData{ + uid: pod.Metadata.Uid, + name: pod.Metadata.Name, + namespace: pod.Metadata.Namespace, + }) + + sandboxList = removeFromSandboxList(sandboxList, sandbox) + + monitorLog.WithFields(logrus.Fields{ + "Pod Name": pod.Metadata.Name, + "Pod Namespace": pod.Metadata.Namespace, + "Pod UID": pod.Metadata.Uid, + }).Debugf("Synced KATA POD %s", pod.Id) + + break + } } - - // Check if a directory associated with the POD ID exist on the kata fs: - // if so we know that the POD is a kata one. - newMap[pod.Id] = checkSandboxFSExists(pod.Id) - monitorLog.WithFields(logrus.Fields{ - "id": pod.Id, - "is kata": newMap[pod.Id], - "pod": pod, - }).Debug("") } - - return newMap, nil + // TODO: here we should mark the sandboxes we failed to retrieve info from: we should try a finite number of times + // to retrieve their metadata: if we fail resign and remove them from the sanbox cache (with a Warning log). + return sandboxList, nil } diff --git a/src/runtime/pkg/kata-monitor/metrics.go b/src/runtime/pkg/kata-monitor/metrics.go index b1788de46..0216969cb 100644 --- a/src/runtime/pkg/kata-monitor/metrics.go +++ b/src/runtime/pkg/kata-monitor/metrics.go @@ -141,7 +141,7 @@ func encodeMetricFamily(mfs []*dto.MetricFamily, encoder expfmt.Encoder) error { // aggregateSandboxMetrics will get metrics from one sandbox and do some process func (km *KataMonitor) aggregateSandboxMetrics(encoder expfmt.Encoder) error { // get all kata sandboxes from cache - sandboxes := km.sandboxCache.getKataSandboxes() + sandboxes := km.sandboxCache.getSandboxList() // save running kata pods as a metrics. runningShimCount.Set(float64(len(sandboxes))) @@ -156,13 +156,17 @@ func (km *KataMonitor) aggregateSandboxMetrics(encoder expfmt.Encoder) error { // used to receive response results := make(chan []*dto.MetricFamily, len(sandboxes)) - monitorLog.WithField("sandbox_count", len(sandboxes)).Debugf("sandboxes count") + monitorLog.WithField("sandboxes count", len(sandboxes)).Debugf("aggregate sandbox metrics") // get metrics from sandbox's shim for _, sandboxID := range sandboxes { + sandboxMetadata, ok := km.sandboxCache.getMetadata(sandboxID) + if !ok { // likely the sandbox has been just removed + continue + } wg.Add(1) - go func(sandboxID string, results chan<- []*dto.MetricFamily) { - sandboxMetrics, err := getParsedMetrics(sandboxID) + go func(sandboxID string, sandboxMetadata sandboxKubeData, results chan<- []*dto.MetricFamily) { + sandboxMetrics, err := getParsedMetrics(sandboxID, sandboxMetadata) if err != nil { monitorLog.WithError(err).WithField("sandbox_id", sandboxID).Errorf("failed to get metrics for sandbox") } @@ -170,7 +174,7 @@ func (km *KataMonitor) aggregateSandboxMetrics(encoder expfmt.Encoder) error { results <- sandboxMetrics wg.Done() monitorLog.WithField("sandbox_id", sandboxID).Debug("job finished") - }(sandboxID, results) + }(sandboxID, sandboxMetadata, results) monitorLog.WithField("sandbox_id", sandboxID).Debug("job started") } @@ -219,13 +223,13 @@ func (km *KataMonitor) aggregateSandboxMetrics(encoder expfmt.Encoder) error { } -func getParsedMetrics(sandboxID string) ([]*dto.MetricFamily, error) { +func getParsedMetrics(sandboxID string, sandboxMetadata sandboxKubeData) ([]*dto.MetricFamily, error) { body, err := doGet(sandboxID, defaultTimeout, "metrics") if err != nil { return nil, err } - return parsePrometheusMetrics(sandboxID, body) + return parsePrometheusMetrics(sandboxID, sandboxMetadata, body) } // GetSandboxMetrics will get sandbox's metrics from shim @@ -240,7 +244,7 @@ func GetSandboxMetrics(sandboxID string) (string, error) { // parsePrometheusMetrics will decode metrics from Prometheus text format // and return array of *dto.MetricFamily with an ASC order -func parsePrometheusMetrics(sandboxID string, body []byte) ([]*dto.MetricFamily, error) { +func parsePrometheusMetrics(sandboxID string, sandboxMetadata sandboxKubeData, body []byte) ([]*dto.MetricFamily, error) { reader := bytes.NewReader(body) decoder := expfmt.NewDecoder(reader, expfmt.FmtText) @@ -258,10 +262,24 @@ func parsePrometheusMetrics(sandboxID string, body []byte) ([]*dto.MetricFamily, metricList := mf.Metric for j := range metricList { metric := metricList[j] - metric.Label = append(metric.Label, &dto.LabelPair{ - Name: mutils.String2Pointer("sandbox_id"), - Value: mutils.String2Pointer(sandboxID), - }) + metric.Label = append(metric.Label, + &dto.LabelPair{ + Name: mutils.String2Pointer("sandbox_id"), + Value: mutils.String2Pointer(sandboxID), + }, + &dto.LabelPair{ + Name: mutils.String2Pointer("kube_uid"), + Value: mutils.String2Pointer(sandboxMetadata.uid), + }, + &dto.LabelPair{ + Name: mutils.String2Pointer("kube_name"), + Value: mutils.String2Pointer(sandboxMetadata.name), + }, + &dto.LabelPair{ + Name: mutils.String2Pointer("kube_namespace"), + Value: mutils.String2Pointer(sandboxMetadata.namespace), + }, + ) } // Kata shim are using prometheus go client, add a prefix for metric name to avoid confusing diff --git a/src/runtime/pkg/kata-monitor/metrics_test.go b/src/runtime/pkg/kata-monitor/metrics_test.go index 5263d2a93..1055a6d36 100644 --- a/src/runtime/pkg/kata-monitor/metrics_test.go +++ b/src/runtime/pkg/kata-monitor/metrics_test.go @@ -40,9 +40,10 @@ ttt 999 func TestParsePrometheusMetrics(t *testing.T) { assert := assert.New(t) sandboxID := "sandboxID-abc" + sandboxMetadata := sandboxKubeData{"123", "pod-name", "pod-namespace"} // parse metrics - list, err := parsePrometheusMetrics(sandboxID, []byte(shimMetricBody)) + list, err := parsePrometheusMetrics(sandboxID, sandboxMetadata, []byte(shimMetricBody)) assert.Nil(err, "parsePrometheusMetrics should not return error") assert.Equal(4, len(list), "should return 3 metric families") @@ -56,9 +57,16 @@ func TestParsePrometheusMetrics(t *testing.T) { // get the metric m := mf.Metric[0] - assert.Equal(1, len(m.Label), "should have only 1 labels") + assert.Equal(4, len(m.Label), "should have 4 labels") assert.Equal("sandbox_id", *m.Label[0].Name, "label name should be sandbox_id") assert.Equal(sandboxID, *m.Label[0].Value, "label value should be", sandboxID) + assert.Equal("kube_uid", *m.Label[1].Name, "label name should be kube_uid") + assert.Equal(sandboxMetadata.uid, *m.Label[1].Value, "label value should be", sandboxMetadata.uid) + + assert.Equal("kube_name", *m.Label[2].Name, "label name should be kube_name") + assert.Equal(sandboxMetadata.name, *m.Label[2].Value, "label value should be", sandboxMetadata.name) + assert.Equal("kube_namespace", *m.Label[3].Name, "label name should be kube_namespace") + assert.Equal(sandboxMetadata.namespace, *m.Label[3].Value, "label value should be", sandboxMetadata.namespace) summary := m.Summary assert.NotNil(summary, "summary should not be nil") diff --git a/src/runtime/pkg/kata-monitor/monitor.go b/src/runtime/pkg/kata-monitor/monitor.go index 6d0003af5..caacfb222 100644 --- a/src/runtime/pkg/kata-monitor/monitor.go +++ b/src/runtime/pkg/kata-monitor/monitor.go @@ -53,7 +53,7 @@ func NewKataMonitor(runtimeEndpoint string) (*KataMonitor, error) { runtimeEndpoint: runtimeEndpoint, sandboxCache: &sandboxCache{ Mutex: &sync.Mutex{}, - sandboxes: make(map[string]bool), + sandboxes: make(map[string]sandboxKubeData), }, } @@ -65,6 +65,15 @@ func NewKataMonitor(runtimeEndpoint string) (*KataMonitor, error) { return km, nil } +func removeFromSandboxList(sandboxList []string, sandboxToRemove string) []string { + for i, sandbox := range sandboxList { + if sandbox == sandboxToRemove { + return append(sandboxList[:i], sandboxList[i+1:]...) + } + } + return sandboxList +} + // startPodCacheUpdater will boot a thread to manage sandbox cache func (km *KataMonitor) startPodCacheUpdater() { sbsWatcher, err := fsnotify.NewWatcher() @@ -84,9 +93,24 @@ func (km *KataMonitor) startPodCacheUpdater() { monitorLog.Debugf("started fs monitoring @%s", getSandboxFS()) break } - // we refresh the pod cache once if we get multiple add/delete pod events in a short time (< podCacheRefreshDelaySeconds) + // Initial sync with the kata sandboxes already running + sbsFile, err := os.Open(getSandboxFS()) + if err != nil { + monitorLog.WithError(err).Fatal("cannot open sandboxes fs") + os.Exit(1) + } + sandboxList, err := sbsFile.Readdirnames(0) + if err != nil { + monitorLog.WithError(err).Fatal("cannot read sandboxes fs") + os.Exit(1) + } + monitorLog.Debug("initial sync of sbs directory completed") + monitorLog.Tracef("pod list from sbs: %v", sandboxList) + + // We should get kubernetes metadata from the container manager for each new kata sandbox we detect. + // It may take a while for data to be available, so we always wait podCacheRefreshDelaySeconds before checking. cacheUpdateTimer := time.NewTimer(podCacheRefreshDelaySeconds * time.Second) - cacheUpdateTimerWasSet := false + cacheUpdateTimerIsSet := true for { select { case event, ok := <-sbsWatcher.Events: @@ -99,11 +123,18 @@ func (km *KataMonitor) startPodCacheUpdater() { case fsnotify.Create: splitPath := strings.Split(event.Name, string(os.PathSeparator)) id := splitPath[len(splitPath)-1] - if !km.sandboxCache.putIfNotExists(id, true) { + if !km.sandboxCache.putIfNotExists(id, sandboxKubeData{}) { monitorLog.WithField("pod", id).Warn( "CREATE event but pod already present in the sandbox cache") } + sandboxList = append(sandboxList, id) monitorLog.WithField("pod", id).Info("sandbox cache: added pod") + if !cacheUpdateTimerIsSet { + cacheUpdateTimer.Reset(podCacheRefreshDelaySeconds * time.Second) + cacheUpdateTimerIsSet = true + monitorLog.Debugf( + "cache update timer fires in %d secs", podCacheRefreshDelaySeconds) + } case fsnotify.Remove: splitPath := strings.Split(event.Name, string(os.PathSeparator)) @@ -112,28 +143,27 @@ func (km *KataMonitor) startPodCacheUpdater() { monitorLog.WithField("pod", id).Warn( "REMOVE event but pod was missing from the sandbox cache") } + sandboxList = removeFromSandboxList(sandboxList, id) monitorLog.WithField("pod", id).Info("sandbox cache: removed pod") - - default: - monitorLog.WithField("event", event).Warn("got unexpected fs event") } - // While we process fs events directly to update the sandbox cache we need to sync with the - // container engine to ensure we are on sync with it: we can get out of sync in environments - // where kata workloads can be started by other processes than the container engine. - cacheUpdateTimerWasSet = cacheUpdateTimer.Reset(podCacheRefreshDelaySeconds * time.Second) - monitorLog.WithField("was reset", cacheUpdateTimerWasSet).Debugf( - "cache update timer fires in %d secs", podCacheRefreshDelaySeconds) - case <-cacheUpdateTimer.C: - sandboxes, err := km.getSandboxes(km.sandboxCache.getAllSandboxes()) + cacheUpdateTimerIsSet = false + monitorLog.WithField("pod list", sandboxList).Debugf( + "retrieve pods metadata from the container manager") + sandboxList, err = km.syncSandboxes(sandboxList) if err != nil { - monitorLog.WithError(err).Error("failed to get sandboxes") + monitorLog.WithError(err).Error("failed to get sandboxes metadata") continue } - monitorLog.WithField("count", len(sandboxes)).Info("synced sandbox cache with the container engine") - monitorLog.WithField("sandboxes", sandboxes).Debug("dump sandbox cache") - km.sandboxCache.set(sandboxes) + if len(sandboxList) > 0 { + monitorLog.WithField("sandboxes", sandboxList).Debugf( + "%d sandboxes still miss metadata", len(sandboxList)) + cacheUpdateTimer.Reset(podCacheRefreshDelaySeconds * time.Second) + cacheUpdateTimerIsSet = true + } + + monitorLog.WithField("sandboxes", km.sandboxCache.getSandboxList()).Trace("dump sandbox cache") } } } @@ -157,7 +187,7 @@ func (km *KataMonitor) GetAgentURL(w http.ResponseWriter, r *http.Request) { // ListSandboxes list all sandboxes running in Kata func (km *KataMonitor) ListSandboxes(w http.ResponseWriter, r *http.Request) { - sandboxes := km.sandboxCache.getKataSandboxes() + sandboxes := km.sandboxCache.getSandboxList() for _, s := range sandboxes { w.Write([]byte(fmt.Sprintf("%s\n", s))) } diff --git a/src/runtime/pkg/kata-monitor/sandbox_cache.go b/src/runtime/pkg/kata-monitor/sandbox_cache.go index 98978f193..ba98a121f 100644 --- a/src/runtime/pkg/kata-monitor/sandbox_cache.go +++ b/src/runtime/pkg/kata-monitor/sandbox_cache.go @@ -9,28 +9,25 @@ import ( "sync" ) +type sandboxKubeData struct { + uid string + name string + namespace string +} type sandboxCache struct { *sync.Mutex - // the bool value tracks if the pod is a kata one (true) or not (false) - sandboxes map[string]bool + // the sandboxKubeData links the sandbox id from the container manager to the pod metadata of kubernetes + sandboxes map[string]sandboxKubeData } -func (sc *sandboxCache) getAllSandboxes() map[string]bool { +func (sc *sandboxCache) getSandboxList() []string { sc.Lock() defer sc.Unlock() - return sc.sandboxes -} - -func (sc *sandboxCache) getKataSandboxes() []string { - sc.Lock() - defer sc.Unlock() - var katasandboxes []string - for id, isKata := range sc.sandboxes { - if isKata { - katasandboxes = append(katasandboxes, id) - } + var sandboxList []string + for id := range sc.sandboxes { + sandboxList = append(sandboxList, id) } - return katasandboxes + return sandboxList } func (sc *sandboxCache) deleteIfExists(id string) bool { @@ -46,7 +43,7 @@ func (sc *sandboxCache) deleteIfExists(id string) bool { return false } -func (sc *sandboxCache) putIfNotExists(id string, value bool) bool { +func (sc *sandboxCache) putIfNotExists(id string, value sandboxKubeData) bool { sc.Lock() defer sc.Unlock() @@ -59,8 +56,17 @@ func (sc *sandboxCache) putIfNotExists(id string, value bool) bool { return false } -func (sc *sandboxCache) set(sandboxes map[string]bool) { +func (sc *sandboxCache) setMetadata(id string, value sandboxKubeData) { sc.Lock() defer sc.Unlock() - sc.sandboxes = sandboxes + + sc.sandboxes[id] = value +} + +func (sc *sandboxCache) getMetadata(id string) (sandboxKubeData, bool) { + sc.Lock() + defer sc.Unlock() + + metadata, ok := sc.sandboxes[id] + return metadata, ok } diff --git a/src/runtime/pkg/kata-monitor/sandbox_cache_test.go b/src/runtime/pkg/kata-monitor/sandbox_cache_test.go index 43b8c8c99..4eedf778a 100644 --- a/src/runtime/pkg/kata-monitor/sandbox_cache_test.go +++ b/src/runtime/pkg/kata-monitor/sandbox_cache_test.go @@ -16,31 +16,26 @@ func TestSandboxCache(t *testing.T) { assert := assert.New(t) sc := &sandboxCache{ Mutex: &sync.Mutex{}, - sandboxes: make(map[string]bool), + sandboxes: map[string]sandboxKubeData{"111": {"1-2-3", "test-name", "test-namespace"}}, } - scMap := map[string]bool{"111": true} - - sc.set(scMap) - - scMap = sc.getAllSandboxes() - assert.Equal(1, len(scMap)) + assert.Equal(1, len(sc.getSandboxList())) // put new item id := "new-id" - b := sc.putIfNotExists(id, true) + b := sc.putIfNotExists(id, sandboxKubeData{}) assert.Equal(true, b) - assert.Equal(2, len(scMap)) + assert.Equal(2, len(sc.getSandboxList())) // put key that alreay exists - b = sc.putIfNotExists(id, true) + b = sc.putIfNotExists(id, sandboxKubeData{}) assert.Equal(false, b) b = sc.deleteIfExists(id) assert.Equal(true, b) - assert.Equal(1, len(scMap)) + assert.Equal(1, len(sc.getSandboxList())) b = sc.deleteIfExists(id) assert.Equal(false, b) - assert.Equal(1, len(scMap)) + assert.Equal(1, len(sc.getSandboxList())) } diff --git a/src/runtime/pkg/kata-monitor/shim_client.go b/src/runtime/pkg/kata-monitor/shim_client.go index 31043c847..bdb62d401 100644 --- a/src/runtime/pkg/kata-monitor/shim_client.go +++ b/src/runtime/pkg/kata-monitor/shim_client.go @@ -10,8 +10,6 @@ import ( "io" "net" "net/http" - "os" - "path/filepath" "time" cdshim "github.com/containerd/containerd/runtime/v2/shim" @@ -43,13 +41,6 @@ func getSandboxFS() string { return shim.GetSandboxesStoragePath() } -func checkSandboxFSExists(sandboxID string) bool { - sbsPath := filepath.Join(string(filepath.Separator), getSandboxFS(), sandboxID) - _, err := os.Stat(sbsPath) - - return !os.IsNotExist(err) -} - // BuildShimClient builds and returns an http client for communicating with the provided sandbox func BuildShimClient(sandboxID string, timeout time.Duration) (*http.Client, error) { return buildUnixSocketClient(shim.SocketAddress(sandboxID), timeout) diff --git a/src/runtime/pkg/katautils/config.go b/src/runtime/pkg/katautils/config.go index 6bc426be0..5874c0344 100644 --- a/src/runtime/pkg/katautils/config.go +++ b/src/runtime/pkg/katautils/config.go @@ -426,7 +426,7 @@ func (h hypervisor) sharedFS() (string, error) { supportedSharedFS := []string{config.Virtio9P, config.VirtioFS, config.VirtioFSNydus} if h.SharedFS == "" { - return config.Virtio9P, nil + return config.VirtioFS, nil } for _, fs := range supportedSharedFS { @@ -644,14 +644,9 @@ func newQemuHypervisorConfig(h hypervisor) (vc.HypervisorConfig, error) { return vc.HypervisorConfig{}, err } - if sharedFS == config.VirtioFS && h.VirtioFSDaemon == "" { + if (sharedFS == config.VirtioFS || sharedFS == config.VirtioFSNydus) && h.VirtioFSDaemon == "" { return vc.HypervisorConfig{}, - errors.New("cannot enable virtio-fs without daemon path in configuration file") - } - - if sharedFS == config.VirtioFSNydus && h.VirtioFSDaemon == "" { - return vc.HypervisorConfig{}, - errors.New("cannot enable virtio nydus without nydusd daemon path in configuration file") + fmt.Errorf("cannot enable %s without daemon path in configuration file", sharedFS) } if vSock, err := utils.SupportsVsocks(); !vSock { @@ -822,11 +817,18 @@ func newClhHypervisorConfig(h hypervisor) (vc.HypervisorConfig, error) { return vc.HypervisorConfig{}, err } - sharedFS := config.VirtioFS + sharedFS, err := h.sharedFS() + if err != nil { + return vc.HypervisorConfig{}, err + } + + if sharedFS != config.VirtioFS && sharedFS != config.VirtioFSNydus { + return vc.HypervisorConfig{}, errors.New("clh only support virtio-fs or virtio-fs-nydus") + } if h.VirtioFSDaemon == "" { return vc.HypervisorConfig{}, - errors.New("virtio-fs daemon path is missing in configuration file") + fmt.Errorf("cannot enable %s without daemon path in configuration file", sharedFS) } return vc.HypervisorConfig{ diff --git a/src/runtime/pkg/katautils/config_test.go b/src/runtime/pkg/katautils/config_test.go index 5bfd61250..268e50276 100644 --- a/src/runtime/pkg/katautils/config_test.go +++ b/src/runtime/pkg/katautils/config_test.go @@ -633,6 +633,8 @@ func TestNewQemuHypervisorConfig(t *testing.T) { PCIeRootPort: pcieRootPort, RxRateLimiterMaxRate: rxRateLimiterMaxRate, TxRateLimiterMaxRate: txRateLimiterMaxRate, + SharedFS: "virtio-fs", + VirtioFSDaemon: filepath.Join(dir, "virtiofsd"), } files := []string{hypervisorPath, kernelPath, imagePath} @@ -1388,6 +1390,8 @@ func TestUpdateRuntimeConfigurationVMConfig(t *testing.T) { Image: "/", Firmware: "/", FirmwareVolume: "/", + SharedFS: "virtio-fs", + VirtioFSDaemon: "/usr/libexec/kata-qemu/virtiofsd", }, }, } diff --git a/src/runtime/vendor/github.com/mdlayher/socket/CHANGELOG.md b/src/runtime/vendor/github.com/mdlayher/socket/CHANGELOG.md new file mode 100644 index 000000000..d16ae090b --- /dev/null +++ b/src/runtime/vendor/github.com/mdlayher/socket/CHANGELOG.md @@ -0,0 +1,36 @@ +# CHANGELOG + +## v0.2.0 + +- [New API] [commit](https://github.com/mdlayher/socket/commit/6e912a68523c45e5fd899239f4b46c402dd856da): + `socket.FileConn` can be used to create a `socket.Conn` from an existing + `os.File`, which may be provided by systemd socket activation or another + external mechanism. +- [API change] [commit](https://github.com/mdlayher/socket/commit/66d61f565188c23fe02b24099ddc856d538bf1a7): + `socket.Conn.Connect` now returns the `unix.Sockaddr` value provided by + `getpeername(2)`, since we have to invoke that system call anyway to verify + that a connection to a remote peer was successfully established. +- [Bug Fix] [commit](https://github.com/mdlayher/socket/commit/b60b2dbe0ac3caff2338446a150083bde8c5c19c): + check the correct error from `unix.GetsockoptInt` in the `socket.Conn.Connect` + method. Thanks @vcabbage! + +## v0.1.2 + +- [Bug Fix]: `socket.Conn.Connect` now properly checks the `SO_ERROR` socket + option value after calling `connect(2)` to verify whether or not a connection + could successfully be established. This means that `Connect` should now report + an error for an `AF_INET` TCP connection refused or `AF_VSOCK` connection + reset by peer. +- [New API]: add `socket.Conn.Getpeername` for use in `Connect`, but also for + use by external callers. + +## v0.1.1 + +- [New API]: `socket.Conn` now has `CloseRead`, `CloseWrite`, and `Shutdown` + methods. +- [Improvement]: internal rework to more robustly handle various errors. + +## v0.1.0 + +- Initial unstable release. Most functionality has been developed and ported +from package [`netlink`](https://github.com/mdlayher/netlink). diff --git a/src/runtime/vendor/github.com/mdlayher/socket/LICENSE.md b/src/runtime/vendor/github.com/mdlayher/socket/LICENSE.md new file mode 100644 index 000000000..3ccdb75b2 --- /dev/null +++ b/src/runtime/vendor/github.com/mdlayher/socket/LICENSE.md @@ -0,0 +1,9 @@ +# MIT License + +Copyright (C) 2021 Matt Layher + +Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. diff --git a/src/runtime/vendor/github.com/mdlayher/socket/README.md b/src/runtime/vendor/github.com/mdlayher/socket/README.md new file mode 100644 index 000000000..97ddcd617 --- /dev/null +++ b/src/runtime/vendor/github.com/mdlayher/socket/README.md @@ -0,0 +1,14 @@ +# socket [![Test Status](https://github.com/mdlayher/socket/workflows/Test/badge.svg)](https://github.com/mdlayher/socket/actions) [![Go Reference](https://pkg.go.dev/badge/github.com/mdlayher/socket.svg)](https://pkg.go.dev/github.com/mdlayher/socket) [![Go Report Card](https://goreportcard.com/badge/github.com/mdlayher/socket)](https://goreportcard.com/report/github.com/mdlayher/socket) + +Package `socket` provides a low-level network connection type which integrates +with Go's runtime network poller to provide asynchronous I/O and deadline +support. MIT Licensed. + +This package focuses on UNIX-like operating systems which make use of BSD +sockets system call APIs. It is meant to be used as a foundation for the +creation of operating system-specific socket packages, for socket families such +as Linux's `AF_NETLINK`, `AF_PACKET`, or `AF_VSOCK`. This package should not be +used directly in end user applications. + +Any use of package socket should be guarded by build tags, as one would also +use when importing the `syscall` or `golang.org/x/sys` packages. diff --git a/src/runtime/vendor/github.com/mdlayher/socket/accept.go b/src/runtime/vendor/github.com/mdlayher/socket/accept.go new file mode 100644 index 000000000..47e9d897e --- /dev/null +++ b/src/runtime/vendor/github.com/mdlayher/socket/accept.go @@ -0,0 +1,23 @@ +//go:build !dragonfly && !freebsd && !illumos && !linux +// +build !dragonfly,!freebsd,!illumos,!linux + +package socket + +import ( + "fmt" + "runtime" + + "golang.org/x/sys/unix" +) + +const sysAccept = "accept" + +// accept wraps accept(2). +func accept(fd, flags int) (int, unix.Sockaddr, error) { + if flags != 0 { + // These operating systems have no support for flags to accept(2). + return 0, nil, fmt.Errorf("socket: Conn.Accept flags are ineffective on %s", runtime.GOOS) + } + + return unix.Accept(fd) +} diff --git a/src/runtime/vendor/github.com/mdlayher/socket/accept4.go b/src/runtime/vendor/github.com/mdlayher/socket/accept4.go new file mode 100644 index 000000000..e1016b206 --- /dev/null +++ b/src/runtime/vendor/github.com/mdlayher/socket/accept4.go @@ -0,0 +1,15 @@ +//go:build dragonfly || freebsd || illumos || linux +// +build dragonfly freebsd illumos linux + +package socket + +import ( + "golang.org/x/sys/unix" +) + +const sysAccept = "accept4" + +// accept wraps accept4(2). +func accept(fd, flags int) (int, unix.Sockaddr, error) { + return unix.Accept4(fd, flags) +} diff --git a/src/runtime/vendor/github.com/mdlayher/socket/conn.go b/src/runtime/vendor/github.com/mdlayher/socket/conn.go new file mode 100644 index 000000000..d5ddcbf4a --- /dev/null +++ b/src/runtime/vendor/github.com/mdlayher/socket/conn.go @@ -0,0 +1,675 @@ +package socket + +import ( + "os" + "sync/atomic" + "syscall" + "time" + + "golang.org/x/sys/unix" +) + +// A Conn is a low-level network connection which integrates with Go's runtime +// network poller to provide asynchronous I/O and deadline support. +type Conn struct { + // Indicates whether or not Conn.Close has been called. Must be accessed + // atomically. Atomics definitions must come first in the Conn struct. + closed uint32 + + // A unique name for the Conn which is also associated with derived file + // descriptors such as those created by accept(2). + name string + + // Provides access to the underlying file registered with the runtime + // network poller, and arbitrary raw I/O calls. + fd *os.File + rc syscall.RawConn +} + +// A Config contains options for a Conn. +type Config struct { + // NetNS specifies the Linux network namespace the Conn will operate in. + // This option is unsupported on other operating systems. + // + // If set (non-zero), Conn will enter the specified network namespace and an + // error will occur in Socket if the operation fails. + // + // If not set (zero), a best-effort attempt will be made to enter the + // network namespace of the calling thread: this means that any changes made + // to the calling thread's network namespace will also be reflected in Conn. + // If this operation fails (due to lack of permissions or because network + // namespaces are disabled by kernel configuration), Socket will not return + // an error, and the Conn will operate in the default network namespace of + // the process. This enables non-privileged use of Conn in applications + // which do not require elevated privileges. + // + // Entering a network namespace is a privileged operation (root or + // CAP_SYS_ADMIN are required), and most applications should leave this set + // to 0. + NetNS int +} + +// High-level methods which provide convenience over raw system calls. + +// Close closes the underlying file descriptor for the Conn, which also causes +// all in-flight I/O operations to immediately unblock and return errors. Any +// subsequent uses of Conn will result in EBADF. +func (c *Conn) Close() error { + // The caller has expressed an intent to close the socket, so immediately + // increment s.closed to force further calls to result in EBADF before also + // closing the file descriptor to unblock any outstanding operations. + // + // Because other operations simply check for s.closed != 0, we will permit + // double Close, which would increment s.closed beyond 1. + if atomic.AddUint32(&c.closed, 1) != 1 { + // Multiple Close calls. + return nil + } + + return os.NewSyscallError("close", c.fd.Close()) +} + +// CloseRead shuts down the reading side of the Conn. Most callers should just +// use Close. +func (c *Conn) CloseRead() error { return c.Shutdown(unix.SHUT_RD) } + +// CloseWrite shuts down the writing side of the Conn. Most callers should just +// use Close. +func (c *Conn) CloseWrite() error { return c.Shutdown(unix.SHUT_WR) } + +// Read implements io.Reader by reading directly from the underlying file +// descriptor. +func (c *Conn) Read(b []byte) (int, error) { return c.fd.Read(b) } + +// Write implements io.Writer by writing directly to the underlying file +// descriptor. +func (c *Conn) Write(b []byte) (int, error) { return c.fd.Write(b) } + +// SetDeadline sets both the read and write deadlines associated with the Conn. +func (c *Conn) SetDeadline(t time.Time) error { return c.fd.SetDeadline(t) } + +// SetReadDeadline sets the read deadline associated with the Conn. +func (c *Conn) SetReadDeadline(t time.Time) error { return c.fd.SetReadDeadline(t) } + +// SetWriteDeadline sets the write deadline associated with the Conn. +func (c *Conn) SetWriteDeadline(t time.Time) error { return c.fd.SetWriteDeadline(t) } + +// ReadBuffer gets the size of the operating system's receive buffer associated +// with the Conn. +func (c *Conn) ReadBuffer() (int, error) { + return c.GetsockoptInt(unix.SOL_SOCKET, unix.SO_RCVBUF) +} + +// WriteBuffer gets the size of the operating system's transmit buffer +// associated with the Conn. +func (c *Conn) WriteBuffer() (int, error) { + return c.GetsockoptInt(unix.SOL_SOCKET, unix.SO_SNDBUF) +} + +// SetReadBuffer sets the size of the operating system's receive buffer +// associated with the Conn. +// +// When called with elevated privileges on Linux, the SO_RCVBUFFORCE option will +// be used to override operating system limits. Otherwise SO_RCVBUF is used +// (which obeys operating system limits). +func (c *Conn) SetReadBuffer(bytes int) error { return c.setReadBuffer(bytes) } + +// SetWriteBuffer sets the size of the operating system's transmit buffer +// associated with the Conn. +// +// When called with elevated privileges on Linux, the SO_SNDBUFFORCE option will +// be used to override operating system limits. Otherwise SO_SNDBUF is used +// (which obeys operating system limits). +func (c *Conn) SetWriteBuffer(bytes int) error { return c.setWriteBuffer(bytes) } + +// SyscallConn returns a raw network connection. This implements the +// syscall.Conn interface. +// +// SyscallConn is intended for advanced use cases, such as getting and setting +// arbitrary socket options using the socket's file descriptor. If possible, +// those operations should be performed using methods on Conn instead. +// +// Once invoked, it is the caller's responsibility to ensure that operations +// performed using Conn and the syscall.RawConn do not conflict with each other. +func (c *Conn) SyscallConn() (syscall.RawConn, error) { + if atomic.LoadUint32(&c.closed) != 0 { + return nil, os.NewSyscallError("syscallconn", unix.EBADF) + } + + // TODO(mdlayher): mutex or similar to enforce syscall.RawConn contract of + // FD remaining valid for duration of calls? + return c.rc, nil +} + +// Socket wraps the socket(2) system call to produce a Conn. domain, typ, and +// proto are passed directly to socket(2), and name should be a unique name for +// the socket type such as "netlink" or "vsock". +// +// The cfg parameter specifies optional configuration for the Conn. If nil, no +// additional configuration will be applied. +// +// If the operating system supports SOCK_CLOEXEC and SOCK_NONBLOCK, they are +// automatically applied to typ to mirror the standard library's socket flag +// behaviors. +func Socket(domain, typ, proto int, name string, cfg *Config) (*Conn, error) { + if cfg == nil { + cfg = &Config{} + } + + if cfg.NetNS == 0 { + // Non-Linux or no network namespace. + return socket(domain, typ, proto, name) + } + + // Linux only: create Conn in the specified network namespace. + return withNetNS(cfg.NetNS, func() (*Conn, error) { + return socket(domain, typ, proto, name) + }) +} + +// socket is the internal, cross-platform entry point for socket(2). +func socket(domain, typ, proto int, name string) (*Conn, error) { + var ( + fd int + err error + ) + + for { + fd, err = unix.Socket(domain, typ|socketFlags, proto) + switch { + case err == nil: + // Some OSes already set CLOEXEC with typ. + if !flagCLOEXEC { + unix.CloseOnExec(fd) + } + + // No error, prepare the Conn. + return newConn(fd, name) + case !ready(err): + // System call interrupted or not ready, try again. + continue + case err == unix.EINVAL, err == unix.EPROTONOSUPPORT: + // On Linux, SOCK_NONBLOCK and SOCK_CLOEXEC were introduced in + // 2.6.27. On FreeBSD, both flags were introduced in FreeBSD 10. + // EINVAL and EPROTONOSUPPORT check for earlier versions of these + // OSes respectively. + // + // Mirror what the standard library does when creating file + // descriptors: avoid racing a fork/exec with the creation of new + // file descriptors, so that child processes do not inherit socket + // file descriptors unexpectedly. + // + // For a more thorough explanation, see similar work in the Go tree: + // func sysSocket in net/sock_cloexec.go, as well as the detailed + // comment in syscall/exec_unix.go. + syscall.ForkLock.RLock() + fd, err = unix.Socket(domain, typ, proto) + if err != nil { + syscall.ForkLock.RUnlock() + return nil, os.NewSyscallError("socket", err) + } + unix.CloseOnExec(fd) + syscall.ForkLock.RUnlock() + + return newConn(fd, name) + default: + // Unhandled error. + return nil, os.NewSyscallError("socket", err) + } + } +} + +// FileConn returns a copy of the network connection corresponding to the open +// file. It is the caller's responsibility to close the file when finished. +// Closing the Conn does not affect the File, and closing the File does not +// affect the Conn. +func FileConn(f *os.File, name string) (*Conn, error) { + // First we'll try to do fctnl(2) with F_DUPFD_CLOEXEC because we can dup + // the file descriptor and set the flag in one syscall. + fd, err := unix.FcntlInt(f.Fd(), unix.F_DUPFD_CLOEXEC, 0) + switch err { + case nil: + // OK, ready to set up non-blocking I/O. + return newConn(fd, name) + case unix.EINVAL: + // The kernel rejected our fcntl(2), fall back to separate dup(2) and + // setting close on exec. + // + // Mirror what the standard library does when creating file descriptors: + // avoid racing a fork/exec with the creation of new file descriptors, + // so that child processes do not inherit socket file descriptors + // unexpectedly. + syscall.ForkLock.RLock() + fd, err := unix.Dup(fd) + if err != nil { + syscall.ForkLock.RUnlock() + return nil, os.NewSyscallError("dup", err) + } + unix.CloseOnExec(fd) + syscall.ForkLock.RUnlock() + + return newConn(fd, name) + default: + // Any other errors. + return nil, os.NewSyscallError("fcntl", err) + } +} + +// TODO(mdlayher): consider exporting newConn as New? + +// newConn wraps an existing file descriptor to create a Conn. name should be a +// unique name for the socket type such as "netlink" or "vsock". +func newConn(fd int, name string) (*Conn, error) { + // All Conn I/O is nonblocking for integration with Go's runtime network + // poller. Depending on the OS this might already be set but it can't hurt + // to set it again. + if err := unix.SetNonblock(fd, true); err != nil { + return nil, os.NewSyscallError("setnonblock", err) + } + + // os.NewFile registers the non-blocking file descriptor with the runtime + // poller, which is then used for most subsequent operations except those + // that require raw I/O via SyscallConn. + // + // See also: https://golang.org/pkg/os/#NewFile + f := os.NewFile(uintptr(fd), name) + rc, err := f.SyscallConn() + if err != nil { + return nil, err + } + + return &Conn{ + name: name, + fd: f, + rc: rc, + }, nil +} + +// Low-level methods which provide raw system call access. + +// Accept wraps accept(2) or accept4(2) depending on the operating system, but +// returns a Conn for the accepted connection rather than a raw file descriptor. +// +// If the operating system supports accept4(2) (which allows flags), +// SOCK_CLOEXEC and SOCK_NONBLOCK are automatically applied to flags to mirror +// the standard library's socket flag behaviors. +// +// If the operating system only supports accept(2) (which does not allow flags) +// and flags is not zero, an error will be returned. +func (c *Conn) Accept(flags int) (*Conn, unix.Sockaddr, error) { + var ( + nfd int + sa unix.Sockaddr + err error + ) + + doErr := c.read(sysAccept, func(fd int) error { + // Either accept(2) or accept4(2) depending on the OS. + nfd, sa, err = accept(fd, flags|socketFlags) + return err + }) + if doErr != nil { + return nil, nil, doErr + } + if err != nil { + // sysAccept is either "accept" or "accept4" depending on the OS. + return nil, nil, os.NewSyscallError(sysAccept, err) + } + + // Successfully accepted a connection, wrap it in a Conn for use by the + // caller. + ac, err := newConn(nfd, c.name) + if err != nil { + return nil, nil, err + } + + return ac, sa, nil +} + +// Bind wraps bind(2). +func (c *Conn) Bind(sa unix.Sockaddr) error { + const op = "bind" + + var err error + doErr := c.control(op, func(fd int) error { + err = unix.Bind(fd, sa) + return err + }) + if doErr != nil { + return doErr + } + + return os.NewSyscallError(op, err) +} + +// Connect wraps connect(2). In order to verify that the underlying socket is +// connected to a remote peer, Connect calls getpeername(2) and returns the +// unix.Sockaddr from that call. +func (c *Conn) Connect(sa unix.Sockaddr) (unix.Sockaddr, error) { + const op = "connect" + + // TODO(mdlayher): it would seem that trying to connect to unbound vsock + // listeners by calling Connect multiple times results in ECONNRESET for the + // first and nil error for subsequent calls. Do we need to memoize the + // error? Check what the stdlib behavior is. + + var ( + // Track progress between invocations of the write closure. We don't + // have an explicit WaitWrite call like internal/poll does, so we have + // to wait until the runtime calls the closure again to indicate we can + // write. + progress uint32 + + // Capture closure sockaddr and error. + rsa unix.Sockaddr + err error + ) + + doErr := c.write(op, func(fd int) error { + if atomic.AddUint32(&progress, 1) == 1 { + // First call: initiate connect. + return unix.Connect(fd, sa) + } + + // Subsequent calls: the runtime network poller indicates fd is + // writable. Check for errno. + errno, gerr := c.GetsockoptInt(unix.SOL_SOCKET, unix.SO_ERROR) + if gerr != nil { + return gerr + } + if errno != 0 { + // Connection is still not ready or failed. If errno indicates + // the socket is not ready, we will wait for the next write + // event. Otherwise we propagate this errno back to the as a + // permanent error. + uerr := unix.Errno(errno) + err = uerr + return uerr + } + + // According to internal/poll, it's possible for the runtime network + // poller to spuriously wake us and return errno 0 for SO_ERROR. + // Make sure we are actually connected to a peer. + peer, err := c.Getpeername() + if err != nil { + // internal/poll unconditionally goes back to WaitWrite. + // Synthesize an error that will do the same for us. + return unix.EAGAIN + } + + // Connection complete. + rsa = peer + return nil + }) + if doErr != nil { + return nil, doErr + } + + if err == unix.EISCONN { + // TODO(mdlayher): is this block obsolete with the addition of the + // getsockopt SO_ERROR check above? + // + // EISCONN is reported if the socket is already established and should + // not be treated as an error. + // - Darwin reports this for at least TCP sockets + // - Linux reports this for at least AF_VSOCK sockets + return rsa, nil + } + + return rsa, os.NewSyscallError(op, err) +} + +// Getsockname wraps getsockname(2). +func (c *Conn) Getsockname() (unix.Sockaddr, error) { + const op = "getsockname" + + var ( + sa unix.Sockaddr + err error + ) + + doErr := c.control(op, func(fd int) error { + sa, err = unix.Getsockname(fd) + return err + }) + if doErr != nil { + return nil, doErr + } + + return sa, os.NewSyscallError(op, err) +} + +// Getpeername wraps getpeername(2). +func (c *Conn) Getpeername() (unix.Sockaddr, error) { + const op = "getpeername" + + var ( + sa unix.Sockaddr + err error + ) + + doErr := c.control(op, func(fd int) error { + sa, err = unix.Getpeername(fd) + return err + }) + if doErr != nil { + return nil, doErr + } + + return sa, os.NewSyscallError(op, err) +} + +// GetsockoptInt wraps getsockopt(2) for integer values. +func (c *Conn) GetsockoptInt(level, opt int) (int, error) { + const op = "getsockopt" + + var ( + value int + err error + ) + + doErr := c.control(op, func(fd int) error { + value, err = unix.GetsockoptInt(fd, level, opt) + return err + }) + if doErr != nil { + return 0, doErr + } + + return value, os.NewSyscallError(op, err) +} + +// Listen wraps listen(2). +func (c *Conn) Listen(n int) error { + const op = "listen" + + var err error + doErr := c.control(op, func(fd int) error { + err = unix.Listen(fd, n) + return err + }) + if doErr != nil { + return doErr + } + + return os.NewSyscallError(op, err) +} + +// Recvmsg wraps recvmsg(2). +func (c *Conn) Recvmsg(p, oob []byte, flags int) (int, int, int, unix.Sockaddr, error) { + const op = "recvmsg" + + var ( + n, oobn, recvflags int + from unix.Sockaddr + err error + ) + + doErr := c.read(op, func(fd int) error { + n, oobn, recvflags, from, err = unix.Recvmsg(fd, p, oob, flags) + return err + }) + if doErr != nil { + return 0, 0, 0, nil, doErr + } + + return n, oobn, recvflags, from, os.NewSyscallError(op, err) +} + +// Recvfrom wraps recvfrom(2) +func (c *Conn) Recvfrom(p []byte, flags int) (int, unix.Sockaddr, error) { + const op = "recvfrom" + + var ( + n int + addr unix.Sockaddr + err error + ) + + doErr := c.read(op, func(fd int) error { + n, addr, err = unix.Recvfrom(fd, p, flags) + return err + }) + if doErr != nil { + return 0, nil, doErr + } + + return n, addr, os.NewSyscallError(op, err) +} + +// Sendmsg wraps sendmsg(2). +func (c *Conn) Sendmsg(p, oob []byte, to unix.Sockaddr, flags int) error { + const op = "sendmsg" + + var err error + doErr := c.write(op, func(fd int) error { + err = unix.Sendmsg(fd, p, oob, to, flags) + return err + }) + if doErr != nil { + return doErr + } + + return os.NewSyscallError(op, err) +} + +// Sendto wraps sendto(2). +func (c *Conn) Sendto(b []byte, to unix.Sockaddr, flags int) error { + const op = "sendto" + + var err error + doErr := c.write(op, func(fd int) error { + err = unix.Sendto(fd, b, flags, to) + return err + }) + if doErr != nil { + return doErr + } + + return os.NewSyscallError(op, err) +} + +// SetsockoptInt wraps setsockopt(2) for integer values. +func (c *Conn) SetsockoptInt(level, opt, value int) error { + const op = "setsockopt" + + var err error + doErr := c.control(op, func(fd int) error { + err = unix.SetsockoptInt(fd, level, opt, value) + return err + }) + if doErr != nil { + return doErr + } + + return os.NewSyscallError(op, err) +} + +// Shutdown wraps shutdown(2). +func (c *Conn) Shutdown(how int) error { + const op = "shutdown" + + var err error + doErr := c.control(op, func(fd int) error { + err = unix.Shutdown(fd, how) + return err + }) + if doErr != nil { + return doErr + } + + return os.NewSyscallError(op, err) +} + +// Conn low-level read/write/control functions. These functions mirror the +// syscall.RawConn APIs but the input closures return errors rather than +// booleans. Any syscalls invoked within f should return their error to allow +// the Conn to check for readiness with the runtime network poller, or to retry +// operations which may have been interrupted by EINTR or similar. +// +// Note that errors from the input closure functions are not propagated to the +// error return values of read/write/control, and the caller is still +// responsible for error handling. + +// read executes f, a read function, against the associated file descriptor. +// op is used to create an *os.SyscallError if the file descriptor is closed. +func (c *Conn) read(op string, f func(fd int) error) error { + if atomic.LoadUint32(&c.closed) != 0 { + return os.NewSyscallError(op, unix.EBADF) + } + + return c.rc.Read(func(fd uintptr) bool { + return ready(f(int(fd))) + }) +} + +// write executes f, a write function, against the associated file descriptor. +// op is used to create an *os.SyscallError if the file descriptor is closed. +func (c *Conn) write(op string, f func(fd int) error) error { + if atomic.LoadUint32(&c.closed) != 0 { + return os.NewSyscallError(op, unix.EBADF) + } + + return c.rc.Write(func(fd uintptr) bool { + return ready(f(int(fd))) + }) +} + +// control executes f, a control function, against the associated file +// descriptor. op is used to create an *os.SyscallError if the file descriptor +// is closed. +func (c *Conn) control(op string, f func(fd int) error) error { + if atomic.LoadUint32(&c.closed) != 0 { + return os.NewSyscallError(op, unix.EBADF) + } + + return c.rc.Control(func(fd uintptr) { + // Repeatedly attempt the syscall(s) invoked by f until completion is + // indicated by the return value of ready. + for { + if ready(f(int(fd))) { + return + } + } + }) +} + +// ready indicates readiness based on the value of err. +func ready(err error) bool { + // When a socket is in non-blocking mode, we might see a variety of errors: + // - EAGAIN: most common case for a socket read not being ready + // - EINPROGRESS: reported by some sockets when first calling connect + // - EINTR: system call interrupted, more frequently occurs in Go 1.14+ + // because goroutines can be asynchronously preempted + // + // Return false to let the poller wait for readiness. See the source code + // for internal/poll.FD.RawRead for more details. + switch err { + case unix.EAGAIN, unix.EINPROGRESS, unix.EINTR: + // Not ready. + return false + default: + // Ready regardless of whether there was an error or no error. + return true + } +} diff --git a/src/runtime/vendor/github.com/mdlayher/socket/conn_linux.go b/src/runtime/vendor/github.com/mdlayher/socket/conn_linux.go new file mode 100644 index 000000000..275f641c1 --- /dev/null +++ b/src/runtime/vendor/github.com/mdlayher/socket/conn_linux.go @@ -0,0 +1,88 @@ +//go:build linux +// +build linux + +package socket + +import ( + "os" + "unsafe" + + "golang.org/x/net/bpf" + "golang.org/x/sys/unix" +) + +// SetBPF attaches an assembled BPF program to a Conn. +func (c *Conn) SetBPF(filter []bpf.RawInstruction) error { + // We can't point to the first instruction in the array if no instructions + // are present. + if len(filter) == 0 { + return os.NewSyscallError("setsockopt", unix.EINVAL) + } + + prog := unix.SockFprog{ + Len: uint16(len(filter)), + Filter: (*unix.SockFilter)(unsafe.Pointer(&filter[0])), + } + + return c.SetsockoptSockFprog(unix.SOL_SOCKET, unix.SO_ATTACH_FILTER, &prog) +} + +// RemoveBPF removes a BPF filter from a Conn. +func (c *Conn) RemoveBPF() error { + // 0 argument is ignored. + return c.SetsockoptInt(unix.SOL_SOCKET, unix.SO_DETACH_FILTER, 0) +} + +// SetsockoptSockFprog wraps setsockopt(2) for unix.SockFprog values. +func (c *Conn) SetsockoptSockFprog(level, opt int, fprog *unix.SockFprog) error { + const op = "setsockopt" + + var err error + doErr := c.control(op, func(fd int) error { + err = unix.SetsockoptSockFprog(fd, level, opt, fprog) + return err + }) + if doErr != nil { + return doErr + } + + return os.NewSyscallError(op, err) +} + +// GetSockoptTpacketStats wraps getsockopt(2) for getting TpacketStats +func (c *Conn) GetSockoptTpacketStats(level, name int) (*unix.TpacketStats, error) { + const op = "getsockopt" + + var ( + stats *unix.TpacketStats + err error + ) + + doErr := c.control(op, func(fd int) error { + stats, err = unix.GetsockoptTpacketStats(fd, level, name) + return err + }) + if doErr != nil { + return stats, doErr + } + return stats, os.NewSyscallError(op, err) +} + +// GetSockoptTpacketStatsV3 wraps getsockopt(2) for getting TpacketStatsV3 +func (c *Conn) GetSockoptTpacketStatsV3(level, name int) (*unix.TpacketStatsV3, error) { + const op = "getsockopt" + + var ( + stats *unix.TpacketStatsV3 + err error + ) + + doErr := c.control(op, func(fd int) error { + stats, err = unix.GetsockoptTpacketStatsV3(fd, level, name) + return err + }) + if doErr != nil { + return stats, doErr + } + return stats, os.NewSyscallError(op, err) +} diff --git a/src/runtime/vendor/github.com/mdlayher/socket/doc.go b/src/runtime/vendor/github.com/mdlayher/socket/doc.go new file mode 100644 index 000000000..7d4566c90 --- /dev/null +++ b/src/runtime/vendor/github.com/mdlayher/socket/doc.go @@ -0,0 +1,13 @@ +// Package socket provides a low-level network connection type which integrates +// with Go's runtime network poller to provide asynchronous I/O and deadline +// support. +// +// This package focuses on UNIX-like operating systems which make use of BSD +// sockets system call APIs. It is meant to be used as a foundation for the +// creation of operating system-specific socket packages, for socket families +// such as Linux's AF_NETLINK, AF_PACKET, or AF_VSOCK. This package should not +// be used directly in end user applications. +// +// Any use of package socket should be guarded by build tags, as one would also +// use when importing the syscall or golang.org/x/sys packages. +package socket diff --git a/src/runtime/vendor/github.com/mdlayher/socket/go.mod b/src/runtime/vendor/github.com/mdlayher/socket/go.mod new file mode 100644 index 000000000..ead5e027b --- /dev/null +++ b/src/runtime/vendor/github.com/mdlayher/socket/go.mod @@ -0,0 +1,10 @@ +module github.com/mdlayher/socket + +go 1.17 + +require ( + github.com/google/go-cmp v0.5.6 + golang.org/x/net v0.0.0-20190503192946-f4e77d36d62c + golang.org/x/sync v0.0.0-20210220032951-036812b2e83c + golang.org/x/sys v0.0.0-20210927094055-39ccf1dd6fa6 +) diff --git a/src/runtime/vendor/github.com/mdlayher/socket/go.sum b/src/runtime/vendor/github.com/mdlayher/socket/go.sum new file mode 100644 index 000000000..8a92791e8 --- /dev/null +++ b/src/runtime/vendor/github.com/mdlayher/socket/go.sum @@ -0,0 +1,13 @@ +github.com/google/go-cmp v0.5.6 h1:BKbKCqvP6I+rmFHt06ZmyQtvB8xAkWdhFyr0ZUNZcxQ= +github.com/google/go-cmp v0.5.6/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= +golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w= +golang.org/x/net v0.0.0-20190503192946-f4e77d36d62c h1:uOCk1iQW6Vc18bnC13MfzScl+wdKBmM9Y9kU7Z83/lw= +golang.org/x/net v0.0.0-20190503192946-f4e77d36d62c/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg= +golang.org/x/sync v0.0.0-20210220032951-036812b2e83c h1:5KslGYwFpkhGh+Q16bwMP3cOontH8FOep7tGV86Y7SQ= +golang.org/x/sync v0.0.0-20210220032951-036812b2e83c/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= +golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= +golang.org/x/sys v0.0.0-20210927094055-39ccf1dd6fa6 h1:foEbQz/B0Oz6YIqu/69kfXPYeFQAuuMYFkjaqXzl5Wo= +golang.org/x/sys v0.0.0-20210927094055-39ccf1dd6fa6/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= +golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= +golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543 h1:E7g+9GITq07hpfrRu66IVDexMakfv52eLZ2CXBWiKr4= +golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= diff --git a/src/runtime/vendor/github.com/mdlayher/socket/netns_linux.go b/src/runtime/vendor/github.com/mdlayher/socket/netns_linux.go new file mode 100644 index 000000000..b29115ad1 --- /dev/null +++ b/src/runtime/vendor/github.com/mdlayher/socket/netns_linux.go @@ -0,0 +1,150 @@ +//go:build linux +// +build linux + +package socket + +import ( + "errors" + "fmt" + "os" + "runtime" + + "golang.org/x/sync/errgroup" + "golang.org/x/sys/unix" +) + +// errNetNSDisabled is returned when network namespaces are unavailable on +// a given system. +var errNetNSDisabled = errors.New("socket: Linux network namespaces are not enabled on this system") + +// withNetNS invokes fn within the context of the network namespace specified by +// fd, while also managing the logic required to safely do so by manipulating +// thread-local state. +func withNetNS(fd int, fn func() (*Conn, error)) (*Conn, error) { + var ( + eg errgroup.Group + conn *Conn + ) + + eg.Go(func() error { + // Retrieve and store the calling OS thread's network namespace so the + // thread can be reassigned to it after creating a socket in another network + // namespace. + runtime.LockOSThread() + + ns, err := threadNetNS() + if err != nil { + // No thread-local manipulation, unlock. + runtime.UnlockOSThread() + return err + } + defer ns.Close() + + // Beyond this point, the thread's network namespace is poisoned. Do not + // unlock the OS thread until all network namespace manipulation completes + // to avoid returning to the caller with altered thread-local state. + + // Assign the current OS thread the goroutine is locked to to the given + // network namespace. + if err := ns.Set(fd); err != nil { + return err + } + + // Attempt Conn creation and unconditionally restore the original namespace. + c, err := fn() + if nerr := ns.Restore(); nerr != nil { + // Failed to restore original namespace. Return an error and allow the + // runtime to terminate the thread. + if err == nil { + _ = c.Close() + } + + return nerr + } + + // No more thread-local state manipulation; return the new Conn. + runtime.UnlockOSThread() + conn = c + return nil + }) + + if err := eg.Wait(); err != nil { + return nil, err + } + + return conn, nil +} + +// A netNS is a handle that can manipulate network namespaces. +// +// Operations performed on a netNS must use runtime.LockOSThread before +// manipulating any network namespaces. +type netNS struct { + // The handle to a network namespace. + f *os.File + + // Indicates if network namespaces are disabled on this system, and thus + // operations should become a no-op or return errors. + disabled bool +} + +// threadNetNS constructs a netNS using the network namespace of the calling +// thread. If the namespace is not the default namespace, runtime.LockOSThread +// should be invoked first. +func threadNetNS() (*netNS, error) { + return fileNetNS(fmt.Sprintf("/proc/self/task/%d/ns/net", unix.Gettid())) +} + +// fileNetNS opens file and creates a netNS. fileNetNS should only be called +// directly in tests. +func fileNetNS(file string) (*netNS, error) { + f, err := os.Open(file) + switch { + case err == nil: + return &netNS{f: f}, nil + case os.IsNotExist(err): + // Network namespaces are not enabled on this system. Use this signal + // to return errors elsewhere if the caller explicitly asks for a + // network namespace to be set. + return &netNS{disabled: true}, nil + default: + return nil, err + } +} + +// Close releases the handle to a network namespace. +func (n *netNS) Close() error { + return n.do(func() error { return n.f.Close() }) +} + +// FD returns a file descriptor which represents the network namespace. +func (n *netNS) FD() int { + if n.disabled { + // No reasonable file descriptor value in this case, so specify a + // non-existent one. + return -1 + } + + return int(n.f.Fd()) +} + +// Restore restores the original network namespace for the calling thread. +func (n *netNS) Restore() error { + return n.do(func() error { return n.Set(n.FD()) }) +} + +// Set sets a new network namespace for the current thread using fd. +func (n *netNS) Set(fd int) error { + return n.do(func() error { + return os.NewSyscallError("setns", unix.Setns(fd, unix.CLONE_NEWNET)) + }) +} + +// do runs fn if network namespaces are enabled on this system. +func (n *netNS) do(fn func() error) error { + if n.disabled { + return errNetNSDisabled + } + + return fn() +} diff --git a/src/runtime/vendor/github.com/mdlayher/socket/netns_others.go b/src/runtime/vendor/github.com/mdlayher/socket/netns_others.go new file mode 100644 index 000000000..4cceb3d04 --- /dev/null +++ b/src/runtime/vendor/github.com/mdlayher/socket/netns_others.go @@ -0,0 +1,14 @@ +//go:build !linux +// +build !linux + +package socket + +import ( + "fmt" + "runtime" +) + +// withNetNS returns an error on non-Linux systems. +func withNetNS(_ int, _ func() (*Conn, error)) (*Conn, error) { + return nil, fmt.Errorf("socket: Linux network namespace support is not available on %s", runtime.GOOS) +} diff --git a/src/runtime/vendor/github.com/mdlayher/socket/setbuffer_linux.go b/src/runtime/vendor/github.com/mdlayher/socket/setbuffer_linux.go new file mode 100644 index 000000000..0d4aa4417 --- /dev/null +++ b/src/runtime/vendor/github.com/mdlayher/socket/setbuffer_linux.go @@ -0,0 +1,24 @@ +//go:build linux +// +build linux + +package socket + +import "golang.org/x/sys/unix" + +// setReadBuffer wraps the SO_RCVBUF{,FORCE} setsockopt(2) options. +func (c *Conn) setReadBuffer(bytes int) error { + err := c.SetsockoptInt(unix.SOL_SOCKET, unix.SO_RCVBUFFORCE, bytes) + if err != nil { + err = c.SetsockoptInt(unix.SOL_SOCKET, unix.SO_RCVBUF, bytes) + } + return err +} + +// setWriteBuffer wraps the SO_SNDBUF{,FORCE} setsockopt(2) options. +func (c *Conn) setWriteBuffer(bytes int) error { + err := c.SetsockoptInt(unix.SOL_SOCKET, unix.SO_SNDBUFFORCE, bytes) + if err != nil { + err = c.SetsockoptInt(unix.SOL_SOCKET, unix.SO_SNDBUF, bytes) + } + return err +} diff --git a/src/runtime/vendor/github.com/mdlayher/socket/setbuffer_others.go b/src/runtime/vendor/github.com/mdlayher/socket/setbuffer_others.go new file mode 100644 index 000000000..72b36dbe3 --- /dev/null +++ b/src/runtime/vendor/github.com/mdlayher/socket/setbuffer_others.go @@ -0,0 +1,16 @@ +//go:build !linux +// +build !linux + +package socket + +import "golang.org/x/sys/unix" + +// setReadBuffer wraps the SO_RCVBUF setsockopt(2) option. +func (c *Conn) setReadBuffer(bytes int) error { + return c.SetsockoptInt(unix.SOL_SOCKET, unix.SO_RCVBUF, bytes) +} + +// setWriteBuffer wraps the SO_SNDBUF setsockopt(2) option. +func (c *Conn) setWriteBuffer(bytes int) error { + return c.SetsockoptInt(unix.SOL_SOCKET, unix.SO_SNDBUF, bytes) +} diff --git a/src/runtime/vendor/github.com/mdlayher/socket/typ_cloexec_nonblock.go b/src/runtime/vendor/github.com/mdlayher/socket/typ_cloexec_nonblock.go new file mode 100644 index 000000000..40e834310 --- /dev/null +++ b/src/runtime/vendor/github.com/mdlayher/socket/typ_cloexec_nonblock.go @@ -0,0 +1,12 @@ +//go:build !darwin +// +build !darwin + +package socket + +import "golang.org/x/sys/unix" + +const ( + // These operating systems support CLOEXEC and NONBLOCK socket options. + flagCLOEXEC = true + socketFlags = unix.SOCK_CLOEXEC | unix.SOCK_NONBLOCK +) diff --git a/src/runtime/vendor/github.com/mdlayher/socket/typ_none.go b/src/runtime/vendor/github.com/mdlayher/socket/typ_none.go new file mode 100644 index 000000000..9bbb1aab5 --- /dev/null +++ b/src/runtime/vendor/github.com/mdlayher/socket/typ_none.go @@ -0,0 +1,11 @@ +//go:build darwin +// +build darwin + +package socket + +const ( + // These operating systems do not support CLOEXEC and NONBLOCK socket + // options. + flagCLOEXEC = false + socketFlags = 0 +) diff --git a/src/runtime/vendor/github.com/mdlayher/vsock/CHANGELOG.md b/src/runtime/vendor/github.com/mdlayher/vsock/CHANGELOG.md new file mode 100644 index 000000000..61be08d8b --- /dev/null +++ b/src/runtime/vendor/github.com/mdlayher/vsock/CHANGELOG.md @@ -0,0 +1,35 @@ +# CHANGELOG + +## Unreleased + +## v1.1.0 + +- [New API] [commit](https://github.com/mdlayher/vsock/commit/44cd82dc5f7de644436f22236b111ab97fa9a14f): + `vsock.FileListener` can be used to create a `vsock.Listener` from an existing + `os.File`, which may be provided by systemd socket activation or another + external mechanism. + +## v1.0.1 + +- [Bug Fix] [commit](https://github.com/mdlayher/vsock/commit/99a6dccdebad21d1fa5f757d228d677ccb1412dc): + upgrade `github.com/mdlayher/socket` to handle non-blocking `connect(2)` + errors (called in `vsock.Dial`) properly by checking the `SO_ERROR` socket + option. Lock in this behavior with a new test. +- [Improvement] [commit](https://github.com/mdlayher/vsock/commit/375f3bbcc363500daf367ec511638a4655471719): + downgrade the version of `golang.org/x/net` in use to support Go 1.12. We + don't need the latest version for this package. + +## v1.0.0 + +**This is the first release of package vsock that only supports Go 1.12+. +Users on older versions of Go must use an unstable release.** + +- Initial stable commit! +- [API change]: the `vsock.Dial` and `vsock.Listen` constructors now accept an + optional `*vsock.Config` parameter to enable future expansion in v1.x.x + without prompting further breaking API changes. Because `vsock.Config` has no + options as of this release, `nil` may be passed in all call sites to fix + existing code upon upgrading to v1.0.0. +- [New API]: the `vsock.ListenContextID` function can be used to create a + `*vsock.Listener` which is bound to an explicit context ID address, rather + than inferring one automatically as `vsock.Listen` does. diff --git a/src/runtime/vendor/github.com/mdlayher/vsock/LICENSE.md b/src/runtime/vendor/github.com/mdlayher/vsock/LICENSE.md index ffcdf89c9..9fa6774b1 100644 --- a/src/runtime/vendor/github.com/mdlayher/vsock/LICENSE.md +++ b/src/runtime/vendor/github.com/mdlayher/vsock/LICENSE.md @@ -1,7 +1,6 @@ -MIT License -=========== +# MIT License -Copyright (C) 2017 Matt Layher +Copyright (C) 2017-2022 Matt Layher Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: diff --git a/src/runtime/vendor/github.com/mdlayher/vsock/README.md b/src/runtime/vendor/github.com/mdlayher/vsock/README.md index dd0ffe805..6395502bf 100644 --- a/src/runtime/vendor/github.com/mdlayher/vsock/README.md +++ b/src/runtime/vendor/github.com/mdlayher/vsock/README.md @@ -1,32 +1,30 @@ -# vsock [![builds.sr.ht status](https://builds.sr.ht/~mdlayher/vsock.svg)](https://builds.sr.ht/~mdlayher/vsock?) [![GoDoc](https://godoc.org/github.com/mdlayher/vsock?status.svg)](https://godoc.org/github.com/mdlayher/vsock) [![Go Report Card](https://goreportcard.com/badge/github.com/mdlayher/vsock)](https://goreportcard.com/report/github.com/mdlayher/vsock) +# vsock [![Test Status](https://github.com/mdlayher/vsock/workflows/Linux%20Test/badge.svg)](https://github.com/mdlayher/vsock/actions) [![Go Reference](https://pkg.go.dev/badge/github.com/mdlayher/vsock.svg)](https://pkg.go.dev/github.com/mdlayher/vsock) [![Go Report Card](https://goreportcard.com/badge/github.com/mdlayher/vsock)](https://goreportcard.com/report/github.com/mdlayher/vsock) Package `vsock` provides access to Linux VM sockets (`AF_VSOCK`) for communication between a hypervisor and its virtual machines. MIT Licensed. For more information about VM sockets, check out my blog about -[Linux VM sockets in Go](https://medium.com/@mdlayher/linux-vm-sockets-in-go-ea11768e9e67). - -## Go version support - -This package supports varying levels of functionality depending on the version -of Go used during compilation. The `Listener` and `Conn` types produced by this -package are backed by non-blocking I/O, in order to integrate with Go's runtime -network poller in Go 1.11+. Additional functionality is available starting in Go -1.12+. The older Go 1.10 is only supported in a blocking-only mode. - -A comprehensive list of functionality for supported Go versions can be found on -[package vsock's GoDoc page](https://godoc.org/github.com/mdlayher/vsock#hdr-Go_version_support). +[Linux VM sockets in Go](https://mdlayher.com/blog/linux-vm-sockets-in-go/). ## Stability -At this time, package `vsock` is in a pre-v1.0.0 state. Changes are being made -which may impact the exported API of this package and others in its ecosystem. +See the [CHANGELOG](./CHANGELOG.md) file for a description of changes between +releases. -**If you depend on this package in your application, please use Go modules when -building your application.** +This package has a stable v1 API and any future breaking changes will prompt +the release of a new major version. Features and bug fixes will continue to +occur in the v1.x.x series. + +In order to reduce the maintenance burden, this package is only supported on +Go 1.12+. Older versions of Go lack critical features and APIs which are +necessary for this package to function correctly. + +**If you depend on this package in your applications, please use Go modules.** ## Requirements +**It's possible these requirements are out of date. PRs are welcome.** + To make use of VM sockets with QEMU and virtio-vsock, you must have: - a Linux hypervisor with kernel 4.8+ @@ -53,10 +51,3 @@ Check out the [QEMU wiki page on virtio-vsock](http://wiki.qemu-project.org/Features/VirtioVsock) for more details. More detail on setting up this environment will be provided in the future. - -## Usage - -To try out VM sockets and see an example of how they work, see -[cmd/vscp](https://github.com/mdlayher/vsock/tree/master/cmd/vscp). -This command shows usage of the `vsock.ListenStream` and `vsock.DialStream` -APIs, and allows users to easily test VM sockets on their systems. diff --git a/src/runtime/vendor/github.com/mdlayher/vsock/conn_linux.go b/src/runtime/vendor/github.com/mdlayher/vsock/conn_linux.go index d9558cd5d..684cf4c98 100644 --- a/src/runtime/vendor/github.com/mdlayher/vsock/conn_linux.go +++ b/src/runtime/vendor/github.com/mdlayher/vsock/conn_linux.go @@ -1,72 +1,55 @@ -//+build linux +//go:build linux +// +build linux package vsock import ( + "github.com/mdlayher/socket" "golang.org/x/sys/unix" ) -// newConn creates a Conn using a connFD, immediately setting the connFD to -// non-blocking mode for use with the runtime network poller. -func newConn(cfd connFD, local, remote *Addr) (*Conn, error) { - // Note: if any calls fail after this point, cfd.Close should be invoked - // for cleanup because the socket is now non-blocking. - if err := cfd.SetNonblocking(local.fileName()); err != nil { - return nil, err - } - - return &Conn{ - fd: cfd, - local: local, - remote: remote, - }, nil -} - // dial is the entry point for Dial on Linux. -func dial(cid, port uint32) (*Conn, error) { - cfd, err := newConnFD() +func dial(cid, port uint32, _ *Config) (*Conn, error) { + // TODO(mdlayher): Config default nil check and initialize. Pass options to + // socket.Config where necessary. + + c, err := socket.Socket(unix.AF_VSOCK, unix.SOCK_STREAM, 0, "vsock", nil) if err != nil { return nil, err } - return dialLinux(cfd, cid, port) -} - -// dialLinux is the entry point for tests on Linux. -func dialLinux(cfd connFD, cid, port uint32) (c *Conn, err error) { - defer func() { - if err != nil { - // If any system calls fail during setup, the socket must be closed - // to avoid file descriptor leaks. - _ = cfd.EarlyClose() - } - }() - - rsa := &unix.SockaddrVM{ - CID: cid, - Port: port, - } - - if err := cfd.Connect(rsa); err != nil { + sa := &unix.SockaddrVM{CID: cid, Port: port} + rsa, err := c.Connect(sa) + if err != nil { + _ = c.Close() return nil, err } - lsa, err := cfd.Getsockname() + // TODO(mdlayher): getpeername(2) appears to return nil in the GitHub CI + // environment, so in the event of a nil sockaddr, fall back to the previous + // method of synthesizing the remote address. + if rsa == nil { + rsa = sa + } + + lsa, err := c.Getsockname() if err != nil { + _ = c.Close() return nil, err } lsavm := lsa.(*unix.SockaddrVM) + rsavm := rsa.(*unix.SockaddrVM) - local := &Addr{ - ContextID: lsavm.CID, - Port: lsavm.Port, - } - - remote := &Addr{ - ContextID: cid, - Port: port, - } - - return newConn(cfd, local, remote) + return &Conn{ + c: c, + local: &Addr{ + ContextID: lsavm.CID, + Port: lsavm.Port, + }, + remote: &Addr{ + ContextID: rsavm.CID, + Port: rsavm.Port, + }, + }, nil } diff --git a/src/runtime/vendor/github.com/mdlayher/vsock/doc.go b/src/runtime/vendor/github.com/mdlayher/vsock/doc.go index a26534146..e158b1836 100644 --- a/src/runtime/vendor/github.com/mdlayher/vsock/doc.go +++ b/src/runtime/vendor/github.com/mdlayher/vsock/doc.go @@ -7,54 +7,4 @@ // - *Addr implements net.Addr // - *Conn implements net.Conn // - *Listener implements net.Listener -// -// Go version support -// -// This package supports varying levels of functionality depending on the version -// of Go used during compilation. The Listener and Conn types produced by this -// package are backed by non-blocking I/O, in order to integrate with Go's -// runtime network poller in Go 1.11+. Additional functionality is available -// starting in Go 1.12+. -// -// Go 1.12+ (recommended): -// - *Listener: -// - Accept blocks until a connection is received -// - Close can interrupt Accept and make it return a permanent error -// - SetDeadline can set timeouts which can interrupt Accept and make it return a -// temporary error -// - *Conn: -// - SetDeadline family of methods are fully supported -// - CloseRead and CloseWrite can close the reading or writing sides of a -// Conn, respectively -// - SyscallConn provides access to raw network control/read/write functionality -// -// Go 1.11 (not recommended): -// - *Listener: -// - Accept is non-blocking and should be called in a loop, checking for -// net.Error.Temporary() == true and sleeping for a short period to avoid wasteful -// CPU cycle consumption -// - Close makes Accept return a permanent error on the next loop iteration -// - SetDeadline is not supported and will always return an error -// - *Conn: -// - SetDeadline family of methods are fully supported -// - CloseRead and CloseWrite are not supported and will always return an error -// - SyscallConn is not supported and will always return an error -// -// Go 1.10 (not recommended): -// - *Listener: -// - Accept blocks until a connection is received -// - Close cannot unblock Accept -// - SetDeadline is not supported and will always return an error -// - *Conn: -// - SetDeadline is not supported and will always return an error -// - CloseRead and CloseWrite are not supported and will always return an error -// - SyscallConn is not supported and will always return an error -// -// Stability -// -// At this time, package vsock is in a pre-v1.0.0 state. Changes are being made -// which may impact the exported API of this package and others in its ecosystem. -// -// If you depend on this package in your application, please use Go modules when -// building your application. package vsock diff --git a/src/runtime/vendor/github.com/mdlayher/vsock/fd_linux.go b/src/runtime/vendor/github.com/mdlayher/vsock/fd_linux.go index 4c593047b..531e53f92 100644 --- a/src/runtime/vendor/github.com/mdlayher/vsock/fd_linux.go +++ b/src/runtime/vendor/github.com/mdlayher/vsock/fd_linux.go @@ -2,10 +2,7 @@ package vsock import ( "fmt" - "io" "os" - "syscall" - "time" "golang.org/x/sys/unix" ) @@ -21,192 +18,6 @@ func contextID() (uint32, error) { return unix.IoctlGetUint32(int(f.Fd()), unix.IOCTL_VM_SOCKETS_GET_LOCAL_CID) } -// A listenFD is a type that wraps a file descriptor used to implement -// net.Listener. -type listenFD interface { - io.Closer - EarlyClose() error - Accept4(flags int) (connFD, unix.Sockaddr, error) - Bind(sa unix.Sockaddr) error - Listen(n int) error - Getsockname() (unix.Sockaddr, error) - SetNonblocking(name string) error - SetDeadline(t time.Time) error -} - -var _ listenFD = &sysListenFD{} - -// A sysListenFD is the system call implementation of listenFD. -type sysListenFD struct { - // These fields should never be non-zero at the same time. - fd int // Used in blocking mode. - f *os.File // Used in non-blocking mode. -} - -// newListenFD creates a sysListenFD in its default blocking mode. -func newListenFD() (*sysListenFD, error) { - fd, err := socket() - if err != nil { - return nil, err - } - - return &sysListenFD{ - fd: fd, - }, nil -} - -// Blocking mode methods. - -func (lfd *sysListenFD) Bind(sa unix.Sockaddr) error { return unix.Bind(lfd.fd, sa) } -func (lfd *sysListenFD) Getsockname() (unix.Sockaddr, error) { return unix.Getsockname(lfd.fd) } -func (lfd *sysListenFD) Listen(n int) error { return unix.Listen(lfd.fd, n) } - -func (lfd *sysListenFD) SetNonblocking(name string) error { - return lfd.setNonblocking(name) -} - -// EarlyClose is a blocking version of Close, only used for cleanup before -// entering non-blocking mode. -func (lfd *sysListenFD) EarlyClose() error { return unix.Close(lfd.fd) } - -// Non-blocking mode methods. - -func (lfd *sysListenFD) Accept4(flags int) (connFD, unix.Sockaddr, error) { - // Invoke Go version-specific logic for accept. - newFD, sa, err := lfd.accept4(flags) - if err != nil { - return nil, nil, err - } - - // Create a non-blocking connFD which will be used to implement net.Conn. - cfd := &sysConnFD{fd: newFD} - return cfd, sa, nil -} - -func (lfd *sysListenFD) Close() error { - // In Go 1.12+, *os.File.Close will also close the runtime network poller - // file descriptor, so that net.Listener.Accept can stop blocking. - return lfd.f.Close() -} - -func (lfd *sysListenFD) SetDeadline(t time.Time) error { - // Invoke Go version-specific logic for setDeadline. - return lfd.setDeadline(t) -} - -// A connFD is a type that wraps a file descriptor used to implement net.Conn. -type connFD interface { - io.ReadWriteCloser - EarlyClose() error - Connect(sa unix.Sockaddr) error - Getsockname() (unix.Sockaddr, error) - Shutdown(how int) error - SetNonblocking(name string) error - SetDeadline(t time.Time, typ deadlineType) error - SyscallConn() (syscall.RawConn, error) -} - -var _ connFD = &sysConnFD{} - -// newConnFD creates a sysConnFD in its default blocking mode. -func newConnFD() (*sysConnFD, error) { - fd, err := socket() - if err != nil { - return nil, err - } - - return &sysConnFD{ - fd: fd, - }, nil -} - -// A sysConnFD is the system call implementation of connFD. -type sysConnFD struct { - // These fields should never be non-zero at the same time. - fd int // Used in blocking mode. - f *os.File // Used in non-blocking mode. -} - -// Blocking mode methods. - -func (cfd *sysConnFD) Connect(sa unix.Sockaddr) error { return unix.Connect(cfd.fd, sa) } -func (cfd *sysConnFD) Getsockname() (unix.Sockaddr, error) { return unix.Getsockname(cfd.fd) } - -// EarlyClose is a blocking version of Close, only used for cleanup before -// entering non-blocking mode. -func (cfd *sysConnFD) EarlyClose() error { return unix.Close(cfd.fd) } - -func (cfd *sysConnFD) SetNonblocking(name string) error { - return cfd.setNonblocking(name) -} - -// Non-blocking mode methods. - -func (cfd *sysConnFD) Close() error { - // *os.File.Close will also close the runtime network poller file descriptor, - // so that read/write can stop blocking. - return cfd.f.Close() -} - -func (cfd *sysConnFD) Read(b []byte) (int, error) { return cfd.f.Read(b) } -func (cfd *sysConnFD) Write(b []byte) (int, error) { return cfd.f.Write(b) } - -func (cfd *sysConnFD) Shutdown(how int) error { - switch how { - case unix.SHUT_RD, unix.SHUT_WR: - return cfd.shutdown(how) - default: - panicf("vsock: sysConnFD.Shutdown method invoked with invalid how constant: %d", how) - return nil - } -} - -func (cfd *sysConnFD) SetDeadline(t time.Time, typ deadlineType) error { - return cfd.setDeadline(t, typ) -} - -func (cfd *sysConnFD) SyscallConn() (syscall.RawConn, error) { return cfd.syscallConn() } - -// socket invokes unix.Socket with the correct arguments to produce a vsock -// file descriptor. -func socket() (int, error) { - // "Mirror what the standard library does when creating file - // descriptors: avoid racing a fork/exec with the creation - // of new file descriptors, so that child processes do not - // inherit [socket] file descriptors unexpectedly. - // - // On Linux, SOCK_CLOEXEC was introduced in 2.6.27. OTOH, - // Go supports Linux 2.6.23 and above. If we get EINVAL on - // the first try, it may be that we are running on a kernel - // older than 2.6.27. In that case, take syscall.ForkLock - // and try again without SOCK_CLOEXEC. - // - // For a more thorough explanation, see similar work in the - // Go tree: func sysSocket in net/sock_cloexec.go, as well - // as the detailed comment in syscall/exec_unix.go." - // - // Explanation copied from netlink, courtesy of acln: - // https://github.com/mdlayher/netlink/pull/138. - fd, err := unix.Socket(unix.AF_VSOCK, unix.SOCK_STREAM|unix.SOCK_CLOEXEC, 0) - switch err { - case nil: - return fd, nil - case unix.EINVAL: - syscall.ForkLock.RLock() - defer syscall.ForkLock.RUnlock() - - fd, err = unix.Socket(unix.AF_VSOCK, unix.SOCK_STREAM, 0) - if err != nil { - return 0, err - } - unix.CloseOnExec(fd) - - return fd, nil - default: - return 0, err - } -} - // isErrno determines if an error a matches UNIX error number. func isErrno(err error, errno int) bool { switch errno { diff --git a/src/runtime/vendor/github.com/mdlayher/vsock/fd_linux_1.10.go b/src/runtime/vendor/github.com/mdlayher/vsock/fd_linux_1.10.go deleted file mode 100644 index aa214feb2..000000000 --- a/src/runtime/vendor/github.com/mdlayher/vsock/fd_linux_1.10.go +++ /dev/null @@ -1,63 +0,0 @@ -//+build go1.10,!go1.11,linux - -package vsock - -import ( - "fmt" - "os" - "runtime" - "syscall" - "time" - - "golang.org/x/sys/unix" -) - -func (lfd *sysListenFD) accept4(flags int) (int, unix.Sockaddr, error) { - // In Go 1.11, accept on the raw file descriptor directly, because lfd.f - // may be attached to the runtime network poller, forcing this call to block - // even if Close is called. - return unix.Accept4(lfd.fd, flags) -} - -func (lfd *sysListenFD) setNonblocking(name string) error { - // Go 1.10 doesn't support non-blocking I/O. - if err := unix.SetNonblock(lfd.fd, false); err != nil { - return err - } - - lfd.f = os.NewFile(uintptr(lfd.fd), name) - - return nil -} - -func (*sysListenFD) setDeadline(_ time.Time) error { - // Listener deadlines won't work as expected in this version of Go, so - // return an early error. - return fmt.Errorf("vsock: listener deadlines not supported on %s", runtime.Version()) -} - -func (*sysConnFD) shutdown(_ int) error { - // Shutdown functionality is not available in this version on Go. - return fmt.Errorf("vsock: close conn read/write not supported on %s", runtime.Version()) -} - -func (*sysConnFD) syscallConn() (syscall.RawConn, error) { - // SyscallConn functionality is not available in this version on Go. - return nil, fmt.Errorf("vsock: syscall conn not supported on %s", runtime.Version()) -} - -func (cfd *sysConnFD) setNonblocking(name string) error { - // Go 1.10 doesn't support non-blocking I/O. - if err := unix.SetNonblock(cfd.fd, false); err != nil { - return err - } - - cfd.f = os.NewFile(uintptr(cfd.fd), name) - - return nil -} - -func (cfd *sysConnFD) setDeadline(t time.Time, typ deadlineType) error { - // Deadline functionality is not available in this version on Go. - return fmt.Errorf("vsock: connection deadlines not supported on %s", runtime.Version()) -} diff --git a/src/runtime/vendor/github.com/mdlayher/vsock/fd_linux_1.11.go b/src/runtime/vendor/github.com/mdlayher/vsock/fd_linux_1.11.go deleted file mode 100644 index 90991a0f5..000000000 --- a/src/runtime/vendor/github.com/mdlayher/vsock/fd_linux_1.11.go +++ /dev/null @@ -1,76 +0,0 @@ -//+build go1.11,!go1.12,linux - -package vsock - -import ( - "fmt" - "os" - "runtime" - "syscall" - "time" - - "golang.org/x/sys/unix" -) - -func (lfd *sysListenFD) accept4(flags int) (int, unix.Sockaddr, error) { - // In Go 1.11, accept on the raw file descriptor directly, because lfd.f - // may be attached to the runtime network poller, forcing this call to block - // even if Close is called. - return unix.Accept4(lfd.fd, flags) -} - -func (lfd *sysListenFD) setNonblocking(name string) error { - // From now on, we must perform non-blocking I/O, so that our - // net.Listener.Accept method can be interrupted by closing the socket. - if err := unix.SetNonblock(lfd.fd, true); err != nil { - return err - } - - // Transition from blocking mode to non-blocking mode. - lfd.f = os.NewFile(uintptr(lfd.fd), name) - - return nil -} - -func (*sysListenFD) setDeadline(_ time.Time) error { - // Listener deadlines won't work as expected in this version of Go, so - // return an early error. - return fmt.Errorf("vsock: listener deadlines not supported on %s", runtime.Version()) -} - -func (*sysConnFD) shutdown(_ int) error { - // Shutdown functionality is not available in this version on Go. - return fmt.Errorf("vsock: close conn read/write not supported on %s", runtime.Version()) -} - -func (*sysConnFD) syscallConn() (syscall.RawConn, error) { - // SyscallConn functionality is not available in this version on Go. - return nil, fmt.Errorf("vsock: syscall conn not supported on %s", runtime.Version()) -} - -func (cfd *sysConnFD) setNonblocking(name string) error { - // From now on, we must perform non-blocking I/O, so that our deadline - // methods work, and the connection can be interrupted by net.Conn.Close. - if err := unix.SetNonblock(cfd.fd, true); err != nil { - return err - } - - // Transition from blocking mode to non-blocking mode. - cfd.f = os.NewFile(uintptr(cfd.fd), name) - - return nil -} - -func (cfd *sysConnFD) setDeadline(t time.Time, typ deadlineType) error { - switch typ { - case deadline: - return cfd.f.SetDeadline(t) - case readDeadline: - return cfd.f.SetReadDeadline(t) - case writeDeadline: - return cfd.f.SetWriteDeadline(t) - } - - panicf("vsock: sysConnFD.SetDeadline method invoked with invalid deadline type constant: %d", typ) - return nil -} diff --git a/src/runtime/vendor/github.com/mdlayher/vsock/fd_linux_gteq_1.12.go b/src/runtime/vendor/github.com/mdlayher/vsock/fd_linux_gteq_1.12.go deleted file mode 100644 index b90a45e03..000000000 --- a/src/runtime/vendor/github.com/mdlayher/vsock/fd_linux_gteq_1.12.go +++ /dev/null @@ -1,112 +0,0 @@ -//+build go1.12,linux - -package vsock - -import ( - "os" - "syscall" - "time" - - "golang.org/x/sys/unix" -) - -func (lfd *sysListenFD) accept4(flags int) (int, unix.Sockaddr, error) { - // In Go 1.12+, we make use of runtime network poller integration to allow - // net.Listener.Accept to be unblocked by a call to net.Listener.Close. - rc, err := lfd.f.SyscallConn() - if err != nil { - return 0, nil, err - } - - var ( - newFD int - sa unix.Sockaddr - ) - - doErr := rc.Read(func(fd uintptr) bool { - newFD, sa, err = unix.Accept4(int(fd), flags) - - switch err { - case unix.EAGAIN, unix.ECONNABORTED: - // Return false to let the poller wait for readiness. See the - // source code for internal/poll.FD.RawRead for more details. - // - // When the socket is in non-blocking mode, we might see EAGAIN if - // the socket is not ready for reading. - // - // In addition, the network poller's accept implementation also - // deals with ECONNABORTED, in case a socket is closed before it is - // pulled from our listen queue. - return false - default: - // No error or some unrecognized error, treat this Read operation - // as completed. - return true - } - }) - if doErr != nil { - return 0, nil, doErr - } - - return newFD, sa, err -} - -func (lfd *sysListenFD) setDeadline(t time.Time) error { return lfd.f.SetDeadline(t) } - -func (lfd *sysListenFD) setNonblocking(name string) error { - // From now on, we must perform non-blocking I/O, so that our - // net.Listener.Accept method can be interrupted by closing the socket. - if err := unix.SetNonblock(lfd.fd, true); err != nil { - return err - } - - // Transition from blocking mode to non-blocking mode. - lfd.f = os.NewFile(uintptr(lfd.fd), name) - - return nil -} - -func (cfd *sysConnFD) shutdown(how int) error { - rc, err := cfd.f.SyscallConn() - if err != nil { - return err - } - - doErr := rc.Control(func(fd uintptr) { - err = unix.Shutdown(int(fd), how) - }) - if doErr != nil { - return doErr - } - - return err -} - -func (cfd *sysConnFD) syscallConn() (syscall.RawConn, error) { return cfd.f.SyscallConn() } - -func (cfd *sysConnFD) setNonblocking(name string) error { - // From now on, we must perform non-blocking I/O, so that our deadline - // methods work, and the connection can be interrupted by net.Conn.Close. - if err := unix.SetNonblock(cfd.fd, true); err != nil { - return err - } - - // Transition from blocking mode to non-blocking mode. - cfd.f = os.NewFile(uintptr(cfd.fd), name) - - return nil -} - -func (cfd *sysConnFD) setDeadline(t time.Time, typ deadlineType) error { - switch typ { - case deadline: - return cfd.f.SetDeadline(t) - case readDeadline: - return cfd.f.SetReadDeadline(t) - case writeDeadline: - return cfd.f.SetWriteDeadline(t) - } - - panicf("vsock: sysConnFD.SetDeadline method invoked with invalid deadline type constant: %d", typ) - return nil -} diff --git a/src/runtime/vendor/github.com/mdlayher/vsock/go.mod b/src/runtime/vendor/github.com/mdlayher/vsock/go.mod index bf837d952..0beb9b692 100644 --- a/src/runtime/vendor/github.com/mdlayher/vsock/go.mod +++ b/src/runtime/vendor/github.com/mdlayher/vsock/go.mod @@ -3,7 +3,8 @@ module github.com/mdlayher/vsock go 1.13 require ( - github.com/google/go-cmp v0.3.1 - golang.org/x/net v0.0.0-20191108221443-4ba9e2ef068c - golang.org/x/sys v0.0.0-20191105231009-c1f44814a5cd + github.com/google/go-cmp v0.5.7 + github.com/mdlayher/socket v0.2.0 + golang.org/x/net v0.0.0-20190503192946-f4e77d36d62c + golang.org/x/sys v0.0.0-20220204135822-1c1b9b1eba6a ) diff --git a/src/runtime/vendor/github.com/mdlayher/vsock/go.sum b/src/runtime/vendor/github.com/mdlayher/vsock/go.sum index 373a00e3b..54e861d3c 100644 --- a/src/runtime/vendor/github.com/mdlayher/vsock/go.sum +++ b/src/runtime/vendor/github.com/mdlayher/vsock/go.sum @@ -1,16 +1,17 @@ -github.com/google/go-cmp v0.3.0 h1:crn/baboCvb5fXaQ0IJ1SGTsTVrWpDsCWC8EGETZijY= -github.com/google/go-cmp v0.3.0/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU= -github.com/google/go-cmp v0.3.1 h1:Xye71clBPdm5HgqGwUkwhbynsUJZhDbS20FvLhQ2izg= -github.com/google/go-cmp v0.3.1/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU= +github.com/google/go-cmp v0.5.6/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= +github.com/google/go-cmp v0.5.7 h1:81/ik6ipDQS2aGcBfIN5dHDB36BwrStyeAQquSYCV4o= +github.com/google/go-cmp v0.5.7/go.mod h1:n+brtR0CgQNWTVd5ZUFpTBC8YFBDLK/h/bpaJ8/DtOE= +github.com/mdlayher/socket v0.2.0 h1:EY4YQd6hTAg2tcXF84p5DTHazShE50u5HeBzBaNgjkA= +github.com/mdlayher/socket v0.2.0/go.mod h1:QLlNPkFR88mRUNQIzRBMfXxwKal8H7u1h3bL1CV+f0E= golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w= golang.org/x/net v0.0.0-20190503192946-f4e77d36d62c h1:uOCk1iQW6Vc18bnC13MfzScl+wdKBmM9Y9kU7Z83/lw= golang.org/x/net v0.0.0-20190503192946-f4e77d36d62c/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg= -golang.org/x/net v0.0.0-20191108221443-4ba9e2ef068c h1:SRpq/kuj/xNci/RdvEs+RSvpfxqvLAzTKuKGlzoGdZQ= -golang.org/x/net v0.0.0-20191108221443-4ba9e2ef068c/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= -golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a h1:1BGLXjeY4akVXGgbC9HugT3Jv3hCI0z56oJR5vAMgBU= +golang.org/x/sync v0.0.0-20210220032951-036812b2e83c h1:5KslGYwFpkhGh+Q16bwMP3cOontH8FOep7tGV86Y7SQ= +golang.org/x/sync v0.0.0-20210220032951-036812b2e83c/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= -golang.org/x/sys v0.0.0-20190509141414-a5b02f93d862 h1:rM0ROo5vb9AdYJi1110yjWGMej9ITfKddS89P3Fkhug= -golang.org/x/sys v0.0.0-20190509141414-a5b02f93d862/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20191105231009-c1f44814a5cd h1:3x5uuvBgE6oaXJjCOvpCC1IpgJogqQ+PqGGU3ZxAgII= -golang.org/x/sys v0.0.0-20191105231009-c1f44814a5cd/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20210927094055-39ccf1dd6fa6/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= +golang.org/x/sys v0.0.0-20220204135822-1c1b9b1eba6a h1:ppl5mZgokTT8uPkmYOyEUmPTr3ypaKkg5eFOGrAmxxE= +golang.org/x/sys v0.0.0-20220204135822-1c1b9b1eba6a/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= +golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543 h1:E7g+9GITq07hpfrRu66IVDexMakfv52eLZ2CXBWiKr4= +golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= diff --git a/src/runtime/vendor/github.com/mdlayher/vsock/listener_linux.go b/src/runtime/vendor/github.com/mdlayher/vsock/listener_linux.go index 2c16a336a..458449ad1 100644 --- a/src/runtime/vendor/github.com/mdlayher/vsock/listener_linux.go +++ b/src/runtime/vendor/github.com/mdlayher/vsock/listener_linux.go @@ -1,11 +1,14 @@ -//+build linux +//go:build linux +// +build linux package vsock import ( "net" + "os" "time" + "github.com/mdlayher/socket" "golang.org/x/sys/unix" ) @@ -14,93 +17,107 @@ var _ net.Listener = &listener{} // A listener is the net.Listener implementation for connection-oriented // VM sockets. type listener struct { - fd listenFD + c *socket.Conn addr *Addr } // Addr and Close implement the net.Listener interface for listener. func (l *listener) Addr() net.Addr { return l.addr } -func (l *listener) Close() error { return l.fd.Close() } -func (l *listener) SetDeadline(t time.Time) error { return l.fd.SetDeadline(t) } +func (l *listener) Close() error { return l.c.Close() } +func (l *listener) SetDeadline(t time.Time) error { return l.c.SetDeadline(t) } // Accept accepts a single connection from the listener, and sets up // a net.Conn backed by conn. func (l *listener) Accept() (net.Conn, error) { - // Mimic what internal/poll does and close on exec, but leave it up to - // newConn to set non-blocking mode. - // See: https://golang.org/src/internal/poll/sock_cloexec.go. - // - // TODO(mdlayher): acquire syscall.ForkLock.RLock here once the Go 1.11 - // code can be removed and we're fully using the runtime network poller in - // non-blocking mode. - cfd, sa, err := l.fd.Accept4(unix.SOCK_CLOEXEC) + c, rsa, err := l.c.Accept(0) if err != nil { return nil, err } - savm := sa.(*unix.SockaddrVM) + savm := rsa.(*unix.SockaddrVM) remote := &Addr{ ContextID: savm.CID, Port: savm.Port, } - return newConn(cfd, l.addr, remote) + return &Conn{ + c: c, + local: l.addr, + remote: remote, + }, nil } +// name is the socket name passed to package socket. +const name = "vsock" + // listen is the entry point for Listen on Linux. -func listen(cid, port uint32) (*Listener, error) { - lfd, err := newListenFD() +func listen(cid, port uint32, _ *Config) (*Listener, error) { + // TODO(mdlayher): Config default nil check and initialize. Pass options to + // socket.Config where necessary. + + c, err := socket.Socket(unix.AF_VSOCK, unix.SOCK_STREAM, 0, name, nil) if err != nil { return nil, err } - return listenLinux(lfd, cid, port) -} + // Be sure to close the Conn if any of the system calls fail before we + // return the Conn to the caller. -// listenLinux is the entry point for tests on Linux. -func listenLinux(lfd listenFD, cid, port uint32) (l *Listener, err error) { - defer func() { - if err != nil { - // If any system calls fail during setup, the socket must be closed - // to avoid file descriptor leaks. - _ = lfd.EarlyClose() - } - }() - - // Zero-value for "any port" is friendlier in Go than a constant. if port == 0 { port = unix.VMADDR_PORT_ANY } - sa := &unix.SockaddrVM{ - CID: cid, - Port: port, - } - - if err := lfd.Bind(sa); err != nil { + if err := c.Bind(&unix.SockaddrVM{CID: cid, Port: port}); err != nil { + _ = c.Close() return nil, err } - if err := lfd.Listen(unix.SOMAXCONN); err != nil { + if err := c.Listen(unix.SOMAXCONN); err != nil { + _ = c.Close() return nil, err } - lsa, err := lfd.Getsockname() + l, err := newListener(c) + if err != nil { + _ = c.Close() + return nil, err + } + + return l, nil +} + +// fileListener is the entry point for FileListener on Linux. +func fileListener(f *os.File) (*Listener, error) { + c, err := socket.FileConn(f, name) if err != nil { return nil, err } - // Done with blocking mode setup, transition to non-blocking before the - // caller has a chance to start calling things concurrently that might make - // the locking situation tricky. - // - // Note: if any calls fail after this point, lfd.Close should be invoked - // for cleanup because the socket is now non-blocking. - if err := lfd.SetNonblocking("vsock-listen"); err != nil { + l, err := newListener(c) + if err != nil { + _ = c.Close() return nil, err } - lsavm := lsa.(*unix.SockaddrVM) + return l, nil +} + +// newListener creates a Listener from a raw socket.Conn. +func newListener(c *socket.Conn) (*Listener, error) { + lsa, err := c.Getsockname() + if err != nil { + return nil, err + } + + // Now that the library can also accept arbitrary os.Files, we have to + // verify the address family so we don't accidentally create a + // *vsock.Listener backed by TCP or some other socket type. + lsavm, ok := lsa.(*unix.SockaddrVM) + if !ok { + // All errors should wrapped with os.SyscallError. + return nil, os.NewSyscallError("listen", unix.EINVAL) + } + addr := &Addr{ ContextID: lsavm.CID, Port: lsavm.Port, @@ -108,7 +125,7 @@ func listenLinux(lfd listenFD, cid, port uint32) (l *Listener, err error) { return &Listener{ l: &listener{ - fd: lfd, + c: c, addr: addr, }, }, nil diff --git a/src/runtime/vendor/github.com/mdlayher/vsock/vsock.go b/src/runtime/vendor/github.com/mdlayher/vsock/vsock.go index 004263efb..32223982a 100644 --- a/src/runtime/vendor/github.com/mdlayher/vsock/vsock.go +++ b/src/runtime/vendor/github.com/mdlayher/vsock/vsock.go @@ -9,26 +9,27 @@ import ( "strings" "syscall" "time" + + "github.com/mdlayher/socket" ) const ( // Hypervisor specifies that a socket should communicate with the hypervisor - // process. + // process. Note that this is _not_ the same as a socket owned by a process + // running on the hypervisor. Most users should probably use Host instead. Hypervisor = 0x0 + // Local specifies that a socket should communicate with a matching socket + // on the same machine. This provides an alternative to UNIX sockets or + // similar and may be useful in testing VM sockets applications. + Local = 0x1 + // Host specifies that a socket should communicate with processes other than - // the hypervisor on the host machine. + // the hypervisor on the host machine. This is the correct choice to + // communicate with a process running on a hypervisor using a socket dialed + // from a guest. Host = 0x2 - // cidReserved is a reserved context ID that is no longer in use, - // and cannot be used for socket communications. - cidReserved = 0x1 - - // shutRd and shutWr are arguments for unix.Shutdown, copied here to avoid - // importing x/sys/unix in cross-platform code. - shutRd = 0 // unix.SHUT_RD - shutWr = 1 // unix.SHUT_WR - // Error numbers we recognize, copied here to avoid importing x/sys/unix in // cross-platform code. ebadf = 9 @@ -55,25 +56,47 @@ const ( opWrite = "write" ) +// TODO(mdlayher): plumb through socket.Config.NetNS if it makes sense. + +// Config contains options for a Conn or Listener. +type Config struct{} + // Listen opens a connection-oriented net.Listener for incoming VM sockets -// connections. The port parameter specifies the port for the Listener. +// connections. The port parameter specifies the port for the Listener. Config +// specifies optional configuration for the Listener. If config is nil, a +// default configuration will be used. // -// To allow the server to assign a port automatically, specify 0 for port. -// The address of the server can be retrieved using the Addr method. +// To allow the server to assign a port automatically, specify 0 for port. The +// address of the server can be retrieved using the Addr method. // -// When the Listener is no longer needed, Close must be called to free resources. -func Listen(port uint32) (*Listener, error) { +// Listen automatically infers the appropriate context ID for this machine by +// calling ContextID and passing that value to ListenContextID. Callers with +// advanced use cases (such as using the Local context ID) may wish to use +// ListenContextID directly. +// +// When the Listener is no longer needed, Close must be called to free +// resources. +func Listen(port uint32, cfg *Config) (*Listener, error) { cid, err := ContextID() if err != nil { // No addresses available. return nil, opError(opListen, err, nil, nil) } - l, err := listen(cid, port) + return ListenContextID(cid, port, cfg) +} + +// ListenContextID is the same as Listen, but also accepts an explicit context +// ID parameter. This function is intended for advanced use cases and most +// callers should use Listen instead. +// +// See the documentation of Listen for more details. +func ListenContextID(contextID, port uint32, cfg *Config) (*Listener, error) { + l, err := listen(contextID, port, cfg) if err != nil { // No remote address available. return nil, opError(opListen, err, &Addr{ - ContextID: cid, + ContextID: contextID, Port: port, }, nil) } @@ -81,6 +104,23 @@ func Listen(port uint32) (*Listener, error) { return l, nil } +// FileListener returns a copy of the network listener corresponding to an open +// os.File. It is the caller's responsibility to close the Listener when +// finished. Closing the Listener does not affect the os.File, and closing the +// os.File does not affect the Listener. +// +// This function is intended for advanced use cases and most callers should use +// Listen instead. +func FileListener(f *os.File) (*Listener, error) { + l, err := fileListener(f) + if err != nil { + // No addresses available. + return nil, opError(opListen, err, nil, nil) + } + + return l, nil +} + var _ net.Listener = &Listener{} // A Listener is a VM sockets implementation of a net.Listener. @@ -112,8 +152,6 @@ func (l *Listener) Close() error { // SetDeadline sets the deadline associated with the listener. A zero time value // disables the deadline. -// -// SetDeadline only works with Go 1.12+. func (l *Listener) SetDeadline(t time.Time) error { return l.opError(opSet, l.l.SetDeadline(t)) } @@ -125,8 +163,10 @@ func (l *Listener) opError(op string, err error) error { return opError(op, err, l.Addr(), nil) } -// Dial dials a connection-oriented net.Conn to a VM sockets server. -// The contextID and port parameters specify the address of the server. +// Dial dials a connection-oriented net.Conn to a VM sockets listener. The +// context ID and port parameters specify the address of the listener. Config +// specifies optional configuration for the Conn. If config is nil, a default +// configuration will be used. // // If dialing a connection from the hypervisor to a virtual machine, the VM's // context ID should be specified. @@ -135,9 +175,10 @@ func (l *Listener) opError(op string, err error) error { // communicate with the hypervisor process, or Host should be used to // communicate with other processes on the host machine. // -// When the connection is no longer needed, Close must be called to free resources. -func Dial(contextID, port uint32) (*Conn, error) { - c, err := dial(contextID, port) +// When the connection is no longer needed, Close must be called to free +// resources. +func Dial(contextID, port uint32, cfg *Config) (*Conn, error) { + c, err := dial(contextID, port, cfg) if err != nil { // No local address, but we have a remote address we can return. return nil, opError(opDial, err, nil, &Addr{ @@ -149,35 +190,33 @@ func Dial(contextID, port uint32) (*Conn, error) { return c, nil } -var _ net.Conn = &Conn{} -var _ syscall.Conn = &Conn{} +var ( + _ net.Conn = &Conn{} + _ syscall.Conn = &Conn{} +) // A Conn is a VM sockets implementation of a net.Conn. type Conn struct { - fd connFD + c *socket.Conn local *Addr remote *Addr } // Close closes the connection. func (c *Conn) Close() error { - return c.opError(opClose, c.fd.Close()) + return c.opError(opClose, c.c.Close()) } // CloseRead shuts down the reading side of the VM sockets connection. Most // callers should just use Close. -// -// CloseRead only works with Go 1.12+. func (c *Conn) CloseRead() error { - return c.opError(opClose, c.fd.Shutdown(shutRd)) + return c.opError(opClose, c.c.CloseRead()) } // CloseWrite shuts down the writing side of the VM sockets connection. Most // callers should just use Close. -// -// CloseWrite only works with Go 1.12+. func (c *Conn) CloseWrite() error { - return c.opError(opClose, c.fd.Shutdown(shutWr)) + return c.opError(opClose, c.c.CloseWrite()) } // LocalAddr returns the local network address. The Addr returned is shared by @@ -190,7 +229,7 @@ func (c *Conn) RemoteAddr() net.Addr { return c.remote } // Read implements the net.Conn Read method. func (c *Conn) Read(b []byte) (int, error) { - n, err := c.fd.Read(b) + n, err := c.c.Read(b) if err != nil { return n, c.opError(opRead, err) } @@ -200,7 +239,7 @@ func (c *Conn) Read(b []byte) (int, error) { // Write implements the net.Conn Write method. func (c *Conn) Write(b []byte) (int, error) { - n, err := c.fd.Write(b) + n, err := c.c.Write(b) if err != nil { return n, c.opError(opWrite, err) } @@ -208,35 +247,25 @@ func (c *Conn) Write(b []byte) (int, error) { return n, nil } -// A deadlineType specifies the type of deadline to set for a Conn. -type deadlineType int - -// Possible deadlineType values. -const ( - deadline deadlineType = iota - readDeadline - writeDeadline -) - // SetDeadline implements the net.Conn SetDeadline method. func (c *Conn) SetDeadline(t time.Time) error { - return c.opError(opSet, c.fd.SetDeadline(t, deadline)) + return c.opError(opSet, c.c.SetDeadline(t)) } // SetReadDeadline implements the net.Conn SetReadDeadline method. func (c *Conn) SetReadDeadline(t time.Time) error { - return c.opError(opSet, c.fd.SetDeadline(t, readDeadline)) + return c.opError(opSet, c.c.SetReadDeadline(t)) } // SetWriteDeadline implements the net.Conn SetWriteDeadline method. func (c *Conn) SetWriteDeadline(t time.Time) error { - return c.opError(opSet, c.fd.SetDeadline(t, writeDeadline)) + return c.opError(opSet, c.c.SetWriteDeadline(t)) } // SyscallConn returns a raw network connection. This implements the // syscall.Conn interface. func (c *Conn) SyscallConn() (syscall.RawConn, error) { - rc, err := c.fd.SyscallConn() + rc, err := c.c.SyscallConn() if err != nil { return nil, c.opError(opSyscallConn, err) } @@ -254,14 +283,17 @@ func (c *Conn) opError(op string, err error) error { return opError(op, err, c.local, c.remote) } +// TODO(mdlayher): see if we can port smarter net.OpError with local/remote +// address error logic into socket.Conn's SyscallConn type to avoid the need for +// this wrapper. + var _ syscall.RawConn = &rawConn{} // A rawConn is a syscall.RawConn that wraps an internal syscall.RawConn in order // to produce net.OpError error values. type rawConn struct { - rc syscall.RawConn - local *Addr - remote *Addr + rc syscall.RawConn + local, remote *Addr } // Control implements the syscall.RawConn Control method. @@ -289,8 +321,7 @@ var _ net.Addr = &Addr{} // An Addr is the address of a VM sockets endpoint. type Addr struct { - ContextID uint32 - Port uint32 + ContextID, Port uint32 } // Network returns the address's network name, "vsock". @@ -304,8 +335,8 @@ func (a *Addr) String() string { switch a.ContextID { case Hypervisor: host = fmt.Sprintf("hypervisor(%d)", a.ContextID) - case cidReserved: - host = fmt.Sprintf("reserved(%d)", a.ContextID) + case Local: + host = fmt.Sprintf("local(%d)", a.ContextID) case Host: host = fmt.Sprintf("host(%d)", a.ContextID) default: @@ -338,6 +369,13 @@ func opError(op string, err error, local, remote net.Addr) error { return nil } + // TODO(mdlayher): this entire function is suspect and should probably be + // looked at carefully, especially with Go 1.13+ error wrapping. + // + // Eventually this *net.OpError logic should probably be ported into + // mdlayher/socket because similar checks are necessary to comply with + // nettest.TestConn. + // Unwrap inner errors from error types. // // TODO(mdlayher): errors.Cause or similar in Go 1.13. diff --git a/src/runtime/vendor/github.com/mdlayher/vsock/vsock_others.go b/src/runtime/vendor/github.com/mdlayher/vsock/vsock_others.go index a246de959..a5bd110aa 100644 --- a/src/runtime/vendor/github.com/mdlayher/vsock/vsock_others.go +++ b/src/runtime/vendor/github.com/mdlayher/vsock/vsock_others.go @@ -1,23 +1,23 @@ -//+build !linux +//go:build !linux +// +build !linux package vsock import ( "fmt" "net" + "os" "runtime" - "syscall" "time" ) -var ( - // errUnimplemented is returned by all functions on platforms that - // cannot make use of VM sockets. - errUnimplemented = fmt.Errorf("vsock: not implemented on %s/%s", - runtime.GOOS, runtime.GOARCH) -) +// errUnimplemented is returned by all functions on platforms that +// cannot make use of VM sockets. +var errUnimplemented = fmt.Errorf("vsock: not implemented on %s/%s", + runtime.GOOS, runtime.GOARCH) -func listen(_, _ uint32) (*Listener, error) { return nil, errUnimplemented } +func fileListener(_ *os.File) (*Listener, error) { return nil, errUnimplemented } +func listen(_, _ uint32, _ *Config) (*Listener, error) { return nil, errUnimplemented } type listener struct{} @@ -26,18 +26,7 @@ func (*listener) Addr() net.Addr { return nil } func (*listener) Close() error { return errUnimplemented } func (*listener) SetDeadline(_ time.Time) error { return errUnimplemented } -func dial(_, _ uint32) (*Conn, error) { return nil, errUnimplemented } - -type connFD struct{} - -func (*connFD) LocalAddr() net.Addr { return nil } -func (*connFD) RemoteAddr() net.Addr { return nil } -func (*connFD) SetDeadline(_ time.Time, _ deadlineType) error { return errUnimplemented } -func (*connFD) Read(_ []byte) (int, error) { return 0, errUnimplemented } -func (*connFD) Write(_ []byte) (int, error) { return 0, errUnimplemented } -func (*connFD) Close() error { return errUnimplemented } -func (*connFD) Shutdown(_ int) error { return errUnimplemented } -func (*connFD) SyscallConn() (syscall.RawConn, error) { return nil, errUnimplemented } +func dial(_, _ uint32, _ *Config) (*Conn, error) { return nil, errUnimplemented } func contextID() (uint32, error) { return 0, errUnimplemented } diff --git a/src/runtime/vendor/golang.org/x/net/bpf/asm.go b/src/runtime/vendor/golang.org/x/net/bpf/asm.go new file mode 100644 index 000000000..15e21b181 --- /dev/null +++ b/src/runtime/vendor/golang.org/x/net/bpf/asm.go @@ -0,0 +1,41 @@ +// Copyright 2016 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +package bpf + +import "fmt" + +// Assemble converts insts into raw instructions suitable for loading +// into a BPF virtual machine. +// +// Currently, no optimization is attempted, the assembled program flow +// is exactly as provided. +func Assemble(insts []Instruction) ([]RawInstruction, error) { + ret := make([]RawInstruction, len(insts)) + var err error + for i, inst := range insts { + ret[i], err = inst.Assemble() + if err != nil { + return nil, fmt.Errorf("assembling instruction %d: %s", i+1, err) + } + } + return ret, nil +} + +// Disassemble attempts to parse raw back into +// Instructions. Unrecognized RawInstructions are assumed to be an +// extension not implemented by this package, and are passed through +// unchanged to the output. The allDecoded value reports whether insts +// contains no RawInstructions. +func Disassemble(raw []RawInstruction) (insts []Instruction, allDecoded bool) { + insts = make([]Instruction, len(raw)) + allDecoded = true + for i, r := range raw { + insts[i] = r.Disassemble() + if _, ok := insts[i].(RawInstruction); ok { + allDecoded = false + } + } + return insts, allDecoded +} diff --git a/src/runtime/vendor/golang.org/x/net/bpf/constants.go b/src/runtime/vendor/golang.org/x/net/bpf/constants.go new file mode 100644 index 000000000..12f3ee835 --- /dev/null +++ b/src/runtime/vendor/golang.org/x/net/bpf/constants.go @@ -0,0 +1,222 @@ +// Copyright 2016 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +package bpf + +// A Register is a register of the BPF virtual machine. +type Register uint16 + +const ( + // RegA is the accumulator register. RegA is always the + // destination register of ALU operations. + RegA Register = iota + // RegX is the indirection register, used by LoadIndirect + // operations. + RegX +) + +// An ALUOp is an arithmetic or logic operation. +type ALUOp uint16 + +// ALU binary operation types. +const ( + ALUOpAdd ALUOp = iota << 4 + ALUOpSub + ALUOpMul + ALUOpDiv + ALUOpOr + ALUOpAnd + ALUOpShiftLeft + ALUOpShiftRight + aluOpNeg // Not exported because it's the only unary ALU operation, and gets its own instruction type. + ALUOpMod + ALUOpXor +) + +// A JumpTest is a comparison operator used in conditional jumps. +type JumpTest uint16 + +// Supported operators for conditional jumps. +// K can be RegX for JumpIfX +const ( + // K == A + JumpEqual JumpTest = iota + // K != A + JumpNotEqual + // K > A + JumpGreaterThan + // K < A + JumpLessThan + // K >= A + JumpGreaterOrEqual + // K <= A + JumpLessOrEqual + // K & A != 0 + JumpBitsSet + // K & A == 0 + JumpBitsNotSet +) + +// An Extension is a function call provided by the kernel that +// performs advanced operations that are expensive or impossible +// within the BPF virtual machine. +// +// Extensions are only implemented by the Linux kernel. +// +// TODO: should we prune this list? Some of these extensions seem +// either broken or near-impossible to use correctly, whereas other +// (len, random, ifindex) are quite useful. +type Extension int + +// Extension functions available in the Linux kernel. +const ( + // extOffset is the negative maximum number of instructions used + // to load instructions by overloading the K argument. + extOffset = -0x1000 + // ExtLen returns the length of the packet. + ExtLen Extension = 1 + // ExtProto returns the packet's L3 protocol type. + ExtProto Extension = 0 + // ExtType returns the packet's type (skb->pkt_type in the kernel) + // + // TODO: better documentation. How nice an API do we want to + // provide for these esoteric extensions? + ExtType Extension = 4 + // ExtPayloadOffset returns the offset of the packet payload, or + // the first protocol header that the kernel does not know how to + // parse. + ExtPayloadOffset Extension = 52 + // ExtInterfaceIndex returns the index of the interface on which + // the packet was received. + ExtInterfaceIndex Extension = 8 + // ExtNetlinkAttr returns the netlink attribute of type X at + // offset A. + ExtNetlinkAttr Extension = 12 + // ExtNetlinkAttrNested returns the nested netlink attribute of + // type X at offset A. + ExtNetlinkAttrNested Extension = 16 + // ExtMark returns the packet's mark value. + ExtMark Extension = 20 + // ExtQueue returns the packet's assigned hardware queue. + ExtQueue Extension = 24 + // ExtLinkLayerType returns the packet's hardware address type + // (e.g. Ethernet, Infiniband). + ExtLinkLayerType Extension = 28 + // ExtRXHash returns the packets receive hash. + // + // TODO: figure out what this rxhash actually is. + ExtRXHash Extension = 32 + // ExtCPUID returns the ID of the CPU processing the current + // packet. + ExtCPUID Extension = 36 + // ExtVLANTag returns the packet's VLAN tag. + ExtVLANTag Extension = 44 + // ExtVLANTagPresent returns non-zero if the packet has a VLAN + // tag. + // + // TODO: I think this might be a lie: it reads bit 0x1000 of the + // VLAN header, which changed meaning in recent revisions of the + // spec - this extension may now return meaningless information. + ExtVLANTagPresent Extension = 48 + // ExtVLANProto returns 0x8100 if the frame has a VLAN header, + // 0x88a8 if the frame has a "Q-in-Q" double VLAN header, or some + // other value if no VLAN information is present. + ExtVLANProto Extension = 60 + // ExtRand returns a uniformly random uint32. + ExtRand Extension = 56 +) + +// The following gives names to various bit patterns used in opcode construction. + +const ( + opMaskCls uint16 = 0x7 + // opClsLoad masks + opMaskLoadDest = 0x01 + opMaskLoadWidth = 0x18 + opMaskLoadMode = 0xe0 + // opClsALU & opClsJump + opMaskOperand = 0x08 + opMaskOperator = 0xf0 +) + +const ( + // +---------------+-----------------+---+---+---+ + // | AddrMode (3b) | LoadWidth (2b) | 0 | 0 | 0 | + // +---------------+-----------------+---+---+---+ + opClsLoadA uint16 = iota + // +---------------+-----------------+---+---+---+ + // | AddrMode (3b) | LoadWidth (2b) | 0 | 0 | 1 | + // +---------------+-----------------+---+---+---+ + opClsLoadX + // +---+---+---+---+---+---+---+---+ + // | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | + // +---+---+---+---+---+---+---+---+ + opClsStoreA + // +---+---+---+---+---+---+---+---+ + // | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | + // +---+---+---+---+---+---+---+---+ + opClsStoreX + // +---------------+-----------------+---+---+---+ + // | Operator (4b) | OperandSrc (1b) | 1 | 0 | 0 | + // +---------------+-----------------+---+---+---+ + opClsALU + // +-----------------------------+---+---+---+---+ + // | TestOperator (4b) | 0 | 1 | 0 | 1 | + // +-----------------------------+---+---+---+---+ + opClsJump + // +---+-------------------------+---+---+---+---+ + // | 0 | 0 | 0 | RetSrc (1b) | 0 | 1 | 1 | 0 | + // +---+-------------------------+---+---+---+---+ + opClsReturn + // +---+-------------------------+---+---+---+---+ + // | 0 | 0 | 0 | TXAorTAX (1b) | 0 | 1 | 1 | 1 | + // +---+-------------------------+---+---+---+---+ + opClsMisc +) + +const ( + opAddrModeImmediate uint16 = iota << 5 + opAddrModeAbsolute + opAddrModeIndirect + opAddrModeScratch + opAddrModePacketLen // actually an extension, not an addressing mode. + opAddrModeMemShift +) + +const ( + opLoadWidth4 uint16 = iota << 3 + opLoadWidth2 + opLoadWidth1 +) + +// Operand for ALU and Jump instructions +type opOperand uint16 + +// Supported operand sources. +const ( + opOperandConstant opOperand = iota << 3 + opOperandX +) + +// An jumpOp is a conditional jump condition. +type jumpOp uint16 + +// Supported jump conditions. +const ( + opJumpAlways jumpOp = iota << 4 + opJumpEqual + opJumpGT + opJumpGE + opJumpSet +) + +const ( + opRetSrcConstant uint16 = iota << 4 + opRetSrcA +) + +const ( + opMiscTAX = 0x00 + opMiscTXA = 0x80 +) diff --git a/src/runtime/vendor/golang.org/x/net/bpf/doc.go b/src/runtime/vendor/golang.org/x/net/bpf/doc.go new file mode 100644 index 000000000..ae62feb53 --- /dev/null +++ b/src/runtime/vendor/golang.org/x/net/bpf/doc.go @@ -0,0 +1,82 @@ +// Copyright 2016 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +/* + +Package bpf implements marshaling and unmarshaling of programs for the +Berkeley Packet Filter virtual machine, and provides a Go implementation +of the virtual machine. + +BPF's main use is to specify a packet filter for network taps, so that +the kernel doesn't have to expensively copy every packet it sees to +userspace. However, it's been repurposed to other areas where running +user code in-kernel is needed. For example, Linux's seccomp uses BPF +to apply security policies to system calls. For simplicity, this +documentation refers only to packets, but other uses of BPF have their +own data payloads. + +BPF programs run in a restricted virtual machine. It has almost no +access to kernel functions, and while conditional branches are +allowed, they can only jump forwards, to guarantee that there are no +infinite loops. + +The virtual machine + +The BPF VM is an accumulator machine. Its main register, called +register A, is an implicit source and destination in all arithmetic +and logic operations. The machine also has 16 scratch registers for +temporary storage, and an indirection register (register X) for +indirect memory access. All registers are 32 bits wide. + +Each run of a BPF program is given one packet, which is placed in the +VM's read-only "main memory". LoadAbsolute and LoadIndirect +instructions can fetch up to 32 bits at a time into register A for +examination. + +The goal of a BPF program is to produce and return a verdict (uint32), +which tells the kernel what to do with the packet. In the context of +packet filtering, the returned value is the number of bytes of the +packet to forward to userspace, or 0 to ignore the packet. Other +contexts like seccomp define their own return values. + +In order to simplify programs, attempts to read past the end of the +packet terminate the program execution with a verdict of 0 (ignore +packet). This means that the vast majority of BPF programs don't need +to do any explicit bounds checking. + +In addition to the bytes of the packet, some BPF programs have access +to extensions, which are essentially calls to kernel utility +functions. Currently, the only extensions supported by this package +are the Linux packet filter extensions. + +Examples + +This packet filter selects all ARP packets. + + bpf.Assemble([]bpf.Instruction{ + // Load "EtherType" field from the ethernet header. + bpf.LoadAbsolute{Off: 12, Size: 2}, + // Skip over the next instruction if EtherType is not ARP. + bpf.JumpIf{Cond: bpf.JumpNotEqual, Val: 0x0806, SkipTrue: 1}, + // Verdict is "send up to 4k of the packet to userspace." + bpf.RetConstant{Val: 4096}, + // Verdict is "ignore packet." + bpf.RetConstant{Val: 0}, + }) + +This packet filter captures a random 1% sample of traffic. + + bpf.Assemble([]bpf.Instruction{ + // Get a 32-bit random number from the Linux kernel. + bpf.LoadExtension{Num: bpf.ExtRand}, + // 1% dice roll? + bpf.JumpIf{Cond: bpf.JumpLessThan, Val: 2^32/100, SkipFalse: 1}, + // Capture. + bpf.RetConstant{Val: 4096}, + // Ignore. + bpf.RetConstant{Val: 0}, + }) + +*/ +package bpf // import "golang.org/x/net/bpf" diff --git a/src/runtime/vendor/golang.org/x/net/bpf/instructions.go b/src/runtime/vendor/golang.org/x/net/bpf/instructions.go new file mode 100644 index 000000000..3cffcaa01 --- /dev/null +++ b/src/runtime/vendor/golang.org/x/net/bpf/instructions.go @@ -0,0 +1,726 @@ +// Copyright 2016 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +package bpf + +import "fmt" + +// An Instruction is one instruction executed by the BPF virtual +// machine. +type Instruction interface { + // Assemble assembles the Instruction into a RawInstruction. + Assemble() (RawInstruction, error) +} + +// A RawInstruction is a raw BPF virtual machine instruction. +type RawInstruction struct { + // Operation to execute. + Op uint16 + // For conditional jump instructions, the number of instructions + // to skip if the condition is true/false. + Jt uint8 + Jf uint8 + // Constant parameter. The meaning depends on the Op. + K uint32 +} + +// Assemble implements the Instruction Assemble method. +func (ri RawInstruction) Assemble() (RawInstruction, error) { return ri, nil } + +// Disassemble parses ri into an Instruction and returns it. If ri is +// not recognized by this package, ri itself is returned. +func (ri RawInstruction) Disassemble() Instruction { + switch ri.Op & opMaskCls { + case opClsLoadA, opClsLoadX: + reg := Register(ri.Op & opMaskLoadDest) + sz := 0 + switch ri.Op & opMaskLoadWidth { + case opLoadWidth4: + sz = 4 + case opLoadWidth2: + sz = 2 + case opLoadWidth1: + sz = 1 + default: + return ri + } + switch ri.Op & opMaskLoadMode { + case opAddrModeImmediate: + if sz != 4 { + return ri + } + return LoadConstant{Dst: reg, Val: ri.K} + case opAddrModeScratch: + if sz != 4 || ri.K > 15 { + return ri + } + return LoadScratch{Dst: reg, N: int(ri.K)} + case opAddrModeAbsolute: + if ri.K > extOffset+0xffffffff { + return LoadExtension{Num: Extension(-extOffset + ri.K)} + } + return LoadAbsolute{Size: sz, Off: ri.K} + case opAddrModeIndirect: + return LoadIndirect{Size: sz, Off: ri.K} + case opAddrModePacketLen: + if sz != 4 { + return ri + } + return LoadExtension{Num: ExtLen} + case opAddrModeMemShift: + return LoadMemShift{Off: ri.K} + default: + return ri + } + + case opClsStoreA: + if ri.Op != opClsStoreA || ri.K > 15 { + return ri + } + return StoreScratch{Src: RegA, N: int(ri.K)} + + case opClsStoreX: + if ri.Op != opClsStoreX || ri.K > 15 { + return ri + } + return StoreScratch{Src: RegX, N: int(ri.K)} + + case opClsALU: + switch op := ALUOp(ri.Op & opMaskOperator); op { + case ALUOpAdd, ALUOpSub, ALUOpMul, ALUOpDiv, ALUOpOr, ALUOpAnd, ALUOpShiftLeft, ALUOpShiftRight, ALUOpMod, ALUOpXor: + switch operand := opOperand(ri.Op & opMaskOperand); operand { + case opOperandX: + return ALUOpX{Op: op} + case opOperandConstant: + return ALUOpConstant{Op: op, Val: ri.K} + default: + return ri + } + case aluOpNeg: + return NegateA{} + default: + return ri + } + + case opClsJump: + switch op := jumpOp(ri.Op & opMaskOperator); op { + case opJumpAlways: + return Jump{Skip: ri.K} + case opJumpEqual, opJumpGT, opJumpGE, opJumpSet: + cond, skipTrue, skipFalse := jumpOpToTest(op, ri.Jt, ri.Jf) + switch operand := opOperand(ri.Op & opMaskOperand); operand { + case opOperandX: + return JumpIfX{Cond: cond, SkipTrue: skipTrue, SkipFalse: skipFalse} + case opOperandConstant: + return JumpIf{Cond: cond, Val: ri.K, SkipTrue: skipTrue, SkipFalse: skipFalse} + default: + return ri + } + default: + return ri + } + + case opClsReturn: + switch ri.Op { + case opClsReturn | opRetSrcA: + return RetA{} + case opClsReturn | opRetSrcConstant: + return RetConstant{Val: ri.K} + default: + return ri + } + + case opClsMisc: + switch ri.Op { + case opClsMisc | opMiscTAX: + return TAX{} + case opClsMisc | opMiscTXA: + return TXA{} + default: + return ri + } + + default: + panic("unreachable") // switch is exhaustive on the bit pattern + } +} + +func jumpOpToTest(op jumpOp, skipTrue uint8, skipFalse uint8) (JumpTest, uint8, uint8) { + var test JumpTest + + // Decode "fake" jump conditions that don't appear in machine code + // Ensures the Assemble -> Disassemble stage recreates the same instructions + // See https://github.com/golang/go/issues/18470 + if skipTrue == 0 { + switch op { + case opJumpEqual: + test = JumpNotEqual + case opJumpGT: + test = JumpLessOrEqual + case opJumpGE: + test = JumpLessThan + case opJumpSet: + test = JumpBitsNotSet + } + + return test, skipFalse, 0 + } + + switch op { + case opJumpEqual: + test = JumpEqual + case opJumpGT: + test = JumpGreaterThan + case opJumpGE: + test = JumpGreaterOrEqual + case opJumpSet: + test = JumpBitsSet + } + + return test, skipTrue, skipFalse +} + +// LoadConstant loads Val into register Dst. +type LoadConstant struct { + Dst Register + Val uint32 +} + +// Assemble implements the Instruction Assemble method. +func (a LoadConstant) Assemble() (RawInstruction, error) { + return assembleLoad(a.Dst, 4, opAddrModeImmediate, a.Val) +} + +// String returns the instruction in assembler notation. +func (a LoadConstant) String() string { + switch a.Dst { + case RegA: + return fmt.Sprintf("ld #%d", a.Val) + case RegX: + return fmt.Sprintf("ldx #%d", a.Val) + default: + return fmt.Sprintf("unknown instruction: %#v", a) + } +} + +// LoadScratch loads scratch[N] into register Dst. +type LoadScratch struct { + Dst Register + N int // 0-15 +} + +// Assemble implements the Instruction Assemble method. +func (a LoadScratch) Assemble() (RawInstruction, error) { + if a.N < 0 || a.N > 15 { + return RawInstruction{}, fmt.Errorf("invalid scratch slot %d", a.N) + } + return assembleLoad(a.Dst, 4, opAddrModeScratch, uint32(a.N)) +} + +// String returns the instruction in assembler notation. +func (a LoadScratch) String() string { + switch a.Dst { + case RegA: + return fmt.Sprintf("ld M[%d]", a.N) + case RegX: + return fmt.Sprintf("ldx M[%d]", a.N) + default: + return fmt.Sprintf("unknown instruction: %#v", a) + } +} + +// LoadAbsolute loads packet[Off:Off+Size] as an integer value into +// register A. +type LoadAbsolute struct { + Off uint32 + Size int // 1, 2 or 4 +} + +// Assemble implements the Instruction Assemble method. +func (a LoadAbsolute) Assemble() (RawInstruction, error) { + return assembleLoad(RegA, a.Size, opAddrModeAbsolute, a.Off) +} + +// String returns the instruction in assembler notation. +func (a LoadAbsolute) String() string { + switch a.Size { + case 1: // byte + return fmt.Sprintf("ldb [%d]", a.Off) + case 2: // half word + return fmt.Sprintf("ldh [%d]", a.Off) + case 4: // word + if a.Off > extOffset+0xffffffff { + return LoadExtension{Num: Extension(a.Off + 0x1000)}.String() + } + return fmt.Sprintf("ld [%d]", a.Off) + default: + return fmt.Sprintf("unknown instruction: %#v", a) + } +} + +// LoadIndirect loads packet[X+Off:X+Off+Size] as an integer value +// into register A. +type LoadIndirect struct { + Off uint32 + Size int // 1, 2 or 4 +} + +// Assemble implements the Instruction Assemble method. +func (a LoadIndirect) Assemble() (RawInstruction, error) { + return assembleLoad(RegA, a.Size, opAddrModeIndirect, a.Off) +} + +// String returns the instruction in assembler notation. +func (a LoadIndirect) String() string { + switch a.Size { + case 1: // byte + return fmt.Sprintf("ldb [x + %d]", a.Off) + case 2: // half word + return fmt.Sprintf("ldh [x + %d]", a.Off) + case 4: // word + return fmt.Sprintf("ld [x + %d]", a.Off) + default: + return fmt.Sprintf("unknown instruction: %#v", a) + } +} + +// LoadMemShift multiplies the first 4 bits of the byte at packet[Off] +// by 4 and stores the result in register X. +// +// This instruction is mainly useful to load into X the length of an +// IPv4 packet header in a single instruction, rather than have to do +// the arithmetic on the header's first byte by hand. +type LoadMemShift struct { + Off uint32 +} + +// Assemble implements the Instruction Assemble method. +func (a LoadMemShift) Assemble() (RawInstruction, error) { + return assembleLoad(RegX, 1, opAddrModeMemShift, a.Off) +} + +// String returns the instruction in assembler notation. +func (a LoadMemShift) String() string { + return fmt.Sprintf("ldx 4*([%d]&0xf)", a.Off) +} + +// LoadExtension invokes a linux-specific extension and stores the +// result in register A. +type LoadExtension struct { + Num Extension +} + +// Assemble implements the Instruction Assemble method. +func (a LoadExtension) Assemble() (RawInstruction, error) { + if a.Num == ExtLen { + return assembleLoad(RegA, 4, opAddrModePacketLen, 0) + } + return assembleLoad(RegA, 4, opAddrModeAbsolute, uint32(extOffset+a.Num)) +} + +// String returns the instruction in assembler notation. +func (a LoadExtension) String() string { + switch a.Num { + case ExtLen: + return "ld #len" + case ExtProto: + return "ld #proto" + case ExtType: + return "ld #type" + case ExtPayloadOffset: + return "ld #poff" + case ExtInterfaceIndex: + return "ld #ifidx" + case ExtNetlinkAttr: + return "ld #nla" + case ExtNetlinkAttrNested: + return "ld #nlan" + case ExtMark: + return "ld #mark" + case ExtQueue: + return "ld #queue" + case ExtLinkLayerType: + return "ld #hatype" + case ExtRXHash: + return "ld #rxhash" + case ExtCPUID: + return "ld #cpu" + case ExtVLANTag: + return "ld #vlan_tci" + case ExtVLANTagPresent: + return "ld #vlan_avail" + case ExtVLANProto: + return "ld #vlan_tpid" + case ExtRand: + return "ld #rand" + default: + return fmt.Sprintf("unknown instruction: %#v", a) + } +} + +// StoreScratch stores register Src into scratch[N]. +type StoreScratch struct { + Src Register + N int // 0-15 +} + +// Assemble implements the Instruction Assemble method. +func (a StoreScratch) Assemble() (RawInstruction, error) { + if a.N < 0 || a.N > 15 { + return RawInstruction{}, fmt.Errorf("invalid scratch slot %d", a.N) + } + var op uint16 + switch a.Src { + case RegA: + op = opClsStoreA + case RegX: + op = opClsStoreX + default: + return RawInstruction{}, fmt.Errorf("invalid source register %v", a.Src) + } + + return RawInstruction{ + Op: op, + K: uint32(a.N), + }, nil +} + +// String returns the instruction in assembler notation. +func (a StoreScratch) String() string { + switch a.Src { + case RegA: + return fmt.Sprintf("st M[%d]", a.N) + case RegX: + return fmt.Sprintf("stx M[%d]", a.N) + default: + return fmt.Sprintf("unknown instruction: %#v", a) + } +} + +// ALUOpConstant executes A = A Val. +type ALUOpConstant struct { + Op ALUOp + Val uint32 +} + +// Assemble implements the Instruction Assemble method. +func (a ALUOpConstant) Assemble() (RawInstruction, error) { + return RawInstruction{ + Op: opClsALU | uint16(opOperandConstant) | uint16(a.Op), + K: a.Val, + }, nil +} + +// String returns the instruction in assembler notation. +func (a ALUOpConstant) String() string { + switch a.Op { + case ALUOpAdd: + return fmt.Sprintf("add #%d", a.Val) + case ALUOpSub: + return fmt.Sprintf("sub #%d", a.Val) + case ALUOpMul: + return fmt.Sprintf("mul #%d", a.Val) + case ALUOpDiv: + return fmt.Sprintf("div #%d", a.Val) + case ALUOpMod: + return fmt.Sprintf("mod #%d", a.Val) + case ALUOpAnd: + return fmt.Sprintf("and #%d", a.Val) + case ALUOpOr: + return fmt.Sprintf("or #%d", a.Val) + case ALUOpXor: + return fmt.Sprintf("xor #%d", a.Val) + case ALUOpShiftLeft: + return fmt.Sprintf("lsh #%d", a.Val) + case ALUOpShiftRight: + return fmt.Sprintf("rsh #%d", a.Val) + default: + return fmt.Sprintf("unknown instruction: %#v", a) + } +} + +// ALUOpX executes A = A X +type ALUOpX struct { + Op ALUOp +} + +// Assemble implements the Instruction Assemble method. +func (a ALUOpX) Assemble() (RawInstruction, error) { + return RawInstruction{ + Op: opClsALU | uint16(opOperandX) | uint16(a.Op), + }, nil +} + +// String returns the instruction in assembler notation. +func (a ALUOpX) String() string { + switch a.Op { + case ALUOpAdd: + return "add x" + case ALUOpSub: + return "sub x" + case ALUOpMul: + return "mul x" + case ALUOpDiv: + return "div x" + case ALUOpMod: + return "mod x" + case ALUOpAnd: + return "and x" + case ALUOpOr: + return "or x" + case ALUOpXor: + return "xor x" + case ALUOpShiftLeft: + return "lsh x" + case ALUOpShiftRight: + return "rsh x" + default: + return fmt.Sprintf("unknown instruction: %#v", a) + } +} + +// NegateA executes A = -A. +type NegateA struct{} + +// Assemble implements the Instruction Assemble method. +func (a NegateA) Assemble() (RawInstruction, error) { + return RawInstruction{ + Op: opClsALU | uint16(aluOpNeg), + }, nil +} + +// String returns the instruction in assembler notation. +func (a NegateA) String() string { + return fmt.Sprintf("neg") +} + +// Jump skips the following Skip instructions in the program. +type Jump struct { + Skip uint32 +} + +// Assemble implements the Instruction Assemble method. +func (a Jump) Assemble() (RawInstruction, error) { + return RawInstruction{ + Op: opClsJump | uint16(opJumpAlways), + K: a.Skip, + }, nil +} + +// String returns the instruction in assembler notation. +func (a Jump) String() string { + return fmt.Sprintf("ja %d", a.Skip) +} + +// JumpIf skips the following Skip instructions in the program if A +// Val is true. +type JumpIf struct { + Cond JumpTest + Val uint32 + SkipTrue uint8 + SkipFalse uint8 +} + +// Assemble implements the Instruction Assemble method. +func (a JumpIf) Assemble() (RawInstruction, error) { + return jumpToRaw(a.Cond, opOperandConstant, a.Val, a.SkipTrue, a.SkipFalse) +} + +// String returns the instruction in assembler notation. +func (a JumpIf) String() string { + return jumpToString(a.Cond, fmt.Sprintf("#%d", a.Val), a.SkipTrue, a.SkipFalse) +} + +// JumpIfX skips the following Skip instructions in the program if A +// X is true. +type JumpIfX struct { + Cond JumpTest + SkipTrue uint8 + SkipFalse uint8 +} + +// Assemble implements the Instruction Assemble method. +func (a JumpIfX) Assemble() (RawInstruction, error) { + return jumpToRaw(a.Cond, opOperandX, 0, a.SkipTrue, a.SkipFalse) +} + +// String returns the instruction in assembler notation. +func (a JumpIfX) String() string { + return jumpToString(a.Cond, "x", a.SkipTrue, a.SkipFalse) +} + +// jumpToRaw assembles a jump instruction into a RawInstruction +func jumpToRaw(test JumpTest, operand opOperand, k uint32, skipTrue, skipFalse uint8) (RawInstruction, error) { + var ( + cond jumpOp + flip bool + ) + switch test { + case JumpEqual: + cond = opJumpEqual + case JumpNotEqual: + cond, flip = opJumpEqual, true + case JumpGreaterThan: + cond = opJumpGT + case JumpLessThan: + cond, flip = opJumpGE, true + case JumpGreaterOrEqual: + cond = opJumpGE + case JumpLessOrEqual: + cond, flip = opJumpGT, true + case JumpBitsSet: + cond = opJumpSet + case JumpBitsNotSet: + cond, flip = opJumpSet, true + default: + return RawInstruction{}, fmt.Errorf("unknown JumpTest %v", test) + } + jt, jf := skipTrue, skipFalse + if flip { + jt, jf = jf, jt + } + return RawInstruction{ + Op: opClsJump | uint16(cond) | uint16(operand), + Jt: jt, + Jf: jf, + K: k, + }, nil +} + +// jumpToString converts a jump instruction to assembler notation +func jumpToString(cond JumpTest, operand string, skipTrue, skipFalse uint8) string { + switch cond { + // K == A + case JumpEqual: + return conditionalJump(operand, skipTrue, skipFalse, "jeq", "jneq") + // K != A + case JumpNotEqual: + return fmt.Sprintf("jneq %s,%d", operand, skipTrue) + // K > A + case JumpGreaterThan: + return conditionalJump(operand, skipTrue, skipFalse, "jgt", "jle") + // K < A + case JumpLessThan: + return fmt.Sprintf("jlt %s,%d", operand, skipTrue) + // K >= A + case JumpGreaterOrEqual: + return conditionalJump(operand, skipTrue, skipFalse, "jge", "jlt") + // K <= A + case JumpLessOrEqual: + return fmt.Sprintf("jle %s,%d", operand, skipTrue) + // K & A != 0 + case JumpBitsSet: + if skipFalse > 0 { + return fmt.Sprintf("jset %s,%d,%d", operand, skipTrue, skipFalse) + } + return fmt.Sprintf("jset %s,%d", operand, skipTrue) + // K & A == 0, there is no assembler instruction for JumpBitNotSet, use JumpBitSet and invert skips + case JumpBitsNotSet: + return jumpToString(JumpBitsSet, operand, skipFalse, skipTrue) + default: + return fmt.Sprintf("unknown JumpTest %#v", cond) + } +} + +func conditionalJump(operand string, skipTrue, skipFalse uint8, positiveJump, negativeJump string) string { + if skipTrue > 0 { + if skipFalse > 0 { + return fmt.Sprintf("%s %s,%d,%d", positiveJump, operand, skipTrue, skipFalse) + } + return fmt.Sprintf("%s %s,%d", positiveJump, operand, skipTrue) + } + return fmt.Sprintf("%s %s,%d", negativeJump, operand, skipFalse) +} + +// RetA exits the BPF program, returning the value of register A. +type RetA struct{} + +// Assemble implements the Instruction Assemble method. +func (a RetA) Assemble() (RawInstruction, error) { + return RawInstruction{ + Op: opClsReturn | opRetSrcA, + }, nil +} + +// String returns the instruction in assembler notation. +func (a RetA) String() string { + return fmt.Sprintf("ret a") +} + +// RetConstant exits the BPF program, returning a constant value. +type RetConstant struct { + Val uint32 +} + +// Assemble implements the Instruction Assemble method. +func (a RetConstant) Assemble() (RawInstruction, error) { + return RawInstruction{ + Op: opClsReturn | opRetSrcConstant, + K: a.Val, + }, nil +} + +// String returns the instruction in assembler notation. +func (a RetConstant) String() string { + return fmt.Sprintf("ret #%d", a.Val) +} + +// TXA copies the value of register X to register A. +type TXA struct{} + +// Assemble implements the Instruction Assemble method. +func (a TXA) Assemble() (RawInstruction, error) { + return RawInstruction{ + Op: opClsMisc | opMiscTXA, + }, nil +} + +// String returns the instruction in assembler notation. +func (a TXA) String() string { + return fmt.Sprintf("txa") +} + +// TAX copies the value of register A to register X. +type TAX struct{} + +// Assemble implements the Instruction Assemble method. +func (a TAX) Assemble() (RawInstruction, error) { + return RawInstruction{ + Op: opClsMisc | opMiscTAX, + }, nil +} + +// String returns the instruction in assembler notation. +func (a TAX) String() string { + return fmt.Sprintf("tax") +} + +func assembleLoad(dst Register, loadSize int, mode uint16, k uint32) (RawInstruction, error) { + var ( + cls uint16 + sz uint16 + ) + switch dst { + case RegA: + cls = opClsLoadA + case RegX: + cls = opClsLoadX + default: + return RawInstruction{}, fmt.Errorf("invalid target register %v", dst) + } + switch loadSize { + case 1: + sz = opLoadWidth1 + case 2: + sz = opLoadWidth2 + case 4: + sz = opLoadWidth4 + default: + return RawInstruction{}, fmt.Errorf("invalid load byte length %d", sz) + } + return RawInstruction{ + Op: cls | sz | mode, + K: k, + }, nil +} diff --git a/src/runtime/vendor/golang.org/x/net/bpf/setter.go b/src/runtime/vendor/golang.org/x/net/bpf/setter.go new file mode 100644 index 000000000..43e35f0ac --- /dev/null +++ b/src/runtime/vendor/golang.org/x/net/bpf/setter.go @@ -0,0 +1,10 @@ +// Copyright 2017 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +package bpf + +// A Setter is a type which can attach a compiled BPF filter to itself. +type Setter interface { + SetBPF(filter []RawInstruction) error +} diff --git a/src/runtime/vendor/golang.org/x/net/bpf/vm.go b/src/runtime/vendor/golang.org/x/net/bpf/vm.go new file mode 100644 index 000000000..73f57f1f7 --- /dev/null +++ b/src/runtime/vendor/golang.org/x/net/bpf/vm.go @@ -0,0 +1,150 @@ +// Copyright 2016 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +package bpf + +import ( + "errors" + "fmt" +) + +// A VM is an emulated BPF virtual machine. +type VM struct { + filter []Instruction +} + +// NewVM returns a new VM using the input BPF program. +func NewVM(filter []Instruction) (*VM, error) { + if len(filter) == 0 { + return nil, errors.New("one or more Instructions must be specified") + } + + for i, ins := range filter { + check := len(filter) - (i + 1) + switch ins := ins.(type) { + // Check for out-of-bounds jumps in instructions + case Jump: + if check <= int(ins.Skip) { + return nil, fmt.Errorf("cannot jump %d instructions; jumping past program bounds", ins.Skip) + } + case JumpIf: + if check <= int(ins.SkipTrue) { + return nil, fmt.Errorf("cannot jump %d instructions in true case; jumping past program bounds", ins.SkipTrue) + } + if check <= int(ins.SkipFalse) { + return nil, fmt.Errorf("cannot jump %d instructions in false case; jumping past program bounds", ins.SkipFalse) + } + case JumpIfX: + if check <= int(ins.SkipTrue) { + return nil, fmt.Errorf("cannot jump %d instructions in true case; jumping past program bounds", ins.SkipTrue) + } + if check <= int(ins.SkipFalse) { + return nil, fmt.Errorf("cannot jump %d instructions in false case; jumping past program bounds", ins.SkipFalse) + } + // Check for division or modulus by zero + case ALUOpConstant: + if ins.Val != 0 { + break + } + + switch ins.Op { + case ALUOpDiv, ALUOpMod: + return nil, errors.New("cannot divide by zero using ALUOpConstant") + } + // Check for unknown extensions + case LoadExtension: + switch ins.Num { + case ExtLen: + default: + return nil, fmt.Errorf("extension %d not implemented", ins.Num) + } + } + } + + // Make sure last instruction is a return instruction + switch filter[len(filter)-1].(type) { + case RetA, RetConstant: + default: + return nil, errors.New("BPF program must end with RetA or RetConstant") + } + + // Though our VM works using disassembled instructions, we + // attempt to assemble the input filter anyway to ensure it is compatible + // with an operating system VM. + _, err := Assemble(filter) + + return &VM{ + filter: filter, + }, err +} + +// Run runs the VM's BPF program against the input bytes. +// Run returns the number of bytes accepted by the BPF program, and any errors +// which occurred while processing the program. +func (v *VM) Run(in []byte) (int, error) { + var ( + // Registers of the virtual machine + regA uint32 + regX uint32 + regScratch [16]uint32 + + // OK is true if the program should continue processing the next + // instruction, or false if not, causing the loop to break + ok = true + ) + + // TODO(mdlayher): implement: + // - NegateA: + // - would require a change from uint32 registers to int32 + // registers + + // TODO(mdlayher): add interop tests that check signedness of ALU + // operations against kernel implementation, and make sure Go + // implementation matches behavior + + for i := 0; i < len(v.filter) && ok; i++ { + ins := v.filter[i] + + switch ins := ins.(type) { + case ALUOpConstant: + regA = aluOpConstant(ins, regA) + case ALUOpX: + regA, ok = aluOpX(ins, regA, regX) + case Jump: + i += int(ins.Skip) + case JumpIf: + jump := jumpIf(ins, regA) + i += jump + case JumpIfX: + jump := jumpIfX(ins, regA, regX) + i += jump + case LoadAbsolute: + regA, ok = loadAbsolute(ins, in) + case LoadConstant: + regA, regX = loadConstant(ins, regA, regX) + case LoadExtension: + regA = loadExtension(ins, in) + case LoadIndirect: + regA, ok = loadIndirect(ins, in, regX) + case LoadMemShift: + regX, ok = loadMemShift(ins, in) + case LoadScratch: + regA, regX = loadScratch(ins, regScratch, regA, regX) + case RetA: + return int(regA), nil + case RetConstant: + return int(ins.Val), nil + case StoreScratch: + regScratch = storeScratch(ins, regScratch, regA, regX) + case TAX: + regX = regA + case TXA: + regA = regX + default: + return 0, fmt.Errorf("unknown Instruction at index %d: %T", i, ins) + } + } + + return 0, nil +} diff --git a/src/runtime/vendor/golang.org/x/net/bpf/vm_instructions.go b/src/runtime/vendor/golang.org/x/net/bpf/vm_instructions.go new file mode 100644 index 000000000..cf8947c33 --- /dev/null +++ b/src/runtime/vendor/golang.org/x/net/bpf/vm_instructions.go @@ -0,0 +1,182 @@ +// Copyright 2016 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +package bpf + +import ( + "encoding/binary" + "fmt" +) + +func aluOpConstant(ins ALUOpConstant, regA uint32) uint32 { + return aluOpCommon(ins.Op, regA, ins.Val) +} + +func aluOpX(ins ALUOpX, regA uint32, regX uint32) (uint32, bool) { + // Guard against division or modulus by zero by terminating + // the program, as the OS BPF VM does + if regX == 0 { + switch ins.Op { + case ALUOpDiv, ALUOpMod: + return 0, false + } + } + + return aluOpCommon(ins.Op, regA, regX), true +} + +func aluOpCommon(op ALUOp, regA uint32, value uint32) uint32 { + switch op { + case ALUOpAdd: + return regA + value + case ALUOpSub: + return regA - value + case ALUOpMul: + return regA * value + case ALUOpDiv: + // Division by zero not permitted by NewVM and aluOpX checks + return regA / value + case ALUOpOr: + return regA | value + case ALUOpAnd: + return regA & value + case ALUOpShiftLeft: + return regA << value + case ALUOpShiftRight: + return regA >> value + case ALUOpMod: + // Modulus by zero not permitted by NewVM and aluOpX checks + return regA % value + case ALUOpXor: + return regA ^ value + default: + return regA + } +} + +func jumpIf(ins JumpIf, regA uint32) int { + return jumpIfCommon(ins.Cond, ins.SkipTrue, ins.SkipFalse, regA, ins.Val) +} + +func jumpIfX(ins JumpIfX, regA uint32, regX uint32) int { + return jumpIfCommon(ins.Cond, ins.SkipTrue, ins.SkipFalse, regA, regX) +} + +func jumpIfCommon(cond JumpTest, skipTrue, skipFalse uint8, regA uint32, value uint32) int { + var ok bool + + switch cond { + case JumpEqual: + ok = regA == value + case JumpNotEqual: + ok = regA != value + case JumpGreaterThan: + ok = regA > value + case JumpLessThan: + ok = regA < value + case JumpGreaterOrEqual: + ok = regA >= value + case JumpLessOrEqual: + ok = regA <= value + case JumpBitsSet: + ok = (regA & value) != 0 + case JumpBitsNotSet: + ok = (regA & value) == 0 + } + + if ok { + return int(skipTrue) + } + + return int(skipFalse) +} + +func loadAbsolute(ins LoadAbsolute, in []byte) (uint32, bool) { + offset := int(ins.Off) + size := int(ins.Size) + + return loadCommon(in, offset, size) +} + +func loadConstant(ins LoadConstant, regA uint32, regX uint32) (uint32, uint32) { + switch ins.Dst { + case RegA: + regA = ins.Val + case RegX: + regX = ins.Val + } + + return regA, regX +} + +func loadExtension(ins LoadExtension, in []byte) uint32 { + switch ins.Num { + case ExtLen: + return uint32(len(in)) + default: + panic(fmt.Sprintf("unimplemented extension: %d", ins.Num)) + } +} + +func loadIndirect(ins LoadIndirect, in []byte, regX uint32) (uint32, bool) { + offset := int(ins.Off) + int(regX) + size := int(ins.Size) + + return loadCommon(in, offset, size) +} + +func loadMemShift(ins LoadMemShift, in []byte) (uint32, bool) { + offset := int(ins.Off) + + // Size of LoadMemShift is always 1 byte + if !inBounds(len(in), offset, 1) { + return 0, false + } + + // Mask off high 4 bits and multiply low 4 bits by 4 + return uint32(in[offset]&0x0f) * 4, true +} + +func inBounds(inLen int, offset int, size int) bool { + return offset+size <= inLen +} + +func loadCommon(in []byte, offset int, size int) (uint32, bool) { + if !inBounds(len(in), offset, size) { + return 0, false + } + + switch size { + case 1: + return uint32(in[offset]), true + case 2: + return uint32(binary.BigEndian.Uint16(in[offset : offset+size])), true + case 4: + return uint32(binary.BigEndian.Uint32(in[offset : offset+size])), true + default: + panic(fmt.Sprintf("invalid load size: %d", size)) + } +} + +func loadScratch(ins LoadScratch, regScratch [16]uint32, regA uint32, regX uint32) (uint32, uint32) { + switch ins.Dst { + case RegA: + regA = regScratch[ins.N] + case RegX: + regX = regScratch[ins.N] + } + + return regA, regX +} + +func storeScratch(ins StoreScratch, regScratch [16]uint32, regA uint32, regX uint32) [16]uint32 { + switch ins.Src { + case RegA: + regScratch[ins.N] = regA + case RegX: + regScratch[ins.N] = regX + } + + return regScratch +} diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux.go b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux.go index bcc45d108..6bce65803 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux.go @@ -38,7 +38,8 @@ const ( AF_KEY = 0xf AF_LLC = 0x1a AF_LOCAL = 0x1 - AF_MAX = 0x2d + AF_MAX = 0x2e + AF_MCTP = 0x2d AF_MPLS = 0x1c AF_NETBEUI = 0xd AF_NETLINK = 0x10 @@ -741,6 +742,7 @@ const ( ETH_P_QINQ2 = 0x9200 ETH_P_QINQ3 = 0x9300 ETH_P_RARP = 0x8035 + ETH_P_REALTEK = 0x8899 ETH_P_SCA = 0x6007 ETH_P_SLOW = 0x8809 ETH_P_SNAP = 0x5 @@ -810,10 +812,12 @@ const ( FAN_EPIDFD = -0x2 FAN_EVENT_INFO_TYPE_DFID = 0x3 FAN_EVENT_INFO_TYPE_DFID_NAME = 0x2 + FAN_EVENT_INFO_TYPE_ERROR = 0x5 FAN_EVENT_INFO_TYPE_FID = 0x1 FAN_EVENT_INFO_TYPE_PIDFD = 0x4 FAN_EVENT_METADATA_LEN = 0x18 FAN_EVENT_ON_CHILD = 0x8000000 + FAN_FS_ERROR = 0x8000 FAN_MARK_ADD = 0x1 FAN_MARK_DONT_FOLLOW = 0x4 FAN_MARK_FILESYSTEM = 0x100 @@ -1827,6 +1831,8 @@ const ( PERF_MEM_BLK_DATA = 0x2 PERF_MEM_BLK_NA = 0x1 PERF_MEM_BLK_SHIFT = 0x28 + PERF_MEM_HOPS_0 = 0x1 + PERF_MEM_HOPS_SHIFT = 0x2b PERF_MEM_LOCK_LOCKED = 0x2 PERF_MEM_LOCK_NA = 0x1 PERF_MEM_LOCK_SHIFT = 0x18 @@ -1986,6 +1992,9 @@ const ( PR_SCHED_CORE_CREATE = 0x1 PR_SCHED_CORE_GET = 0x0 PR_SCHED_CORE_MAX = 0x4 + PR_SCHED_CORE_SCOPE_PROCESS_GROUP = 0x2 + PR_SCHED_CORE_SCOPE_THREAD = 0x0 + PR_SCHED_CORE_SCOPE_THREAD_GROUP = 0x1 PR_SCHED_CORE_SHARE_FROM = 0x3 PR_SCHED_CORE_SHARE_TO = 0x2 PR_SET_CHILD_SUBREAPER = 0x24 @@ -2167,12 +2176,23 @@ const ( RTCF_NAT = 0x800000 RTCF_VALVE = 0x200000 RTC_AF = 0x20 + RTC_BSM_DIRECT = 0x1 + RTC_BSM_DISABLED = 0x0 + RTC_BSM_LEVEL = 0x2 + RTC_BSM_STANDBY = 0x3 RTC_FEATURE_ALARM = 0x0 + RTC_FEATURE_ALARM_RES_2S = 0x3 RTC_FEATURE_ALARM_RES_MINUTE = 0x1 - RTC_FEATURE_CNT = 0x3 + RTC_FEATURE_BACKUP_SWITCH_MODE = 0x6 + RTC_FEATURE_CNT = 0x7 + RTC_FEATURE_CORRECTION = 0x5 RTC_FEATURE_NEED_WEEK_DAY = 0x2 + RTC_FEATURE_UPDATE_INTERRUPT = 0x4 RTC_IRQF = 0x80 RTC_MAX_FREQ = 0x2000 + RTC_PARAM_BACKUP_SWITCH_MODE = 0x2 + RTC_PARAM_CORRECTION = 0x1 + RTC_PARAM_FEATURES = 0x0 RTC_PF = 0x40 RTC_UF = 0x10 RTF_ADDRCLASSMASK = 0xf8000000 @@ -2532,6 +2552,8 @@ const ( SO_VM_SOCKETS_BUFFER_MIN_SIZE = 0x1 SO_VM_SOCKETS_BUFFER_SIZE = 0x0 SO_VM_SOCKETS_CONNECT_TIMEOUT = 0x6 + SO_VM_SOCKETS_CONNECT_TIMEOUT_NEW = 0x8 + SO_VM_SOCKETS_CONNECT_TIMEOUT_OLD = 0x6 SO_VM_SOCKETS_NONBLOCK_TXRX = 0x7 SO_VM_SOCKETS_PEER_HOST_VM_ID = 0x3 SO_VM_SOCKETS_TRUSTED = 0x5 diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_386.go b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_386.go index 3ca40ca7f..234fd4a5d 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_386.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_386.go @@ -250,6 +250,8 @@ const ( RTC_EPOCH_SET = 0x4004700e RTC_IRQP_READ = 0x8004700b RTC_IRQP_SET = 0x4004700c + RTC_PARAM_GET = 0x40187013 + RTC_PARAM_SET = 0x40187014 RTC_PIE_OFF = 0x7006 RTC_PIE_ON = 0x7005 RTC_PLL_GET = 0x801c7011 @@ -327,6 +329,7 @@ const ( SO_RCVTIMEO = 0x14 SO_RCVTIMEO_NEW = 0x42 SO_RCVTIMEO_OLD = 0x14 + SO_RESERVE_MEM = 0x49 SO_REUSEADDR = 0x2 SO_REUSEPORT = 0xf SO_RXQ_OVFL = 0x28 diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_amd64.go b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_amd64.go index ead332091..58619b758 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_amd64.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_amd64.go @@ -251,6 +251,8 @@ const ( RTC_EPOCH_SET = 0x4008700e RTC_IRQP_READ = 0x8008700b RTC_IRQP_SET = 0x4008700c + RTC_PARAM_GET = 0x40187013 + RTC_PARAM_SET = 0x40187014 RTC_PIE_OFF = 0x7006 RTC_PIE_ON = 0x7005 RTC_PLL_GET = 0x80207011 @@ -328,6 +330,7 @@ const ( SO_RCVTIMEO = 0x14 SO_RCVTIMEO_NEW = 0x42 SO_RCVTIMEO_OLD = 0x14 + SO_RESERVE_MEM = 0x49 SO_REUSEADDR = 0x2 SO_REUSEPORT = 0xf SO_RXQ_OVFL = 0x28 diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_arm.go b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_arm.go index 39bdc9455..3a64ff59d 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_arm.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_arm.go @@ -257,6 +257,8 @@ const ( RTC_EPOCH_SET = 0x4004700e RTC_IRQP_READ = 0x8004700b RTC_IRQP_SET = 0x4004700c + RTC_PARAM_GET = 0x40187013 + RTC_PARAM_SET = 0x40187014 RTC_PIE_OFF = 0x7006 RTC_PIE_ON = 0x7005 RTC_PLL_GET = 0x801c7011 @@ -334,6 +336,7 @@ const ( SO_RCVTIMEO = 0x14 SO_RCVTIMEO_NEW = 0x42 SO_RCVTIMEO_OLD = 0x14 + SO_RESERVE_MEM = 0x49 SO_REUSEADDR = 0x2 SO_REUSEPORT = 0xf SO_RXQ_OVFL = 0x28 diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_arm64.go b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_arm64.go index 9aec987db..abe0b9257 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_arm64.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_arm64.go @@ -247,6 +247,8 @@ const ( RTC_EPOCH_SET = 0x4008700e RTC_IRQP_READ = 0x8008700b RTC_IRQP_SET = 0x4008700c + RTC_PARAM_GET = 0x40187013 + RTC_PARAM_SET = 0x40187014 RTC_PIE_OFF = 0x7006 RTC_PIE_ON = 0x7005 RTC_PLL_GET = 0x80207011 @@ -324,6 +326,7 @@ const ( SO_RCVTIMEO = 0x14 SO_RCVTIMEO_NEW = 0x42 SO_RCVTIMEO_OLD = 0x14 + SO_RESERVE_MEM = 0x49 SO_REUSEADDR = 0x2 SO_REUSEPORT = 0xf SO_RXQ_OVFL = 0x28 diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_mips.go b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_mips.go index a8bba9491..14d7a8439 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_mips.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_mips.go @@ -250,6 +250,8 @@ const ( RTC_EPOCH_SET = 0x8004700e RTC_IRQP_READ = 0x4004700b RTC_IRQP_SET = 0x8004700c + RTC_PARAM_GET = 0x80187013 + RTC_PARAM_SET = 0x80187014 RTC_PIE_OFF = 0x20007006 RTC_PIE_ON = 0x20007005 RTC_PLL_GET = 0x401c7011 @@ -327,6 +329,7 @@ const ( SO_RCVTIMEO = 0x1006 SO_RCVTIMEO_NEW = 0x42 SO_RCVTIMEO_OLD = 0x1006 + SO_RESERVE_MEM = 0x49 SO_REUSEADDR = 0x4 SO_REUSEPORT = 0x200 SO_RXQ_OVFL = 0x28 diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_mips64.go b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_mips64.go index ee9e7e202..99e7c4ac0 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_mips64.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_mips64.go @@ -250,6 +250,8 @@ const ( RTC_EPOCH_SET = 0x8008700e RTC_IRQP_READ = 0x4008700b RTC_IRQP_SET = 0x8008700c + RTC_PARAM_GET = 0x80187013 + RTC_PARAM_SET = 0x80187014 RTC_PIE_OFF = 0x20007006 RTC_PIE_ON = 0x20007005 RTC_PLL_GET = 0x40207011 @@ -327,6 +329,7 @@ const ( SO_RCVTIMEO = 0x1006 SO_RCVTIMEO_NEW = 0x42 SO_RCVTIMEO_OLD = 0x1006 + SO_RESERVE_MEM = 0x49 SO_REUSEADDR = 0x4 SO_REUSEPORT = 0x200 SO_RXQ_OVFL = 0x28 diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_mips64le.go b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_mips64le.go index ba4b288a3..496364c33 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_mips64le.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_mips64le.go @@ -250,6 +250,8 @@ const ( RTC_EPOCH_SET = 0x8008700e RTC_IRQP_READ = 0x4008700b RTC_IRQP_SET = 0x8008700c + RTC_PARAM_GET = 0x80187013 + RTC_PARAM_SET = 0x80187014 RTC_PIE_OFF = 0x20007006 RTC_PIE_ON = 0x20007005 RTC_PLL_GET = 0x40207011 @@ -327,6 +329,7 @@ const ( SO_RCVTIMEO = 0x1006 SO_RCVTIMEO_NEW = 0x42 SO_RCVTIMEO_OLD = 0x1006 + SO_RESERVE_MEM = 0x49 SO_REUSEADDR = 0x4 SO_REUSEPORT = 0x200 SO_RXQ_OVFL = 0x28 diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_mipsle.go b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_mipsle.go index bc93afc36..3e4083085 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_mipsle.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_mipsle.go @@ -250,6 +250,8 @@ const ( RTC_EPOCH_SET = 0x8004700e RTC_IRQP_READ = 0x4004700b RTC_IRQP_SET = 0x8004700c + RTC_PARAM_GET = 0x80187013 + RTC_PARAM_SET = 0x80187014 RTC_PIE_OFF = 0x20007006 RTC_PIE_ON = 0x20007005 RTC_PLL_GET = 0x401c7011 @@ -327,6 +329,7 @@ const ( SO_RCVTIMEO = 0x1006 SO_RCVTIMEO_NEW = 0x42 SO_RCVTIMEO_OLD = 0x1006 + SO_RESERVE_MEM = 0x49 SO_REUSEADDR = 0x4 SO_REUSEPORT = 0x200 SO_RXQ_OVFL = 0x28 diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_ppc.go b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_ppc.go index 9295e6947..1151a7dfa 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_ppc.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_ppc.go @@ -305,6 +305,8 @@ const ( RTC_EPOCH_SET = 0x8004700e RTC_IRQP_READ = 0x4004700b RTC_IRQP_SET = 0x8004700c + RTC_PARAM_GET = 0x80187013 + RTC_PARAM_SET = 0x80187014 RTC_PIE_OFF = 0x20007006 RTC_PIE_ON = 0x20007005 RTC_PLL_GET = 0x401c7011 @@ -382,6 +384,7 @@ const ( SO_RCVTIMEO = 0x12 SO_RCVTIMEO_NEW = 0x42 SO_RCVTIMEO_OLD = 0x12 + SO_RESERVE_MEM = 0x49 SO_REUSEADDR = 0x2 SO_REUSEPORT = 0xf SO_RXQ_OVFL = 0x28 diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_ppc64.go b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_ppc64.go index 1fa081c9a..ed17f249e 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_ppc64.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_ppc64.go @@ -309,6 +309,8 @@ const ( RTC_EPOCH_SET = 0x8008700e RTC_IRQP_READ = 0x4008700b RTC_IRQP_SET = 0x8008700c + RTC_PARAM_GET = 0x80187013 + RTC_PARAM_SET = 0x80187014 RTC_PIE_OFF = 0x20007006 RTC_PIE_ON = 0x20007005 RTC_PLL_GET = 0x40207011 @@ -386,6 +388,7 @@ const ( SO_RCVTIMEO = 0x12 SO_RCVTIMEO_NEW = 0x42 SO_RCVTIMEO_OLD = 0x12 + SO_RESERVE_MEM = 0x49 SO_REUSEADDR = 0x2 SO_REUSEPORT = 0xf SO_RXQ_OVFL = 0x28 diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_ppc64le.go b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_ppc64le.go index 74b321149..d84a37c1a 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_ppc64le.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_ppc64le.go @@ -309,6 +309,8 @@ const ( RTC_EPOCH_SET = 0x8008700e RTC_IRQP_READ = 0x4008700b RTC_IRQP_SET = 0x8008700c + RTC_PARAM_GET = 0x80187013 + RTC_PARAM_SET = 0x80187014 RTC_PIE_OFF = 0x20007006 RTC_PIE_ON = 0x20007005 RTC_PLL_GET = 0x40207011 @@ -386,6 +388,7 @@ const ( SO_RCVTIMEO = 0x12 SO_RCVTIMEO_NEW = 0x42 SO_RCVTIMEO_OLD = 0x12 + SO_RESERVE_MEM = 0x49 SO_REUSEADDR = 0x2 SO_REUSEPORT = 0xf SO_RXQ_OVFL = 0x28 diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_riscv64.go b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_riscv64.go index c91c8ac5b..5cafba83f 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_riscv64.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_riscv64.go @@ -238,6 +238,8 @@ const ( RTC_EPOCH_SET = 0x4008700e RTC_IRQP_READ = 0x8008700b RTC_IRQP_SET = 0x4008700c + RTC_PARAM_GET = 0x40187013 + RTC_PARAM_SET = 0x40187014 RTC_PIE_OFF = 0x7006 RTC_PIE_ON = 0x7005 RTC_PLL_GET = 0x80207011 @@ -315,6 +317,7 @@ const ( SO_RCVTIMEO = 0x14 SO_RCVTIMEO_NEW = 0x42 SO_RCVTIMEO_OLD = 0x14 + SO_RESERVE_MEM = 0x49 SO_REUSEADDR = 0x2 SO_REUSEPORT = 0xf SO_RXQ_OVFL = 0x28 diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_s390x.go b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_s390x.go index b66bf2228..6d122da41 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_s390x.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_s390x.go @@ -313,6 +313,8 @@ const ( RTC_EPOCH_SET = 0x4008700e RTC_IRQP_READ = 0x8008700b RTC_IRQP_SET = 0x4008700c + RTC_PARAM_GET = 0x40187013 + RTC_PARAM_SET = 0x40187014 RTC_PIE_OFF = 0x7006 RTC_PIE_ON = 0x7005 RTC_PLL_GET = 0x80207011 @@ -390,6 +392,7 @@ const ( SO_RCVTIMEO = 0x14 SO_RCVTIMEO_NEW = 0x42 SO_RCVTIMEO_OLD = 0x14 + SO_RESERVE_MEM = 0x49 SO_REUSEADDR = 0x2 SO_REUSEPORT = 0xf SO_RXQ_OVFL = 0x28 diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_sparc64.go b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_sparc64.go index f7fb149b0..6bd19e51d 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_sparc64.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zerrors_linux_sparc64.go @@ -304,6 +304,8 @@ const ( RTC_EPOCH_SET = 0x8008700e RTC_IRQP_READ = 0x4008700b RTC_IRQP_SET = 0x8008700c + RTC_PARAM_GET = 0x80187013 + RTC_PARAM_SET = 0x80187014 RTC_PIE_OFF = 0x20007006 RTC_PIE_ON = 0x20007005 RTC_PLL_GET = 0x40207011 @@ -381,6 +383,7 @@ const ( SO_RCVTIMEO = 0x2000 SO_RCVTIMEO_NEW = 0x44 SO_RCVTIMEO_OLD = 0x2000 + SO_RESERVE_MEM = 0x52 SO_REUSEADDR = 0x4 SO_REUSEPORT = 0x200 SO_RXQ_OVFL = 0x24 diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_386.go b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_386.go index 31847d230..cac1f758b 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_386.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_386.go @@ -445,4 +445,5 @@ const ( SYS_LANDLOCK_RESTRICT_SELF = 446 SYS_MEMFD_SECRET = 447 SYS_PROCESS_MRELEASE = 448 + SYS_FUTEX_WAITV = 449 ) diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_amd64.go b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_amd64.go index 3503cbbde..f327e4a0b 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_amd64.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_amd64.go @@ -367,4 +367,5 @@ const ( SYS_LANDLOCK_RESTRICT_SELF = 446 SYS_MEMFD_SECRET = 447 SYS_PROCESS_MRELEASE = 448 + SYS_FUTEX_WAITV = 449 ) diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_arm.go b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_arm.go index 5ecd24bf6..fb06a08d4 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_arm.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_arm.go @@ -409,4 +409,5 @@ const ( SYS_LANDLOCK_ADD_RULE = 445 SYS_LANDLOCK_RESTRICT_SELF = 446 SYS_PROCESS_MRELEASE = 448 + SYS_FUTEX_WAITV = 449 ) diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_arm64.go b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_arm64.go index 7e5c94cc7..58285646e 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_arm64.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_arm64.go @@ -312,4 +312,5 @@ const ( SYS_LANDLOCK_RESTRICT_SELF = 446 SYS_MEMFD_SECRET = 447 SYS_PROCESS_MRELEASE = 448 + SYS_FUTEX_WAITV = 449 ) diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_mips.go b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_mips.go index e1e2a2bf5..3b0418e68 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_mips.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_mips.go @@ -429,4 +429,5 @@ const ( SYS_LANDLOCK_ADD_RULE = 4445 SYS_LANDLOCK_RESTRICT_SELF = 4446 SYS_PROCESS_MRELEASE = 4448 + SYS_FUTEX_WAITV = 4449 ) diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_mips64.go b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_mips64.go index 7651915a3..314ebf166 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_mips64.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_mips64.go @@ -359,4 +359,5 @@ const ( SYS_LANDLOCK_ADD_RULE = 5445 SYS_LANDLOCK_RESTRICT_SELF = 5446 SYS_PROCESS_MRELEASE = 5448 + SYS_FUTEX_WAITV = 5449 ) diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_mips64le.go b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_mips64le.go index a26a2c050..b8fbb937a 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_mips64le.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_mips64le.go @@ -359,4 +359,5 @@ const ( SYS_LANDLOCK_ADD_RULE = 5445 SYS_LANDLOCK_RESTRICT_SELF = 5446 SYS_PROCESS_MRELEASE = 5448 + SYS_FUTEX_WAITV = 5449 ) diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_mipsle.go b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_mipsle.go index fda9a6a99..ee309b2ba 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_mipsle.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_mipsle.go @@ -429,4 +429,5 @@ const ( SYS_LANDLOCK_ADD_RULE = 4445 SYS_LANDLOCK_RESTRICT_SELF = 4446 SYS_PROCESS_MRELEASE = 4448 + SYS_FUTEX_WAITV = 4449 ) diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_ppc.go b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_ppc.go index e8496150d..ac3748104 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_ppc.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_ppc.go @@ -436,4 +436,5 @@ const ( SYS_LANDLOCK_ADD_RULE = 445 SYS_LANDLOCK_RESTRICT_SELF = 446 SYS_PROCESS_MRELEASE = 448 + SYS_FUTEX_WAITV = 449 ) diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_ppc64.go b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_ppc64.go index 5ee0678a3..5aa472111 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_ppc64.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_ppc64.go @@ -408,4 +408,5 @@ const ( SYS_LANDLOCK_ADD_RULE = 445 SYS_LANDLOCK_RESTRICT_SELF = 446 SYS_PROCESS_MRELEASE = 448 + SYS_FUTEX_WAITV = 449 ) diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_ppc64le.go b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_ppc64le.go index 29c0f9a39..0793ac1a6 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_ppc64le.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_ppc64le.go @@ -408,4 +408,5 @@ const ( SYS_LANDLOCK_ADD_RULE = 445 SYS_LANDLOCK_RESTRICT_SELF = 446 SYS_PROCESS_MRELEASE = 448 + SYS_FUTEX_WAITV = 449 ) diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_riscv64.go b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_riscv64.go index 5c9a9a3b6..a520962e3 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_riscv64.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_riscv64.go @@ -310,4 +310,5 @@ const ( SYS_LANDLOCK_ADD_RULE = 445 SYS_LANDLOCK_RESTRICT_SELF = 446 SYS_PROCESS_MRELEASE = 448 + SYS_FUTEX_WAITV = 449 ) diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_s390x.go b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_s390x.go index 913f50f98..d1738586b 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_s390x.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_s390x.go @@ -373,4 +373,5 @@ const ( SYS_LANDLOCK_ADD_RULE = 445 SYS_LANDLOCK_RESTRICT_SELF = 446 SYS_PROCESS_MRELEASE = 448 + SYS_FUTEX_WAITV = 449 ) diff --git a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_sparc64.go b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_sparc64.go index 0de03a722..dfd5660f9 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_sparc64.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/zsysnum_linux_sparc64.go @@ -387,4 +387,5 @@ const ( SYS_LANDLOCK_ADD_RULE = 445 SYS_LANDLOCK_RESTRICT_SELF = 446 SYS_PROCESS_MRELEASE = 448 + SYS_FUTEX_WAITV = 449 ) diff --git a/src/runtime/vendor/golang.org/x/sys/unix/ztypes_linux.go b/src/runtime/vendor/golang.org/x/sys/unix/ztypes_linux.go index f6f0d79c4..66788f156 100644 --- a/src/runtime/vendor/golang.org/x/sys/unix/ztypes_linux.go +++ b/src/runtime/vendor/golang.org/x/sys/unix/ztypes_linux.go @@ -1144,7 +1144,8 @@ const ( PERF_RECORD_BPF_EVENT = 0x12 PERF_RECORD_CGROUP = 0x13 PERF_RECORD_TEXT_POKE = 0x14 - PERF_RECORD_MAX = 0x15 + PERF_RECORD_AUX_OUTPUT_HW_ID = 0x15 + PERF_RECORD_MAX = 0x16 PERF_RECORD_KSYMBOL_TYPE_UNKNOWN = 0x0 PERF_RECORD_KSYMBOL_TYPE_BPF = 0x1 PERF_RECORD_KSYMBOL_TYPE_OOL = 0x2 @@ -1784,7 +1785,8 @@ const ( const ( NF_NETDEV_INGRESS = 0x0 - NF_NETDEV_NUMHOOKS = 0x1 + NF_NETDEV_EGRESS = 0x1 + NF_NETDEV_NUMHOOKS = 0x2 ) const ( @@ -3166,7 +3168,13 @@ const ( DEVLINK_ATTR_RELOAD_ACTION_INFO = 0xa2 DEVLINK_ATTR_RELOAD_ACTION_STATS = 0xa3 DEVLINK_ATTR_PORT_PCI_SF_NUMBER = 0xa4 - DEVLINK_ATTR_MAX = 0xa9 + DEVLINK_ATTR_RATE_TYPE = 0xa5 + DEVLINK_ATTR_RATE_TX_SHARE = 0xa6 + DEVLINK_ATTR_RATE_TX_MAX = 0xa7 + DEVLINK_ATTR_RATE_NODE_NAME = 0xa8 + DEVLINK_ATTR_RATE_PARENT_NODE_NAME = 0xa9 + DEVLINK_ATTR_REGION_MAX_SNAPSHOTS = 0xaa + DEVLINK_ATTR_MAX = 0xaa DEVLINK_DPIPE_FIELD_MAPPING_TYPE_NONE = 0x0 DEVLINK_DPIPE_FIELD_MAPPING_TYPE_IFINDEX = 0x1 DEVLINK_DPIPE_MATCH_TYPE_FIELD_EXACT = 0x0 @@ -3463,7 +3471,14 @@ const ( ETHTOOL_MSG_CABLE_TEST_ACT = 0x1a ETHTOOL_MSG_CABLE_TEST_TDR_ACT = 0x1b ETHTOOL_MSG_TUNNEL_INFO_GET = 0x1c - ETHTOOL_MSG_USER_MAX = 0x21 + ETHTOOL_MSG_FEC_GET = 0x1d + ETHTOOL_MSG_FEC_SET = 0x1e + ETHTOOL_MSG_MODULE_EEPROM_GET = 0x1f + ETHTOOL_MSG_STATS_GET = 0x20 + ETHTOOL_MSG_PHC_VCLOCKS_GET = 0x21 + ETHTOOL_MSG_MODULE_GET = 0x22 + ETHTOOL_MSG_MODULE_SET = 0x23 + ETHTOOL_MSG_USER_MAX = 0x23 ETHTOOL_MSG_KERNEL_NONE = 0x0 ETHTOOL_MSG_STRSET_GET_REPLY = 0x1 ETHTOOL_MSG_LINKINFO_GET_REPLY = 0x2 @@ -3494,7 +3509,14 @@ const ( ETHTOOL_MSG_CABLE_TEST_NTF = 0x1b ETHTOOL_MSG_CABLE_TEST_TDR_NTF = 0x1c ETHTOOL_MSG_TUNNEL_INFO_GET_REPLY = 0x1d - ETHTOOL_MSG_KERNEL_MAX = 0x22 + ETHTOOL_MSG_FEC_GET_REPLY = 0x1e + ETHTOOL_MSG_FEC_NTF = 0x1f + ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY = 0x20 + ETHTOOL_MSG_STATS_GET_REPLY = 0x21 + ETHTOOL_MSG_PHC_VCLOCKS_GET_REPLY = 0x22 + ETHTOOL_MSG_MODULE_GET_REPLY = 0x23 + ETHTOOL_MSG_MODULE_NTF = 0x24 + ETHTOOL_MSG_KERNEL_MAX = 0x24 ETHTOOL_A_HEADER_UNSPEC = 0x0 ETHTOOL_A_HEADER_DEV_INDEX = 0x1 ETHTOOL_A_HEADER_DEV_NAME = 0x2 diff --git a/src/runtime/vendor/golang.org/x/sys/windows/syscall_windows.go b/src/runtime/vendor/golang.org/x/sys/windows/syscall_windows.go index 200b62a00..cf44e6933 100644 --- a/src/runtime/vendor/golang.org/x/sys/windows/syscall_windows.go +++ b/src/runtime/vendor/golang.org/x/sys/windows/syscall_windows.go @@ -363,6 +363,8 @@ func NewCallbackCDecl(fn interface{}) uintptr { //sys SetProcessWorkingSetSizeEx(hProcess Handle, dwMinimumWorkingSetSize uintptr, dwMaximumWorkingSetSize uintptr, flags uint32) (err error) //sys GetCommTimeouts(handle Handle, timeouts *CommTimeouts) (err error) //sys SetCommTimeouts(handle Handle, timeouts *CommTimeouts) (err error) +//sys GetActiveProcessorCount(groupNumber uint16) (ret uint32) +//sys GetMaximumProcessorCount(groupNumber uint16) (ret uint32) // Volume Management Functions //sys DefineDosDevice(flags uint32, deviceName *uint16, targetPath *uint16) (err error) = DefineDosDeviceW diff --git a/src/runtime/vendor/golang.org/x/sys/windows/types_windows.go b/src/runtime/vendor/golang.org/x/sys/windows/types_windows.go index bb31abda4..e19471c6a 100644 --- a/src/runtime/vendor/golang.org/x/sys/windows/types_windows.go +++ b/src/runtime/vendor/golang.org/x/sys/windows/types_windows.go @@ -3172,3 +3172,5 @@ type ModuleInfo struct { SizeOfImage uint32 EntryPoint uintptr } + +const ALL_PROCESSOR_GROUPS = 0xFFFF diff --git a/src/runtime/vendor/golang.org/x/sys/windows/zsyscall_windows.go b/src/runtime/vendor/golang.org/x/sys/windows/zsyscall_windows.go index 1055d47ed..9ea1a44f0 100644 --- a/src/runtime/vendor/golang.org/x/sys/windows/zsyscall_windows.go +++ b/src/runtime/vendor/golang.org/x/sys/windows/zsyscall_windows.go @@ -226,6 +226,7 @@ var ( procFreeLibrary = modkernel32.NewProc("FreeLibrary") procGenerateConsoleCtrlEvent = modkernel32.NewProc("GenerateConsoleCtrlEvent") procGetACP = modkernel32.NewProc("GetACP") + procGetActiveProcessorCount = modkernel32.NewProc("GetActiveProcessorCount") procGetCommTimeouts = modkernel32.NewProc("GetCommTimeouts") procGetCommandLineW = modkernel32.NewProc("GetCommandLineW") procGetComputerNameExW = modkernel32.NewProc("GetComputerNameExW") @@ -251,6 +252,7 @@ var ( procGetLogicalDriveStringsW = modkernel32.NewProc("GetLogicalDriveStringsW") procGetLogicalDrives = modkernel32.NewProc("GetLogicalDrives") procGetLongPathNameW = modkernel32.NewProc("GetLongPathNameW") + procGetMaximumProcessorCount = modkernel32.NewProc("GetMaximumProcessorCount") procGetModuleFileNameW = modkernel32.NewProc("GetModuleFileNameW") procGetModuleHandleExW = modkernel32.NewProc("GetModuleHandleExW") procGetNamedPipeHandleStateW = modkernel32.NewProc("GetNamedPipeHandleStateW") @@ -1967,6 +1969,12 @@ func GetACP() (acp uint32) { return } +func GetActiveProcessorCount(groupNumber uint16) (ret uint32) { + r0, _, _ := syscall.Syscall(procGetActiveProcessorCount.Addr(), 1, uintptr(groupNumber), 0, 0) + ret = uint32(r0) + return +} + func GetCommTimeouts(handle Handle, timeouts *CommTimeouts) (err error) { r1, _, e1 := syscall.Syscall(procGetCommTimeouts.Addr(), 2, uintptr(handle), uintptr(unsafe.Pointer(timeouts)), 0) if r1 == 0 { @@ -2169,6 +2177,12 @@ func GetLongPathName(path *uint16, buf *uint16, buflen uint32) (n uint32, err er return } +func GetMaximumProcessorCount(groupNumber uint16) (ret uint32) { + r0, _, _ := syscall.Syscall(procGetMaximumProcessorCount.Addr(), 1, uintptr(groupNumber), 0, 0) + ret = uint32(r0) + return +} + func GetModuleFileName(module Handle, filename *uint16, size uint32) (n uint32, err error) { r0, _, e1 := syscall.Syscall(procGetModuleFileNameW.Addr(), 3, uintptr(module), uintptr(unsafe.Pointer(filename)), uintptr(size)) n = uint32(r0) diff --git a/src/runtime/vendor/modules.txt b/src/runtime/vendor/modules.txt index be2682371..e62ed4774 100644 --- a/src/runtime/vendor/modules.txt +++ b/src/runtime/vendor/modules.txt @@ -132,6 +132,7 @@ github.com/davecgh/go-spew/spew # github.com/docker/go-events v0.0.0-20190806004212-e31b211e4f1c github.com/docker/go-events # github.com/docker/go-units v0.4.0 +## explicit github.com/docker/go-units # github.com/fsnotify/fsnotify v1.4.9 ## explicit @@ -215,7 +216,9 @@ github.com/mailru/easyjson/jlexer github.com/mailru/easyjson/jwriter # github.com/matttproud/golang_protobuf_extensions v1.0.2-0.20181231171920-c182affec369 github.com/matttproud/golang_protobuf_extensions/pbutil -# github.com/mdlayher/vsock v0.0.0-20191108225356-d9c65923cb8f +# github.com/mdlayher/socket v0.2.0 +github.com/mdlayher/socket +# github.com/mdlayher/vsock v1.1.0 ## explicit github.com/mdlayher/vsock # github.com/mitchellh/mapstructure v1.1.2 @@ -332,8 +335,9 @@ go.opentelemetry.io/otel/sdk/trace # go.opentelemetry.io/otel/trace v1.3.0 ## explicit go.opentelemetry.io/otel/trace -# golang.org/x/net v0.0.0-20211216030914-fe4d6282115f +# golang.org/x/net v0.0.0-20220127200216-cd36cc0744dd ## explicit +golang.org/x/net/bpf golang.org/x/net/context golang.org/x/net/context/ctxhttp golang.org/x/net/http/httpguts @@ -348,7 +352,7 @@ golang.org/x/oauth2 golang.org/x/oauth2/internal # golang.org/x/sync v0.0.0-20210220032951-036812b2e83c golang.org/x/sync/errgroup -# golang.org/x/sys v0.0.0-20211216021012-1d35b9e2eb4e +# golang.org/x/sys v0.0.0-20220204135822-1c1b9b1eba6a ## explicit golang.org/x/sys/execabs golang.org/x/sys/internal/unsafeheader diff --git a/src/runtime/virtcontainers/clh.go b/src/runtime/virtcontainers/clh.go index f08eba67f..4ab1400da 100644 --- a/src/runtime/virtcontainers/clh.go +++ b/src/runtime/virtcontainers/clh.go @@ -144,27 +144,27 @@ func (c *clhClientApi) VmRemoveDevicePut(ctx context.Context, vmRemoveDevice chc // Cloud hypervisor state // type CloudHypervisorState struct { - apiSocket string - PID int - VirtiofsdPID int - state clhState + apiSocket string + PID int + VirtiofsDaemonPid int + state clhState } func (s *CloudHypervisorState) reset() { s.PID = 0 - s.VirtiofsdPID = 0 + s.VirtiofsDaemonPid = 0 s.state = clhNotReady } type cloudHypervisor struct { - console console.Console - virtiofsd VirtiofsDaemon - APIClient clhClient - ctx context.Context - id string - vmconfig chclient.VmConfig - state CloudHypervisorState - config HypervisorConfig + console console.Console + virtiofsDaemon VirtiofsDaemon + APIClient clhClient + ctx context.Context + id string + vmconfig chclient.VmConfig + state CloudHypervisorState + config HypervisorConfig } var clhKernelParams = []Param{ @@ -198,6 +198,10 @@ func (clh *cloudHypervisor) setConfig(config *HypervisorConfig) error { return nil } +func (clh *cloudHypervisor) nydusdAPISocketPath(id string) (string, error) { + return utils.BuildSocketPath(clh.config.VMStorePath, id, nydusdAPISock) +} + // For cloudHypervisor this call only sets the internal structure up. // The VM will be created and started through StartVM(). func (clh *cloudHypervisor) CreateVM(ctx context.Context, id string, network Network, hypervisorConfig *HypervisorConfig) error { @@ -223,8 +227,8 @@ func (clh *cloudHypervisor) CreateVM(ctx context.Context, id string, network Net if clh.state.PID > 0 { clh.Logger().WithField("function", "CreateVM").Info("Sandbox already exist, loading from state") - clh.virtiofsd = &virtiofsd{ - PID: clh.state.VirtiofsdPID, + clh.virtiofsDaemon = &virtiofsd{ + PID: clh.state.VirtiofsDaemonPid, sourcePath: hypervisorConfig.SharedPath, debug: clh.config.Debug, socketPath: virtiofsdSocketPath, @@ -349,7 +353,7 @@ func (clh *cloudHypervisor) CreateVM(ctx context.Context, id string, network Net ApiInternal: chclient.NewAPIClient(cfg).DefaultApi, } - clh.virtiofsd = &virtiofsd{ + clh.virtiofsDaemon = &virtiofsd{ path: clh.config.VirtioFSDaemon, sourcePath: filepath.Join(GetSharePath(clh.id)), socketPath: virtiofsdSocketPath, @@ -358,6 +362,25 @@ func (clh *cloudHypervisor) CreateVM(ctx context.Context, id string, network Net cache: clh.config.VirtioFSCache, } + if clh.config.SharedFS == config.VirtioFSNydus { + apiSockPath, err := clh.nydusdAPISocketPath(clh.id) + if err != nil { + clh.Logger().WithError(err).Error("Invalid api socket path for nydusd") + return err + } + nd := &nydusd{ + path: clh.config.VirtioFSDaemon, + sockPath: virtiofsdSocketPath, + apiSockPath: apiSockPath, + sourcePath: filepath.Join(GetSharePath(clh.id)), + debug: clh.config.Debug, + extraArgs: clh.config.VirtioFSExtraArgs, + startFn: startInShimNS, + } + nd.setupShareDirFn = nd.setupPassthroughFS + clh.virtiofsDaemon = nd + } + if clh.config.SGXEPCSize > 0 { epcSection := chclient.NewSgxEpcConfig("kata-epc", clh.config.SGXEPCSize) epcSection.Prefault = func(b bool) *bool { return &b }(true) @@ -389,8 +412,8 @@ func (clh *cloudHypervisor) StartVM(ctx context.Context, timeout int) error { return err } - if clh.virtiofsd == nil { - return errors.New("Missing virtiofsd configuration") + if clh.virtiofsDaemon == nil { + return errors.New("Missing virtiofsDaemon configuration") } // This needs to be done as late as possible, just before launching @@ -402,23 +425,23 @@ func (clh *cloudHypervisor) StartVM(ctx context.Context, timeout int) error { } defer label.SetProcessLabel("") - if clh.config.SharedFS == config.VirtioFS { - clh.Logger().WithField("function", "StartVM").Info("Starting virtiofsd") - pid, err := clh.virtiofsd.Start(ctx, func() { + if clh.config.SharedFS == config.VirtioFS || clh.config.SharedFS == config.VirtioFSNydus { + clh.Logger().WithField("function", "StartVM").Info("Starting virtiofsDaemon") + pid, err := clh.virtiofsDaemon.Start(ctx, func() { clh.StopVM(ctx, false) }) if err != nil { return err } - clh.state.VirtiofsdPID = pid + clh.state.VirtiofsDaemonPid = pid } else { return errors.New("cloud-hypervisor only supports virtio based file sharing") } pid, err := clh.launchClh() if err != nil { - if shutdownErr := clh.virtiofsd.Stop(ctx); shutdownErr != nil { - clh.Logger().WithError(shutdownErr).Warn("error shutting down Virtiofsd") + if shutdownErr := clh.virtiofsDaemon.Stop(ctx); shutdownErr != nil { + clh.Logger().WithError(shutdownErr).Warn("error shutting down VirtiofsDaemon") } return fmt.Errorf("failed to launch cloud-hypervisor: %q", err) } @@ -759,14 +782,14 @@ func (clh *cloudHypervisor) toGrpc(ctx context.Context) ([]byte, error) { func (clh *cloudHypervisor) Save() (s hv.HypervisorState) { s.Pid = clh.state.PID s.Type = string(ClhHypervisor) - s.VirtiofsDaemonPid = clh.state.VirtiofsdPID + s.VirtiofsDaemonPid = clh.state.VirtiofsDaemonPid s.APISocket = clh.state.apiSocket return } func (clh *cloudHypervisor) Load(s hv.HypervisorState) { clh.state.PID = s.Pid - clh.state.VirtiofsdPID = s.VirtiofsDaemonPid + clh.state.VirtiofsDaemonPid = s.VirtiofsDaemonPid clh.state.apiSocket = s.APISocket } @@ -790,7 +813,7 @@ func (clh *cloudHypervisor) GetPids() []int { } func (clh *cloudHypervisor) GetVirtioFsPid() *int { - return &clh.state.VirtiofsdPID + return &clh.state.VirtiofsDaemonPid } func (clh *cloudHypervisor) AddDevice(ctx context.Context, devInfo interface{}, devType DeviceType) error { @@ -872,13 +895,13 @@ func (clh *cloudHypervisor) terminate(ctx context.Context, waitOnly bool) (err e return err } - if clh.virtiofsd == nil { - return errors.New("virtiofsd config is nil, failed to stop it") + if clh.virtiofsDaemon == nil { + return errors.New("virtiofsDaemon config is nil, failed to stop it") } - clh.Logger().Debug("stop virtiofsd") - if err = clh.virtiofsd.Stop(ctx); err != nil { - clh.Logger().WithError(err).Error("failed to stop virtiofsd") + clh.Logger().Debug("stop virtiofsDaemon") + if err = clh.virtiofsDaemon.Stop(ctx); err != nil { + clh.Logger().WithError(err).Error("failed to stop virtiofsDaemon") } return @@ -1181,7 +1204,7 @@ func (clh *cloudHypervisor) addNet(e Endpoint) error { // Add shared Volume using virtiofs func (clh *cloudHypervisor) addVolume(volume types.Volume) error { - if clh.config.SharedFS != config.VirtioFS { + if clh.config.SharedFS != config.VirtioFS && clh.config.SharedFS != config.VirtioFSNydus { return fmt.Errorf("shared fs method not supported %s", clh.config.SharedFS) } diff --git a/src/runtime/virtcontainers/clh_test.go b/src/runtime/virtcontainers/clh_test.go index d350dd9e9..7fbdd17fa 100644 --- a/src/runtime/virtcontainers/clh_test.go +++ b/src/runtime/virtcontainers/clh_test.go @@ -296,7 +296,7 @@ func TestClhCreateVM(t *testing.T) { assert.Exactly(clhConfig, clh.config) } -func TestClooudHypervisorStartSandbox(t *testing.T) { +func TestCloudHypervisorStartSandbox(t *testing.T) { assert := assert.New(t) clhConfig, err := newClhConfig() assert.NoError(err) @@ -308,9 +308,9 @@ func TestClooudHypervisorStartSandbox(t *testing.T) { clhConfig.RunStorePath = store.RunStoragePath() clh := &cloudHypervisor{ - config: clhConfig, - APIClient: &clhClientMock{}, - virtiofsd: &virtiofsdMock{}, + config: clhConfig, + APIClient: &clhClientMock{}, + virtiofsDaemon: &virtiofsdMock{}, } err = clh.StartVM(context.Background(), 10) diff --git a/src/runtime/virtcontainers/container.go b/src/runtime/virtcontainers/container.go index 11b7587a6..0ee4b8c2c 100644 --- a/src/runtime/virtcontainers/container.go +++ b/src/runtime/virtcontainers/container.go @@ -904,38 +904,11 @@ func (c *Container) create(ctx context.Context) (err error) { } } - var ( - machineType = c.sandbox.config.HypervisorConfig.HypervisorMachineType - normalAttachedDevs []ContainerDevice //for q35: normally attached devices - delayAttachedDevs []ContainerDevice //for q35: delay attached devices, for example, large bar space device - ) - // Fix: https://github.com/kata-containers/runtime/issues/2460 - if machineType == QemuQ35 { - // add Large Bar space device to delayAttachedDevs - for _, device := range c.devices { - var isLargeBarSpace bool - isLargeBarSpace, err = manager.IsVFIOLargeBarSpaceDevice(device.ContainerPath) - if err != nil { - return - } - if isLargeBarSpace { - delayAttachedDevs = append(delayAttachedDevs, device) - } else { - normalAttachedDevs = append(normalAttachedDevs, device) - } - } - } else { - normalAttachedDevs = c.devices - } - c.Logger().WithFields(logrus.Fields{ - "machine_type": machineType, - "devices": normalAttachedDevs, - }).Info("normal attach devices") - if len(normalAttachedDevs) > 0 { - if err = c.attachDevices(ctx, normalAttachedDevs); err != nil { - return - } + "devices": c.devices, + }).Info("Attach devices") + if err = c.attachDevices(ctx); err != nil { + return } // Deduce additional system mount info that should be handled by the agent @@ -948,17 +921,6 @@ func (c *Container) create(ctx context.Context) (err error) { } c.process = *process - // lazy attach device after createContainer for q35 - if machineType == QemuQ35 && len(delayAttachedDevs) > 0 { - c.Logger().WithFields(logrus.Fields{ - "machine_type": machineType, - "devices": delayAttachedDevs, - }).Info("lazy attach devices") - if err = c.attachDevices(ctx, delayAttachedDevs); err != nil { - return - } - } - if err = c.setContainerState(types.StateReady); err != nil { return } @@ -1320,7 +1282,7 @@ func (c *Container) hotplugDrive(ctx context.Context) error { c.rootfsSuffix = "" } // If device mapper device, then fetch the full path of the device - devicePath, fsType, err = utils.GetDevicePathAndFsType(dev.mountPoint) + devicePath, fsType, _, err = utils.GetDevicePathAndFsTypeOptions(dev.mountPoint) if err != nil { return err } @@ -1403,7 +1365,7 @@ func (c *Container) removeDrive(ctx context.Context) (err error) { return nil } -func (c *Container) attachDevices(ctx context.Context, devices []ContainerDevice) error { +func (c *Container) attachDevices(ctx context.Context) error { // there's no need to do rollback when error happens, // because if attachDevices fails, container creation will fail too, // and rollbackFailingContainerCreation could do all the rollbacks @@ -1411,7 +1373,7 @@ func (c *Container) attachDevices(ctx context.Context, devices []ContainerDevice // since devices with large bar space require delayed attachment, // the devices need to be split into two lists, normalAttachedDevs and delayAttachedDevs. // so c.device is not used here. See issue https://github.com/kata-containers/runtime/issues/2460. - for _, dev := range devices { + for _, dev := range c.devices { if err := c.sandbox.devManager.AttachDevice(ctx, dev.ID, c.sandbox); err != nil { return err } diff --git a/src/runtime/virtcontainers/device/config/pmem.go b/src/runtime/virtcontainers/device/config/pmem.go index 44ea63f72..33ce4fdff 100644 --- a/src/runtime/virtcontainers/device/config/pmem.go +++ b/src/runtime/virtcontainers/device/config/pmem.go @@ -75,7 +75,7 @@ func PmemDeviceInfo(source, destination string) (*DeviceInfo, error) { return nil, fmt.Errorf("backing file %v has not PFN signature", device.HostPath) } - _, fstype, err := utils.GetDevicePathAndFsType(source) + _, fstype, _, err := utils.GetDevicePathAndFsTypeOptions(source) if err != nil { pmemLog.WithError(err).WithField("mount-point", source).Warn("failed to get fstype: using ext4") fstype = "ext4" diff --git a/src/runtime/virtcontainers/device/manager/utils.go b/src/runtime/virtcontainers/device/manager/utils.go index ade0b6c9d..61488ef9f 100644 --- a/src/runtime/virtcontainers/device/manager/utils.go +++ b/src/runtime/virtcontainers/device/manager/utils.go @@ -7,16 +7,10 @@ package manager import ( - "fmt" - "os" "path/filepath" - "strconv" "strings" - "github.com/sirupsen/logrus" - "github.com/kata-containers/kata-containers/src/runtime/virtcontainers/device/config" - "github.com/kata-containers/kata-containers/src/runtime/virtcontainers/device/drivers" ) const ( @@ -42,101 +36,6 @@ func isBlock(devInfo config.DeviceInfo) bool { return devInfo.DevType == "b" } -// IsVFIOLargeBarSpaceDevice checks if the device is a large bar space device. -func IsVFIOLargeBarSpaceDevice(hostPath string) (bool, error) { - if !isVFIO(hostPath) { - return false, nil - } - - iommuDevicesPath := filepath.Join(config.SysIOMMUPath, filepath.Base(hostPath), "devices") - deviceFiles, err := os.ReadDir(iommuDevicesPath) - if err != nil { - return false, err - } - - // Pass all devices in iommu group - for _, deviceFile := range deviceFiles { - vfioDeviceType := drivers.GetVFIODeviceType(deviceFile.Name()) - var isLarge bool - switch vfioDeviceType { - case config.VFIODeviceNormalType: - sysfsResource := filepath.Join(iommuDevicesPath, deviceFile.Name(), "resource") - if isLarge, err = isLargeBarSpace(sysfsResource); err != nil { - return false, err - } - deviceLogger().WithFields(logrus.Fields{ - "device-file": deviceFile.Name(), - "device-type": vfioDeviceType, - "resource": sysfsResource, - "large-bar-space": isLarge, - }).Info("Detect large bar space device") - return isLarge, nil - case config.VFIODeviceMediatedType: - //TODO: support VFIODeviceMediatedType - deviceLogger().WithFields(logrus.Fields{ - "device-file": deviceFile.Name(), - "device-type": vfioDeviceType, - }).Warn("Detect large bar space device is not yet supported for VFIODeviceMediatedType") - default: - deviceLogger().WithFields(logrus.Fields{ - "device-file": deviceFile.Name(), - "device-type": vfioDeviceType, - }).Warn("Incorrect token found when detecting large bar space devices") - } - } - - return false, nil -} - -func isLargeBarSpace(resourcePath string) (bool, error) { - buf, err := os.ReadFile(resourcePath) - if err != nil { - return false, fmt.Errorf("failed to read sysfs resource: %v", err) - } - - // The resource file contains host addresses of PCI resources: - // For example: - // $ cat /sys/bus/pci/devices/0000:04:00.0/resource - // 0x00000000c6000000 0x00000000c6ffffff 0x0000000000040200 - // 0x0000383800000000 0x0000383bffffffff 0x000000000014220c - // Refer: - // resource format: https://github.com/torvalds/linux/blob/63623fd44972d1ed2bfb6e0fb631dfcf547fd1e7/drivers/pci/pci-sysfs.c#L145 - // calculate size : https://github.com/pciutils/pciutils/blob/61ecc14a327de030336f1ff3fea9c7e7e55a90ca/lspci.c#L388 - for rIdx, line := range strings.Split(string(buf), "\n") { - cols := strings.Fields(line) - // start and end columns are required to calculate the size - if len(cols) < 2 { - deviceLogger().WithField("resource-line", line).Debug("not enough columns to calculate PCI size") - continue - } - start, _ := strconv.ParseUint(cols[0], 0, 64) - end, _ := strconv.ParseUint(cols[1], 0, 64) - if start > end { - deviceLogger().WithFields(logrus.Fields{ - "start": start, - "end": end, - }).Debug("start is greater than end") - continue - } - // Use right shift to convert Bytes to GBytes - // This is equivalent to ((end - start + 1) / 1024 / 1024 / 1024) - gbSize := (end - start + 1) >> 30 - deviceLogger().WithFields(logrus.Fields{ - "resource": resourcePath, - "region": rIdx, - "start": cols[0], - "end": cols[1], - "gb-size": gbSize, - }).Debug("Check large bar space device") - //size is large than 4G - if gbSize > 4 { - return true, nil - } - } - - return false, nil -} - // isVhostUserBlk checks if the device is a VhostUserBlk device. func isVhostUserBlk(devInfo config.DeviceInfo) bool { return devInfo.DevType == "b" && devInfo.Major == config.VhostUserBlkMajor diff --git a/src/runtime/virtcontainers/device/manager/utils_test.go b/src/runtime/virtcontainers/device/manager/utils_test.go index 76239340f..ec518ce7a 100644 --- a/src/runtime/virtcontainers/device/manager/utils_test.go +++ b/src/runtime/virtcontainers/device/manager/utils_test.go @@ -7,7 +7,6 @@ package manager import ( - "os" "testing" "github.com/kata-containers/kata-containers/src/runtime/virtcontainers/device/config" @@ -104,46 +103,3 @@ func TestIsVhostUserSCSI(t *testing.T) { assert.Equal(t, d.expected, isVhostUserSCSI) } } - -func TestIsLargeBarSpace(t *testing.T) { - assert := assert.New(t) - - // File not exist - bs, err := isLargeBarSpace("/abc/xyz/123/rgb") - assert.Error(err) - assert.False(bs) - - f, err := os.CreateTemp("", "pci") - assert.NoError(err) - defer f.Close() - defer os.RemoveAll(f.Name()) - - type testData struct { - resourceInfo string - error bool - result bool - } - - for _, d := range []testData{ - {"", false, false}, - {"\t\n\t ", false, false}, - {"abc zyx", false, false}, - {"abc zyx rgb", false, false}, - {"abc\t zyx \trgb", false, false}, - {"0x00015\n0x0013", false, false}, - {"0x00000000c6000000 0x00000000c6ffffff 0x0000000000040200", false, false}, - {"0x0000383bffffffff 0x0000383800000000", false, false}, // start greater than end - {"0x0000383800000000 0x0000383bffffffff", false, true}, - {"0x0000383800000000 0x0000383bffffffff 0x000000000014220c", false, true}, - } { - f.WriteAt([]byte(d.resourceInfo), 0) - bs, err = isLargeBarSpace(f.Name()) - assert.NoError(f.Truncate(0)) - if d.error { - assert.Error(err, d.resourceInfo) - } else { - assert.NoError(err, d.resourceInfo) - } - assert.Equal(d.result, bs, d.resourceInfo) - } -} diff --git a/src/runtime/virtcontainers/kata_agent.go b/src/runtime/virtcontainers/kata_agent.go index 7f74b5d46..208b192c0 100644 --- a/src/runtime/virtcontainers/kata_agent.go +++ b/src/runtime/virtcontainers/kata_agent.go @@ -17,6 +17,7 @@ import ( "syscall" "time" + "github.com/docker/go-units" "github.com/kata-containers/kata-containers/src/runtime/pkg/katautils/katatrace" "github.com/kata-containers/kata-containers/src/runtime/pkg/uuid" "github.com/kata-containers/kata-containers/src/runtime/virtcontainers/device/api" @@ -31,6 +32,7 @@ import ( "github.com/kata-containers/kata-containers/src/runtime/virtcontainers/pkg/rootless" "github.com/kata-containers/kata-containers/src/runtime/virtcontainers/types" vcTypes "github.com/kata-containers/kata-containers/src/runtime/virtcontainers/types" + "github.com/kata-containers/kata-containers/src/runtime/virtcontainers/utils" "github.com/gogo/protobuf/proto" "github.com/opencontainers/runtime-spec/specs-go" @@ -158,6 +160,16 @@ var kataHostSharedDir = func() string { return defaultKataHostSharedDir } +func getPagesizeFromOpt(fsOpts []string) string { + // example options array: "rw", "relatime", "seclabel", "pagesize=2M" + for _, opt := range fsOpts { + if strings.HasPrefix(opt, "pagesize=") { + return strings.TrimPrefix(opt, "pagesize=") + } + } + return "" +} + // Shared path handling: // 1. create three directories for each sandbox: // -. /run/kata-containers/shared/sandboxes/$sbx_id/mounts/, a directory to hold all host/guest shared mounts @@ -1274,13 +1286,24 @@ func (k *kataAgent) rollbackFailingContainerCreation(ctx context.Context, c *Con } } -func (k *kataAgent) buildContainerRootfsWithNydus(sandbox *Sandbox, c *Container, rootPathParent string) (*grpc.Storage, error) { - if sandbox.GetHypervisorType() != string(QemuHypervisor) { - // qemu is supported first, other hypervisors will next - // https://github.com/kata-containers/kata-containers/issues/2724 +func getVirtiofsDaemonForNydus(sandbox *Sandbox) (VirtiofsDaemon, error) { + var virtiofsDaemon VirtiofsDaemon + switch sandbox.GetHypervisorType() { + case string(QemuHypervisor): + virtiofsDaemon = sandbox.hypervisor.(*qemu).virtiofsDaemon + case string(ClhHypervisor): + virtiofsDaemon = sandbox.hypervisor.(*cloudHypervisor).virtiofsDaemon + default: return nil, errNydusdNotSupport } - q, _ := sandbox.hypervisor.(*qemu) + return virtiofsDaemon, nil +} + +func (k *kataAgent) buildContainerRootfsWithNydus(sandbox *Sandbox, c *Container, rootPathParent string) (*grpc.Storage, error) { + virtiofsDaemon, err := getVirtiofsDaemonForNydus(sandbox) + if err != nil { + return nil, err + } extraOption, err := parseExtraOption(c.rootFs.Options) if err != nil { return nil, err @@ -1292,7 +1315,7 @@ func (k *kataAgent) buildContainerRootfsWithNydus(sandbox *Sandbox, c *Container } k.Logger().Infof("nydus option: %v", extraOption) // mount lowerdir to guest /run/kata-containers/shared/images//lowerdir - if err := q.virtiofsDaemon.Mount(*mountOpt); err != nil { + if err := virtiofsDaemon.Mount(*mountOpt); err != nil { return nil, err } rootfs := &grpc.Storage{} @@ -1457,6 +1480,13 @@ func (k *kataAgent) createContainer(ctx context.Context, sandbox *Sandbox, c *Co ctrStorages = append(ctrStorages, epheStorages...) + k.Logger().WithField("ociSpec Hugepage Resources", ociSpec.Linux.Resources.HugepageLimits).Debug("ociSpec HugepageLimit") + hugepages, err := k.handleHugepages(ociSpec.Mounts, ociSpec.Linux.Resources.HugepageLimits) + if err != nil { + return nil, err + } + ctrStorages = append(ctrStorages, hugepages...) + localStorages, err := k.handleLocalStorage(ociSpec.Mounts, sandbox.id, c.rootfsSuffix) if err != nil { return nil, err @@ -1537,6 +1567,71 @@ func buildProcessFromExecID(token string) (*Process, error) { }, nil } +// handleHugePages handles hugepages storage by +// creating a Storage from corresponding source of the mount point +func (k *kataAgent) handleHugepages(mounts []specs.Mount, hugepageLimits []specs.LinuxHugepageLimit) ([]*grpc.Storage, error) { + //Map to hold the total memory of each type of hugepages + optionsMap := make(map[int64]string) + + for _, hp := range hugepageLimits { + if hp.Limit != 0 { + k.Logger().WithFields(logrus.Fields{ + "Pagesize": hp.Pagesize, + "Limit": hp.Limit, + }).Info("hugepage request") + //example Pagesize 2MB, 1GB etc. The Limit are in Bytes + pageSize, err := units.RAMInBytes(hp.Pagesize) + if err != nil { + k.Logger().Error("Unable to convert pagesize to bytes") + return nil, err + } + totalHpSizeStr := strconv.FormatUint(hp.Limit, 10) + optionsMap[pageSize] = totalHpSizeStr + } + } + + var hugepages []*grpc.Storage + for idx, mnt := range mounts { + if mnt.Type != KataLocalDevType { + continue + } + //HugePages mount Type is Local + if _, fsType, fsOptions, _ := utils.GetDevicePathAndFsTypeOptions(mnt.Source); fsType == "hugetlbfs" { + k.Logger().WithField("fsOptions", fsOptions).Debug("hugepage mount options") + //Find the pagesize from the mountpoint options + pagesizeOpt := getPagesizeFromOpt(fsOptions) + if pagesizeOpt == "" { + return nil, fmt.Errorf("No pagesize option found in filesystem mount options") + } + pageSize, err := units.RAMInBytes(pagesizeOpt) + if err != nil { + k.Logger().Error("Unable to convert pagesize from fs mount options to bytes") + return nil, err + } + //Create mount option string + options := fmt.Sprintf("pagesize=%s,size=%s", strconv.FormatInt(pageSize, 10), optionsMap[pageSize]) + k.Logger().WithField("Hugepage options string", options).Debug("hugepage mount options") + // Set the mount source path to a path that resides inside the VM + mounts[idx].Source = filepath.Join(ephemeralPath(), filepath.Base(mnt.Source)) + // Set the mount type to "bind" + mounts[idx].Type = "bind" + + // Create a storage struct so that kata agent is able to create + // hugetlbfs backed volume inside the VM + hugepage := &grpc.Storage{ + Driver: KataEphemeralDevType, + Source: "nodev", + Fstype: "hugetlbfs", + MountPoint: mounts[idx].Source, + Options: []string{options}, + } + hugepages = append(hugepages, hugepage) + } + + } + return hugepages, nil +} + // handleEphemeralStorage handles ephemeral storages by // creating a Storage from corresponding source of the mount point func (k *kataAgent) handleEphemeralStorage(mounts []specs.Mount) ([]*grpc.Storage, error) { diff --git a/src/runtime/virtcontainers/kata_agent_test.go b/src/runtime/virtcontainers/kata_agent_test.go index fb596da12..6475e8b06 100644 --- a/src/runtime/virtcontainers/kata_agent_test.go +++ b/src/runtime/virtcontainers/kata_agent_test.go @@ -9,6 +9,7 @@ import ( "bufio" "context" "fmt" + "io/ioutil" "os" "path" "path/filepath" @@ -1230,3 +1231,57 @@ func TestSandboxBindMount(t *testing.T) { assert.True(os.IsNotExist(err)) } + +func TestHandleHugepages(t *testing.T) { + if os.Getuid() != 0 { + t.Skip("Test disabled as requires root user") + } + + assert := assert.New(t) + + dir, err := ioutil.TempDir("", "hugepages-test") + assert.Nil(err) + defer os.RemoveAll(dir) + + k := kataAgent{} + var mounts []specs.Mount + var hugepageLimits []specs.LinuxHugepageLimit + + hugepageDirs := [2]string{"hugepages-1Gi", "hugepages-2Mi"} + options := [2]string{"pagesize=1024M", "pagesize=2M"} + + for i := 0; i < 2; i++ { + target := path.Join(dir, hugepageDirs[i]) + err := os.MkdirAll(target, 0777) + assert.NoError(err, "Unable to create dir %s", target) + + err = syscall.Mount("nodev", target, "hugetlbfs", uintptr(0), options[i]) + assert.NoError(err, "Unable to mount %s", target) + + defer syscall.Unmount(target, 0) + defer os.RemoveAll(target) + mount := specs.Mount{ + Type: KataLocalDevType, + Source: target, + } + mounts = append(mounts, mount) + } + + hugepageLimits = []specs.LinuxHugepageLimit{ + { + Pagesize: "1GB", + Limit: 1073741824, + }, + { + Pagesize: "2MB", + Limit: 134217728, + }, + } + + hugepages, err := k.handleHugepages(mounts, hugepageLimits) + + assert.NoError(err, "Unable to handle hugepages %v", hugepageLimits) + assert.NotNil(hugepages) + assert.Equal(len(hugepages), 2) + +} diff --git a/src/runtime/virtcontainers/mount.go b/src/runtime/virtcontainers/mount.go index b23bd0879..f782c5d09 100644 --- a/src/runtime/virtcontainers/mount.go +++ b/src/runtime/virtcontainers/mount.go @@ -390,13 +390,11 @@ func bindUnmountContainerSnapshotDir(ctx context.Context, sharedDir, cID string) func nydusContainerCleanup(ctx context.Context, sharedDir string, c *Container) error { sandbox := c.sandbox - if sandbox.GetHypervisorType() != string(QemuHypervisor) { - // qemu is supported first, other hypervisors will next - // https://github.com/kata-containers/kata-containers/issues/2724 - return errNydusdNotSupport + virtiofsDaemon, err := getVirtiofsDaemonForNydus(sandbox) + if err != nil { + return err } - q, _ := sandbox.hypervisor.(*qemu) - if err := q.virtiofsDaemon.Umount(rafsMountPath(c.id)); err != nil { + if err := virtiofsDaemon.Umount(rafsMountPath(c.id)); err != nil { return errors.Wrap(err, "umount rafs failed") } if err := bindUnmountContainerSnapshotDir(ctx, sharedDir, c.id); err != nil { @@ -470,7 +468,7 @@ func IsEphemeralStorage(path string) bool { return false } - if _, fsType, _ := utils.GetDevicePathAndFsType(path); fsType == "tmpfs" { + if _, fsType, _, _ := utils.GetDevicePathAndFsTypeOptions(path); fsType == "tmpfs" { return true } @@ -485,7 +483,7 @@ func Isk8sHostEmptyDir(path string) bool { return false } - if _, fsType, _ := utils.GetDevicePathAndFsType(path); fsType != "tmpfs" { + if _, fsType, _, _ := utils.GetDevicePathAndFsTypeOptions(path); fsType != "tmpfs" { return true } return false diff --git a/src/runtime/virtcontainers/network_linux.go b/src/runtime/virtcontainers/network_linux.go index 378cd02e0..c4f2380e5 100644 --- a/src/runtime/virtcontainers/network_linux.go +++ b/src/runtime/virtcontainers/network_linux.go @@ -178,38 +178,32 @@ func (n *LinuxNetwork) addSingleEndpoint(ctx context.Context, s *Sandbox, netInf endpoint.SetProperties(netInfo) - if err := doNetNS(n.netNSPath, func(_ ns.NetNS) error { - networkLogger().WithField("endpoint-type", endpoint.Type()).WithField("hotplug", hotplug).Info("Attaching endpoint") - if hotplug { - if err := endpoint.HotAttach(ctx, s.hypervisor); err != nil { - return err - } - } else { - if err := endpoint.Attach(ctx, s); err != nil { - return err + networkLogger().WithField("endpoint-type", endpoint.Type()).WithField("hotplug", hotplug).Info("Attaching endpoint") + if hotplug { + if err := endpoint.HotAttach(ctx, s.hypervisor); err != nil { + return nil, err + } + } else { + if err := endpoint.Attach(ctx, s); err != nil { + return nil, err + } + } + + if !s.hypervisor.IsRateLimiterBuiltin() { + rxRateLimiterMaxRate := s.hypervisor.HypervisorConfig().RxRateLimiterMaxRate + if rxRateLimiterMaxRate > 0 { + networkLogger().Info("Add Rx Rate Limiter") + if err := addRxRateLimiter(endpoint, rxRateLimiterMaxRate); err != nil { + return nil, err } } - - if !s.hypervisor.IsRateLimiterBuiltin() { - rxRateLimiterMaxRate := s.hypervisor.HypervisorConfig().RxRateLimiterMaxRate - if rxRateLimiterMaxRate > 0 { - networkLogger().Info("Add Rx Rate Limiter") - if err := addRxRateLimiter(endpoint, rxRateLimiterMaxRate); err != nil { - return err - } - } - txRateLimiterMaxRate := s.hypervisor.HypervisorConfig().TxRateLimiterMaxRate - if txRateLimiterMaxRate > 0 { - networkLogger().Info("Add Tx Rate Limiter") - if err := addTxRateLimiter(endpoint, txRateLimiterMaxRate); err != nil { - return err - } + txRateLimiterMaxRate := s.hypervisor.HypervisorConfig().TxRateLimiterMaxRate + if txRateLimiterMaxRate > 0 { + networkLogger().Info("Add Tx Rate Limiter") + if err := addTxRateLimiter(endpoint, txRateLimiterMaxRate); err != nil { + return nil, err } } - - return nil - }); err != nil { - return nil, err } n.eps = append(n.eps, endpoint) @@ -298,10 +292,13 @@ func (n *LinuxNetwork) addAllEndpoints(ctx context.Context, s *Sandbox, hotplug continue } - _, err = n.addSingleEndpoint(ctx, s, netInfo, hotplug) - if err != nil { + if err := doNetNS(n.netNSPath, func(_ ns.NetNS) error { + _, err = n.addSingleEndpoint(ctx, s, netInfo, hotplug) + return err + }); err != nil { return err } + } sort.Slice(n.eps, func(i, j int) bool { @@ -335,8 +332,14 @@ func (n *LinuxNetwork) AddEndpoints(ctx context.Context, s *Sandbox, endpointsIn } } else { for _, ep := range endpointsInfo { - if _, err := n.addSingleEndpoint(ctx, s, ep, hotplug); err != nil { - n.eps = nil + if err := doNetNS(n.netNSPath, func(_ ns.NetNS) error { + if _, err := n.addSingleEndpoint(ctx, s, ep, hotplug); err != nil { + n.eps = nil + return err + } + + return nil + }); err != nil { return nil, err } } diff --git a/src/runtime/virtcontainers/nydusd.go b/src/runtime/virtcontainers/nydusd.go index 1a09b24b1..c9315ee37 100644 --- a/src/runtime/virtcontainers/nydusd.go +++ b/src/runtime/virtcontainers/nydusd.go @@ -68,7 +68,7 @@ var ( errNydusdSockPathInvalid = errors.New("nydusd sock path is invalid") errNydusdAPISockPathInvalid = errors.New("nydusd api sock path is invalid") errNydusdSourcePathInvalid = errors.New("nydusd resource path is invalid") - errNydusdNotSupport = errors.New("nydusd only supports the QEMU hypervisor currently (see https://github.com/kata-containers/kata-containers/issues/2724)") + errNydusdNotSupport = errors.New("nydusd only supports the QEMU/CLH hypervisor currently (see https://github.com/kata-containers/kata-containers/issues/3654)") ) type nydusd struct { diff --git a/src/runtime/virtcontainers/pkg/agent/protocols/client/client.go b/src/runtime/virtcontainers/pkg/agent/protocols/client/client.go index 227388d54..5728c8933 100644 --- a/src/runtime/virtcontainers/pkg/agent/protocols/client/client.go +++ b/src/runtime/virtcontainers/pkg/agent/protocols/client/client.go @@ -361,7 +361,7 @@ func VsockDialer(sock string, timeout time.Duration) (net.Conn, error) { } dialFunc := func() (net.Conn, error) { - return vsock.Dial(cid, port) + return vsock.Dial(cid, port, nil) } timeoutErr := grpcStatus.Errorf(codes.DeadlineExceeded, "timed out connecting to vsock %d:%d", cid, port) diff --git a/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/.openapi-generator/FILES b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/.openapi-generator/FILES index 22dad792f..ed5dc5764 100644 --- a/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/.openapi-generator/FILES +++ b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/.openapi-generator/FILES @@ -25,6 +25,7 @@ docs/NetConfig.md docs/NumaConfig.md docs/NumaDistance.md docs/PciDeviceInfo.md +docs/PlatformConfig.md docs/PmemConfig.md docs/RateLimiterConfig.md docs/ReceiveMigrationData.md @@ -32,6 +33,7 @@ docs/RestoreConfig.md docs/RngConfig.md docs/SendMigrationData.md docs/SgxEpcConfig.md +docs/TdxConfig.md docs/TokenBucket.md docs/VmAddDevice.md docs/VmConfig.md @@ -63,6 +65,7 @@ model_net_config.go model_numa_config.go model_numa_distance.go model_pci_device_info.go +model_platform_config.go model_pmem_config.go model_rate_limiter_config.go model_receive_migration_data.go @@ -70,6 +73,7 @@ model_restore_config.go model_rng_config.go model_send_migration_data.go model_sgx_epc_config.go +model_tdx_config.go model_token_bucket.go model_vm_add_device.go model_vm_config.go diff --git a/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/README.md b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/README.md index b9829e33e..3b7967c4b 100644 --- a/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/README.md +++ b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/README.md @@ -125,6 +125,7 @@ Class | Method | HTTP request | Description - [NumaConfig](docs/NumaConfig.md) - [NumaDistance](docs/NumaDistance.md) - [PciDeviceInfo](docs/PciDeviceInfo.md) + - [PlatformConfig](docs/PlatformConfig.md) - [PmemConfig](docs/PmemConfig.md) - [RateLimiterConfig](docs/RateLimiterConfig.md) - [ReceiveMigrationData](docs/ReceiveMigrationData.md) @@ -132,6 +133,7 @@ Class | Method | HTTP request | Description - [RngConfig](docs/RngConfig.md) - [SendMigrationData](docs/SendMigrationData.md) - [SgxEpcConfig](docs/SgxEpcConfig.md) + - [TdxConfig](docs/TdxConfig.md) - [TokenBucket](docs/TokenBucket.md) - [VmAddDevice](docs/VmAddDevice.md) - [VmConfig](docs/VmConfig.md) diff --git a/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/api/openapi.yaml b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/api/openapi.yaml index a00924ac5..b820e01cc 100644 --- a/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/api/openapi.yaml +++ b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/api/openapi.yaml @@ -374,7 +374,7 @@ components: VmInfo: description: Virtual Machine information example: - memory_actual_size: 7 + memory_actual_size: 3 state: Created config: console: @@ -382,8 +382,9 @@ components: file: file iommu: false balloon: - size: 1 + size: 6 deflate_on_oom: false + free_page_reporting: false memory: hugepages: false shared: false @@ -489,80 +490,87 @@ components: path: path numa: - distances: - - distance: 8 - destination: 4 - - distance: 8 - destination: 4 + - distance: 4 + destination: 0 + - distance: 4 + destination: 0 cpus: - - 0 - - 0 + - 6 + - 6 sgx_epc_sections: - sgx_epc_sections - sgx_epc_sections memory_zones: - memory_zones - memory_zones - guest_numa_id: 6 + guest_numa_id: 7 - distances: - - distance: 8 - destination: 4 - - distance: 8 - destination: 4 + - distance: 4 + destination: 0 + - distance: 4 + destination: 0 cpus: - - 0 - - 0 + - 6 + - 6 sgx_epc_sections: - sgx_epc_sections - sgx_epc_sections memory_zones: - memory_zones - memory_zones - guest_numa_id: 6 + guest_numa_id: 7 + tdx: + firmware: firmware rng: iommu: false src: /dev/urandom sgx_epc: - prefault: false - size: 7 + size: 0 id: id - prefault: false - size: 7 + size: 0 id: id fs: - - pci_segment: 5 - num_queues: 2 - queue_size: 6 + - pci_segment: 6 + num_queues: 1 + queue_size: 2 cache_size: 6 dax: true tag: tag socket: socket id: id - - pci_segment: 5 - num_queues: 2 - queue_size: 6 + - pci_segment: 6 + num_queues: 1 + queue_size: 2 cache_size: 6 dax: true tag: tag socket: socket id: id vsock: - pci_segment: 0 + pci_segment: 7 iommu: false socket: socket id: id cid: 3 + platform: + iommu_segments: + - 7 + - 7 + num_pci_segments: 8 pmem: - - pci_segment: 3 + - pci_segment: 6 mergeable: false file: file - size: 6 + size: 5 iommu: false id: id discard_writes: false - - pci_segment: 3 + - pci_segment: 6 mergeable: false file: file - size: 6 + size: 5 iommu: false id: id discard_writes: false @@ -591,12 +599,9 @@ components: one_time_burst: 0 refill_time: 0 mac: mac - pci_segment: 6 + pci_segment: 3 vhost_mode: Client iommu: false - fds: - - 3 - - 3 vhost_socket: vhost_socket vhost_user: false id: id @@ -615,12 +620,9 @@ components: one_time_burst: 0 refill_time: 0 mac: mac - pci_segment: 6 + pci_segment: 3 vhost_mode: Client iommu: false - fds: - - 3 - - 3 vhost_socket: vhost_socket vhost_user: false id: id @@ -710,8 +712,9 @@ components: file: file iommu: false balloon: - size: 1 + size: 6 deflate_on_oom: false + free_page_reporting: false memory: hugepages: false shared: false @@ -817,80 +820,87 @@ components: path: path numa: - distances: - - distance: 8 - destination: 4 - - distance: 8 - destination: 4 + - distance: 4 + destination: 0 + - distance: 4 + destination: 0 cpus: - - 0 - - 0 + - 6 + - 6 sgx_epc_sections: - sgx_epc_sections - sgx_epc_sections memory_zones: - memory_zones - memory_zones - guest_numa_id: 6 + guest_numa_id: 7 - distances: - - distance: 8 - destination: 4 - - distance: 8 - destination: 4 + - distance: 4 + destination: 0 + - distance: 4 + destination: 0 cpus: - - 0 - - 0 + - 6 + - 6 sgx_epc_sections: - sgx_epc_sections - sgx_epc_sections memory_zones: - memory_zones - memory_zones - guest_numa_id: 6 + guest_numa_id: 7 + tdx: + firmware: firmware rng: iommu: false src: /dev/urandom sgx_epc: - prefault: false - size: 7 + size: 0 id: id - prefault: false - size: 7 + size: 0 id: id fs: - - pci_segment: 5 - num_queues: 2 - queue_size: 6 + - pci_segment: 6 + num_queues: 1 + queue_size: 2 cache_size: 6 dax: true tag: tag socket: socket id: id - - pci_segment: 5 - num_queues: 2 - queue_size: 6 + - pci_segment: 6 + num_queues: 1 + queue_size: 2 cache_size: 6 dax: true tag: tag socket: socket id: id vsock: - pci_segment: 0 + pci_segment: 7 iommu: false socket: socket id: id cid: 3 + platform: + iommu_segments: + - 7 + - 7 + num_pci_segments: 8 pmem: - - pci_segment: 3 + - pci_segment: 6 mergeable: false file: file - size: 6 + size: 5 iommu: false id: id discard_writes: false - - pci_segment: 3 + - pci_segment: 6 mergeable: false file: file - size: 6 + size: 5 iommu: false id: id discard_writes: false @@ -919,12 +929,9 @@ components: one_time_burst: 0 refill_time: 0 mac: mac - pci_segment: 6 + pci_segment: 3 vhost_mode: Client iommu: false - fds: - - 3 - - 3 vhost_socket: vhost_socket vhost_user: false id: id @@ -943,12 +950,9 @@ components: one_time_burst: 0 refill_time: 0 mac: mac - pci_segment: 6 + pci_segment: 3 vhost_mode: Client iommu: false - fds: - - 3 - - 3 vhost_socket: vhost_socket vhost_user: false id: id @@ -998,6 +1002,8 @@ components: items: $ref: '#/components/schemas/SgxEpcConfig' type: array + tdx: + $ref: '#/components/schemas/TdxConfig' numa: items: $ref: '#/components/schemas/NumaConfig' @@ -1008,6 +1014,8 @@ components: watchdog: default: false type: boolean + platform: + $ref: '#/components/schemas/PlatformConfig' required: - kernel type: object @@ -1081,6 +1089,22 @@ components: - boot_vcpus - max_vcpus type: object + PlatformConfig: + example: + iommu_segments: + - 7 + - 7 + num_pci_segments: 8 + properties: + num_pci_segments: + format: int16 + type: integer + iommu_segments: + items: + format: int16 + type: integer + type: array + type: object MemoryZoneConfig: example: hugepages: false @@ -1353,12 +1377,9 @@ components: one_time_burst: 0 refill_time: 0 mac: mac - pci_segment: 6 + pci_segment: 3 vhost_mode: Client iommu: false - fds: - - 3 - - 3 vhost_socket: vhost_socket vhost_user: false id: id @@ -1393,11 +1414,6 @@ components: type: string id: type: string - fds: - items: - format: int32 - type: integer - type: array pci_segment: format: int16 type: integer @@ -1420,25 +1436,29 @@ components: type: object BalloonConfig: example: - size: 1 + size: 6 deflate_on_oom: false + free_page_reporting: false properties: size: format: int64 type: integer deflate_on_oom: default: false - description: Whether the balloon should deflate when the guest is under - memory pressure. + description: Deflate balloon when the guest is under memory pressure. + type: boolean + free_page_reporting: + default: false + description: Enable guest to report free pages. type: boolean required: - size type: object FsConfig: example: - pci_segment: 5 - num_queues: 2 - queue_size: 6 + pci_segment: 6 + num_queues: 1 + queue_size: 2 cache_size: 6 dax: true tag: tag @@ -1476,10 +1496,10 @@ components: type: object PmemConfig: example: - pci_segment: 3 + pci_segment: 6 mergeable: false file: file - size: 6 + size: 5 iommu: false id: id discard_writes: false @@ -1550,7 +1570,7 @@ components: type: object VsockConfig: example: - pci_segment: 0 + pci_segment: 7 iommu: false socket: socket id: id @@ -1579,7 +1599,7 @@ components: SgxEpcConfig: example: prefault: false - size: 7 + size: 0 id: id properties: id: @@ -1594,10 +1614,21 @@ components: - id - size type: object + TdxConfig: + example: + firmware: firmware + properties: + firmware: + description: Path to the firmware that will be used to boot the TDx guest + up. + type: string + required: + - firmware + type: object NumaDistance: example: - distance: 8 - destination: 4 + distance: 4 + destination: 0 properties: destination: format: int32 @@ -1612,20 +1643,20 @@ components: NumaConfig: example: distances: - - distance: 8 - destination: 4 - - distance: 8 - destination: 4 + - distance: 4 + destination: 0 + - distance: 4 + destination: 0 cpus: - - 0 - - 0 + - 6 + - 6 sgx_epc_sections: - sgx_epc_sections - sgx_epc_sections memory_zones: - memory_zones - memory_zones - guest_numa_id: 6 + guest_numa_id: 7 properties: guest_numa_id: format: int32 diff --git a/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/docs/BalloonConfig.md b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/docs/BalloonConfig.md index e6f222847..196e9c1da 100644 --- a/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/docs/BalloonConfig.md +++ b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/docs/BalloonConfig.md @@ -5,7 +5,8 @@ Name | Type | Description | Notes ------------ | ------------- | ------------- | ------------- **Size** | **int64** | | -**DeflateOnOom** | Pointer to **bool** | Whether the balloon should deflate when the guest is under memory pressure. | [optional] [default to false] +**DeflateOnOom** | Pointer to **bool** | Deflate balloon when the guest is under memory pressure. | [optional] [default to false] +**FreePageReporting** | Pointer to **bool** | Enable guest to report free pages. | [optional] [default to false] ## Methods @@ -71,6 +72,31 @@ SetDeflateOnOom sets DeflateOnOom field to given value. HasDeflateOnOom returns a boolean if a field has been set. +### GetFreePageReporting + +`func (o *BalloonConfig) GetFreePageReporting() bool` + +GetFreePageReporting returns the FreePageReporting field if non-nil, zero value otherwise. + +### GetFreePageReportingOk + +`func (o *BalloonConfig) GetFreePageReportingOk() (*bool, bool)` + +GetFreePageReportingOk returns a tuple with the FreePageReporting field if it's non-nil, zero value otherwise +and a boolean to check if the value has been set. + +### SetFreePageReporting + +`func (o *BalloonConfig) SetFreePageReporting(v bool)` + +SetFreePageReporting sets FreePageReporting field to given value. + +### HasFreePageReporting + +`func (o *BalloonConfig) HasFreePageReporting() bool` + +HasFreePageReporting returns a boolean if a field has been set. + [[Back to Model list]](../README.md#documentation-for-models) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to README]](../README.md) diff --git a/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/docs/NetConfig.md b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/docs/NetConfig.md index ff4f2dcfb..073401d19 100644 --- a/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/docs/NetConfig.md +++ b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/docs/NetConfig.md @@ -15,7 +15,6 @@ Name | Type | Description | Notes **VhostSocket** | Pointer to **string** | | [optional] **VhostMode** | Pointer to **string** | | [optional] [default to "Client"] **Id** | Pointer to **string** | | [optional] -**Fds** | Pointer to **[]int32** | | [optional] **PciSegment** | Pointer to **int32** | | [optional] **RateLimiterConfig** | Pointer to [**RateLimiterConfig**](RateLimiterConfig.md) | | [optional] @@ -313,31 +312,6 @@ SetId sets Id field to given value. HasId returns a boolean if a field has been set. -### GetFds - -`func (o *NetConfig) GetFds() []int32` - -GetFds returns the Fds field if non-nil, zero value otherwise. - -### GetFdsOk - -`func (o *NetConfig) GetFdsOk() (*[]int32, bool)` - -GetFdsOk returns a tuple with the Fds field if it's non-nil, zero value otherwise -and a boolean to check if the value has been set. - -### SetFds - -`func (o *NetConfig) SetFds(v []int32)` - -SetFds sets Fds field to given value. - -### HasFds - -`func (o *NetConfig) HasFds() bool` - -HasFds returns a boolean if a field has been set. - ### GetPciSegment `func (o *NetConfig) GetPciSegment() int32` diff --git a/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/docs/PlatformConfig.md b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/docs/PlatformConfig.md new file mode 100644 index 000000000..91adf0d99 --- /dev/null +++ b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/docs/PlatformConfig.md @@ -0,0 +1,82 @@ +# PlatformConfig + +## Properties + +Name | Type | Description | Notes +------------ | ------------- | ------------- | ------------- +**NumPciSegments** | Pointer to **int32** | | [optional] +**IommuSegments** | Pointer to **[]int32** | | [optional] + +## Methods + +### NewPlatformConfig + +`func NewPlatformConfig() *PlatformConfig` + +NewPlatformConfig instantiates a new PlatformConfig object +This constructor will assign default values to properties that have it defined, +and makes sure properties required by API are set, but the set of arguments +will change when the set of required properties is changed + +### NewPlatformConfigWithDefaults + +`func NewPlatformConfigWithDefaults() *PlatformConfig` + +NewPlatformConfigWithDefaults instantiates a new PlatformConfig object +This constructor will only assign default values to properties that have it defined, +but it doesn't guarantee that properties required by API are set + +### GetNumPciSegments + +`func (o *PlatformConfig) GetNumPciSegments() int32` + +GetNumPciSegments returns the NumPciSegments field if non-nil, zero value otherwise. + +### GetNumPciSegmentsOk + +`func (o *PlatformConfig) GetNumPciSegmentsOk() (*int32, bool)` + +GetNumPciSegmentsOk returns a tuple with the NumPciSegments field if it's non-nil, zero value otherwise +and a boolean to check if the value has been set. + +### SetNumPciSegments + +`func (o *PlatformConfig) SetNumPciSegments(v int32)` + +SetNumPciSegments sets NumPciSegments field to given value. + +### HasNumPciSegments + +`func (o *PlatformConfig) HasNumPciSegments() bool` + +HasNumPciSegments returns a boolean if a field has been set. + +### GetIommuSegments + +`func (o *PlatformConfig) GetIommuSegments() []int32` + +GetIommuSegments returns the IommuSegments field if non-nil, zero value otherwise. + +### GetIommuSegmentsOk + +`func (o *PlatformConfig) GetIommuSegmentsOk() (*[]int32, bool)` + +GetIommuSegmentsOk returns a tuple with the IommuSegments field if it's non-nil, zero value otherwise +and a boolean to check if the value has been set. + +### SetIommuSegments + +`func (o *PlatformConfig) SetIommuSegments(v []int32)` + +SetIommuSegments sets IommuSegments field to given value. + +### HasIommuSegments + +`func (o *PlatformConfig) HasIommuSegments() bool` + +HasIommuSegments returns a boolean if a field has been set. + + +[[Back to Model list]](../README.md#documentation-for-models) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to README]](../README.md) + + diff --git a/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/docs/TdxConfig.md b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/docs/TdxConfig.md new file mode 100644 index 000000000..8577bcf5b --- /dev/null +++ b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/docs/TdxConfig.md @@ -0,0 +1,51 @@ +# TdxConfig + +## Properties + +Name | Type | Description | Notes +------------ | ------------- | ------------- | ------------- +**Firmware** | **string** | Path to the firmware that will be used to boot the TDx guest up. | + +## Methods + +### NewTdxConfig + +`func NewTdxConfig(firmware string, ) *TdxConfig` + +NewTdxConfig instantiates a new TdxConfig object +This constructor will assign default values to properties that have it defined, +and makes sure properties required by API are set, but the set of arguments +will change when the set of required properties is changed + +### NewTdxConfigWithDefaults + +`func NewTdxConfigWithDefaults() *TdxConfig` + +NewTdxConfigWithDefaults instantiates a new TdxConfig object +This constructor will only assign default values to properties that have it defined, +but it doesn't guarantee that properties required by API are set + +### GetFirmware + +`func (o *TdxConfig) GetFirmware() string` + +GetFirmware returns the Firmware field if non-nil, zero value otherwise. + +### GetFirmwareOk + +`func (o *TdxConfig) GetFirmwareOk() (*string, bool)` + +GetFirmwareOk returns a tuple with the Firmware field if it's non-nil, zero value otherwise +and a boolean to check if the value has been set. + +### SetFirmware + +`func (o *TdxConfig) SetFirmware(v string)` + +SetFirmware sets Firmware field to given value. + + + +[[Back to Model list]](../README.md#documentation-for-models) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to README]](../README.md) + + diff --git a/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/docs/VmConfig.md b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/docs/VmConfig.md index b65a194b2..b01b81db5 100644 --- a/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/docs/VmConfig.md +++ b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/docs/VmConfig.md @@ -20,9 +20,11 @@ Name | Type | Description | Notes **Devices** | Pointer to [**[]DeviceConfig**](DeviceConfig.md) | | [optional] **Vsock** | Pointer to [**VsockConfig**](VsockConfig.md) | | [optional] **SgxEpc** | Pointer to [**[]SgxEpcConfig**](SgxEpcConfig.md) | | [optional] +**Tdx** | Pointer to [**TdxConfig**](TdxConfig.md) | | [optional] **Numa** | Pointer to [**[]NumaConfig**](NumaConfig.md) | | [optional] **Iommu** | Pointer to **bool** | | [optional] [default to false] **Watchdog** | Pointer to **bool** | | [optional] [default to false] +**Platform** | Pointer to [**PlatformConfig**](PlatformConfig.md) | | [optional] ## Methods @@ -448,6 +450,31 @@ SetSgxEpc sets SgxEpc field to given value. HasSgxEpc returns a boolean if a field has been set. +### GetTdx + +`func (o *VmConfig) GetTdx() TdxConfig` + +GetTdx returns the Tdx field if non-nil, zero value otherwise. + +### GetTdxOk + +`func (o *VmConfig) GetTdxOk() (*TdxConfig, bool)` + +GetTdxOk returns a tuple with the Tdx field if it's non-nil, zero value otherwise +and a boolean to check if the value has been set. + +### SetTdx + +`func (o *VmConfig) SetTdx(v TdxConfig)` + +SetTdx sets Tdx field to given value. + +### HasTdx + +`func (o *VmConfig) HasTdx() bool` + +HasTdx returns a boolean if a field has been set. + ### GetNuma `func (o *VmConfig) GetNuma() []NumaConfig` @@ -523,6 +550,31 @@ SetWatchdog sets Watchdog field to given value. HasWatchdog returns a boolean if a field has been set. +### GetPlatform + +`func (o *VmConfig) GetPlatform() PlatformConfig` + +GetPlatform returns the Platform field if non-nil, zero value otherwise. + +### GetPlatformOk + +`func (o *VmConfig) GetPlatformOk() (*PlatformConfig, bool)` + +GetPlatformOk returns a tuple with the Platform field if it's non-nil, zero value otherwise +and a boolean to check if the value has been set. + +### SetPlatform + +`func (o *VmConfig) SetPlatform(v PlatformConfig)` + +SetPlatform sets Platform field to given value. + +### HasPlatform + +`func (o *VmConfig) HasPlatform() bool` + +HasPlatform returns a boolean if a field has been set. + [[Back to Model list]](../README.md#documentation-for-models) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to README]](../README.md) diff --git a/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/model_balloon_config.go b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/model_balloon_config.go index 704bb5b8d..765da2a8a 100644 --- a/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/model_balloon_config.go +++ b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/model_balloon_config.go @@ -17,8 +17,10 @@ import ( // BalloonConfig struct for BalloonConfig type BalloonConfig struct { Size int64 `json:"size"` - // Whether the balloon should deflate when the guest is under memory pressure. + // Deflate balloon when the guest is under memory pressure. DeflateOnOom *bool `json:"deflate_on_oom,omitempty"` + // Enable guest to report free pages. + FreePageReporting *bool `json:"free_page_reporting,omitempty"` } // NewBalloonConfig instantiates a new BalloonConfig object @@ -30,6 +32,8 @@ func NewBalloonConfig(size int64) *BalloonConfig { this.Size = size var deflateOnOom bool = false this.DeflateOnOom = &deflateOnOom + var freePageReporting bool = false + this.FreePageReporting = &freePageReporting return &this } @@ -40,6 +44,8 @@ func NewBalloonConfigWithDefaults() *BalloonConfig { this := BalloonConfig{} var deflateOnOom bool = false this.DeflateOnOom = &deflateOnOom + var freePageReporting bool = false + this.FreePageReporting = &freePageReporting return &this } @@ -99,6 +105,38 @@ func (o *BalloonConfig) SetDeflateOnOom(v bool) { o.DeflateOnOom = &v } +// GetFreePageReporting returns the FreePageReporting field value if set, zero value otherwise. +func (o *BalloonConfig) GetFreePageReporting() bool { + if o == nil || o.FreePageReporting == nil { + var ret bool + return ret + } + return *o.FreePageReporting +} + +// GetFreePageReportingOk returns a tuple with the FreePageReporting field value if set, nil otherwise +// and a boolean to check if the value has been set. +func (o *BalloonConfig) GetFreePageReportingOk() (*bool, bool) { + if o == nil || o.FreePageReporting == nil { + return nil, false + } + return o.FreePageReporting, true +} + +// HasFreePageReporting returns a boolean if a field has been set. +func (o *BalloonConfig) HasFreePageReporting() bool { + if o != nil && o.FreePageReporting != nil { + return true + } + + return false +} + +// SetFreePageReporting gets a reference to the given bool and assigns it to the FreePageReporting field. +func (o *BalloonConfig) SetFreePageReporting(v bool) { + o.FreePageReporting = &v +} + func (o BalloonConfig) MarshalJSON() ([]byte, error) { toSerialize := map[string]interface{}{} if true { @@ -107,6 +145,9 @@ func (o BalloonConfig) MarshalJSON() ([]byte, error) { if o.DeflateOnOom != nil { toSerialize["deflate_on_oom"] = o.DeflateOnOom } + if o.FreePageReporting != nil { + toSerialize["free_page_reporting"] = o.FreePageReporting + } return json.Marshal(toSerialize) } diff --git a/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/model_net_config.go b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/model_net_config.go index 72bdb5422..976d8cd33 100644 --- a/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/model_net_config.go +++ b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/model_net_config.go @@ -27,7 +27,6 @@ type NetConfig struct { VhostSocket *string `json:"vhost_socket,omitempty"` VhostMode *string `json:"vhost_mode,omitempty"` Id *string `json:"id,omitempty"` - Fds *[]int32 `json:"fds,omitempty"` PciSegment *int32 `json:"pci_segment,omitempty"` RateLimiterConfig *RateLimiterConfig `json:"rate_limiter_config,omitempty"` } @@ -429,38 +428,6 @@ func (o *NetConfig) SetId(v string) { o.Id = &v } -// GetFds returns the Fds field value if set, zero value otherwise. -func (o *NetConfig) GetFds() []int32 { - if o == nil || o.Fds == nil { - var ret []int32 - return ret - } - return *o.Fds -} - -// GetFdsOk returns a tuple with the Fds field value if set, nil otherwise -// and a boolean to check if the value has been set. -func (o *NetConfig) GetFdsOk() (*[]int32, bool) { - if o == nil || o.Fds == nil { - return nil, false - } - return o.Fds, true -} - -// HasFds returns a boolean if a field has been set. -func (o *NetConfig) HasFds() bool { - if o != nil && o.Fds != nil { - return true - } - - return false -} - -// SetFds gets a reference to the given []int32 and assigns it to the Fds field. -func (o *NetConfig) SetFds(v []int32) { - o.Fds = &v -} - // GetPciSegment returns the PciSegment field value if set, zero value otherwise. func (o *NetConfig) GetPciSegment() int32 { if o == nil || o.PciSegment == nil { @@ -560,9 +527,6 @@ func (o NetConfig) MarshalJSON() ([]byte, error) { if o.Id != nil { toSerialize["id"] = o.Id } - if o.Fds != nil { - toSerialize["fds"] = o.Fds - } if o.PciSegment != nil { toSerialize["pci_segment"] = o.PciSegment } diff --git a/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/model_platform_config.go b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/model_platform_config.go new file mode 100644 index 000000000..e480c8a91 --- /dev/null +++ b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/model_platform_config.go @@ -0,0 +1,149 @@ +/* +Cloud Hypervisor API + +Local HTTP based API for managing and inspecting a cloud-hypervisor virtual machine. + +API version: 0.3.0 +*/ + +// Code generated by OpenAPI Generator (https://openapi-generator.tech); DO NOT EDIT. + +package openapi + +import ( + "encoding/json" +) + +// PlatformConfig struct for PlatformConfig +type PlatformConfig struct { + NumPciSegments *int32 `json:"num_pci_segments,omitempty"` + IommuSegments *[]int32 `json:"iommu_segments,omitempty"` +} + +// NewPlatformConfig instantiates a new PlatformConfig object +// This constructor will assign default values to properties that have it defined, +// and makes sure properties required by API are set, but the set of arguments +// will change when the set of required properties is changed +func NewPlatformConfig() *PlatformConfig { + this := PlatformConfig{} + return &this +} + +// NewPlatformConfigWithDefaults instantiates a new PlatformConfig object +// This constructor will only assign default values to properties that have it defined, +// but it doesn't guarantee that properties required by API are set +func NewPlatformConfigWithDefaults() *PlatformConfig { + this := PlatformConfig{} + return &this +} + +// GetNumPciSegments returns the NumPciSegments field value if set, zero value otherwise. +func (o *PlatformConfig) GetNumPciSegments() int32 { + if o == nil || o.NumPciSegments == nil { + var ret int32 + return ret + } + return *o.NumPciSegments +} + +// GetNumPciSegmentsOk returns a tuple with the NumPciSegments field value if set, nil otherwise +// and a boolean to check if the value has been set. +func (o *PlatformConfig) GetNumPciSegmentsOk() (*int32, bool) { + if o == nil || o.NumPciSegments == nil { + return nil, false + } + return o.NumPciSegments, true +} + +// HasNumPciSegments returns a boolean if a field has been set. +func (o *PlatformConfig) HasNumPciSegments() bool { + if o != nil && o.NumPciSegments != nil { + return true + } + + return false +} + +// SetNumPciSegments gets a reference to the given int32 and assigns it to the NumPciSegments field. +func (o *PlatformConfig) SetNumPciSegments(v int32) { + o.NumPciSegments = &v +} + +// GetIommuSegments returns the IommuSegments field value if set, zero value otherwise. +func (o *PlatformConfig) GetIommuSegments() []int32 { + if o == nil || o.IommuSegments == nil { + var ret []int32 + return ret + } + return *o.IommuSegments +} + +// GetIommuSegmentsOk returns a tuple with the IommuSegments field value if set, nil otherwise +// and a boolean to check if the value has been set. +func (o *PlatformConfig) GetIommuSegmentsOk() (*[]int32, bool) { + if o == nil || o.IommuSegments == nil { + return nil, false + } + return o.IommuSegments, true +} + +// HasIommuSegments returns a boolean if a field has been set. +func (o *PlatformConfig) HasIommuSegments() bool { + if o != nil && o.IommuSegments != nil { + return true + } + + return false +} + +// SetIommuSegments gets a reference to the given []int32 and assigns it to the IommuSegments field. +func (o *PlatformConfig) SetIommuSegments(v []int32) { + o.IommuSegments = &v +} + +func (o PlatformConfig) MarshalJSON() ([]byte, error) { + toSerialize := map[string]interface{}{} + if o.NumPciSegments != nil { + toSerialize["num_pci_segments"] = o.NumPciSegments + } + if o.IommuSegments != nil { + toSerialize["iommu_segments"] = o.IommuSegments + } + return json.Marshal(toSerialize) +} + +type NullablePlatformConfig struct { + value *PlatformConfig + isSet bool +} + +func (v NullablePlatformConfig) Get() *PlatformConfig { + return v.value +} + +func (v *NullablePlatformConfig) Set(val *PlatformConfig) { + v.value = val + v.isSet = true +} + +func (v NullablePlatformConfig) IsSet() bool { + return v.isSet +} + +func (v *NullablePlatformConfig) Unset() { + v.value = nil + v.isSet = false +} + +func NewNullablePlatformConfig(val *PlatformConfig) *NullablePlatformConfig { + return &NullablePlatformConfig{value: val, isSet: true} +} + +func (v NullablePlatformConfig) MarshalJSON() ([]byte, error) { + return json.Marshal(v.value) +} + +func (v *NullablePlatformConfig) UnmarshalJSON(src []byte) error { + v.isSet = true + return json.Unmarshal(src, &v.value) +} diff --git a/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/model_tdx_config.go b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/model_tdx_config.go new file mode 100644 index 000000000..3fe9057da --- /dev/null +++ b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/model_tdx_config.go @@ -0,0 +1,107 @@ +/* +Cloud Hypervisor API + +Local HTTP based API for managing and inspecting a cloud-hypervisor virtual machine. + +API version: 0.3.0 +*/ + +// Code generated by OpenAPI Generator (https://openapi-generator.tech); DO NOT EDIT. + +package openapi + +import ( + "encoding/json" +) + +// TdxConfig struct for TdxConfig +type TdxConfig struct { + // Path to the firmware that will be used to boot the TDx guest up. + Firmware string `json:"firmware"` +} + +// NewTdxConfig instantiates a new TdxConfig object +// This constructor will assign default values to properties that have it defined, +// and makes sure properties required by API are set, but the set of arguments +// will change when the set of required properties is changed +func NewTdxConfig(firmware string) *TdxConfig { + this := TdxConfig{} + this.Firmware = firmware + return &this +} + +// NewTdxConfigWithDefaults instantiates a new TdxConfig object +// This constructor will only assign default values to properties that have it defined, +// but it doesn't guarantee that properties required by API are set +func NewTdxConfigWithDefaults() *TdxConfig { + this := TdxConfig{} + return &this +} + +// GetFirmware returns the Firmware field value +func (o *TdxConfig) GetFirmware() string { + if o == nil { + var ret string + return ret + } + + return o.Firmware +} + +// GetFirmwareOk returns a tuple with the Firmware field value +// and a boolean to check if the value has been set. +func (o *TdxConfig) GetFirmwareOk() (*string, bool) { + if o == nil { + return nil, false + } + return &o.Firmware, true +} + +// SetFirmware sets field value +func (o *TdxConfig) SetFirmware(v string) { + o.Firmware = v +} + +func (o TdxConfig) MarshalJSON() ([]byte, error) { + toSerialize := map[string]interface{}{} + if true { + toSerialize["firmware"] = o.Firmware + } + return json.Marshal(toSerialize) +} + +type NullableTdxConfig struct { + value *TdxConfig + isSet bool +} + +func (v NullableTdxConfig) Get() *TdxConfig { + return v.value +} + +func (v *NullableTdxConfig) Set(val *TdxConfig) { + v.value = val + v.isSet = true +} + +func (v NullableTdxConfig) IsSet() bool { + return v.isSet +} + +func (v *NullableTdxConfig) Unset() { + v.value = nil + v.isSet = false +} + +func NewNullableTdxConfig(val *TdxConfig) *NullableTdxConfig { + return &NullableTdxConfig{value: val, isSet: true} +} + +func (v NullableTdxConfig) MarshalJSON() ([]byte, error) { + return json.Marshal(v.value) +} + +func (v *NullableTdxConfig) UnmarshalJSON(src []byte) error { + v.isSet = true + return json.Unmarshal(src, &v.value) +} diff --git a/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/model_vm_config.go b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/model_vm_config.go index cf4e7e0d5..24c2e6289 100644 --- a/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/model_vm_config.go +++ b/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/model_vm_config.go @@ -32,9 +32,11 @@ type VmConfig struct { Devices *[]DeviceConfig `json:"devices,omitempty"` Vsock *VsockConfig `json:"vsock,omitempty"` SgxEpc *[]SgxEpcConfig `json:"sgx_epc,omitempty"` + Tdx *TdxConfig `json:"tdx,omitempty"` Numa *[]NumaConfig `json:"numa,omitempty"` Iommu *bool `json:"iommu,omitempty"` Watchdog *bool `json:"watchdog,omitempty"` + Platform *PlatformConfig `json:"platform,omitempty"` } // NewVmConfig instantiates a new VmConfig object @@ -578,6 +580,38 @@ func (o *VmConfig) SetSgxEpc(v []SgxEpcConfig) { o.SgxEpc = &v } +// GetTdx returns the Tdx field value if set, zero value otherwise. +func (o *VmConfig) GetTdx() TdxConfig { + if o == nil || o.Tdx == nil { + var ret TdxConfig + return ret + } + return *o.Tdx +} + +// GetTdxOk returns a tuple with the Tdx field value if set, nil otherwise +// and a boolean to check if the value has been set. +func (o *VmConfig) GetTdxOk() (*TdxConfig, bool) { + if o == nil || o.Tdx == nil { + return nil, false + } + return o.Tdx, true +} + +// HasTdx returns a boolean if a field has been set. +func (o *VmConfig) HasTdx() bool { + if o != nil && o.Tdx != nil { + return true + } + + return false +} + +// SetTdx gets a reference to the given TdxConfig and assigns it to the Tdx field. +func (o *VmConfig) SetTdx(v TdxConfig) { + o.Tdx = &v +} + // GetNuma returns the Numa field value if set, zero value otherwise. func (o *VmConfig) GetNuma() []NumaConfig { if o == nil || o.Numa == nil { @@ -674,6 +708,38 @@ func (o *VmConfig) SetWatchdog(v bool) { o.Watchdog = &v } +// GetPlatform returns the Platform field value if set, zero value otherwise. +func (o *VmConfig) GetPlatform() PlatformConfig { + if o == nil || o.Platform == nil { + var ret PlatformConfig + return ret + } + return *o.Platform +} + +// GetPlatformOk returns a tuple with the Platform field value if set, nil otherwise +// and a boolean to check if the value has been set. +func (o *VmConfig) GetPlatformOk() (*PlatformConfig, bool) { + if o == nil || o.Platform == nil { + return nil, false + } + return o.Platform, true +} + +// HasPlatform returns a boolean if a field has been set. +func (o *VmConfig) HasPlatform() bool { + if o != nil && o.Platform != nil { + return true + } + + return false +} + +// SetPlatform gets a reference to the given PlatformConfig and assigns it to the Platform field. +func (o *VmConfig) SetPlatform(v PlatformConfig) { + o.Platform = &v +} + func (o VmConfig) MarshalJSON() ([]byte, error) { toSerialize := map[string]interface{}{} if o.Cpus != nil { @@ -724,6 +790,9 @@ func (o VmConfig) MarshalJSON() ([]byte, error) { if o.SgxEpc != nil { toSerialize["sgx_epc"] = o.SgxEpc } + if o.Tdx != nil { + toSerialize["tdx"] = o.Tdx + } if o.Numa != nil { toSerialize["numa"] = o.Numa } @@ -733,6 +802,9 @@ func (o VmConfig) MarshalJSON() ([]byte, error) { if o.Watchdog != nil { toSerialize["watchdog"] = o.Watchdog } + if o.Platform != nil { + toSerialize["platform"] = o.Platform + } return json.Marshal(toSerialize) } diff --git a/src/runtime/virtcontainers/pkg/cloud-hypervisor/cloud-hypervisor.yaml b/src/runtime/virtcontainers/pkg/cloud-hypervisor/cloud-hypervisor.yaml index 476c179fe..c4dcae04c 100644 --- a/src/runtime/virtcontainers/pkg/cloud-hypervisor/cloud-hypervisor.yaml +++ b/src/runtime/virtcontainers/pkg/cloud-hypervisor/cloud-hypervisor.yaml @@ -500,6 +500,8 @@ components: type: array items: $ref: '#/components/schemas/SgxEpcConfig' + tdx: + $ref: '#/components/schemas/TdxConfig' numa: type: array items: @@ -510,6 +512,8 @@ components: watchdog: type: boolean default: false + platform: + $ref: '#/components/schemas/PlatformConfig' description: Virtual machine configuration CpuAffinity: @@ -557,6 +561,18 @@ components: items: $ref: '#/components/schemas/CpuAffinity' + PlatformConfig: + type: object + properties: + num_pci_segments: + type: integer + format: int16 + iommu_segments: + type: array + items: + type: integer + format: int16 + MemoryZoneConfig: required: - id @@ -771,11 +787,6 @@ components: default: "Client" id: type: string - fds: - type: array - items: - type: integer - format: int32 pci_segment: type: integer format: int16 @@ -805,7 +816,11 @@ components: deflate_on_oom: type: boolean default: false - description: Whether the balloon should deflate when the guest is under memory pressure. + description: Deflate balloon when the guest is under memory pressure. + free_page_reporting: + type: boolean + default: false + description: Enable guest to report free pages. FsConfig: required: @@ -933,6 +948,15 @@ components: type: boolean default: false + TdxConfig: + required: + - firmware + type: object + properties: + firmware: + type: string + description: Path to the firmware that will be used to boot the TDx guest up. + NumaDistance: required: - destination diff --git a/src/runtime/virtcontainers/sandbox.go b/src/runtime/virtcontainers/sandbox.go index 2c9ba4264..2254cd294 100644 --- a/src/runtime/virtcontainers/sandbox.go +++ b/src/runtime/virtcontainers/sandbox.go @@ -1946,11 +1946,13 @@ func (s *Sandbox) updateResources(ctx context.Context) error { sandboxVCPUs += s.hypervisor.HypervisorConfig().NumVCPUs sandboxMemoryByte, sandboxneedPodSwap, sandboxSwapByte := s.calculateSandboxMemory() + // Add default / rsvd memory for sandbox. - hypervisorMemoryByte := int64(s.hypervisor.HypervisorConfig().MemorySize) << utils.MibToBytesShift + hypervisorMemoryByteI64 := int64(s.hypervisor.HypervisorConfig().MemorySize) << utils.MibToBytesShift + hypervisorMemoryByte := uint64(hypervisorMemoryByteI64) sandboxMemoryByte += hypervisorMemoryByte if sandboxneedPodSwap { - sandboxSwapByte += hypervisorMemoryByte + sandboxSwapByte += hypervisorMemoryByteI64 } s.Logger().WithField("sandboxMemoryByte", sandboxMemoryByte).WithField("sandboxneedPodSwap", sandboxneedPodSwap).WithField("sandboxSwapByte", sandboxSwapByte).Debugf("updateResources: after calculateSandboxMemory") @@ -1982,7 +1984,8 @@ func (s *Sandbox) updateResources(ctx context.Context) error { // Update Memory s.Logger().WithField("memory-sandbox-size-byte", sandboxMemoryByte).Debugf("Request to hypervisor to update memory") - newMemory, updatedMemoryDevice, err := s.hypervisor.ResizeMemory(ctx, uint32(sandboxMemoryByte>>utils.MibToBytesShift), s.state.GuestMemoryBlockSizeMB, s.state.GuestMemoryHotplugProbe) + newMemoryMB := uint32(sandboxMemoryByte >> utils.MibToBytesShift) + newMemory, updatedMemoryDevice, err := s.hypervisor.ResizeMemory(ctx, newMemoryMB, s.state.GuestMemoryBlockSizeMB, s.state.GuestMemoryHotplugProbe) if err != nil { if err == noGuestMemHotplugErr { s.Logger().Warnf("%s, memory specifications cannot be guaranteed", err) @@ -2005,8 +2008,8 @@ func (s *Sandbox) updateResources(ctx context.Context) error { return nil } -func (s *Sandbox) calculateSandboxMemory() (int64, bool, int64) { - memorySandbox := int64(0) +func (s *Sandbox) calculateSandboxMemory() (uint64, bool, int64) { + memorySandbox := uint64(0) needPodSwap := false swapSandbox := int64(0) for _, c := range s.config.Containers { @@ -2020,8 +2023,17 @@ func (s *Sandbox) calculateSandboxMemory() (int64, bool, int64) { currentLimit := int64(0) if m.Limit != nil && *m.Limit > 0 { currentLimit = *m.Limit - memorySandbox += currentLimit + memorySandbox += uint64(currentLimit) + s.Logger().WithField("memory limit", memorySandbox).Info("Memory Sandbox + Memory Limit ") } + + // Add hugepages memory + // HugepageLimit is uint64 - https://github.com/opencontainers/runtime-spec/blob/master/specs-go/config.go#L242 + for _, l := range c.Resources.HugepageLimits { + memorySandbox += l.Limit + } + + // Add swap if s.config.HypervisorConfig.GuestSwap && m.Swappiness != nil && *m.Swappiness > 0 { currentSwap := int64(0) if m.Swap != nil { @@ -2039,6 +2051,7 @@ func (s *Sandbox) calculateSandboxMemory() (int64, bool, int64) { } } } + return memorySandbox, needPodSwap, swapSandbox } diff --git a/src/runtime/virtcontainers/sandbox_test.go b/src/runtime/virtcontainers/sandbox_test.go index ca02210cd..9c64c8b22 100644 --- a/src/runtime/virtcontainers/sandbox_test.go +++ b/src/runtime/virtcontainers/sandbox_test.go @@ -152,13 +152,14 @@ func TestCalculateSandboxMem(t *testing.T) { sandbox.config = &SandboxConfig{} unconstrained := newTestContainerConfigNoop("cont-00001") constrained := newTestContainerConfigNoop("cont-00001") - limit := int64(4000) - constrained.Resources.Memory = &specs.LinuxMemory{Limit: &limit} + mlimit := int64(4000) + limit := uint64(4000) + constrained.Resources.Memory = &specs.LinuxMemory{Limit: &mlimit} tests := []struct { name string containers []ContainerConfig - want int64 + want uint64 }{ {"1-unconstrained", []ContainerConfig{unconstrained}, 0}, {"2-unconstrained", []ContainerConfig{unconstrained, unconstrained}, 0}, @@ -187,7 +188,7 @@ func TestCalculateSandboxMemHandlesNegativeLimits(t *testing.T) { sandbox.config.Containers = []ContainerConfig{container} mem, needSwap, swap := sandbox.calculateSandboxMemory() - assert.Equal(t, mem, int64(0)) + assert.Equal(t, mem, uint64(0)) assert.Equal(t, needSwap, false) assert.Equal(t, swap, int64(0)) } @@ -581,7 +582,7 @@ func TestSandboxAttachDevicesVFIO(t *testing.T) { containers[c.id].sandbox = &sandbox - err = containers[c.id].attachDevices(context.Background(), c.devices) + err = containers[c.id].attachDevices(context.Background()) assert.Nil(t, err, "Error while attaching devices %s", err) err = containers[c.id].detachDevices(context.Background()) @@ -676,7 +677,7 @@ func TestSandboxAttachDevicesVhostUserBlk(t *testing.T) { containers[c.id].sandbox = &sandbox - err = containers[c.id].attachDevices(context.Background(), c.devices) + err = containers[c.id].attachDevices(context.Background()) assert.Nil(t, err, "Error while attaching vhost-user-blk devices %s", err) err = containers[c.id].detachDevices(context.Background()) @@ -1639,3 +1640,41 @@ func TestGetSandboxCpuSet(t *testing.T) { }) } } + +func TestSandboxHugepageLimit(t *testing.T) { + contConfig1 := newTestContainerConfigNoop("cont-00001") + contConfig2 := newTestContainerConfigNoop("cont-00002") + limit := int64(4000) + contConfig1.Resources.Memory = &specs.LinuxMemory{Limit: &limit} + contConfig2.Resources.Memory = &specs.LinuxMemory{Limit: &limit} + hConfig := newHypervisorConfig(nil, nil) + + defer cleanUp() + // create a sandbox + s, err := testCreateSandbox(t, + testSandboxID, + MockHypervisor, + hConfig, + NetworkConfig{}, + []ContainerConfig{contConfig1, contConfig2}, + nil) + + assert.NoError(t, err) + + hugepageLimits := []specs.LinuxHugepageLimit{ + { + Pagesize: "1GB", + Limit: 322122547, + }, + { + Pagesize: "2MB", + Limit: 134217728, + }, + } + + for i := range s.config.Containers { + s.config.Containers[i].Resources.HugepageLimits = hugepageLimits + } + err = s.updateResources(context.Background()) + assert.NoError(t, err) +} diff --git a/src/runtime/virtcontainers/utils/utils_linux.go b/src/runtime/virtcontainers/utils/utils_linux.go index 90e0d631b..265f10d11 100644 --- a/src/runtime/virtcontainers/utils/utils_linux.go +++ b/src/runtime/virtcontainers/utils/utils_linux.go @@ -96,11 +96,12 @@ const ( procDeviceIndex = iota procPathIndex procTypeIndex + procOptionIndex ) -// GetDevicePathAndFsType gets the device for the mount point and the file system type -// of the mount. -func GetDevicePathAndFsType(mountPoint string) (devicePath, fsType string, err error) { +// GetDevicePathAndFsTypeOptions gets the device for the mount point, the file system type +// and mount options +func GetDevicePathAndFsTypeOptions(mountPoint string) (devicePath, fsType string, fsOptions []string, err error) { if mountPoint == "" { err = fmt.Errorf("Mount point cannot be empty") return @@ -134,6 +135,7 @@ func GetDevicePathAndFsType(mountPoint string) (devicePath, fsType string, err e if mountPoint == fields[procPathIndex] { devicePath = fields[procDeviceIndex] fsType = fields[procTypeIndex] + fsOptions = strings.Split(fields[procOptionIndex], ",") return } } diff --git a/src/runtime/virtcontainers/utils/utils_linux_test.go b/src/runtime/virtcontainers/utils/utils_linux_test.go index c7b2b8793..dbf9fde38 100644 --- a/src/runtime/virtcontainers/utils/utils_linux_test.go +++ b/src/runtime/virtcontainers/utils/utils_linux_test.go @@ -6,7 +6,10 @@ package utils import ( + "bytes" "errors" + "os/exec" + "strings" "testing" "github.com/stretchr/testify/assert" @@ -34,20 +37,31 @@ func TestFindContextID(t *testing.T) { assert.Error(err) } -func TestGetDevicePathAndFsTypeEmptyMount(t *testing.T) { +func TestGetDevicePathAndFsTypeOptionsEmptyMount(t *testing.T) { assert := assert.New(t) - _, _, err := GetDevicePathAndFsType("") + _, _, _, err := GetDevicePathAndFsTypeOptions("") assert.Error(err) } -func TestGetDevicePathAndFsTypeSuccessful(t *testing.T) { +func TestGetDevicePathAndFsTypeOptionsSuccessful(t *testing.T) { assert := assert.New(t) - path, fstype, err := GetDevicePathAndFsType("/proc") + cmdStr := "grep ^proc /proc/mounts" + cmd := exec.Command("sh", "-c", cmdStr) + output, err := cmd.Output() + assert.NoError(err) + + data := bytes.Split(output, []byte(" ")) + fstypeOut := string(data[2]) + optsOut := strings.Split(string(data[3]), ",") + + path, fstype, fsOptions, err := GetDevicePathAndFsTypeOptions("/proc") assert.NoError(err) assert.Equal(path, "proc") assert.Equal(fstype, "proc") + assert.Equal(fstype, fstypeOut) + assert.Equal(fsOptions, optsOut) } func TestIsAPVFIOMediatedDeviceFalse(t *testing.T) { diff --git a/src/runtime/virtcontainers/utils/utils_test.go b/src/runtime/virtcontainers/utils/utils_test.go index f37e0f222..838e72ef4 100644 --- a/src/runtime/virtcontainers/utils/utils_test.go +++ b/src/runtime/virtcontainers/utils/utils_test.go @@ -297,9 +297,9 @@ func TestBuildSocketPath(t *testing.T) { msg := fmt.Sprintf("test[%d]: %+v", i, d) if d.valid { - assert.NoErrorf(err, "test %d, data %+v", i, d, msg) + assert.NoError(err, msg) } else { - assert.Errorf(err, "test %d, data %+v", i, d, msg) + assert.Error(err, msg) } assert.NotNil(result, msg) diff --git a/src/tools/agent-ctl/Makefile b/src/tools/agent-ctl/Makefile index 1cb20e1d7..df3eacf24 100644 --- a/src/tools/agent-ctl/Makefile +++ b/src/tools/agent-ctl/Makefile @@ -5,6 +5,7 @@ include ../../../utils.mk +.DEFAULT_GOAL := default default: build build: logging-crate-tests @@ -24,7 +25,7 @@ test: install: @RUSTFLAGS="$(EXTRA_RUSTFLAGS) --deny warnings" cargo install --target $(TRIPLE) --path . -check: +check: standard_rust_check .PHONY: \ build \ diff --git a/src/tools/agent-ctl/src/client.rs b/src/tools/agent-ctl/src/client.rs index 6456c8ec3..e7f4dcb96 100644 --- a/src/tools/agent-ctl/src/client.rs +++ b/src/tools/agent-ctl/src/client.rs @@ -24,7 +24,6 @@ use std::os::unix::io::{IntoRawFd, RawFd}; use std::os::unix::net::UnixStream; use std::thread::sleep; use std::time::Duration; -use ttrpc; use ttrpc::context::Context; // Run the specified closure to set an automatic value if the ttRPC Context @@ -93,13 +92,13 @@ struct BuiltinCmd { } // Command that causes the agent to exit (iff tracing is enabled) -const SHUTDOWN_CMD: &'static str = "DestroySandbox"; +const SHUTDOWN_CMD: &str = "DestroySandbox"; // Command that requests this program ends -const CMD_QUIT: &'static str = "quit"; -const CMD_REPEAT: &'static str = "repeat"; +const CMD_QUIT: &str = "quit"; +const CMD_REPEAT: &str = "repeat"; -const DEFAULT_PROC_SIGNAL: &'static str = "SIGKILL"; +const DEFAULT_PROC_SIGNAL: &str = "SIGKILL"; const ERR_API_FAILED: &str = "API failed"; @@ -110,7 +109,7 @@ const METADATA_CFG_NS: &str = "agent-ctl-cfg"; // automatically. const NO_AUTO_VALUES_CFG_NAME: &str = "no-auto-values"; -static AGENT_CMDS: &'static [AgentCmd] = &[ +static AGENT_CMDS: &[AgentCmd] = &[ AgentCmd { name: "AddARPNeighbors", st: ServiceType::Agent, @@ -283,7 +282,7 @@ static AGENT_CMDS: &'static [AgentCmd] = &[ }, ]; -static BUILTIN_CMDS: &'static [BuiltinCmd] = &[ +static BUILTIN_CMDS: & [BuiltinCmd] = &[ BuiltinCmd { name: "echo", descr: "Display the arguments", @@ -663,7 +662,7 @@ pub fn client(cfg: &Config, commands: Vec<&str>) -> Result<()> { "server-address" => cfg.server_address.to_string()); if cfg.interactive { - return interactive_client_loop(&cfg, &mut options, &client, &health, &image, &ttrpc_ctx); + return interactive_client_loop(cfg, &mut options, &client, &health, &image, &ttrpc_ctx); } let mut repeat_count = 1; @@ -675,18 +674,17 @@ pub fn client(cfg: &Config, commands: Vec<&str>) -> Result<()> { } let (result, shutdown) = handle_cmd( - &cfg, + cfg, &client, &health, &image, &ttrpc_ctx, repeat_count, &mut options, - &cmd, + cmd, ); - if result.is_err() { - return result; - } + + result.map_err(|e| anyhow!(e))?; if shutdown { break; @@ -719,7 +717,7 @@ fn handle_cmd( return (Ok(()), false); } - let first = match cmd.chars().nth(0) { + let first = match cmd.chars().next() { Some(c) => c, None => return (Err(anyhow!("failed to check command name")), false), }; @@ -786,12 +784,12 @@ fn handle_cmd( } fn handle_builtin_cmd(cmd: &str, args: &str) -> (Result<()>, bool) { - let f = match get_builtin_cmd_func(&cmd) { + let f = match get_builtin_cmd_func(cmd) { Ok(fp) => fp, Err(e) => return (Err(e), false), }; - f(&args) + f(args) } // Execute the ttRPC specified by the first field of "line". Return a result @@ -805,12 +803,12 @@ fn handle_agent_cmd( cmd: &str, args: &str, ) -> (Result<()>, bool) { - let f = match get_agent_cmd_func(&cmd) { + let f = match get_agent_cmd_func(cmd) { Ok(fp) => fp, Err(e) => return (Err(e), false), }; - let result = f(ctx, client, health, image, options, &args); + let result = f(ctx, client, health, image, options, args); if result.is_err() { return (result, false); } @@ -858,9 +856,7 @@ fn interactive_client_loop( options, &cmdline, ); - if result.is_err() { - return result; - } + result.map_err(|e| anyhow!(e))?; if shutdown { break; @@ -1468,7 +1464,7 @@ fn agent_cmd_container_tty_win_resize( let rows_str = utils::get_option("row", options, args)?; - if rows_str != "" { + if !rows_str.is_empty() { let rows = rows_str .parse::() .map_err(|e| anyhow!(e).context("invalid row size"))?; @@ -1477,7 +1473,7 @@ fn agent_cmd_container_tty_win_resize( let cols_str = utils::get_option("column", options, args)?; - if cols_str != "" { + if !cols_str.is_empty() { let cols = cols_str .parse::() .map_err(|e| anyhow!(e).context("invalid column size"))?; @@ -1555,7 +1551,7 @@ fn agent_cmd_container_read_stdout( let length_str = utils::get_option("len", options, args)?; - if length_str != "" { + if !length_str.is_empty() { let length = length_str .parse::() .map_err(|e| anyhow!(e).context("invalid length"))?; @@ -1598,7 +1594,7 @@ fn agent_cmd_container_read_stderr( let length_str = utils::get_option("len", options, args)?; - if length_str != "" { + if !length_str.is_empty() { let length = length_str .parse::() .map_err(|e| anyhow!(e).context("invalid length"))?; @@ -1720,13 +1716,13 @@ fn agent_cmd_sandbox_copy_file( run_if_auto_values!(ctx, || -> Result<()> { let path = utils::get_option("path", options, args)?; - if path != "" { + if !path.is_empty() { req.set_path(path); } let file_size_str = utils::get_option("file_size", options, args)?; - if file_size_str != "" { + if !file_size_str.is_empty() { let file_size = file_size_str .parse::() .map_err(|e| anyhow!(e).context("invalid file_size"))?; @@ -1736,7 +1732,7 @@ fn agent_cmd_sandbox_copy_file( let file_mode_str = utils::get_option("file_mode", options, args)?; - if file_mode_str != "" { + if !file_mode_str.is_empty() { let file_mode = file_mode_str .parse::() .map_err(|e| anyhow!(e).context("invalid file_mode"))?; @@ -1746,7 +1742,7 @@ fn agent_cmd_sandbox_copy_file( let dir_mode_str = utils::get_option("dir_mode", options, args)?; - if dir_mode_str != "" { + if !dir_mode_str.is_empty() { let dir_mode = dir_mode_str .parse::() .map_err(|e| anyhow!(e).context("invalid dir_mode"))?; @@ -1756,7 +1752,7 @@ fn agent_cmd_sandbox_copy_file( let uid_str = utils::get_option("uid", options, args)?; - if uid_str != "" { + if !uid_str.is_empty() { let uid = uid_str .parse::() .map_err(|e| anyhow!(e).context("invalid uid"))?; @@ -1766,7 +1762,7 @@ fn agent_cmd_sandbox_copy_file( let gid_str = utils::get_option("gid", options, args)?; - if gid_str != "" { + if !gid_str.is_empty() { let gid = gid_str .parse::() .map_err(|e| anyhow!(e).context("invalid gid"))?; @@ -1775,7 +1771,7 @@ fn agent_cmd_sandbox_copy_file( let offset_str = utils::get_option("offset", options, args)?; - if offset_str != "" { + if !offset_str.is_empty() { let offset = offset_str .parse::() .map_err(|e| anyhow!(e).context("invalid offset"))?; @@ -1783,7 +1779,7 @@ fn agent_cmd_sandbox_copy_file( } let data_str = utils::get_option("data", options, args)?; - if data_str != "" { + if !data_str.is_empty() { let data = utils::str_to_bytes(&data_str)?; req.set_data(data.to_vec()); } @@ -1851,7 +1847,7 @@ fn agent_cmd_sandbox_online_cpu_mem( run_if_auto_values!(ctx, || -> Result<()> { let wait_str = utils::get_option("wait", options, args)?; - if wait_str != "" { + if !wait_str.is_empty() { let wait = wait_str .parse::() .map_err(|e| anyhow!(e).context("invalid wait bool"))?; @@ -1861,7 +1857,7 @@ fn agent_cmd_sandbox_online_cpu_mem( let nb_cpus_str = utils::get_option("nb_cpus", options, args)?; - if nb_cpus_str != "" { + if !nb_cpus_str.is_empty() { let nb_cpus = nb_cpus_str .parse::() .map_err(|e| anyhow!(e).context("invalid nb_cpus value"))?; @@ -1871,7 +1867,7 @@ fn agent_cmd_sandbox_online_cpu_mem( let cpu_only_str = utils::get_option("cpu_only", options, args)?; - if cpu_only_str != "" { + if !cpu_only_str.is_empty() { let cpu_only = cpu_only_str .parse::() .map_err(|e| anyhow!(e).context("invalid cpu_only bool"))?; @@ -1909,7 +1905,7 @@ fn agent_cmd_sandbox_set_guest_date_time( run_if_auto_values!(ctx, || -> Result<()> { let secs_str = utils::get_option("sec", options, args)?; - if secs_str != "" { + if !secs_str.is_empty() { let secs = secs_str .parse::() .map_err(|e| anyhow!(e).context("invalid seconds"))?; @@ -1919,7 +1915,7 @@ fn agent_cmd_sandbox_set_guest_date_time( let usecs_str = utils::get_option("usec", options, args)?; - if usecs_str != "" { + if !usecs_str.is_empty() { let usecs = usecs_str .parse::() .map_err(|e| anyhow!(e).context("invalid useconds"))?; @@ -2023,7 +2019,7 @@ fn agent_cmd_sandbox_mem_hotplug_by_probe( if !addr_list.is_empty() { let addrs: Vec = addr_list // Convert into a list of string values. - .split(",") + .split(',') // Convert each string element into a u8 array of bytes, ignoring // those elements that fail the conversion. .filter_map(|s| hex::decode(s.trim_start_matches("0x")).ok()) @@ -2126,7 +2122,7 @@ fn builtin_cmd_list(_args: &str) -> (Result<()>, bool) { cmds.iter().for_each(|n| println!(" - {}", n)); - println!(""); + println!(); (Ok(()), false) } @@ -2147,8 +2143,8 @@ fn get_repeat_count(cmdline: &str) -> i64 { let count = fields[1]; match count.parse::() { - Ok(n) => return n, - Err(_) => return default_repeat_count, + Ok(n) => n, + Err(_) => default_repeat_count, } } diff --git a/src/tools/agent-ctl/src/main.rs b/src/tools/agent-ctl/src/main.rs index 88c12e984..61701a664 100644 --- a/src/tools/agent-ctl/src/main.rs +++ b/src/tools/agent-ctl/src/main.rs @@ -191,11 +191,7 @@ fn connect(name: &str, global_args: clap::ArgMatches) -> Result<()> { let result = rpc::run(&logger, &cfg, commands); - if result.is_err() { - return result; - } - - Ok(()) + result.map_err(|e| anyhow!(e)) } fn real_main() -> Result<()> { diff --git a/src/tools/agent-ctl/src/utils.rs b/src/tools/agent-ctl/src/utils.rs index 14359c6d9..8dbfa53f4 100644 --- a/src/tools/agent-ctl/src/utils.rs +++ b/src/tools/agent-ctl/src/utils.rs @@ -92,12 +92,10 @@ pub fn signame_to_signum(name: &str) -> Result { return Err(anyhow!("invalid signal")); } - match name.parse::() { - Ok(n) => return Ok(n), - - // "fall through" on error as we assume the name is not a number, but - // a signal name. - Err(_) => (), + // "fall through" on error as we assume the name is not a number, but + // a signal name. + if let Ok(n) = name.parse::() { + return Ok(n); } let mut search_term: String; @@ -129,8 +127,7 @@ pub fn human_time_to_ns(human_time: &str) -> Result { let d: humantime::Duration = human_time .parse::() - .map_err(|e| anyhow!(e))? - .into(); + .map_err(|e| anyhow!(e))?; Ok(d.as_nanos() as i64) } @@ -262,8 +259,8 @@ fn config_file_from_bundle_dir(bundle_dir: &str) -> Result { fn root_oci_to_ttrpc(bundle_dir: &str, root: &ociRoot) -> Result { let root_dir = root.path.clone(); - let path = if root_dir.starts_with("/") { - root_dir.clone() + let path = if root_dir.starts_with('/') { + root_dir } else { // Expand the root directory into an absolute value let abs_root_dir = PathBuf::from(&bundle_dir).join(&root_dir); @@ -685,13 +682,13 @@ fn linux_oci_to_ttrpc(l: &ociLinux) -> ttrpcLinux { fn oci_to_ttrpc(bundle_dir: &str, cid: &str, oci: &ociSpec) -> Result { let process = match &oci.process { - Some(p) => protobuf::SingularPtrField::some(process_oci_to_ttrpc(&p)), + Some(p) => protobuf::SingularPtrField::some(process_oci_to_ttrpc(p)), None => protobuf::SingularPtrField::none(), }; let root = match &oci.root { Some(r) => { - let ttrpc_root = root_oci_to_ttrpc(bundle_dir, &r).map_err(|e| e)?; + let ttrpc_root = root_oci_to_ttrpc(bundle_dir, r).map_err(|e| e)?; protobuf::SingularPtrField::some(ttrpc_root) } @@ -700,7 +697,7 @@ fn oci_to_ttrpc(bundle_dir: &str, cid: &str, oci: &ociSpec) -> Result let mut mounts = protobuf::RepeatedField::new(); for m in &oci.mounts { - mounts.push(mount_oci_to_ttrpc(&m)); + mounts.push(mount_oci_to_ttrpc(m)); } let linux = match &oci.linux { @@ -770,7 +767,7 @@ pub fn get_ttrpc_spec(options: &mut Options, cid: &str) -> Result { let oci_spec: ociSpec = serde_json::from_str(&json_spec).map_err(|e| anyhow!(e))?; - Ok(oci_to_ttrpc(&bundle_dir, cid, &oci_spec)?) + oci_to_ttrpc(&bundle_dir, cid, &oci_spec) } pub fn str_to_bytes(s: &str) -> Result> { diff --git a/src/tools/trace-forwarder/Makefile b/src/tools/trace-forwarder/Makefile index 06530face..5b1c53849 100644 --- a/src/tools/trace-forwarder/Makefile +++ b/src/tools/trace-forwarder/Makefile @@ -5,6 +5,7 @@ include ../../../utils.mk +.DEFAULT_GOAL := default default: build build: logging-crate-tests @@ -24,7 +25,7 @@ test: install: -check: +check: standard_rust_check .PHONY: \ build \ diff --git a/src/tools/trace-forwarder/src/handler.rs b/src/tools/trace-forwarder/src/handler.rs index 1ac76ec2d..b669f0613 100644 --- a/src/tools/trace-forwarder/src/handler.rs +++ b/src/tools/trace-forwarder/src/handler.rs @@ -73,8 +73,7 @@ async fn handle_trace_data<'a>( let payload_len: u64 = NetworkEndian::read_u64(&header); - let mut encoded_payload = Vec::with_capacity(payload_len as usize); - encoded_payload.resize(payload_len as usize, 0); + let mut encoded_payload = vec![0; payload_len as usize]; reader .read_exact(&mut encoded_payload) diff --git a/tools/packaging/kata-deploy/local-build/kata-deploy-binaries.sh b/tools/packaging/kata-deploy/local-build/kata-deploy-binaries.sh index b5b1717a9..1864140e1 100755 --- a/tools/packaging/kata-deploy/local-build/kata-deploy-binaries.sh +++ b/tools/packaging/kata-deploy/local-build/kata-deploy-binaries.sh @@ -125,11 +125,7 @@ install_firecracker() { # Install static cloud-hypervisor asset install_clh() { - local cloud_hypervisor_repo - local cloud_hypervisor_version - - cloud_hypervisor_repo="$(yq r $versions_yaml assets.hypervisor.cloud_hypervisor.url)" - cloud_hypervisor_version="$(yq r $versions_yaml assets.hypervisor.cloud_hypervisor.version)" + export extra_build_args="--features tdx" info "build static cloud-hypervisor" "${clh_builder}" diff --git a/tools/packaging/kata-deploy/scripts/kata-deploy.sh b/tools/packaging/kata-deploy/scripts/kata-deploy.sh index 50361379e..425a65802 100755 --- a/tools/packaging/kata-deploy/scripts/kata-deploy.sh +++ b/tools/packaging/kata-deploy/scripts/kata-deploy.sh @@ -20,6 +20,8 @@ shims=( ) [ "${CONFIGURE_CC:-}" == "yes" ] && shims+=("cc") +default_shim="qemu" + # If we fail for any reason a message will be displayed die() { msg="$*" @@ -98,6 +100,11 @@ function configure_different_shims_base() { KATA_CONF_FILE=/opt/kata/share/defaults/kata-containers/configuration-${shim}.toml /opt/kata/bin/containerd-shim-kata-v2 "\$@" EOT chmod +x "$shim_file" + + if [ "${shim}" == "${default_shim}" ]; then + echo "Creating the default shim-v2 binary" + ln -sf "${shim_file}" /usr/local/bin/containerd-shim-kata-v2 + fi done } @@ -113,6 +120,8 @@ function cleanup_different_shims_base() { mv "$shim_backup" "$shim_file" fi done + + rm /usr/local/bin/containerd-shim-kata-v2 } function configure_crio_runtime() { diff --git a/tools/packaging/kernel/configs/fragments/powerpc/base.conf b/tools/packaging/kernel/configs/fragments/powerpc/base.conf index f5d7abee6..080172c43 100644 --- a/tools/packaging/kernel/configs/fragments/powerpc/base.conf +++ b/tools/packaging/kernel/configs/fragments/powerpc/base.conf @@ -4,6 +4,5 @@ CONFIG_64BIT=y CONFIG_HW_RANDOM_PSERIES=y CONFIG_HAS_IOMEM=y -CONFIG_SYS_SUPPORTS_HUGETLBFS=y CONFIG_VIRTUALIZATION=y CONFIG_PPC_OF_BOOT_TRAMPOLINE=y diff --git a/tools/packaging/kernel/configs/fragments/whitelist.conf b/tools/packaging/kernel/configs/fragments/whitelist.conf index 78c41613e..74d6a2ce4 100644 --- a/tools/packaging/kernel/configs/fragments/whitelist.conf +++ b/tools/packaging/kernel/configs/fragments/whitelist.conf @@ -12,3 +12,6 @@ CONFIG_CRYPTO_DEV_SP_PSP CONFIG_CRYPTO_DEV_CCP CONFIG_HAVE_NET_DSA CONFIG_NF_LOG_COMMON +CONFIG_MANDATORY_FILE_LOCKING +CONFIG_ARM64_UAO +CONFIG_VFIO_MDEV_DEVICE diff --git a/tools/packaging/kernel/configs/fragments/build-type/experimental/sgx.conf b/tools/packaging/kernel/configs/fragments/x86_64/sgx.conf similarity index 100% rename from tools/packaging/kernel/configs/fragments/build-type/experimental/sgx.conf rename to tools/packaging/kernel/configs/fragments/x86_64/sgx.conf diff --git a/tools/packaging/kernel/configs/fragments/x86_64/tdx/tdx.conf b/tools/packaging/kernel/configs/fragments/x86_64/tdx/tdx.conf new file mode 100644 index 000000000..214c469b5 --- /dev/null +++ b/tools/packaging/kernel/configs/fragments/x86_64/tdx/tdx.conf @@ -0,0 +1,12 @@ +# Intel Trust Domain Extensions (Intel TDX) + +CONFIG_EFI_STUB=y +CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS=y +CONFIG_INTEL_TDX_GUEST=y +CONFIG_INTEL_TDX_FIXES=y +CONFIG_X86_MEM_ENCRYPT_COMMON=y +CONFIG_X86_5LEVEL=y +CONFIG_OF=y +CONFIG_CLK_LGM_CGU=y +CONFIG_OF_RESERVED_MEM=y +CONFIG_DMA_RESTRICTED_POOL=y diff --git a/tools/packaging/kernel/kata_config_version b/tools/packaging/kernel/kata_config_version index d22307c42..8643cf6de 100644 --- a/tools/packaging/kernel/kata_config_version +++ b/tools/packaging/kernel/kata_config_version @@ -1 +1 @@ -88 +89 diff --git a/tools/packaging/kernel/patches/5.15.x/no_patches.txt b/tools/packaging/kernel/patches/5.15.x/no_patches.txt new file mode 100644 index 000000000..8b1378917 --- /dev/null +++ b/tools/packaging/kernel/patches/5.15.x/no_patches.txt @@ -0,0 +1 @@ + diff --git a/tools/packaging/qemu/patches/tag_patches/tdx-qemu-2021.11.29-v6.0.0-rc1-mvp/no_patches.txt b/tools/packaging/qemu/patches/tag_patches/tdx-qemu-2021.11.29-v6.0.0-rc1-mvp/no_patches.txt new file mode 100644 index 000000000..e69de29bb diff --git a/tools/packaging/scripts/apply_patches.sh b/tools/packaging/scripts/apply_patches.sh index 68d3d4260..de8580845 100755 --- a/tools/packaging/scripts/apply_patches.sh +++ b/tools/packaging/scripts/apply_patches.sh @@ -40,7 +40,7 @@ if [ -d "$patches_dir" ]; then echo "INFO: Found ${#patches[@]} patches" for patch in ${patches[@]}; do echo "INFO: Apply $patch" - git apply "$patch" || \ + patch -p1 < "$patch" || \ { echo >&2 "ERROR: Not applied. Exiting..."; exit 1; } done else diff --git a/tools/packaging/static-build/cloud-hypervisor/build-static-clh.sh b/tools/packaging/static-build/cloud-hypervisor/build-static-clh.sh index f174daac0..61a0824eb 100755 --- a/tools/packaging/static-build/cloud-hypervisor/build-static-clh.sh +++ b/tools/packaging/static-build/cloud-hypervisor/build-static-clh.sh @@ -15,6 +15,8 @@ ARCH=$(uname -m) script_dir=$(dirname $(readlink -f "$0")) kata_version="${kata_version:-}" +force_build_from_source="${force_build_from_source:-false}" +extra_build_args="${extra_build_args:-}" source "${script_dir}/../../scripts/lib.sh" @@ -39,6 +41,7 @@ pull_clh_released_binary() { curl --fail -L ${cloud_hypervisor_binary} -o cloud-hypervisor-static || return 1 mkdir -p cloud-hypervisor mv -f cloud-hypervisor-static cloud-hypervisor/cloud-hypervisor + chmod +x cloud_hypervisor/cloud-hypervisor } build_clh_from_source() { @@ -49,13 +52,31 @@ build_clh_from_source() { pushd "${repo_dir}" git fetch || true git checkout "${cloud_hypervisor_version}" - ./scripts/dev_cli.sh build --release --libc musl + if [ -n "${extra_build_args}" ]; then + info "Build cloud-hypervisor with extra args: ${extra_build_args}" + ./scripts/dev_cli.sh build --release --libc musl -- ${extra_build_args} + else + ./scripts/dev_cli.sh build --release --libc musl + fi rm -f cloud-hypervisor cp build/cargo_target/$(uname -m)-unknown-linux-musl/release/cloud-hypervisor . popd } -if [ ${ARCH} == "aarch64" ] || ! pull_clh_released_binary; then - info "arch is aarch64 or failed to pull cloud-hypervisor released binary on x86_64, trying to build from source" - build_clh_from_source +if [ "${ARCH}" == "aarch64" ]; then + info "aarch64 binaries are not distributed as part of the Cloud Hypervisor releases, forcing to build from source" + force_build_from_source="true" +fi + +if [ -n "${extra_build_args}" ]; then + info "As an extra build argument has been passed to the script, forcing to build from source" + force_build_from_source="true" +fi + +if [ "${force_build_from_source}" == "true" ]; then + info "Build cloud-hypervisor from source as it's been request via the force_build_from_source flag" + build_clh_from_source +else + pull_clh_released_binary || + (info "Failed to pull cloud-hypervisor released binary, trying to build from source" && build_clh_from_source) fi diff --git a/tools/packaging/static-build/kernel/Dockerfile b/tools/packaging/static-build/kernel/Dockerfile index cd1a59f2d..2595a08e7 100644 --- a/tools/packaging/static-build/kernel/Dockerfile +++ b/tools/packaging/static-build/kernel/Dockerfile @@ -16,6 +16,7 @@ RUN apt-get update && \ flex \ git \ iptables \ - libelf-dev && \ + libelf-dev \ + patch && \ if [ "$(uname -m)" = "s390x" ]; then apt-get install -y --no-install-recommends libssl-dev; fi && \ apt-get clean && rm -rf /var/lib/lists/ diff --git a/tools/packaging/static-build/qemu/Dockerfile b/tools/packaging/static-build/qemu/Dockerfile index f32644fec..61cc6ce95 100644 --- a/tools/packaging/static-build/qemu/Dockerfile +++ b/tools/packaging/static-build/qemu/Dockerfile @@ -43,6 +43,7 @@ RUN apt-get update && apt-get upgrade -y && \ pkg-config \ libseccomp-dev \ libseccomp2 \ + patch \ python \ python-dev \ rsync \ diff --git a/utils.mk b/utils.mk index dbfefb5a2..e833b40d7 100644 --- a/utils.mk +++ b/utils.mk @@ -145,3 +145,9 @@ endif TRIPLE = $(ARCH)-unknown-linux-$(LIBC) CWD := $(shell dirname $(realpath $(firstword $(MAKEFILE_LIST)))) + +standard_rust_check: + cargo fmt -- --check + cargo clippy --all-targets --all-features --release \ + -- \ + -D warnings diff --git a/utils/README.md b/utils/README.md index 57e9a879b..255568a71 100644 --- a/utils/README.md +++ b/utils/README.md @@ -38,18 +38,36 @@ If you still wish to continue, but prefer a manual installation, see ## Install a minimal Kata Containers system +By default, the script will attempt to install Kata Containers and +containerd, and then configure containerd to use Kata Containers. However, +the script provides a number of options to allow you to change its +behaviour. + +> **Note:** +> +> Before running the script to install Kata Containers, we recommend +> that you [review the available options](#show-available-options). + +### Show available options + +To show the available options without installing anything, run: + +```sh +$ bash -c "$(curl -fsSL https://raw.githubusercontent.com/kata-containers/kata-containers/main/utils/kata-manager.sh) -h" +``` + +### To install Kata Containers only + +If your system already has containerd installed, to install Kata Containers and only configure containerd, run: + +```sh +$ bash -c "$(curl -fsSL https://raw.githubusercontent.com/kata-containers/kata-containers/main/utils/kata-manager.sh) -o" +``` + +### To install Kata Containers and containerd + To install and configure a system with Kata Containers and containerd, run: ```bash $ bash -c "$(curl -fsSL https://raw.githubusercontent.com/kata-containers/kata-containers/main/utils/kata-manager.sh)" ``` - -> **Notes:** -> -> - The script must be run on a system that does not have Kata Containers or -> containerd already installed on it. -> -> - The script accepts up to two parameters which can be used to test -> pre-release versions (a Kata Containers version, and a containerd -> version). If either version is unspecified or specified as `""`, the -> latest official version will be installed. diff --git a/utils/kata-manager.sh b/utils/kata-manager.sh index 52416b920..38260c865 100755 --- a/utils/kata-manager.sh +++ b/utils/kata-manager.sh @@ -136,16 +136,31 @@ github_get_release_file_url() local url="${1:-}" local version="${2:-}" - download_urls=$(curl -sL "$url" |\ + local arch=$(uname -m) + + local regex="" + + case "$url" in + *kata*) + regex="kata-static-.*-${arch}.tar.xz" + ;; + + *containerd*) + [ "$arch" = "x86_64" ] && arch="amd64" + regex="containerd-.*-linux-${arch}.tar.gz" + ;; + + *) die "invalid url: '$url'" ;; + esac + + local download_url + + download_url=$(curl -sL "$url" |\ jq --arg version "$version" \ -r '.[] | select(.tag_name == $version) | .assets[].browser_download_url' |\ - grep static) + grep "/${regex}$") - [ -z "$download_urls" ] && die "Cannot determine download URL for version $version ($url)" - - local arch=$(uname -m) - local download_url=$(grep "$arch" <<< "$download_urls") - [ -z "$download_url" ] && die "No release for architecture '$arch' ($url)" + [ -z "$download_url" ] && die "Cannot determine download URL for version $version ($url)" echo "$download_url" } @@ -187,11 +202,17 @@ usage() cat < []] -Description: Install $kata_project [1] and $containerd_project [2] from GitHub release binaries. +Description: Install $kata_project [1] (and optionally $containerd_project [2]) + from GitHub release binaries. Options: - -h : Show this help statement. + -c : Specify containerd version. + -f : Force installation (use with care). + -h : Show this help statement. + -k : Specify Kata Containers version. + -o : Only install Kata Containers. + -r : Don't cleanup on failure (retain files). Notes: @@ -231,6 +252,18 @@ only_supports_cgroups_v2() return 0 } +# Return 0 if containerd is already installed, else return 1. +containerd_installed() +{ + command -v containerd &>/dev/null && return 0 + + systemctl list-unit-files --type service |\ + egrep -q "^${containerd_service_name}\>" \ + && return 0 + + return 1 +} + pre_checks() { info "Running pre-checks" @@ -238,12 +271,11 @@ pre_checks() command -v "${kata_shim_v2}" &>/dev/null \ && die "Please remove existing $kata_project installation" - command -v containerd &>/dev/null \ - && die "$containerd_project already installed" + local ret - systemctl list-unit-files --type service |\ - egrep -q "^${containerd_service_name}\>" \ - && die "$containerd_project already installed" + { containerd_installed; ret=$?; } || true + + [ "$ret" -eq 0 ] && die "$containerd_project already installed" local cgroups_v2_only=$(only_supports_cgroups_v2 || true) @@ -298,9 +330,18 @@ check_deps() setup() { - trap cleanup EXIT + local cleanup="${1:-}" + [ -z "$cleanup" ] && die "no cleanup value" + + local force="${2:-}" + [ -z "$force" ] && die "no force value" + + [ "$cleanup" = "true" ] && trap cleanup EXIT + source /etc/os-release || source /usr/lib/os-release + [ "$force" = "true" ] && return 0 + pre_checks check_deps } @@ -313,6 +354,8 @@ github_download_package() { local releases_url="${1:-}" local requested_version="${2:-}" + + # Only used for error message local project="${3:-}" [ -z "$releases_url" ] && die "need releases URL" @@ -320,7 +363,7 @@ github_download_package() local version=$(github_resolve_version_to_download \ "$releases_url" \ - "$version" || true) + "$requested_version" || true) [ -z "$version" ] && die "Unable to determine $project version to download" @@ -359,7 +402,7 @@ install_containerd() sudo tar -C /usr/local -xvf "${file}" - sudo ln -s /usr/local/bin/ctr "${link_dir}" + sudo ln -sf /usr/local/bin/ctr "${link_dir}" info "$project installed\n" } @@ -372,31 +415,35 @@ configure_containerd() local cfg="/etc/containerd/config.toml" - pushd "$tmpdir" >/dev/null - - local service_url=$(printf "%s/%s/%s/%s" \ - "https://raw.githubusercontent.com" \ - "${containerd_slug}" \ - "master" \ - "${containerd_service_name}") - - curl -LO "$service_url" - - printf "# %s: Service installed for Kata Containers\n" \ - "$(date -Iseconds)" |\ - tee -a "$containerd_service_name" - local systemd_unit_dir="/etc/systemd/system" sudo mkdir -p "$systemd_unit_dir" local dest="${systemd_unit_dir}/${containerd_service_name}" - sudo cp "${containerd_service_name}" "${dest}" - sudo systemctl daemon-reload + if [ ! -f "$dest" ] + then + pushd "$tmpdir" >/dev/null - info "Installed ${dest}" + local service_url=$(printf "%s/%s/%s/%s" \ + "https://raw.githubusercontent.com" \ + "${containerd_slug}" \ + "main" \ + "${containerd_service_name}") - popd >/dev/null + curl -LO "$service_url" + + printf "# %s: Service installed for Kata Containers\n" \ + "$(date -Iseconds)" |\ + tee -a "$containerd_service_name" + + + sudo cp "${containerd_service_name}" "${dest}" + sudo systemctl daemon-reload + + info "Installed ${dest}" + + popd >/dev/null + fi # Backup the original containerd configuration: sudo mkdir -p "$(dirname $cfg)" @@ -429,6 +476,7 @@ EOT info "Modified $cfg" } + sudo systemctl enable containerd sudo systemctl start containerd info "Configured $project\n" @@ -471,7 +519,7 @@ install_kata() # Since we're unpacking to the root directory, perform a sanity check # on the archive first. local unexpected=$(tar -tf "${file}" |\ - egrep -v "^(\./opt/$|\.${kata_install_dir}/)" || true) + egrep -v "^(\./$|\./opt/$|\.${kata_install_dir}/)" || true) [ -n "$unexpected" ] && die "File '$file' contains unexpected paths: '$unexpected'" @@ -505,7 +553,24 @@ handle_containerd() { local version="${1:-}" - install_containerd "$version" + local force="${2:-}" + [ -z "$force" ] && die "need force value" + + local ret + + if [ "$force" = "true" ] + then + install_containerd "$version" + else + { containerd_installed; ret=$?; } || true + + if [ "$ret" -eq 0 ] + then + info "Using existing containerd installation" + else + install_containerd "$version" + fi + fi configure_containerd @@ -543,31 +608,72 @@ test_installation() handle_installation() { - local kata_version="${1:-}" - local containerd_version="${2:-}" + local cleanup="${1:-}" + [ -z "$cleanup" ] && die "no cleanup value" - setup + local force="${2:-}" + [ -z "$force" ] && die "no force value" + + local only_kata="${3:-}" + [ -z "$only_kata" ] && die "no only Kata value" + + # These params can be blank + local kata_version="${4:-}" + local containerd_version="${5:-}" + + setup "$cleanup" "$force" handle_kata "$kata_version" - handle_containerd "$containerd_version" + + [ "$only_kata" = "false" ] && \ + handle_containerd \ + "$containerd_version" \ + "$force" test_installation - info "$kata_project and $containerd_project are now installed" + if [ "$only_kata" = "true" ] + then + info "$kata_project is now installed" + else + info "$kata_project and $containerd_project are now installed" + fi echo -e "\n${warnings}\n" } handle_args() { - case "${1:-}" in - -h|--help|help) usage; exit 0;; - esac + local cleanup="true" + local force="false" + local only_kata="false" - local kata_version="${1:-}" - local containerd_version="${2:-}" + local opt + + local kata_version="" + local containerd_version="" + + while getopts "c:fhk:or" opt "$@" + do + case "$opt" in + c) containerd_version="$OPTARG" ;; + f) force="true" ;; + h) usage; exit 0 ;; + k) kata_version="$OPTARG" ;; + o) only_kata="true" ;; + r) cleanup="false" ;; + esac + done + + shift $[$OPTIND-1] + + [ -z "$kata_version" ] && kata_version="${1:-}" || true + [ -z "$containerd_version" ] && containerd_version="${2:-}" || true handle_installation \ + "$cleanup" \ + "$force" \ + "$only_kata" \ "$kata_version" \ "$containerd_version" } diff --git a/versions.yaml b/versions.yaml index a923d6c4d..40a529bdb 100644 --- a/versions.yaml +++ b/versions.yaml @@ -75,7 +75,7 @@ assets: url: "https://github.com/cloud-hypervisor/cloud-hypervisor" uscan-url: >- https://github.com/cloud-hypervisor/cloud-hypervisor/tags.*/v?(\d\S+)\.tar\.gz - version: "v21.0" + version: "55479a64d237d4c757dba19a696abefd27ec74fd" firecracker: description: "Firecracker micro-VMM" @@ -98,6 +98,10 @@ assets: uscan-url: >- https://github.com/qemu/qemu/tags .*/v?(\d\S+)\.tar\.gz + tdx: + description: "VMM that uses KVM and supports TDX" + url: "https://github.com/intel/qemu-tdx" + tag: "tdx-qemu-2021.11.29-v6.0.0-rc1-mvp" qemu-experimental: description: "QEMU with virtiofs support" @@ -147,7 +151,7 @@ assets: kernel: description: "Linux kernel optimised for virtual machines" url: "https://cdn.kernel.org/pub/linux/kernel/v5.x/" - version: "v5.10.25" + version: "v5.15.23" tdx: description: "Linux kernel that supports TDX" url: "https://github.com/intel/tdx/archive/refs/tags" @@ -258,6 +262,11 @@ externals: url: "https://github.com/dragonflyoss/image-service" version: "v1.1.2" + nydus-snapshotter: + description: "Snapshotter for Nydus image acceleration service" + url: "https://github.com/containerd/nydus-snapshotter" + version: "v0.1.0" + languages: description: | Details of programming languages required to build system