Today in agent watchers, when we copy files/symlinks
or create directories, the ownership of the source path
is not preserved which can lead to permission issues.
In copy, ensure that we do a chown of the source path
uid/gid to the destination file/symlink after copy to
ensure that ownership matches the source ownership.
fs::copy() takes care of setting the permissions.
For directory creation, ensure that we set the
permissions of the created directory to the source
directory permissions and also perform a chown of the
source path uid/gid to ensure directory ownership
and permissions matches to the source.
Fixes: #4188
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
(cherry picked from commit 70eda2fa6c)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The type of MemorySwappiness in runtime is uint64, and the type of swappiness in agent is int64,
if we set max uint64 in runtime and pass it to agent, the value will be equal to -1. We should
modify the type of swappiness to u64
Fixes: #4123
Signed-off-by: holyfei <yangfeiyu20092010@163.com>
(cherry picked from commit 0239502781)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Adding two functions set_ownership and
recursive_ownership_change to support changing group id
ownership for a mounted volume.
The set_ownership will be called in common_storage_handler
after mount_storage performs the mount for the volume.
set_ownership will be a noop if the FSGroup field in the
Storage struct is not set which indicates no chown will be
performed. If FSGroup field is specified, then it will
perform the recursive walk of the mounted volume path to
change ownership of all files and directories to the
desired group id. It will also configure the SetGid bit
so that files created the directory will have group
following parent directory group.
If the fsGroupChangePolicy is on root mismatch,
then the group ownership will be skipped if the root
directory group id alreasy matches the desired group
id and if the SetGid bit is also set on the root directory.
This is the same behavior as what
Kubelet does today when performing the recursive walk
to change ownership.
Fixes#4018
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
(cherry picked from commit 92c00c7e84)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
During container exit, the agent tries to remove all the mount point directories,
which can fail if it's a readonly filesytem (e.g. device mapper). This commit ignores
the removal failure and logs a warning message.
Fixes: #4043
Signed-off-by: Feng Wang <feng.wang@databricks.com>
(cherry picked from commit aabcebbf58)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
allocate_hugepages() writes to the kernel sysfs file to allocate hugepages
in the Kata VM. However, even if the write succeeds, it's not certain that
the kernel will actually be able to allocate as many hugepages as we
requested.
This patch reads back the file after writing it to check if we were able to
allocate all the required hugepages.
fixes#3816
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 42e35505b0)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
allocate_hugepages() constructs the path for the sysfs directory containing
hugepage configuration, then attempts to create this directory if it does
not exist.
This doesn't make sense: sysfs is a view into kernel configuration, if the
kernel has support for the hugepage size, the directory will already be
there, if it doesn't, trying to create it won't help.
For the same reason, attempting to create the "nr_hugepages" file
itself is pointless, so there's no reason to call
OpenOptions::create(true).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 608e003abc)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The nix::sys::signal::Signal package api cannot deal with SIGRTMIN+3,
directly use libc function to send the signal.
Fixes: #3990
Signed-off-by: Wang Xingxing <stellarwxx@163.com>
(cherry picked from commit 0d765bd082)
Signed-off-by: Wang Xingxing <stellarwxx@163.com>
Mount the direct-assigned block device fs only once and keep a refcount
in the guest. Also use the ro flag inside the options field to determine
whether the block device and filesystem should be mounted as ro
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
Add GetVolumeStats and ResizeVolume APIs for the runtime to query stat
and resize fs in the guest.
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
The content about systemd in "/proc/self/cgroup" is as:
1:name=systemd:/kubepods/pod1815643d-3789-4e4e-aaf4-00de024912e1/0e15a65bd5f7b30a0b818d90706212354d8b3f0998a1495473c3be9a24706ccf
and in "/prol/self/mountinfo" is as:
30 29 0:26 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:6 - cgroup cgroup rw,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
The keys extracted from the two files are the same as "name=systemd". So no need to rename the key to "systemd".
Fixes: #3385
Signed-off-by: sailorvii <challengingway@hotmail.com>
Mount hugepage directories and configure the requested number of hugepages
dynamically by writing to sysfs files
Port from:
78b307b5bdFixes: #3342
Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Signed-off-by: bin <bin@hyper.sh>
Current hook process is handled by just calling
unwrap() on it, sometime it will cause panic.
By handling all Result type and check the error can
avoid panic.
Fixes: #3649
Signed-off-by: bin <bin@hyper.sh>
Envs contain null-byte will cause running hooks to panic,
this commit will filter envs and only pass valid envs to hooks.
Fixes: #3667
Signed-off-by: bin <bin@hyper.sh>
Pulling image is the most time-consuming step in the container lifecycle. This PR
introduse nydus to kata container, it can lazily pull image when container start. So it
can speed up kata container create and start.
Fixes#2724
Signed-off-by: luodaowen.backend <luodaowen.backend@bytedance.com>
In commit 78dff468bf1 we introduced logic to rewrite PCIDEVICE_ environment
variables for the container so that they contain correct addresses for the
Kata VM rather than for the host. Unfortunately, we never actually invoked
the function to do this.
It turns out we need to do this not only at container creation time, but
also for environment variables supplied to processes exec-ed into the
container after creation (e.g. with crictl exec). Add calls to make both
those updates.
fixes#3634
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
add_devices() generates a mapping of host to guest PCI addresses which is
used to update some environment variables for the workload. Currently it
just does this locally, but it turns out we're going to need the same map
again in order to correct environment variables for processes exec-ed into
the existing container.
Move the map to the sandbox structure so we can keep it around for those
later uses.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This function updates PCIDEVICE_ environment variables (such as those
supplied by the Kubernetes SR-IOV plugin) in the OCI spec to be correct
for the Kata VM, rather than for the host.
We neglected to actually call this function, however, and it turns out that
when we do, we need to do things slightly different. We actually need to
adjust envionment variables both in the OCI spec when creating a container
and also in the variables supplied for exec-ing a new process within an
existing container.
Adjust the function so that it can be used for both these cases.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
1. The hook.args[0] is the hook binary name which shouldn't be included
in the Command.args.
2. Add new unit tests
Fixes: #2610
Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
After the protocols are moved to upper libs (PR3355),
the runtime protocol generation is broken. This fixes it.
Fixes: #3414
Signed-off-by: Feng Wang <feng.wang@databricks.com>
move the protocols to upper libs thus it can
be shared between agent and other rust runtime.
Depends-on: github.com/kata-containers/tests#4306
Fixes: #3348
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Move the oci crate to upper libs thus it can be
shared between agent and other rust runtimes.
Fixes: #3348
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
The tokio's spawn will only create an future async task
instead of a new real thread, thus executing unshare to
create a new namespace in tokio's async task would make
the agent process to join in the new created namespace,
which isn't expected.
Thus, we'd better to to the unshare in a real thread to
prevent moving the agent process into a new namespace.
Fixes: #3369
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
For calls from shim to agent, the return error will be processed like this:
match self.do_start_container(req).await {
Err(e) => Err(ttrpc_error(ttrpc::Code::INTERNAL, e.to_string())),
Ok(_) => Ok(Empty::new()),
}
The e.to_string() return only a part of the error(for example set by context()),
this may lead lack of information.
The `format!("{:?}", err)` will return more info.
Fixes: #3353
Signed-off-by: bin <bin@hyper.sh>
We already have a `mount_path` local Path variable which holds the mount
point.
Use it instead of creating a new `mount_point` variable with identical
type and content.
Fixes: #3332
Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>
Make it clear when reading the table in the agent's "Change the agent
API" documentation that the commands in the "Generation method" column
should be run in the agent repo.
Fixes: #3317.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Move the architecture document into a new `docs/design/architecture/` directory
in preparation for splitting it into more manageable pieces.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>