The documentation of the bufio package explicitly says
"Err returns the first non-EOF error that was encountered by the
Scanner."
When io.EOF happens, `Err()` will return `nil` and `Scan()` will return
`false`.
Fixes#4079
Signed-off-by: Rafael Fonseca <r4f4rfs@gmail.com>
In the is_signal_handled function, when parsing the hex string returned
from `/proc/<pid>/status` the space/tab character after the colon
is not removed.
This patch trims the result of SigCgt so that
all whitespace characters are removed. It also extends the existing
test cases to check for this scenario.
Fixes: #4250
Signed-off-by: Champ-Goblem <cameron@northflank.com>
As now we build and ship the rust version of virtiofsd, which is not
tied to QEMU, we need to update its default location to match with where
we're installing this binary.
Fixes: #4249
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
go-test.sh by default adds the -v option to 'go test' meaning that output
will be printed from all the passing tests as well as any failing ones.
This results in a lot of output in which it's often difficult to locate the
failing tests you're interested in.
So, remove -v from the default flags.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
One of the responsibilities of the go-test.sh script is setting up the
default flags for 'go test'. This is constructed across several different
places in the script using several unneeded intermediate variables though.
Consolidate all the flag construction into one place.
fixes#4190
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
go-test.sh changes behaviour based on both the $CI and $KATA_DEV_MODE
variables, but not in a way that makes a lot of sense.
If either one is set it uses the test_coverage path, instead of the
test_local path. That collects coverage information, as the name
suggests, but it also means it runs the tests twice as root and
non-root, which is very non-obvious.
It's not clear what use case the test_local path is for at all.
Developer local builds will typically have $KATA_DEV_MODE set and CI
builds will have $CI set. There's essentially no downside to running
coverage all the time - it has little impact on the test runtime.
In addition, if *both* $CI and $KATA_DEV_MODE are set, the script
refuses to run things as root, considering it "unsafe". While having
both set might be unwise in a general sense, there's not really any
way running sudo can be any more unsafe than it is with either one
set.
So, simplify everything by just always running the test_coverage path.
This leaves the test_local path unused, so we can remove it entirely.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
go-test.sh accepts subcommands, however invoking it in the usual way via
the Makefile doesn't use them. In fact the only remaining subcommand is
"help" and we already have another way of getting the usage information
(-h or --help). We don't need a second way, so just drop subcommand
handling.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
go-test.sh defaults to testing all the packages listed by go list, except
for a number filtered out. It turns out that none of those filters are
necessary any more:
* We've long required a Go newer than 1.9 which means the vendor filter
isn't needed
* The agent filter doesn't do anything now that we've moved to the Kata
2.x unified repo
* The tests filters don't hit anything on the list of modules in
src/runtime (which is the only user of the script)
But since we don't need to filter anything out any more, we don't even need
to iterate through a list ourselves. We can simply pass "./..." directly
to go test and it will iterate through all the sub-packages itself.
Interestingly this more than doubles the speed of "make test" for me - I
suspect because go test's internal paralellism works better over a larger
pool of tests.
This also lets us remove handling of non-existent coverage files from
test_go_package(), since with default options we will no longer test packages without tests
by default. If the user explicitly requests testing of a package with no
tests, then failing makes sense.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The go-test.sh script has an explicit chmod command, run as root, to
set the mode of the temporary coverage files to 0644. AFAICT the
point of this is specifically the 004 bit allowing world read access,
so that we can then merge the temporary coverage file into the main
coverage file.
That's a convoluted way of doing things. Instead we can just run the tail
command which reads the temporary file as the same user that generated it.
In addition, go-test.sh became root to remove that temporary coverage
file. This is not necessary, since deleting a regular file just requires
write access to the directory, not the file itself.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The html-coverage option to this script doesn't really alter behaviour
it just does the same thing as normal coverage, then converts the
report to HTML. That conversion is a single command, plus a chmod to
make the final output mode 0644. That overrides any umask the user
has set, which doesn't seem like a policy decision this script should
be making.
Nothing in the kata-containers or tests repository uses this, so it doesn't
really make sense to keep this logic inside this script.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In addition to coverage.txt, the go-test.sh script creates
coverage.txt.tmp files while running. These are temporary and
certainly shouldn't be committed, so add them to the gitignore file.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The go unit tests for the runtime are invoked by the helper script
ci/go-test.sh. Which calls the run_go_test() function in ci/lib.sh. Which
calls into .ci/go-test.sh from the tests repository.
But.. the runtime is the only user of this script, and generally stuff for
unit tests (rather than functional or integration tests) lives in the main
repository, not the tests repository.
So, just move the actual script into src/runtime. A change to remove it
from the tests repo will follow.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
to the kata-containers repo under the src/tools/log-parser folder
and vendor the modules
Fixes: #4100
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
We're currently hitting a race condition on the Cloud Hypervisor's
driver code when quickly removing and adding a block device.
This happens because the device removal is an asynchronous operation,
and we currently do *not* monitor events coming from Cloud Hypervisor to
know when the device was actually removed. Together with this, the
sandbox code doesn't know about that and when a new device is attached
it'll quickly assign what may be the very same ID to the new device,
leading to the Cloud Hypervisor's driver trying to hotplug a device with
the very same ID of the device that was not yet removed.
This is, in a nutshell, why the tests with Cloud Hypervisor and
devmapper have been failing every now and then.
The workaround taken to solve the issue is basically *not* passing down
the device ID to Cloud Hypervisor and simply letting Cloud Hypervisor
itself generate those, as Cloud Hypervisor does it in a manner that
avoids such conflicts. With this addition we have then to keep a map of
the device ID and the Cloud Hypervisor's generated ID, so we can
properly remove the device.
This workaround will probably stay for a while, at least till someone
has enough cycles to implement a way to watch the device removal event
and then properly act on that. Spoiler alert, this will be a complex
change that may not even be worth it considering the race can be avoided
with this commit.
Fixes: #4176
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Add test coverage for the functions find_process and online_resources in src/sandbox.rs.
Fixes#4085Fixes#4136
Signed-off-by: Jack Hance <jack.hance@ndsu.edu>
Today in agent watchers, when we copy files/symlinks
or create directories, the ownership of the source path
is not preserved which can lead to permission issues.
In copy, ensure that we do a chown of the source path
uid/gid to the destination file/symlink after copy to
ensure that ownership matches the source ownership.
fs::copy() takes care of setting the permissions.
For directory creation, ensure that we set the
permissions of the created directory to the source
directory permissions and also perform a chown of the
source path uid/gid to ensure directory ownership
and permissions matches to the source.
Fixes: #4188
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
runk uses liboci-cli crate to parse command line options,
but liboci-cli does not support --all option for kill command,
though this is the runtime spec behavior.
But crictl will issue kill --all command when stopping containers,
as a workaround, we use a custom kill command instead of the one
provided by liboci-cli.
Fixes: #4182
Signed-off-by: Bin Liu <bin@hyper.sh>
The default runtime for io.containerd.runc.v2 is runc,
to use runk, the containerd configuration should set the
default runtime to runk or add BinaryName options for the
runtime.
Fixes: #4177
Signed-off-by: Bin Liu <bin@hyper.sh>
Update the `agent-ctl` docs to show how to use a VSOCK local address
when running the agent and the tool in the same environment. This is an
alternative to using a Unix socket.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Unbreak the `agent-ctl` tool connecting to the agent with a Unix domain
socket.
It appears that [1] changed the behaviour of connecting to the agent
using a local Unix socket (which is not used by Kata under normal
operation).
The change can be seen by reverting to commit
72b8144b56 (the one before [1]) and
running the agent manually as:
```bash
$ sudo KATA_AGENT_SERVER_ADDR=unix:///tmp/foo.socket target/x86_64-unknown-linux-musl/release/kata-agent
```
Before [1], in another terminal we see this:
```bash
$ sudo lsof -U 2>/dev/null |grep foo|awk '{print $9}'
@/tmp/foo.socket@
```
But now, we see the following:
```bash
$ sudo lsof -U 2>/dev/null |grep foo|awk '{print $9}'
@/tmp/foo.socket
```
Note the last byte which represents a nul (`\0`) value.
The `agent-ctl` tool used to add that trailing nul but now it seems to not
be needed, so this change removes it, restoring functionality. No
external changes are necessary so the `agent-ctl` tool can connect to
the agent as below like this:
```bash
$ cargo run -- -l debug connect --server-address "unix://@/tmp/foo.socket" --bundle-dir "$bundle_dir" -c Check -c GetGuestDetails
```
[1] - https://github.com/kata-containers/kata-containers/issues/3124Fixes: #4164.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
With everything implemented, let's now expose the disk rate limiter
configuration options in the Cloud Hypervisor configuration file.
Fixes: #4139
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
With everything implemented, let's now expose the net rate limiter
configuration options in the Cloud Hypervisor configuration file.
Fixes: #4017
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The notion of "built-in rate limiter" was added as part of
bd8658e362, and that commit considered
that only Firecracker had a built-in rate limiter, which I think was the
case when that was introduced (mid 2020).
Nowadays, however, Cloud Hypervisor takes advantage of the very same crate
used by Firecraker to do I/O throttling.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's take advantage of the newly added DiskRateLimiter* options and
apply those to the network device configuration.
The logic here is identical to the one already present in the Network
part of Cloud Hypervisor's driver.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add the newly added disk rate limiter configurations to the Cloud
Hypervisor's hypervisor configuration.
Right now those are not used anywhere, and there's absolutely no way the
users can set those up. That's coming later in this very same series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is the disk counterpart of the what was introduced for the network
as part of the previous commits in this series.
The newly added fields are:
* DiskRateLimiterBwMaxRate, defined in bits per second, which is used to
control the network I/O bandwidth at the VM level.
* DiskRateLimiterBwOneTimeBurst, also defined in bits per second, which
is used to define an *initial* max rate, which doesn't replenish.
* DiskRateLimiterOpsMaxRate, the operations per second equivalent of the
DiskRateLimiterBwMaxRate.
* DiskRateLimiterOpsOneTimeBurst, the operations per second equivalent of
the DiskRateLimiterBwOneTimeBurst.
For now those extra fields have only been added to the hypervisor's
configuration and they'll be used in the coming patches of this very
same series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's take advantage of the newly added NetRateLimiter* options and
apply those to the network device configuration.
The logic here is quite similar to the one already present in the
Firecracker's driver, with the main difference being the single Inbound
/ Outbound MaxRate and the presence of both Bandwidth and Operations
rate limiter.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>