Yield is a completion that does not allocate any inner state. By design
it is completed from the start and has no errors. This allows lightly
yield without allocating any locks nor heap allocate inner state.
We use relaxed ordering in a lot of places where we really need to
ensure all CPUs see the write. Switch to sequential consistency, unless
acquire/release is explicitly used. If there are places that can be
optimized, we can switch to relaxed case-by-case, but have a comment
explaning *why* it is safe.
Closes#3193
We use relaxed ordering in a lot of places where we really need to
ensure all CPUs see the write. Switch to sequential consistency, unless
acquire/release is explicitly used. If there are places that can be
optimized, we can switch to relaxed case-by-case, but have a comment
explaning *why* it is safe.
Because `io_uring` may have many other I/O submission events queued
(that are relevant to the operation) when we experience an error,
marking our `Completion` objects as aborted is not sufficient, the
kernel will still execute queued I/O, which can mutate WAL or DB state
after we’ve declared failure and keep references (iovec arrays, buffers)
alive and stall reuse. We need to stop those in-flight SQEs at the
kernel and then drain the ring to a known-empty state before reusing any
resources.
The following methods were added to the `IO` trait:
`cancel`: which takes a slice of `Completion` objects and has a default
implementation that simply marks them as `aborted`.
`drain`: which has a default noop implementation, but the `io_uring`
backend implements this method to drain the ring.
CC @sivukhin
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#2787
Add handling malformed inputs to function `read_varint` and test cases.
```
# 9 byte truncated to 8
read_varint(&[0x81, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80])
before -> panic index out of bounds: the len is 8 but the index is 8
after -> LimboError
# bits set without end
read_varint(&[0x80; 9])
before -> Ok((128, 9))
after -> LimboError
```
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#2904
closes#1419
When submitting a `pwritev` for flushing dirty pages, in the case that
it's a commit frame, we use a new completion type which tells io_uring
to add a flag, which ensures the following:
1. If any operation in the chain fails, subsequent operations get
cancelled with -ECANCELED
2. All operations in the chain complete in order
If there is an ongoing chain of `IO_LINK`, it ends at the `fsync`
barrier, and ensures everything submitted before it has completed.
for 99% of the cases, the syscall that immediately proceeds the
`pwritev` is going to be the fsync, but just in case, this
implementation links everything that comes between the final commit
`pwritev` and the next `fsync`
In the event that we get a partial write, if it was linked, then we
submit an additional fsync after the partial write completes, with an
`IO_DRAIN` flag after forcing a `submit`, which will mean durability is
maintained, as that fsync will flush/drain everything in the squeue
before submission.
The other option in the event of partial writes on commit frames/linked
writes is to error.. not sure which is the right move here. I guess it's
possible that since the fsync completion fired, than the commit could be
over without us being durable ondisk. So maybe it's an assertion
instead? Thoughts?
Closes#2909
Using `usize` to compute file offsets caps us at ~16GB on 32-bit
systems. For example, with 4 KiB pages we can only address up to 1048576
pages; attempting the next page overflows a 32-bit usize and can wrap
the write offset, corrupting data. Switching our I/O APIs and offset
math to u64 avoids this overflow on 32-bit targets
Closes#2791
Previously, we just hardcoded the reserved space with encryption flag.
This patch removes that and sets the reserved space if a key was
specified during a creation of db
This patch adds support for per page encryption. The code is of alpha
quality, was to test my hypothesis. All the encryption code is gated
behind a `encryption` flag. To play with it, you can do:
```sh
cargo run --features encryption -- database.db
turso> PRAGMA key='turso_test_encryption_key_123456';
turso> CREATE TABLE t(v);
```
Right now, most stuff is hard coded. We use AES GCM 256. This
information is not stored anywhere, but in future versions we will start
saving this info in the file. When writing to disk, we will generate a
cryptographically secure random salt, use that to encrypt the page. Then
we will store the authentication tag and the salt in the page itself. To
accommodate this encryption hardcodes reserved space of 28 bytes.
Once the key is set in the connection, we propagate that information to
pager and the WAL, to encrypt / decrypt when reading from disk.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#2567