This PR add proper program abort in case of unfinished statement reset
and interruption.
Also, this PR makes rollback methods non-failing because otherwise of
their callers usually unclear (if rollback failed - what is the state of
statement/connection/transaction?)
Reviewed-by: Preston Thorpe <preston@turso.tech>
Closes#3591
We use relaxed ordering in a lot of places where we really need to
ensure all CPUs see the write. Switch to sequential consistency, unless
acquire/release is explicitly used. If there are places that can be
optimized, we can switch to relaxed case-by-case, but have a comment
explaning *why* it is safe.
Currently header changes are tracked through pager by reading page 1.
MVCC has it's own layer to track changes during txn so this commit makes
it so that headers are tracked by each txn separately.
On commit we update the _global_ header which is used to update
`database_size` because pager commits require it to be up to date. This
also makes it _simpler_ to keep track of header updates and update
pager's header accordingly.
This PR is needed in order to make logical log work because we don't
want to rely on pager as much as possible!
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#3156
Currently header changes are tracked through pager by reading page 1.
MVCC has it's own layer to track changes during txn so this commit makes
it so that headers are tracked by each txn separately.
On commit we update the _global_ header which is used to update
`database_size` because pager commits require it to be up to date. This
also makes it _simpler_ to keep track of header updates and update
pager's header accordingly.
this is only used for returning LimboResult::Busy, and we already
have LimboError::Busy, so it only adds confusion.
Moreover, the current busy handler was not handling LimboError::Busy,
because it's returned as an error, not as Ok. So this may fix the
"busy handler not working" issue in the perf thrpt benchmark.
This was causing checkpoint_seq to be 0 when we had already successfully
ran a passive checkpoint, and causing us to use improper pages from the
cache.
When multiple tx writes happen concurrently in mvcc, max frame will be
updated. This new max_frame makes is the point of view of the other
transaction return busy because his current wal snapshot is outdated.
closes#3024
Also we snapshot the page when we determine that it's eligible, and pay a
memcpy instead of the read from disk, but this further prevents any in-memory
changes to the page/TOCTOU issues.
Our simulator is currently limited to concurrency of one. This
introduces a much less sophisticated DST with focus on finding
concurrency bugs.
Closes#2985
If both of the following are true:
1. All read locks are already held
2. The highest readmark of any read lock is less than the committed max frame
Then we must return Busy to the reader, because otherwise they would begin a
transaction with a stale local max frame, and thus not see some committed
changes.
closes#1419
When submitting a `pwritev` for flushing dirty pages, in the case that
it's a commit frame, we use a new completion type which tells io_uring
to add a flag, which ensures the following:
1. If any operation in the chain fails, subsequent operations get
cancelled with -ECANCELED
2. All operations in the chain complete in order
If there is an ongoing chain of `IO_LINK`, it ends at the `fsync`
barrier, and ensures everything submitted before it has completed.
for 99% of the cases, the syscall that immediately proceeds the
`pwritev` is going to be the fsync, but just in case, this
implementation links everything that comes between the final commit
`pwritev` and the next `fsync`
In the event that we get a partial write, if it was linked, then we
submit an additional fsync after the partial write completes, with an
`IO_DRAIN` flag after forcing a `submit`, which will mean durability is
maintained, as that fsync will flush/drain everything in the squeue
before submission.
The other option in the event of partial writes on commit frames/linked
writes is to error.. not sure which is the right move here. I guess it's
possible that since the fsync completion fired, than the commit could be
over without us being durable ondisk. So maybe it's an assertion
instead? Thoughts?
Closes#2909
Because we can abort a read_page completion, this means a page can be in
the cache but be unloaded and unlocked. However, if we do not evict that
page from the page cache, we will return an unloaded page later which
will trigger assertions later on. This is worsened by the fact that page
cache is not per `Statement`, so you can abort a completion in one
Statement, and trigger some error in the next one if we don't evict the
page in these circumstances.
Also, to propagate IO errors we need to return the Error from
IOCompletions on step.
Closes#2785
Using `usize` to compute file offsets caps us at ~16GB on 32-bit
systems. For example, with 4 KiB pages we can only address up to 1048576
pages; attempting the next page overflows a 32-bit usize and can wrap
the write offset, corrupting data. Switching our I/O APIs and offset
math to u64 avoids this overflow on 32-bit targets
Closes#2791