Commit Graph

263 Commits

Author SHA1 Message Date
TcMits
01da48fde9 introduce instruction virtual table 2025-09-13 16:35:17 +07:00
PThorpe92
b04c364981 Fix clippy error 2025-09-12 11:43:38 -04:00
PThorpe92
7a14c7394f Remove the header copy stored on the WalFile, fix fast_path 2025-09-12 11:29:43 -04:00
PThorpe92
25e7c719f1 Update checkpoint_seq on each checkpoint, not just when log restarts
This was causing checkpoint_seq to be 0 when we had already successfully
ran a passive checkpoint, and causing us to use improper pages from the
cache.
2025-09-12 11:29:42 -04:00
Pere Diaz Bou
9b6d181be4 wal: add hacky update max frame for mvcc use
When multiple tx writes happen concurrently in mvcc, max frame will be
updated. This new max_frame makes is the point of view of the other
transaction return busy because his current wal snapshot is outdated.
2025-09-12 13:49:14 +00:00
PThorpe92
f60ca3970f Remove old comment from wal 2025-09-12 06:39:59 -04:00
PThorpe92
faf3531a4e Fix checkpoint fast-path, don't use cached pages w/o write lock
closes #3024
Also we snapshot the page when we determine that it's eligible, and pay a
memcpy instead of the read from disk, but this further prevents any in-memory
changes to the page/TOCTOU issues.
2025-09-12 06:38:02 -04:00
Pekka Enberg
7d8a1a0d5f Merge 'whopper: A new DST with concurrency' from Pekka Enberg
Our simulator is currently limited to concurrency of one. This
introduces a much less sophisticated DST with focus on finding
concurrency bugs.

Closes #2985
2025-09-11 18:42:45 +03:00
Jussi Saurio
c30d320cab Fix: read transaction cannot be allowed to start with a stale max frame
If both of the following are true:

1. All read locks are already held
2. The highest readmark of any read lock is less than the committed max frame

Then we must return Busy to the reader, because otherwise they would begin a
transaction with a stale local max frame, and thus not see some committed
changes.
2025-09-11 15:58:13 +03:00
Pekka Enberg
ca51a60b3c core/storage: Demote restart_log() logging to debug 2025-09-11 08:35:18 +03:00
PThorpe92
2f4f67efa8 Remove some unused attributes 2025-09-09 16:17:49 -04:00
PThorpe92
02bebf02a5 Remove read_entire_wal_dumb in favor of reading chunks 2025-09-09 16:06:27 -04:00
PThorpe92
37ec77eec2 Fix read_entire_wal_dumb to prefer streaming read if over 32mb wal file 2025-09-09 13:12:58 -04:00
Pekka Enberg
6d80d862ee Merge 'io_uring: prevent out of order operations that could interfere with durability' from Preston Thorpe
closes #1419
When submitting a `pwritev` for flushing dirty pages, in the case that
it's a commit frame, we use a new completion type which tells io_uring
to add a flag, which ensures the following:
1. If any operation in the chain fails, subsequent operations get
cancelled with -ECANCELED
2. All operations in the chain complete in order
If there is an ongoing chain of `IO_LINK`, it ends at the `fsync`
barrier, and ensures everything submitted before it has completed.
for 99% of the cases, the syscall that immediately proceeds the
`pwritev` is going to be the fsync, but just in case, this
implementation links everything that comes between the final commit
`pwritev` and the next `fsync`
In the event that we get a partial write, if it was linked, then we
submit an additional fsync after the partial write completes, with an
`IO_DRAIN` flag after forcing a `submit`, which will mean durability is
maintained, as that fsync will flush/drain everything in the squeue
before submission.
The other option in the event of partial writes on commit frames/linked
writes is to error.. not sure which is the right move here. I guess it's
possible that since the fsync completion fired, than the commit could be
over without us being durable ondisk. So maybe it's an assertion
instead? Thoughts?

Closes #2909
2025-09-05 08:34:35 +03:00
Pekka Enberg
5950003eaf core: Simplify WalFileShared life cycle
Create one WalFileShared for a Database and update its state
accordingly. Also support case where the WAL is disabled.
2025-09-04 21:09:12 +03:00
PThorpe92
e3f366963d Compute the final db page or make the commit frame submit a linked pwritev completion 2025-09-03 16:01:16 -04:00
Pekka Enberg
87d3f74e6e Merge 'Evict page from cache if page is unlocked and unloaded' from Pedro Muniz
Because we can abort a read_page completion, this means a page can be in
the cache but be unloaded and unlocked. However, if we do not evict that
page from the page cache, we will return an unloaded page later which
will trigger assertions later on. This is worsened by the fact that page
cache is not per `Statement`, so you can abort a completion in one
Statement, and trigger some error in the next one if we don't evict the
page in these circumstances.
Also, to propagate IO errors we need to return the Error from
IOCompletions on step.

Closes #2785
2025-09-02 09:08:12 +03:00
Pekka Enberg
d959319b42 Merge 'Use u64 for file offsets in I/O and calculate such offsets in u64' from Preston Thorpe
Using `usize` to compute file offsets caps us at ~16GB on 32-bit
systems. For example, with 4 KiB pages we can only address up to 1048576
pages; attempting the next page overflows a 32-bit usize and can wrap
the write offset, corrupting data. Switching our I/O APIs and offset
math to u64 avoids this overflow on 32-bit targets

Closes #2791
2025-09-02 09:06:49 +03:00
pedrocarlo
4618df9d1a because we can abort a read_page completion, this means that the page can be in the cache but be unloaded and unlocked. However, if we do not evict that page from the page cache, we will return an unloaded page later 2025-09-01 11:10:39 -03:00
Gaurav Sarma
453cbd3201 Decrypt WAL page while reading raw frames 2025-09-01 15:29:01 +08:00
Pekka Enberg
0c16ca9ce9 Merge 'core/wal: cache file size' from Pere Diaz Bou
Closes #2829
2025-08-30 08:41:58 +03:00
Avinash Sajjanshetty
bb591ab7e1 Propagate decryption erorr when reading from WAL 2025-08-29 18:07:38 +05:30
Pere Diaz Bou
db5e2883ee core/wal: cache wal is initialized 2025-08-29 13:15:09 +02:00
PThorpe92
a0e5536360 Fix clippy warnings and remove self casts 2025-08-28 09:45:19 -04:00
PThorpe92
0a56d23402 Use u64 for file offsets in IO and calculate such offsets in u64 2025-08-28 09:44:00 -04:00
Avinash Sajjanshetty
2c0842ff52 Set and propagate IOContext as required 2025-08-27 22:05:01 +05:30
PThorpe92
37a7ec7477 Update append_frames_vectored to use new encryption_ctx and apply review 2025-08-25 09:50:57 -04:00
PThorpe92
daea841b47 Minor adjustments/comments to wal append_frames_vectored method 2025-08-25 09:47:06 -04:00
PThorpe92
46e288ac26 Add append_frames_vectored to WAL api
In addition to the existing `append_frame` which will write an individual frame
to the WAL, we add a method `append_frames_vectored` that takes N frames and the
db size which will need to be set for the last (commit) frame, and it
calculates the checksums and submits them as a single `pwritev` call,
reducing the number of syscalls needed for each write operation.
2025-08-25 09:47:01 -04:00
Preston Thorpe
040ceba2d6 Merge 'WAL txn: fix reads from DB file' from Nikita Sivukhin
- Transaction which was started with max_frame = 0 and
max_frame_read_lock_index = 0 can write to the WAL and in this case it
needs to read data back from WAL and not the DB file.
- Without cache spilling its hard to reproduce this issue for the turso-
db now, but I found this issue with sync-engine which do weird stuff
with the WAL which "simulates" cache spilling behaviour to some extent.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #2735
2025-08-25 08:34:17 -04:00
Nikita Sivukhin
c62b87d9b6 read from database file only if max_frame_read_lock_index is 0 and max_frame > min_frame
- transaction which was started with max_frame = 0 and max_frame_read_lock_index = 0
  can write to the WAL and in this case it needs to read data back from WAL
- without cache spilling its hard to reproduce this issue for the turso-db now,
  but I stumbled into this issue with sync-engine which do weird stuff with the WAL
  which "simulates" cache spilling behaviour to some extent
2025-08-25 11:36:58 +04:00
bit-aloo
37cebb0669 fix(clippy): remove duplicate arc_with_non_send_sync attribute in wal.rs 2025-08-24 22:59:47 +05:30
Pekka Enberg
22c9cb6618 s/PerConnEncryptionContext/EncryptionContext/ 2025-08-24 08:17:20 +03:00
Avinash Sajjanshetty
3090545167 use encryption ctx instead of encryption key 2025-08-21 22:36:32 +05:30
Nikita Sivukhin
d7e47c1268 fix bug - continue checkpoint as usual even if frames range is degenerate 2025-08-21 17:37:19 +04:00
Nikita Sivukhin
69c39d5d8c replace wal_frames_count with wal_state method which return both frames count and checkpoint sequence 2025-08-21 15:13:23 +04:00
Nikita Sivukhin
25cb28da67 add method to get checkpoint_seq from WAL 2025-08-21 15:13:23 +04:00
Nikita Sivukhin
38eb5232c8 do not check page size if it's not initialized yet 2025-08-21 15:12:22 +04:00
Nikita Sivukhin
10a164e399 extend checkpoint result with information about last checkpointed frame 2025-08-21 15:12:15 +04:00
Nikita Sivukhin
05931f70ce add optional upper_bound_inclusive parameter to some checkpoint modes
- will be used in sync-engine protocol
2025-08-21 14:12:11 +04:00
PThorpe92
4a2da6c262 Remove assertion for checkpoint seq in favor of selectively using cached pages 2025-08-20 18:26:55 -04:00
PThorpe92
7082086061 Remove ENV var and enable cache by default, track which pages were cached 2025-08-20 17:42:17 -04:00
PThorpe92
345b80d14c Change env var to ENABLE instead of DISABLE so its disabled by default 2025-08-20 17:36:00 -04:00
PThorpe92
51e4cd0f1d Add debug assertion for cached pages used during checkpoint 2025-08-20 17:35:59 -04:00
PThorpe92
e28a38abc5 Fix wal tag safety issues, and add debug assertion that we are reading the proper frames 2025-08-20 17:28:48 -04:00
PThorpe92
4100737358 remove page entries without frames in frame cache in WAL rollback method 2025-08-20 17:28:19 -04:00
PThorpe92
d2c3ba14c8 Remove inefficient vec in WAL for tracking pages present in frame cache 2025-08-20 17:28:18 -04:00
PThorpe92
d6d72d2966 Update Page to carry epoch of frame + checkpont seq to ensure proper cached page for chkpt 2025-08-20 17:28:17 -04:00
PThorpe92
00f2a0f216 Performance improvements to checkpointing. prevent serializing I/O 2025-08-20 17:26:54 -04:00
PThorpe92
fe7a5e98b8 Track frame_ids on PageInner and use the page cache for reading pages to checkpoint 2025-08-20 17:24:10 -04:00