Commit Graph

175 Commits

Author SHA1 Message Date
Nikita Sivukhin
b612259a3a more friendly copmletely runtime agnostic turso-sync-engine crate 2025-08-06 19:26:55 +04:00
PThorpe92
f6a68cffc2 Remove RefCell from IO and Page apis 2025-08-05 16:24:49 -04:00
Jussi Saurio
13219dbf87 Merge 'extend raw WAL API with few more methods' from Nikita Sivukhin
This PR extends raw WAL API with few methods which will be helpful for
offline-sync:
1. `try_wal_watermark_read_page` - try to read page from the DB with
given WAL watermark value\
    * Usually, WAL max_frame is set automatically to the latest value
(`shared.max_frame`) when transaction is started and then this
"watermark" is preserved throughout whole transaction
    * New method allows to simulate "read from the past" by controlling
frame watermark explicitly
    * There is an alternative to implement some API like
`start_read_session(frame_watermark: u64)` - but I decided to expose
just single method to simplify the logic and reduce "surface" of actions
which can be executed in this "controllable" manner
    * Also, for simplicity, now `try_wal_watermark_read_page` always
read data from disk and bypass any cached values (and also do not
populate the cache)
2. `wal_changed_pages_after` - return set of unique pages changed after
watermark WAL position in the current WAL session
With these 2 methods we can implement `REVERT frame_watermark` logic
which will just fetch all changed pages first, and then revert them to
the previous value by using `try_wal_watermark_read_page` and
`wal_insert_frame` methods (see `test_wal_api_revert_pages` test).
Note, that if there were schema changes - than `REVERT` logic described
above can bring connection to the inconsistent state, as it will
preserve schema information in memory and will still think that table
exist (while it can be reverted). This should be considered by any
consumer of this new methods.

Closes #2433
2025-08-04 23:53:46 +03:00
Nikita Sivukhin
9694366645 add one more assert 2025-08-04 17:23:34 +04:00
Nikita Sivukhin
76bdf0c1ab small fixes 2025-08-04 17:02:53 +04:00
Nikita Sivukhin
2e23230e79 extend raw WAL API with few more methods
- try_wal_watermark_read_page - try to read page from the DB with given WAL watermark value
- wal_changed_pages_after - return set of unique pages changed after watermark WAL position
2025-08-04 16:55:50 +04:00
Jussi Saurio
4f3f66d55e fix/wal: remove start_pages_in_frames_hack to prevent checkpoint data loss
We have some kind of transaction-local hack (`start_pages_in_frames`) for bookkeeping
how many pages are currently in the in-memory WAL frame cache,
I assume for performance reasons or whatever.

`wal.rollback()` clears all the frames from `shared.frame_cache` that the rollbacking tx is
allowed to clear, and then truncates `shared.pages_in_frames` to however much its local
`start_pages_in_frames` value was.

In `complete_append_frame`, we check if `frame_cache` has that key (page) already, and if not,
we add it to `pages_in_frames`.

However, `wal.rollback()` never _removes_ the key (page) if its value is empty, so we can end
up in a scenario where the `frame_cache` key for `page P` exists but has no frames, and so `page P`
does not get added to `pages_in_frames` in `complete_append_frame`.

This leads to a checkpoint data loss scenario:

- transaction rolls back, has start_pages_in_frames=0, so truncates
  shared pages_in_frames to an empty vec. let's say `page P` key in `frame_cache` still remains
  but it has no frames.
- The next time someone commits a frame for `page P`, it does NOT get added to `pages_in_frames`
  because `frame_cache` has that key
- At some point, a PASSIVE checkpoint checkpoints `n` frames, but since `pages_in_frames` does not have
  `page P`, it doesn't actually checkpoint it and all the "checkpointed" frames are simply thrown away
- very similar to the scenario in #2366

Remove the `start_pages_in_frames` hack entirely and just make `pages_in_frames` effectively
the same as `frame_cache.keys`. I think we could also just get rid of `pages_in_frames` and just use
`frame_cache.contains_key(p)` but maybe Pere can chime in here
2025-08-04 10:35:12 +03:00
Pere Diaz Bou
c807b035c5 core/mvcc: fix tests again
had to create connections for every different txn
2025-08-01 10:44:19 +02:00
Jussi Saurio
2233bb41c3 Merge 'fix/wal: reset ongoing checkpoint state when checkpoint fails' from Jussi Saurio
## What
The following sequence of actions is possible:
```sql
-- TRUNCATE checkpoint fails during WAL restart,
-- but OngoingCheckpoint.state is still left at Done for conn 0
Connection 0(op=23): PRAGMA wal_checkpoint(TRUNCATE)
Connection 0(op=23) Checkpoint TRUNCATE: OK: false, wal_page_count: NULL, checkpointed_count: NULL

-- TRUNCATE checkpoint succeeds for conn 1
Connection 1(op=26): PRAGMA wal_checkpoint(TRUNCATE)
Connection 1(op=26) Checkpoint TRUNCATE: OK: true, wal_page_count: 0, checkpointed_count: 0

-- Conn 0 now does a PASSIVE checkpoint, and immediately thinks
-- it's in the Done state, and thinks it checkpointed 17 frames.
-- since mode is PASSIVE, it now thinks both the WAL and the DB have those 17 frames
-- so the first 17 frames of the WAL can be ignored from now on.
Connection 0(op=27): PRAGMA wal_checkpoint(PASSIVE)
Connection 0(op=27) Checkpoint PASSIVE: OK: true, wal_page_count: 0, checkpointed_count: 17

-- Connection 0 starts a txn with min=18 (ignore first 17 frames in WAL),
-- and deletes rowid=690, which becomes WAL frame number 1
Connection 0(op=28): DELETE FROM test_table WHERE id = 690
begin_read_tx(min=18, max=0, slot=1, max_frame_in_wal=0)

-- Connection 1 starts a txn with min=18 (ignore first 17 frames in WAL),
-- and inserts rowid=1128, which becomes WAL frame number 2
Connection 1(op=28): INSERT INTO test_table (id, text) VALUES (1128, text_560)
begin_read_tx(min=18, max=1, slot=1, max_frame_in_wal=1)

-- Connection 0 again starts tx with min=18, and performs a read, and two wrong things happen:
-- 1. it doesn't see row 690 as deleted, because it's in WAL frame 1, which it ignores
-- 2. it doesn't see the new row 1128, because it's in WAL frame 2, which it ignores
Connection 0(op=29): SELECT * FROM test_table
begin_read_tx(min=18, max=2, slot=1, max_frame_in_wal=2)
```
## Fix
Reset `ongoing_checkpoint.state` to `Start` when checkpoint fails.
Issue found in #2364 .

Reviewed-by: bit-aloo (@Shourya742)

Closes #2380
2025-08-01 11:28:04 +03:00
Jussi Saurio
e147494642 pager: make WAL optional again and remove DummyWAL 2025-08-01 10:14:35 +03:00
Jussi Saurio
e6528f2664 fix/wal: reset ongoing checkpoint state when checkpoint fails 2025-08-01 08:39:34 +03:00
Jussi Saurio
eeceefe49d Merge 'fix/wal: only rollback WAL if txn was write + fix start state for WalFile' from Jussi Saurio
Closes #2363
## What
The following sequence of actions is possible:
```
Some committed frames already exist in the WAL. shared.pages_in_frames.len() > 0.

Brand new connection does this:
BEGIN
^-- deferred, no read tx started yet, so its `self.start_pages_in_frames` is `0`
       because it's a brand new WalFile instance

ROLLBACK   <-- calls `wal.rollback()` and truncates `shared.pages_in_frames` to length `0`

PRAGMA wal_checkpoint();
^-- because `pages_in_frames` is empty, it doesnt actually
checkpoint anything but still sets shared.max_frame to 0, causing effectively data loss
```
## Fix
- Only call `wal.rollback()` for write transactions
- Set `start_pages_in_frames` correctly so that this doesn't happen even
if a regression starts calling `wal.rollback()` again

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #2366
2025-07-31 16:16:20 +03:00
Jussi Saurio
62e804480e fix/wal: make db_changed check detect cases where max frame happens to be the same 2025-07-31 14:37:33 +03:00
Jussi Saurio
e88707c6fd fix/wal: only rollback WAL if txn was write 2025-07-31 14:18:43 +03:00
Jussi Saurio
39dec647a7 fix/wal: reset page cache when another connection checkpointed in between 2025-07-31 12:44:22 +03:00
PThorpe92
2e741641e6 Add test to assert we are backfilling all the rows properly with vectored writes 2025-07-30 19:42:54 -04:00
PThorpe92
ade1c182de Add is_full method to checkpoint batch 2025-07-30 19:42:54 -04:00
PThorpe92
693b71449e Clean up writev batching and apply suggestions 2025-07-30 19:42:53 -04:00
PThorpe92
ef69df7258 Apply review suggestions 2025-07-30 19:42:53 -04:00
PThorpe92
73882b97d6 Remove unnecessary collecting CQEs into an array in run_once, comments 2025-07-30 19:42:53 -04:00
PThorpe92
efcffd380d Clean up io_uring writev implementation, add iovec and cqe cache 2025-07-30 19:42:52 -04:00
PThorpe92
b8e6cd5ae2 Fix taking page content from cached pages in checkpoint loop 2025-07-30 19:42:51 -04:00
PThorpe92
0f94cdef03 Fix io_uring pwritev to properly handle partial writes 2025-07-30 19:42:50 -04:00
PThorpe92
88445328a5 Handle partial writes for pwritev calls in io_uring and fix JS bindings 2025-07-30 19:42:50 -04:00
PThorpe92
62f004c898 Fix write counter for writev batching in checkpoint 2025-07-30 19:42:49 -04:00
PThorpe92
7b2163208b batch backfilling pages when checkpointing 2025-07-30 19:42:48 -04:00
Jussi Saurio
975b7b5434 wal: fix test incorrect expectation 2025-07-30 15:53:13 +03:00
Jussi Saurio
af660326d8 finish_append_frames_commit: revert bumping readmark incorrectly 2025-07-30 15:53:01 +03:00
Jussi Saurio
43d1321033 ignore completion result of self.read_frame 2025-07-30 14:58:03 +03:00
Jussi Saurio
9a63425b43 clippy 2025-07-30 14:58:03 +03:00
Jussi Saurio
772b71963e finish_append_frames_commit: properly increase readmark on commit 2025-07-30 14:58:03 +03:00
Jussi Saurio
1562c1df10 begin_read_tx: better assertion failure message 2025-07-30 14:58:03 +03:00
PThorpe92
4dc15492d8 Integrate changes from tx isolation commits from @jussisaurio 2025-07-30 14:10:12 +03:00
PThorpe92
2c3a9fe5ef Finish wal transaction handling and add more wal and chkpt testing 2025-07-30 14:10:10 +03:00
PThorpe92
8806b77d26 Clear snapshot and readmark/lock index flags on failure 2025-07-30 14:09:18 +03:00
PThorpe92
d702e6a80c Polish checkpointing and fix tests, add documentation 2025-07-30 14:08:53 +03:00
PThorpe92
8ec99a9143 Remove assert for !NO_LOCK_HELD, properly handle writing header if reset 2025-07-30 14:08:51 +03:00
PThorpe92
529cc14e29 Fix wal tests remove unwrap from previous Result return val 2025-07-30 14:08:33 +03:00
PThorpe92
7640535ba4 Fix transaction read0 shortcut in WAL and track whether we have snapshot 2025-07-30 14:08:33 +03:00
PThorpe92
ff1987a45c Temporarily remove optimization for new read tx to grab read mark 0 and skip db file 2025-07-30 14:08:33 +03:00
PThorpe92
318bfa9590 Change incorrect comments and rename guard 2025-07-30 14:08:33 +03:00
PThorpe92
1490a586b1 Apply suggestions/fixes and add extensive comments to wal chkpt 2025-07-30 14:08:33 +03:00
PThorpe92
3db72cf111 Just forget Full checkpoint mode for now, comment out compat test 2025-07-30 14:08:33 +03:00
PThorpe92
49f90980d4 Create new header after truncation chkpt 2025-07-30 14:08:33 +03:00
PThorpe92
b214c3dfc8 Add diff chkpt modes to sqlite3 api, finish checkpoint logic and add tests 2025-07-30 14:08:33 +03:00
PThorpe92
eaa6f99fa8 Hold and ensure release of proper locks if we trunc the db file post-checkpoint 2025-07-30 14:08:33 +03:00
PThorpe92
8ca37b71b6 Ensure we properly hold and release read locks in log restart method and fix tests 2025-07-30 14:08:33 +03:00
PThorpe92
9b7e5ed292 Trunc db file after backfilling everything in calling method 2025-07-30 14:08:33 +03:00
PThorpe92
1a9b7ef76e Add support for truncate, restart and full checkpointing methods 2025-07-30 14:08:31 +03:00
PThorpe92
ad286bb873 Use new wait_for_completion for sync IO 2025-07-30 14:07:04 +03:00