Commit Graph

1110 Commits

Author SHA1 Message Date
Jussi Saurio
2ed41bbb35 btree/insert: avoid calling self.usable_space() in a loop 2025-08-07 10:09:35 +03:00
Jussi Saurio
4b27cc0d46 btree: add fast path version of cell_get_raw_region 2025-08-07 09:57:56 +03:00
Jussi Saurio
c98136c8c4 btree: use new cell start helper method in cell_get_raw_region 2025-08-07 09:37:33 +03:00
Jussi Saurio
3db25cf84c perf/btree: add method for getting raw offset of cell payload start 2025-08-07 09:34:05 +03:00
Pekka Enberg
be8e8ff7c0 Merge 'turso-sync: rewrite' from Nikita Sivukhin
This PR rewrites `turso-sync` package introduced in the #2334 and
renames it to the `turso-sync-engine` (anyway the diff will be
unreadable).
The purpose of rewrite is to get rid of Tokio because this makes things
harder when we wants to export bindings to WASM.
In order to achieve "runtime"-agnostic sync core but still be able to
use async/await machiner - this PR introduce usage of `genawaiter` crate
which allows to transfer async/await Rust state machines to the
generators. So, sync operations are just generators which can yield `IO`
command in case where there is a need for it.
Also, this PR introduces separate `ProtocolIo` in the `turso-sync-
engine` which defines extra IO methods:
1. HTTP interaction
2. Atomic read/writes to the file. This is not strictly necessary and
`turso_core::IO` methods can be extended to support few more things
(like `delete`/`rename`) - but I decided that it will be simpler to just
expose 2 more methods for sync protocol for the sake of atomic metadata
update (which is very small - dozens of bytes).
    * As a bonus, we can store metadata for browser in the
`LocalStorage` which may be more natural thing to do(?) (user can reset
everything by just clearing local storage)
The `ProtocolIo` works similarly to the `IO` in a sense that it gives
the caller `Completion` which it can check periodically for new data.

Closes #2457
2025-08-07 07:58:02 +03:00
Preston Thorpe
0777fd9082 Merge 'implement the MaxPgCount opcode' from Glauber Costa
It is used by the pragma max_page_count, which is also implemented.

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #2472
2025-08-06 23:44:05 -04:00
Jussi Saurio
64c8587f27 Merge 'IO More State Machine' from Pedro Muniz
I swear we just need one more state machine. Just more state machine
until we achieve IO tracking. (PS: this is a meme)

Closes #2462
2025-08-06 21:26:19 +03:00
Glauber Costa
f36974f086 implement the MaxPgCount opcode
It is used by the pragma max_page_count, which is also implemented.
2025-08-06 13:20:15 -05:00
Nikita Sivukhin
b612259a3a more friendly copmletely runtime agnostic turso-sync-engine crate 2025-08-06 19:26:55 +04:00
pedrocarlo
7b746ccc65 adjust state machine for ptrmap_get 2025-08-06 11:32:21 -03:00
pedrocarlo
b529305b82 state machine for ptrmap_put 2025-08-06 11:32:21 -03:00
pedrocarlo
931384afb6 state machine fix for btree create for AutoVacuum::Full 2025-08-06 11:32:21 -03:00
pedrocarlo
f656d0bc20 header ref state machine 2025-08-06 11:32:21 -03:00
Jussi Saurio
c8d2a1a480 btree: add a few more assertions about balance state 2025-08-06 13:39:20 +03:00
Jussi Saurio
a86a0e194d refactor/btree: cleanup write/delete/balancing states
Problem:

Currently `WriteState` "owns" the balancing state machine, even
though a separate `DeleteState` can also trigger balancing, which
results in awkward back-and-forth switching between `CursorState::Write`
and `CursorState::Delete` during balancing.

Fix:

1. Extract `balance_state` as a separate state machine, since its
state transitions are exactly the same regardless of whether an
insert or a delete triggered the balancing.
2. This allows to remove the different 'Balance-xxx' variants from
`WriteState`, as well as removing `WriteInfo` and `DeleteInfo`, as
those states become just simple enums now. Each of them now has a state
called `Balancing` which just delegates work to the balancing state
machine.
3. This further allows us to remove the awkward switching between
`CursorState::Delete` and `CursorState::Write` during a balance that
happens as a result of a deletion.
2025-08-06 13:37:35 +03:00
Jussi Saurio
5f3cfaac60 refactor/btree: don't clone WriteState in balance_non_root() 2025-08-06 11:30:09 +03:00
Jussi Saurio
a15d7dd2e7 refactor/btree: don't clone WriteState in balance() 2025-08-06 11:30:09 +03:00
Jussi Saurio
1c1f55fdfb refactor/btree: remove cloning of WriteState in insert_into_page() 2025-08-06 08:50:56 +03:00
Jussi Saurio
c3a32b63bf refactor/btree: remove unnecessary ref of self in overwrite_content() 2025-08-06 08:45:34 +03:00
Jussi Saurio
6dd08c21e4 refactor/btree: remove unnecessary mut ref of self in rowid() 2025-08-06 08:44:52 +03:00
Jussi Saurio
839d428e36 core/btree: fix re-entrancy bug in insert_into_page()
We currently clone WriteState on every iteration of `insert_into_page()`,
presumably for Borrow Checker Reasons (tm).

There was a bug in `WriteState::Insert` handling where if `fill_cell_payload()`
returned IO, the `fill_cell_payload_state` was not updated in
`write_info.state`, leading to an infinite loop of allocating new pages.

This bug was surfaced by, but not caused by, #2400.
2025-08-06 08:01:49 +03:00
PThorpe92
f6a68cffc2 Remove RefCell from IO and Page apis 2025-08-05 16:24:49 -04:00
PThorpe92
914c10e095 Remove Clone impl for Buffer and PageContent 2025-08-05 14:26:53 -04:00
Jussi Saurio
cde8567b1d Merge 'More state machine + Return IO in places where completions are created' from Pedro Muniz
In preparation for tracking IO Completions, we need to start to return
IO in places where completions are created. Doing some more plumbing now
to avoid bigger PRs for the future

Closes #2438
2025-08-05 15:47:51 +03:00
Jussi Saurio
a28e64bfdd cleanup: remove unused page uptodate flag 2025-08-05 14:25:42 +03:00
Pekka Enberg
d2fea25fef Merge 'perf/btree: implement fast algorithm for defragment_page' from Jussi Saurio
Implement sqlite's fast path defragment algorithm. This path is taken
when:
1. There are 1-2 freeblocks
2. There are at most `max_frag_bytes` fragmented free bytes (-1..=4)
Instead of reconstructing the entire page, it merges the two freeblocks
and then moves the merged freeblock to the left, effectively turning it
into free space in the unallocated region, instead of a freeblock.
`max_frag_bytes` is particularly important when jnserting a new cell,
because if the page contains (in total) ~just enough space for the new
cell, then there can be hardly any fragmented free space because
otherwise, merging the 1-2 freeblocks won't produce enough contiguous
free space to fit the cell.
## Benchmark
```sql
Insert rows in batches/limbo_insert_1_rows
                        time:   [26.692 µs 27.153 µs 27.695 µs]
                        change: [-9.9033% -2.9097% +1.6336%] (p = 0.55 > 0.05)
                        No change in performance detected.
Insert rows in batches/limbo_insert_10_rows
                        time:   [38.618 µs 40.022 µs 42.201 µs]
                        change: [-8.9137% -6.6405% -4.2299%] (p = 0.00 < 0.05)
                        Performance has improved.
Insert rows in batches/limbo_insert_100_rows
                        time:   [168.94 µs 169.58 µs 170.31 µs]
                        change: [-22.520% -17.669% -12.790%] (p = 0.00 < 0.05)
                        Performance has improved.
```

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2411
2025-08-05 12:44:48 +03:00
Pekka Enberg
e355fc4c65 Merge 'core/mvcc: implement seeking operations with rowid' from Pere Diaz Bou
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2429
2025-08-05 12:40:48 +03:00
Jussi Saurio
ad35cf07eb Add extra illustrative doodle for pere 2025-08-05 11:24:15 +03:00
Jussi Saurio
a5330aa6fb perf/btree: implement fast algorithm for defragment_page 2025-08-05 11:24:14 +03:00
Jussi Saurio
5b84ad6b0f Merge 'Update defragment page to defragment in-place' from João Severo
Change original code from doing a full copy of the original buffer to
modify the buffer in-place using a temporary vector with offsets.

Closes #2258
2025-08-05 11:22:22 +03:00
pedrocarlo
aa8d17cbf1 state machine for ptrmap_get 2025-08-05 01:38:42 -03:00
pedrocarlo
0ac040cc87 return IO in some other functions in Pager 2025-08-04 23:28:57 -03:00
pedrocarlo
a4a2425ffd return IO in places where completions are created 2025-08-04 23:28:57 -03:00
Jussi Saurio
a66b56678d Merge 'Reprepare Statements when Schema changes' from Pedro Muniz
Closes #1967
To support this I had to change how we did `epilogue` similarly to how
SQLite does it. SQLIte first declares a `beginWriteOperation` when some
statement is going to necessitate a Write Transaction. And as we now
need to pass the current schema cookie to `epilogue` it was easier to
call epilogue only in one location (like we do with prologue), and just
have each statement declare their intentions separately. This allows us
to not have to pass the Schema around just to do the epilogue. I believe
this is something that @jussisaurio would be interested in.
~Also had to disable the MVCC test, as it was extremely buggy for me.~
Just disabled reprepare statements for MVCC

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2214
2025-08-05 00:01:14 +03:00
Jussi Saurio
1e59165ea6 Merge 'More State Machines in preparation for tracking IO Completions' from Pedro Muniz
More changes. I want to avoid big PRs, so doing these changes in small
increments. I think in like 2 PRs after this one, I will be able make
the change effectively.

Closes #2400
2025-08-05 00:00:09 +03:00
Jussi Saurio
13219dbf87 Merge 'extend raw WAL API with few more methods' from Nikita Sivukhin
This PR extends raw WAL API with few methods which will be helpful for
offline-sync:
1. `try_wal_watermark_read_page` - try to read page from the DB with
given WAL watermark value\
    * Usually, WAL max_frame is set automatically to the latest value
(`shared.max_frame`) when transaction is started and then this
"watermark" is preserved throughout whole transaction
    * New method allows to simulate "read from the past" by controlling
frame watermark explicitly
    * There is an alternative to implement some API like
`start_read_session(frame_watermark: u64)` - but I decided to expose
just single method to simplify the logic and reduce "surface" of actions
which can be executed in this "controllable" manner
    * Also, for simplicity, now `try_wal_watermark_read_page` always
read data from disk and bypass any cached values (and also do not
populate the cache)
2. `wal_changed_pages_after` - return set of unique pages changed after
watermark WAL position in the current WAL session
With these 2 methods we can implement `REVERT frame_watermark` logic
which will just fetch all changed pages first, and then revert them to
the previous value by using `try_wal_watermark_read_page` and
`wal_insert_frame` methods (see `test_wal_api_revert_pages` test).
Note, that if there were schema changes - than `REVERT` logic described
above can bring connection to the inconsistent state, as it will
preserve schema information in memory and will still think that table
exist (while it can be reverted). This should be considered by any
consumer of this new methods.

Closes #2433
2025-08-04 23:53:46 +03:00
PThorpe92
6cbc8ff868 Replace values with constants 2025-08-04 15:14:06 -04:00
PThorpe92
73d1fdef14 Fix and change bitmap, apply suggestions and add some optimizations 2025-08-04 14:58:58 -04:00
PThorpe92
f4197f1eb5 change debug assertions to turso asserts 2025-08-04 14:55:48 -04:00
PThorpe92
54696d2f0d Add additional test for edge cases 2025-08-04 14:55:48 -04:00
PThorpe92
5378195ad6 Add page bitmap to storage mod.rs 2025-08-04 14:55:48 -04:00
PThorpe92
3e30335ea5 Add tests for PageBitmap 2025-08-04 14:55:48 -04:00
PThorpe92
7b1f908c00 Add PageBitmap for use with arena page allocator 2025-08-04 14:55:48 -04:00
pedrocarlo
f2d84a534c adjust clear_overflow_pages 2025-08-04 15:28:06 -03:00
pedrocarlo
718ad5e7fd btree_destroy retunrn IO 2025-08-04 14:12:51 -03:00
pedrocarlo
e0978844e6 adjust integrity_check 2025-08-04 14:12:50 -03:00
Jussi Saurio
7045d44fdc Merge 'fix/wal: remove start_pages_in_frames_hack to prevent checkpoint data loss' from Jussi Saurio
Closes #2421
## Background
We have some kind of transaction-local hack (`start_pages_in_frames`)
for bookkeeping how many pages are currently in the in-memory WAL frame
cache, I assume for performance reasons or whatever.
`wal.rollback()` clears all the frames from `shared.frame_cache` that
the rollbacking tx is allowed to clear, and then truncates
`shared.pages_in_frames` to however much its local
`start_pages_in_frames` value was.
## Problem
In `complete_append_frame`, we check if `frame_cache` has that key
(page) already, and if not, we add it to `pages_in_frames`.
However, `wal.rollback()` never _removes_ the key (page) if its value is
empty, so we can end up in a scenario where the `frame_cache` key for
`page P` exists but has no frames, and so `page P` does not get added to
`pages_in_frames` in `complete_append_frame`.
This leads to a checkpoint data loss scenario:
- transaction rolls back, has start_pages_in_frames=0, so truncates
shared pages_in_frames to an empty vec. let's say `page P` key in
`frame_cache` still remains but it has no frames.
- The next time someone commits a frame for `page P`, it does NOT get
added to `pages_in_frames` because `frame_cache` has that key (although
the value vector is empty)
- At some point, a checkpoint checkpoints `n` frames, but since
`pages_in_frames` does not have `page P`, it doesn't actually checkpoint
it and all the "checkpointed" frames are simply thrown away
- very similar to the scenario in #2366
## Fix
Remove the `start_pages_in_frames` hack entirely and just make
`pages_in_frames` effectively the same as `frame_cache.keys`. I think we
could also just get rid of `pages_in_frames` and just use
`frame_cache.contains_key(p)` but maybe Pere can chime in here

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2422
2025-08-04 19:49:55 +03:00
pedrocarlo
aa05616845 fix tests 2025-08-04 13:08:30 -03:00
pedrocarlo
5f52d9b6b4 state machine for count 2025-08-04 13:00:43 -03:00
pedrocarlo
1585d5cbee state machine for 'next' and prev 2025-08-04 13:00:43 -03:00