Commit Graph

926 Commits

Author SHA1 Message Date
Jussi Saurio
0b627ed331 Merge 'btree/balance: support case where immediate parent page of unbalanced child page also overflows' from Jussi Saurio
Closes #2241
## What
When an index interior cell is deleted, it steals the leaf cell with the
largest key in its left subtree, deletes the old interior cell and then
replaces it with the stolen cell. This ensures the binary-search-tree
aspect of the btree remains correct. However, this can cause a situation
where both are true:
1. The leaf page is now UNDERFULL and must be rebalanced
2. The leaf's IMMEDIATE parent page is now OVERFULL and must be
rebalanced
## Why is this a problem
We simply didn't support the case where:
- Leaf page P is unbalanced and rebalancing starts on it
- Its immediate parent is ALSO unbalanced and _overflows_.
We had an assertion against this happening (see #2241)
## The fix
Allow exactly 1 overflow cell in the parent under very particular
conditions:
1. The parent page must be an index interior page
2. The parent must be positioned exactly at the divider cell whose left
child page underflows
This is the _only_ case where the immediate parent of a page about to
undergo rebalancing can have overflow cells.
## Implementation details
The parent overflow cell is folded into `cell_array` fairly early on and
`parent.overflow_cells` is cleared. However we need to be careful with
`cell_idx` for dividers other than the overflow cell because they get
shifted left on the page in `drop_cell()`. I've added a long comment
about this.
## Testing
Adds fuzz test that does inserts and deletes on an index btree and
asserts that all the expected keys are found at the end in the right
order. This test runs into this case quite frequently so I was able to
verify it.

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2243
2025-07-24 18:48:36 +03:00
Pere Diaz Bou
46f5609fce Merge 'Append WAL frames one by one' from Pere Diaz Bou
Let's make sure we don't end up in a weird situation by appending frames
one by one and we can later think of optimizations.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2034
2025-07-24 16:44:51 +02:00
Pere Diaz Bou
b07e57d9d1 review fixes 2025-07-24 15:29:21 +02:00
Pere Diaz Bou
674d88e140 do not clear dirty pages on cacheflush::start 2025-07-24 15:29:21 +02:00
Pere Diaz Bou
d77c899fa6 clippy 2025-07-24 15:29:21 +02:00
Pere Diaz Bou
5a1773edf1 clippy 2025-07-24 15:29:21 +02:00
Pere Diaz Bou
14de7c55af set connection state to None in vdbe rollback 2025-07-24 15:29:21 +02:00
Pere Diaz Bou
5f8e386b48 reset internal states on rollback 2025-07-24 15:29:06 +02:00
Jussi Saurio
37955e9a04 Pager/WAL: fix not clearing stale page cache
SQLite behavior is: if another connection has modified the DB when a
read tx starts, it must clear its page cache due to the potentiality
of there being stale versions of pages in it.

In the future, we may want to do either:
1. a more granular invalidation logic for per-conn cache, or
2. a shared versioned page cache

But right now we must follow SQLite to make our current behavior not
corrupt data
2025-07-24 16:23:12 +03:00
Pere Diaz Bou
066ffcc940 append frame one by one
Let's make sure we don't end up in a weird situation by appending frames
one by one and we can later think of optimizations.
2025-07-24 15:12:13 +02:00
Jussi Saurio
d1b1617231 btree: add index insert-delete fuzz test 2025-07-24 13:18:33 +03:00
Jussi Saurio
d773a7924d fix/btree/balance: allow exactly 1 parent overflow cell for index balancing 2025-07-24 13:18:33 +03:00
Nikita Sivukhin
6daa6d07f1 re-parse schema if necessary after WAL sync end 2025-07-24 11:52:07 +04:00
Nikita Sivukhin
edd6ef2d21 fix after rebase 2025-07-24 11:51:33 +04:00
Nikita Sivukhin
3d2a38eb88 add simple helper 2025-07-24 11:49:39 +04:00
Nikita Sivukhin
4a80306705 fix wal insert frame raw API
- we need to properly mark pages as dirty after insertion
2025-07-24 11:49:39 +04:00
Nikita Sivukhin
d618463906 simplify add_dirty API 2025-07-24 11:29:01 +04:00
Nikita Sivukhin
f4a40c43cd fix clippy 2025-07-23 20:19:00 +04:00
Nikita Sivukhin
30c7bef27b make add dirty to change flag and also add page to the dirty list 2025-07-23 20:06:49 +04:00
Jussi Saurio
1e38202084 Merge 'WAL insert API' from Nikita Sivukhin
This PR implements missing raw WAL API from LibSQL for future use for
offline-sync feature:
1. `wal_insert_begin` - begin WAL session by opening WAL read/write
transaction
2. `wal_insert_end` - finish WAL session by closing WAL transaction
opened by `wal_insert_begin` call
3. `wal_insert_frame` - insert frame `frame_no` with raw content `frame`
(WAL frame included)
For now any schema changes will not be reflected after
`wal_insert_frame` because `turso-db` do not re-parse schema without
need. I will fix this in follow up PR.

Reviewed-by: Pekka Enberg <penberg@iki.fi>

Closes #2231
2025-07-23 14:08:15 +03:00
Jussi Saurio
f98a9e8939 Pager: don't assume page is necessarily in memory anymore 2025-07-23 11:08:34 +03:00
Jussi Saurio
ecb5fce1bd Pager: clear overflow cells when freeing page 2025-07-23 10:58:10 +03:00
Nikita Sivukhin
3c0af3e389 small adjustments 2025-07-23 11:31:00 +04:00
Nikita Sivukhin
bf2bfbe978 fix clippy 2025-07-23 11:31:00 +04:00
Nikita Sivukhin
16763e1500 implement raw WAL write api 2025-07-23 11:30:59 +04:00
Nikita Sivukhin
bc09ea6e98 make end_write_txn/end_read_txn function non-failing 2025-07-23 11:30:29 +04:00
Levy A.
e6ad88cc18 refactor: constified enum -> regular enum 2025-07-22 17:20:30 -03:00
Levy A.
203239ff30 refactor: safer db_state 2025-07-22 17:20:29 -03:00
Jussi Saurio
72b4318fa1 Merge 'fix raw read frame WAL API' from Nikita Sivukhin
This PR fixes `wal_read_frame_raw` API
Before, implementation of raw read API read only page content - which is
not enough as we also need page_no and size_after fields from the
header. This PR fixes that and also make few adjustments in the
signatures.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #2229
2025-07-22 16:10:55 +03:00
Nikita Sivukhin
b34d081d35 cargo fmt 2025-07-22 16:23:04 +04:00
Nikita Sivukhin
d617d1d21e fix raw read frame WAL API 2025-07-22 16:21:04 +04:00
Nikita Sivukhin
a730136564 use default hasher for the sake of determinism 2025-07-22 16:18:42 +04:00
Pere Diaz Bou
1933815233 wal: write txn fail in case max_frame change midway
A write txn can only start if the current snapshot held by writer is
consistent with the one in shared state
2025-07-21 13:08:56 +02:00
Jussi Saurio
d6bd9fc26e Merge 'fix/btree/balance: interior cell insertion can leave page overfull' from Jussi Saurio
- When an interior index cell is replaced, it can cause the page where
the
replacement happens to overflow OR underflow. On `main` we did not check
this case, because
the interior cell replacement always moves the cursor to a leaf, and if
the leaf
doesn't underflow, then no further balancing happens.
- The solution is to ALWAYS check whether the interior page where the
replacement
happens is underflowing OR overflowing, and balance that page regardless
of whether
the leaf page where the replacement was taken underflows or not.
So summary:
- InteriorCellReplacement: cell deleted from Interior page I,
replacement cell taken from Leaf L
  and inserted back to Interior page I.
- If Leaf L underflows:
  * balance it first
  * then balance I if it overflows OR underflows
- If Leaf L does NOT underflow:
  * balance I anyway if it overflows OR underflows
Closes https://github.com/tursodatabase/turso/issues/1701
Closes https://github.com/tursodatabase/turso/issues/2167

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2168
2025-07-21 11:03:26 +03:00
Jussi Saurio
2967fafe73 Merge 'Usable space unwrap' from Pedro Muniz
Using `unwrap_or_default` can make `page_size` become 0 in this case,
which can lead to subtracting with overflow in `payload_threshold_max`
in case we have some sort of error. Better to unwrap the error here, as
in release mode we may not have overflow checks enabled to catch this.

Closes #2145
2025-07-21 00:23:06 +03:00
Jussi Saurio
9936748132 Merge 'Avoid redundant decoding of record headers when reading sorted chunk files' from Iaroslav Zeigerman
Currently, each record header is decoded at least twice: once to
determine the record size within the read buffer (in order to construct
the `ImmutableRecord` instance), and again later when decoding the
record for comparison. This redundant decoding can have a noticeable
negative impact on performance when records are wide (eg. contain
multiple columns).
This update modifies the (de)serialization format for sorted chunk files
by prepending a record size varint to each record payload. As a result,
only a single varint needs to be decoded to determine the record size,
eliminating the need to decode the full record header during reads.

Closes #2176
2025-07-20 23:54:54 +03:00
Jussi Saurio
0987618d6b fix/btree/balance: interior cell insertion can leave page unbalanced
- When an interior index cell is replaced, it can cause the page where the
replacement happens to overflow. On `main` we did not check this case, because
the interior cell replacement always moves the cursor to a leaf, and if the leaf
doesn't underflow, then no further balancing happens.

- The solution is to ALWAYS check whether the interior page where the replacement
happens is underflowing OR overflowing, and balance that page regardless of whether
the leaf page where the replacement was taken underflows or not.

So summary:

- InteriorCellReplacement: cell deleted from Interior page I, replacement cell taken from Leaf L
  and inserted back to Interior page I.
- If Leaf L underflows:
  * balance it first
  * then balance I if it overflows OR underflows
- If Leaf L does NOT underflow:
  * balance I anyway

Closes #1701
Closes #2167
2025-07-20 23:38:47 +03:00
Jussi Saurio
010fb1c12a fix/pager/cacheflush: cacheflush shouldn't commit 2025-07-20 21:18:45 +03:00
Pekka Enberg
4be6772e8e Merge 'implement Debug for Database' from Glauber Costa
Very useful in printing data structures containing databases, like maps
Example output:
Connecting to Database { path: "sq.db", open_flags: OpenFlags(1),
db_state: "initialized", mv_store: "none", init_lock: "unlocked",
wal_state: "present", page_cache: "( capacity 100000, used: 0 )" }

Reviewed-by: Pedro Muniz (@pedrocarlo)
Reviewed-by: bit-aloo (@Shourya742)

Closes #2175
2025-07-20 09:46:09 +03:00
Glauber Costa
6506b3147d implement pragma application_id
Just for completeness, because it is easy.
2025-07-19 20:44:06 -05:00
Glauber Costa
4749ce95c1 implement Debug for Database
Very useful in printing data structures containing databases, like maps

Example output:

Connecting to Database { path: "sq.db", open_flags: OpenFlags(1), db_state: "initialized", mv_store: "none", init_lock: "unlocked", wal_state: "present", page_cache: "( capacity 100000, used: 0 )" }
2025-07-19 09:29:46 -05:00
Levy A.
0ea7849dca feat: IOExt utility trait 2025-07-19 01:40:42 -03:00
Iaroslav Zeigerman
5d47502e3a Avoid redundant decoding of record headers when reading sorted chunk files 2025-07-19 06:08:27 +02:00
pedrocarlo
97d2306e26 unwrap on failed usable_space 2025-07-18 11:36:50 -03:00
pedrocarlo
28ae96f49f remove confusing casting from usize -> u16 -> usize for usable space 2025-07-18 11:36:50 -03:00
Jussi Saurio
40df1725c5 Fix restore_context() not advancing when required 2025-07-18 13:48:23 +03:00
Jussi Saurio
2a2ab16c52 fix moved_before handling in cursor.insert 2025-07-18 13:48:23 +03:00
Jussi Saurio
28c050dd27 seek before insert to ensure correct location in fuzz test 2025-07-18 13:48:23 +03:00
Jussi Saurio
fdeb15bb9d btree/delete: rightmost_cell_was_dropped logic is not needed since a) if we balance, we seek anyway, and b) if we dont balance, we retreat anyway 2025-07-18 13:48:23 +03:00
Jussi Saurio
4f0ef663e2 btree: add target cell tracking for EQ seeks 2025-07-18 13:48:23 +03:00