Commit Graph

643 Commits

Author SHA1 Message Date
Jussi Saurio
69fc1ea238 Merge 'perf/btree: improve performance of rowid() function' from Jussi Saurio
if the table is an intkey table, we can read the rowid directly without
deserializing the full cell, and we also don't need to start
deserializing the record if only the rowid is requested.
```sql
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/1: Collecting 100 samples in estimated 5.0007 s (11M i
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/1
                        time:   [469.38 ns 470.77 ns 472.40 ns]
                        change: [-5.8959% -5.5232% -5.1840%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/10: Collecting 100 samples in estimated 5.0088 s (1.9M
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/10
                        time:   [2.6523 µs 2.6596 µs 2.6685 µs]
                        change: [-8.7117% -8.4083% -8.0949%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/50: Collecting 100 samples in estimated 5.0197 s (399k
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/50
                        time:   [12.514 µs 12.545 µs 12.578 µs]
                        change: [-9.5243% -9.0562% -8.6227%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/100: Collecting 100 samples in estimated 5.0600 s (202
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/100
                        time:   [25.135 µs 25.291 µs 25.470 µs]
                        change: [-8.8822% -8.3943% -7.8854%] (p = 0.00 < 0.05)
                        Performance has improved.
```
"only" 4x slower than sqlite on `SELECT * FROM users LIMIT 100` after
this!

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2382
2025-08-01 13:35:02 +03:00
Jussi Saurio
c9a3a65942 perf/btree: don't waste time reading contents twice 2025-08-01 11:49:41 +03:00
Jussi Saurio
111c1e64c4 perf/btree: improve performance of rowid() function
if the table is an intkey table, we can read the rowid directly
without deserializing the full cell, and we also don't need to start
deserializing the record if only the rowid is requested.

```sql
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/1: Collecting 100 samples in estimated 5.0007 s (11M i
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/1
                        time:   [469.38 ns 470.77 ns 472.40 ns]
                        change: [-5.8959% -5.5232% -5.1840%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/10: Collecting 100 samples in estimated 5.0088 s (1.9M
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/10
                        time:   [2.6523 µs 2.6596 µs 2.6685 µs]
                        change: [-8.7117% -8.4083% -8.0949%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/50: Collecting 100 samples in estimated 5.0197 s (399k
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/50
                        time:   [12.514 µs 12.545 µs 12.578 µs]
                        change: [-9.5243% -9.0562% -8.6227%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/100: Collecting 100 samples in estimated 5.0600 s (202
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/100
                        time:   [25.135 µs 25.291 µs 25.470 µs]
                        change: [-8.8822% -8.3943% -7.8854%] (p = 0.00 < 0.05)
                        Performance has improved.
```
2025-08-01 11:44:53 +03:00
Pere Diaz Bou
c807b035c5 core/mvcc: fix tests again
had to create connections for every different txn
2025-08-01 10:44:19 +02:00
Pere Diaz Bou
49a00ff338 core/mvcc: load table's rowid on initialization
We need to load rowids into mvcc's store in order before doing any read
in case there are rows.

This has a performance penalty for now as expected because we should,
ideally, scan for row ids lazily instead.
2025-08-01 10:38:41 +02:00
Pere Diaz Bou
b4ac38cd25 core/mvcc: persist writes on mvcc commit
On Mvcc `commit_txn` we need to persist changes to database, for this case we re-use pager's semantics of transactions:
1. If there are no conflicts, we start `pager.begin_write_txn`
2. `pager.end_txn`: We flush changes to WAL
3. We finish Mvcc transaction by marking rows with new timestamp.
2025-08-01 10:38:41 +02:00
Jussi Saurio
e147494642 pager: make WAL optional again and remove DummyWAL 2025-08-01 10:14:35 +03:00
pedrocarlo
1abe8fd70c state machine seek_to_last 2025-07-31 11:51:17 -03:00
pedrocarlo
543cdb3e2c underscoring completions and IOResult to avoid warning messages 2025-07-31 11:51:17 -03:00
pedrocarlo
6bfba2518e state machine for move_to_rightmost 2025-07-31 11:49:12 -03:00
pedrocarlo
966b96882e move_to_root should return completion 2025-07-31 11:49:12 -03:00
pedrocarlo
cf951e24cd add state machine for is_empty_table in preparation for IO Completion refactor 2025-07-31 11:49:12 -03:00
Jussi Saurio
f619556344 Merge 'Direct DatabaseHeader reads and writes – with_header and with_header_mut' from Levy A.
This PR introduces two methods to pager. Very much inspired by
`with_schema` and `with_schema_mut`. `Pager::with_header` and
`Pager::with_header_mut` will give to the closure a shared and unique
reference respectively that are transmuted references from the `PageRef`
buffer.
This PR also adds type-safe wrappers for `Version`, `PageSize`,
`CacheSize` and `TextEncoding`, as they have special in-memory
representations.
Writing the `DatabaseHeader` is just a single `memcpy` now.
```rs
pub fn write_database_header(&self, header: &DatabaseHeader) {
    let buf = self.as_ptr();
    buf[0..DatabaseHeader::SIZE].copy_from_slice(bytemuck::bytes_of(header));
}
```
`HeaderRef` and `HeaderRefMut` are used in the `with_header*` methods,
but also can be used on its own when there are multiple reads and writes
to the header, where putting everything in a closure would add too much
nesting.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #2234
2025-07-31 10:02:47 +03:00
Levy A.
e35fdb8263 feat: zero-copy DatabaseHeader 2025-07-30 17:33:59 -03:00
Jussi Saurio
ac8a123e38 refactor/btree: simplify get_next_record()/get_prev_record()
When traversing, we are only interested the following things:

- Is the page a leaf or not
- Is the page an index or table page
- If not a leaf, what is the left child page

This means we don't have to read the entire cell, just the left child
page.
2025-07-30 21:29:14 +03:00
pedrocarlo
58b51e036d read_page should return a Completion 2025-07-29 12:42:36 -03:00
pedrocarlo
3831e0db39 convert must_use compile warnings to unused_variables to track locations where we need to refactor in the future 2025-07-28 16:09:26 -03:00
Pekka Enberg
e2d4cbbe48 Merge 'core: Enforce shared database object per database file' from Pekka Enberg
We need to ensures that there is a single, shared `Database` object per
a database file. We need because it is not safe to have multiple
independent WAL files open because coordination happens at process-level
POSIX file advisory locks.
Fixes #2267

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2299
2025-07-28 19:34:35 +03:00
Pekka Enberg
5b6a30c1df core/storage: Fix B-Tree test cases to use ":memory:"
...otherwise they all share the same `Database` object.
2025-07-28 19:13:53 +03:00
Jussi Saurio
c349a9d689 Ensure underlying payload vec cannot be copied so that raw pointers remain valid 2025-07-28 10:11:57 +03:00
Jussi Saurio
08d5b3b4bc btree: make fill_cell_payload() re-entrant (overflow pages may require IO) 2025-07-28 09:00:59 +03:00
Jussi Saurio
5ce65bf8e7 btree/pager: reuse freelist pages in allocate_page() to fix UPDATE perf 2025-07-28 09:00:59 +03:00
Pekka Enberg
2141293017 Merge 'Fix page_count pragma' from meteorgan
Closes: #1415
### What this PR does
1. Removes database initialization from the `read_tx` function.
2. Adds checks for database initialization when executing `.schema`,
`.indexes`, `.tables` and `.import` commands, as they rely on
`sqlite_schema` table.
### About the second issue
I think we have another solution for the second issue: create the
`sqlite_schema` table in `Schema` only during page1 initialization,
rather than during `Schema` initialization.
#### Pros
This approach has the advantage of unifying the logic for the
`sqlite_schema` table with other user tables when running `select`
statements
#### Cons
- we still need to check error codes for commands like  `.schema`.
- this approach may increase the complexity of the `pager`
implementation.
I'd like to hear your thoughts and feedback.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2099
2025-07-24 19:21:35 +03:00
Jussi Saurio
b33527c3c4 Merge 'btree: clear overflow pages when insert overwrites a cell (= UPDATE)' from Jussi Saurio
Closes #2227 , enables fixing #2225
## What
Although we cleared overflow pages on DELETE, we never did it for
INSERT/UPDATE, which means any overflow pages were left dangling and not
added to freelist.
## Why is this a problem
This means that we are not able to reuse these pages to solve #2225,
causing massive bloat in the DB when UPDATEs are executed.
## Fix
Clear overflow pages when `BTreeCursor::insert()` overwrites a cell.
Needed a new state machine for `overwrite_cell` + new `WriteState`
variants

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2230
2025-07-24 18:59:15 +03:00
Jussi Saurio
7968be9d71 btree/insert: cell can also underflow after overwrite 2025-07-24 18:43:02 +03:00
Jussi Saurio
a4535684b3 btree: WriteState: remove CheckNeedsBalancing variant 2025-07-24 18:40:49 +03:00
Jussi Saurio
b0edd3b716 btree: WriteState: add comments 2025-07-24 18:36:07 +03:00
meteorgan
c48a5ef538 we don't need read_tx return IOResult anymore 2025-07-24 23:19:33 +08:00
Jussi Saurio
2b045ccfd8 btree: clear overflow pages when insert overwrites a cell 2025-07-24 13:44:11 +03:00
Jussi Saurio
d1b1617231 btree: add index insert-delete fuzz test 2025-07-24 13:18:33 +03:00
Jussi Saurio
d773a7924d fix/btree/balance: allow exactly 1 parent overflow cell for index balancing 2025-07-24 13:18:33 +03:00
Nikita Sivukhin
d618463906 simplify add_dirty API 2025-07-24 11:29:01 +04:00
Nikita Sivukhin
30c7bef27b make add dirty to change flag and also add page to the dirty list 2025-07-23 20:06:49 +04:00
Levy A.
203239ff30 refactor: safer db_state 2025-07-22 17:20:29 -03:00
Jussi Saurio
d6bd9fc26e Merge 'fix/btree/balance: interior cell insertion can leave page overfull' from Jussi Saurio
- When an interior index cell is replaced, it can cause the page where
the
replacement happens to overflow OR underflow. On `main` we did not check
this case, because
the interior cell replacement always moves the cursor to a leaf, and if
the leaf
doesn't underflow, then no further balancing happens.
- The solution is to ALWAYS check whether the interior page where the
replacement
happens is underflowing OR overflowing, and balance that page regardless
of whether
the leaf page where the replacement was taken underflows or not.
So summary:
- InteriorCellReplacement: cell deleted from Interior page I,
replacement cell taken from Leaf L
  and inserted back to Interior page I.
- If Leaf L underflows:
  * balance it first
  * then balance I if it overflows OR underflows
- If Leaf L does NOT underflow:
  * balance I anyway if it overflows OR underflows
Closes https://github.com/tursodatabase/turso/issues/1701
Closes https://github.com/tursodatabase/turso/issues/2167

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2168
2025-07-21 11:03:26 +03:00
Jussi Saurio
2967fafe73 Merge 'Usable space unwrap' from Pedro Muniz
Using `unwrap_or_default` can make `page_size` become 0 in this case,
which can lead to subtracting with overflow in `payload_threshold_max`
in case we have some sort of error. Better to unwrap the error here, as
in release mode we may not have overflow checks enabled to catch this.

Closes #2145
2025-07-21 00:23:06 +03:00
Jussi Saurio
0987618d6b fix/btree/balance: interior cell insertion can leave page unbalanced
- When an interior index cell is replaced, it can cause the page where the
replacement happens to overflow. On `main` we did not check this case, because
the interior cell replacement always moves the cursor to a leaf, and if the leaf
doesn't underflow, then no further balancing happens.

- The solution is to ALWAYS check whether the interior page where the replacement
happens is underflowing OR overflowing, and balance that page regardless of whether
the leaf page where the replacement was taken underflows or not.

So summary:

- InteriorCellReplacement: cell deleted from Interior page I, replacement cell taken from Leaf L
  and inserted back to Interior page I.
- If Leaf L underflows:
  * balance it first
  * then balance I if it overflows OR underflows
- If Leaf L does NOT underflow:
  * balance I anyway

Closes #1701
Closes #2167
2025-07-20 23:38:47 +03:00
Levy A.
0ea7849dca feat: IOExt utility trait 2025-07-19 01:40:42 -03:00
pedrocarlo
97d2306e26 unwrap on failed usable_space 2025-07-18 11:36:50 -03:00
pedrocarlo
28ae96f49f remove confusing casting from usize -> u16 -> usize for usable space 2025-07-18 11:36:50 -03:00
Jussi Saurio
40df1725c5 Fix restore_context() not advancing when required 2025-07-18 13:48:23 +03:00
Jussi Saurio
2a2ab16c52 fix moved_before handling in cursor.insert 2025-07-18 13:48:23 +03:00
Jussi Saurio
28c050dd27 seek before insert to ensure correct location in fuzz test 2025-07-18 13:48:23 +03:00
Jussi Saurio
fdeb15bb9d btree/delete: rightmost_cell_was_dropped logic is not needed since a) if we balance, we seek anyway, and b) if we dont balance, we retreat anyway 2025-07-18 13:48:23 +03:00
Jussi Saurio
4f0ef663e2 btree: add target cell tracking for EQ seeks 2025-07-18 13:48:23 +03:00
Jussi Saurio
2b23495943 btree: allow overwriting index interior cell 2025-07-18 13:48:23 +03:00
Jussi Saurio
e33ff667dc btree: use seek() when inserting -- replaces find_cell() 2025-07-18 13:48:23 +03:00
Jussi Saurio
aeab89bd75 Fix parent page stack location after interior node replacement
Another fix extracted from running simulations on the #1988 branch.

When interior cell replacement happens as described in #2108,
we use the `cursor.prev()` method to locate the largest key in the
left subtree.

There was an error during backwards traversal in the `get_prev_record()`
method where the parent's cell index was set as `i32::MAX` but not properly
set to `cell_count + 1` (indicating that rightmost pointer has been visited).

The reason `i32::MAX` is used is that the cell count of the page is not
necessarily known at the time it is pushed to the stack.

This PR fixes the issue by setting the cell index of the parent properly
when visiting the rightmost child.
2025-07-18 13:30:01 +03:00
Pekka Enberg
02f4bc39b3 Merge 'Reanimate MVCC' from Pekka Enberg
Bit-rot happened. Bring MVCC back from the dead.

Closes #2136
2025-07-18 11:22:49 +03:00
Jussi Saurio
347a9152a6 Merge 'Replace verbose IO Completion methods with helpers' from Preston Thorpe
one of the last remnants of some original verbosity

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2156
2025-07-18 10:52:17 +03:00