if the table is an intkey table, we can read the rowid directly without
deserializing the full cell, and we also don't need to start
deserializing the record if only the rowid is requested.
```sql
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/1: Collecting 100 samples in estimated 5.0007 s (11M i
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/1
time: [469.38 ns 470.77 ns 472.40 ns]
change: [-5.8959% -5.5232% -5.1840%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/10: Collecting 100 samples in estimated 5.0088 s (1.9M
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/10
time: [2.6523 µs 2.6596 µs 2.6685 µs]
change: [-8.7117% -8.4083% -8.0949%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
3 (3.00%) high mild
3 (3.00%) high severe
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/50: Collecting 100 samples in estimated 5.0197 s (399k
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/50
time: [12.514 µs 12.545 µs 12.578 µs]
change: [-9.5243% -9.0562% -8.6227%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/100: Collecting 100 samples in estimated 5.0600 s (202
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/100
time: [25.135 µs 25.291 µs 25.470 µs]
change: [-8.8822% -8.3943% -7.8854%] (p = 0.00 < 0.05)
Performance has improved.
```
"only" 4x slower than sqlite on `SELECT * FROM users LIMIT 100` after
this!
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#2382
if the table is an intkey table, we can read the rowid directly
without deserializing the full cell, and we also don't need to start
deserializing the record if only the rowid is requested.
```sql
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/1: Collecting 100 samples in estimated 5.0007 s (11M i
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/1
time: [469.38 ns 470.77 ns 472.40 ns]
change: [-5.8959% -5.5232% -5.1840%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/10: Collecting 100 samples in estimated 5.0088 s (1.9M
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/10
time: [2.6523 µs 2.6596 µs 2.6685 µs]
change: [-8.7117% -8.4083% -8.0949%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
3 (3.00%) high mild
3 (3.00%) high severe
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/50: Collecting 100 samples in estimated 5.0197 s (399k
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/50
time: [12.514 µs 12.545 µs 12.578 µs]
change: [-9.5243% -9.0562% -8.6227%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/100: Collecting 100 samples in estimated 5.0600 s (202
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/100
time: [25.135 µs 25.291 µs 25.470 µs]
change: [-8.8822% -8.3943% -7.8854%] (p = 0.00 < 0.05)
Performance has improved.
```
We need to load rowids into mvcc's store in order before doing any read
in case there are rows.
This has a performance penalty for now as expected because we should,
ideally, scan for row ids lazily instead.
On Mvcc `commit_txn` we need to persist changes to database, for this case we re-use pager's semantics of transactions:
1. If there are no conflicts, we start `pager.begin_write_txn`
2. `pager.end_txn`: We flush changes to WAL
3. We finish Mvcc transaction by marking rows with new timestamp.
This PR introduces two methods to pager. Very much inspired by
`with_schema` and `with_schema_mut`. `Pager::with_header` and
`Pager::with_header_mut` will give to the closure a shared and unique
reference respectively that are transmuted references from the `PageRef`
buffer.
This PR also adds type-safe wrappers for `Version`, `PageSize`,
`CacheSize` and `TextEncoding`, as they have special in-memory
representations.
Writing the `DatabaseHeader` is just a single `memcpy` now.
```rs
pub fn write_database_header(&self, header: &DatabaseHeader) {
let buf = self.as_ptr();
buf[0..DatabaseHeader::SIZE].copy_from_slice(bytemuck::bytes_of(header));
}
```
`HeaderRef` and `HeaderRefMut` are used in the `with_header*` methods,
but also can be used on its own when there are multiple reads and writes
to the header, where putting everything in a closure would add too much
nesting.
Reviewed-by: Preston Thorpe (@PThorpe92)
Closes#2234
When traversing, we are only interested the following things:
- Is the page a leaf or not
- Is the page an index or table page
- If not a leaf, what is the left child page
This means we don't have to read the entire cell, just the left child
page.
We need to ensures that there is a single, shared `Database` object per
a database file. We need because it is not safe to have multiple
independent WAL files open because coordination happens at process-level
POSIX file advisory locks.
Fixes#2267
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#2299
Closes: #1415
### What this PR does
1. Removes database initialization from the `read_tx` function.
2. Adds checks for database initialization when executing `.schema`,
`.indexes`, `.tables` and `.import` commands, as they rely on
`sqlite_schema` table.
### About the second issue
I think we have another solution for the second issue: create the
`sqlite_schema` table in `Schema` only during page1 initialization,
rather than during `Schema` initialization.
#### Pros
This approach has the advantage of unifying the logic for the
`sqlite_schema` table with other user tables when running `select`
statements
#### Cons
- we still need to check error codes for commands like `.schema`.
- this approach may increase the complexity of the `pager`
implementation.
I'd like to hear your thoughts and feedback.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#2099
Closes#2227 , enables fixing #2225
## What
Although we cleared overflow pages on DELETE, we never did it for
INSERT/UPDATE, which means any overflow pages were left dangling and not
added to freelist.
## Why is this a problem
This means that we are not able to reuse these pages to solve #2225,
causing massive bloat in the DB when UPDATEs are executed.
## Fix
Clear overflow pages when `BTreeCursor::insert()` overwrites a cell.
Needed a new state machine for `overwrite_cell` + new `WriteState`
variants
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#2230
- When an interior index cell is replaced, it can cause the page where
the
replacement happens to overflow OR underflow. On `main` we did not check
this case, because
the interior cell replacement always moves the cursor to a leaf, and if
the leaf
doesn't underflow, then no further balancing happens.
- The solution is to ALWAYS check whether the interior page where the
replacement
happens is underflowing OR overflowing, and balance that page regardless
of whether
the leaf page where the replacement was taken underflows or not.
So summary:
- InteriorCellReplacement: cell deleted from Interior page I,
replacement cell taken from Leaf L
and inserted back to Interior page I.
- If Leaf L underflows:
* balance it first
* then balance I if it overflows OR underflows
- If Leaf L does NOT underflow:
* balance I anyway if it overflows OR underflows
Closes https://github.com/tursodatabase/turso/issues/1701
Closes https://github.com/tursodatabase/turso/issues/2167
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#2168
Using `unwrap_or_default` can make `page_size` become 0 in this case,
which can lead to subtracting with overflow in `payload_threshold_max`
in case we have some sort of error. Better to unwrap the error here, as
in release mode we may not have overflow checks enabled to catch this.
Closes#2145
- When an interior index cell is replaced, it can cause the page where the
replacement happens to overflow. On `main` we did not check this case, because
the interior cell replacement always moves the cursor to a leaf, and if the leaf
doesn't underflow, then no further balancing happens.
- The solution is to ALWAYS check whether the interior page where the replacement
happens is underflowing OR overflowing, and balance that page regardless of whether
the leaf page where the replacement was taken underflows or not.
So summary:
- InteriorCellReplacement: cell deleted from Interior page I, replacement cell taken from Leaf L
and inserted back to Interior page I.
- If Leaf L underflows:
* balance it first
* then balance I if it overflows OR underflows
- If Leaf L does NOT underflow:
* balance I anyway
Closes#1701Closes#2167
Another fix extracted from running simulations on the #1988 branch.
When interior cell replacement happens as described in #2108,
we use the `cursor.prev()` method to locate the largest key in the
left subtree.
There was an error during backwards traversal in the `get_prev_record()`
method where the parent's cell index was set as `i32::MAX` but not properly
set to `cell_count + 1` (indicating that rightmost pointer has been visited).
The reason `i32::MAX` is used is that the cell count of the page is not
necessarily known at the time it is pushed to the stack.
This PR fixes the issue by setting the cell index of the parent properly
when visiting the rightmost child.