- When an interior index cell is replaced, it can cause the page where
the
replacement happens to overflow OR underflow. On `main` we did not check
this case, because
the interior cell replacement always moves the cursor to a leaf, and if
the leaf
doesn't underflow, then no further balancing happens.
- The solution is to ALWAYS check whether the interior page where the
replacement
happens is underflowing OR overflowing, and balance that page regardless
of whether
the leaf page where the replacement was taken underflows or not.
So summary:
- InteriorCellReplacement: cell deleted from Interior page I,
replacement cell taken from Leaf L
and inserted back to Interior page I.
- If Leaf L underflows:
* balance it first
* then balance I if it overflows OR underflows
- If Leaf L does NOT underflow:
* balance I anyway if it overflows OR underflows
Closes https://github.com/tursodatabase/turso/issues/1701
Closes https://github.com/tursodatabase/turso/issues/2167
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#2168
Using `unwrap_or_default` can make `page_size` become 0 in this case,
which can lead to subtracting with overflow in `payload_threshold_max`
in case we have some sort of error. Better to unwrap the error here, as
in release mode we may not have overflow checks enabled to catch this.
Closes#2145
Currently, each record header is decoded at least twice: once to
determine the record size within the read buffer (in order to construct
the `ImmutableRecord` instance), and again later when decoding the
record for comparison. This redundant decoding can have a noticeable
negative impact on performance when records are wide (eg. contain
multiple columns).
This update modifies the (de)serialization format for sorted chunk files
by prepending a record size varint to each record payload. As a result,
only a single varint needs to be decoded to determine the record size,
eliminating the need to decode the full record header during reads.
Closes#2176
- When an interior index cell is replaced, it can cause the page where the
replacement happens to overflow. On `main` we did not check this case, because
the interior cell replacement always moves the cursor to a leaf, and if the leaf
doesn't underflow, then no further balancing happens.
- The solution is to ALWAYS check whether the interior page where the replacement
happens is underflowing OR overflowing, and balance that page regardless of whether
the leaf page where the replacement was taken underflows or not.
So summary:
- InteriorCellReplacement: cell deleted from Interior page I, replacement cell taken from Leaf L
and inserted back to Interior page I.
- If Leaf L underflows:
* balance it first
* then balance I if it overflows OR underflows
- If Leaf L does NOT underflow:
* balance I anyway
Closes#1701Closes#2167
Another fix extracted from running simulations on the #1988 branch.
When interior cell replacement happens as described in #2108,
we use the `cursor.prev()` method to locate the largest key in the
left subtree.
There was an error during backwards traversal in the `get_prev_record()`
method where the parent's cell index was set as `i32::MAX` but not properly
set to `cell_count + 1` (indicating that rightmost pointer has been visited).
The reason `i32::MAX` is used is that the cell count of the page is not
necessarily known at the time it is pushed to the stack.
This PR fixes the issue by setting the cell index of the parent properly
when visiting the rightmost child.
Span creation in debug mode is very slow and impacts our ability to run
the Simulator faster.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#2146
Closes#1998. Now I am queuing IO to be run at some later point in time.
Also Latency for some reason is slowing the simulator a looot for some
runs.
This PR also adds a StateMachine variant in Balance as now `free_pages`
is correctly an asynchronous function. With this change, we now need a
state machine in the `Pager` so that `free_pages` can be reentrant.
Lastly, I removed a timeout in `checkpoint_shutdown` as it was
triggering constantly due to the slightly increased latency.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#1943
During running simulations for #1988 I ran into a post-balance validation
error where the correct divider cell could not be found from the parent.
This was caused by divider cell insertion happening this way:
- First divider cell caused overflow
- Second technically had space to fit, so we didn't add it to overflow cells
I looked at SQLite source, and it seems SQLite always adds the cell to overflow
cells if there are existing overflow cells:
```c
if( pPage->nOverflow || sz+2>pPage->nFree ){
...add to overflow cells...
}
```
So, I changed our implementation to do the same, which fixed the balance validation
issue.
However, then I ran into another issue:
A cell inserted during balancing in the `edit_page()` stage was added to overflow cells,
which should not happen. The reason for this was the changed logic in `insert_into_page()`,
outlined above.
It looks like SQLite doesn't use `insert_into_cell()´ in its implementation of `page_insert_array()`
which explains this.
For simplicity, I made a second version of `insert_into_cell()` called `insert_into_cell_during_balance()`
which allows regular cell insertion despite existing overflow cells, since the existing overflow cells are
what caused the balance to happen in the first place and will be cleared as soon as `edit_page()` is done.
Add an explicit rewind() to move to the beginning. Change forward()
semantics so that *after* first forward() call, you are pointing to the
first row, which matches the get_next_record() semantics in B-tree
cursor.
These are nearly always used together in some form, so it makes sense to
colocate them, and it also makes many code paths simpler, as we don't
separately pass `collations` and `key_sort_order` around
As a side effect, as the bitfield-based `IndexKeySortOrder` is removed,
we now remove the arbitrary 64 column restriction for indexes, see e.g.
this sim failure which fails to 64+ index columns (not sure why it uses
an index if they are disabled):
https://github.com/tursodatabase/turso/actions/runs/16339391964/job/4615
8045158
Closes#2131
<img height="400" alt="image" src="https://github.com/user-
attachments/assets/bdd5c0a8-1bbb-4199-9026-57f0e5202d73" />
<img height="400" alt="image" src="https://github.com/user-
attachments/assets/7ea63e58-2ab7-4132-b29e-b20597c7093f" />
We were copying the schema preemptively on each `Database::connect`, now
the schema is shared until a change needs to be made by sharing a single
`Arc` and mutating it via `Arc::make_mut`. This is faster as reduces
memory usage.
Closes#2022
Aftermath of seek-related refactor in #2065, which you can read for
background. The change in this PR is documented pretty well inline - if
we receive a `TryAdvance` seek result when seeking after balancing, we
need to - well - try to advance.
Closes#2116Closes#2115
## What does this fix
This PR fixes an issue with BTree upwards traversal logic where we would
try to go up to a parent node in `next()` even though we are at the very
end of the btree. This behavior can leave the cursor incorrectly
positioned at an interior node when it should be at the right edge of
the rightmost leaf.
## Why doesn't it cause problems on main
This bug is masked on `main` by every table `insert()` (wastefully)
calling `find_cell()`:
- `op_new_rowid` called, let's say the current max rowid is `666`.
Cursor is left pointing at `666`.
- `insert()` is called with rowid `667`, cursor is currently pointing at
`666`, which is incorrect.
- `find_cell()` does a binary search every time, and hence somewhat
accidentally positions the cursor correctly _after_ `666` so that the
insert goes to the correct place
## Why was this issue found
in #1988, I am removing `find_cell()` entirely in favor of always
performing a seek to the correct location - and skipping `seek` when it
is not required, saving us from wasting a binary search on every insert
- but this change means that we need to call `next()` after
`op_new_rowid` to have the cursor positioned correctly at the new
insertion slot. Doing this surfaces this upwards traversal bug in that
PR branch.
## Details of solution
- Store `cell_count` together with `cell_idx` in pagestack, so that
chlidren can know whether their parents have reached their end without
doing IO
- To make this foolproof, pin pages on `PageStack` so the page cache
cannot evict them during tree traversal
- `cell_indices` renamed to `node_states` since it now carries more
information (cell index AND count, instead of just index)
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#2005