Commit Graph

585 Commits

Author SHA1 Message Date
Jussi Saurio
e56325bf05 Merge 'Implement IO latency correctly in simulator' from Pedro Muniz
Closes #1998. Now I am queuing IO to be run at some later point in time.
Also Latency for some reason is slowing the simulator a looot for some
runs.
This PR also adds a StateMachine variant in Balance as now `free_pages`
is correctly an asynchronous function. With this change, we now need a
state machine in the `Pager` so that `free_pages` can be reentrant.
Lastly, I removed a timeout in `checkpoint_shutdown` as it was
triggering constantly due to the slightly increased latency.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1943
2025-07-17 21:05:17 +03:00
Jussi Saurio
49b9a69c40 fix/btree: fix insert_into_cell() logic
During running simulations for #1988 I ran into a post-balance validation
error where the correct divider cell could not be found from the parent.

This was caused by divider cell insertion happening this way:
- First divider cell caused overflow
- Second technically had space to fit, so we didn't add it to overflow cells

I looked at SQLite source, and it seems SQLite always adds the cell to overflow
cells if there are existing overflow cells:

```c
if( pPage->nOverflow || sz+2>pPage->nFree ){
  ...add to overflow cells...
}
```

So, I changed our implementation to do the same, which fixed the balance validation
issue.

However, then I ran into another issue:

A cell inserted during balancing in the `edit_page()` stage was added to overflow cells,
which should not happen. The reason for this was the changed logic in `insert_into_page()`,
outlined above.

It looks like SQLite doesn't use `insert_into_cell()´ in its implementation of `page_insert_array()`
which explains this.

For simplicity, I made a second version of `insert_into_cell()` called `insert_into_cell_during_balance()`
which allows regular cell insertion despite existing overflow cells, since the existing overflow cells are
what caused the balance to happen in the first place and will be cleared as soon as `edit_page()` is done.
2025-07-17 18:26:14 +03:00
pedrocarlo
b80218324d fix merge conflicts 2025-07-17 12:25:31 -03:00
pedrocarlo
7b8eec90bd edit state machine in Btree for freeing pages + Pager state machine for free_page 2025-07-17 12:24:43 -03:00
Jussi Saurio
e8199cb26c btree/vdbe: fold index key info (sort order, collations) into a single struct
These are nearly always used together in some form, so it makes sense to colocate
them, and it also makes many code paths simpler.
2025-07-17 10:58:43 +03:00
Pekka Enberg
af182d9895 Merge 'btree: fix post-balancing seek bug in delete path' from Jussi Saurio
Aftermath of seek-related refactor in #2065, which you can read for
background. The change in this PR is documented pretty well inline - if
we receive a `TryAdvance` seek result when seeking after balancing, we
need to - well - try to advance.
Closes #2116

Closes #2115
2025-07-16 20:08:15 +03:00
Jussi Saurio
8558675c4c page cache: pin pages on the stack 2025-07-16 17:09:05 +03:00
Jussi Saurio
f7b9265c26 btree: fix trying to go upwards when at end of btree 2025-07-16 16:58:42 +03:00
Jussi Saurio
e0d797aac0 btree: use node_states instead of cell_indices (tracks cell count too) 2025-07-16 16:58:41 +03:00
Jussi Saurio
f0145fef5c btree: create BTreeNodeState struct for tracking cell idx and count 2025-07-16 16:58:11 +03:00
Jussi Saurio
ac065a79bb btree: fix post-balancing seek bug in delete path 2025-07-16 14:23:46 +03:00
Pekka Enberg
84d8842fbe Merge 'btree: fix interior cell replacement in btrees with depth >=3' from Jussi Saurio
## Background
When a divider cell is deleted from an index interior page, the
following algorithm is used:
1. Find predecessor: Move to largest key in left subtree of the current
page. This is always a leaf page.
2. Create replacement: Convert this predecessor leaf cell to interior
cell format, using original cell's left child page pointer
3. Replace: Drop original cell from parent page, insert replacement at
same position
4. Cleanup: Delete the taken predecessor cell from the leaf page
<img width="845" height="266" alt="Screenshot 2025-07-16 at 10 39 18"
src="https://github.com/user-
attachments/assets/30517da4-a4dc-471e-a8f5-c27ba0979c86" />
## The faulty code leading to the bug
The error in our logic was that we always expected to only traverse down
one level of the btree:
```rust
let parent_page = self.stack.parent_page().unwrap();
let leaf_page = self.stack.top();
```
This meant that when the deletion happened on say, level 1, and the
replacement cell was taken from level 3, we actually inserted the
replacement cell into level 2 instead of level 1.
## Manifestation of the bug in issue 2106
In #2106, this manifested as the following chain of pages, going from
parent to children:
3 -> 111 -> 119
- Cell to be deleted was on page 3 (whose left pointer is 111)
- Going to the largest key in the left subtree meant traversing from 3
to 111 and then from 111 to 119
- a replacement cell was taken from 119
- incorrectly inserted into 111
- and its left child pointer also set as 111!
- now whenever page 111 wanted to go to its left child page, it would
just traverse back to itself, eventually causing a crash because we have
a hard limit of the number of pages on the page stack.
## The fix
The fix is quite trivial: store the page we are on before we start
traversing down.
Closes #2106

Closes #2108
2025-07-16 13:15:54 +03:00
Jussi Saurio
bd69af7372 btree: ensure re-entrancy of InteriorNodeReplacement 2025-07-16 10:50:22 +03:00
Jussi Saurio
47ef30b22e btree: fix interior cell replacement in btrees with depth >=3
When a divider cell is deleted from an index interior page, the following
algorithm is used:

1. Find predecessor: Move to largest key in left subtree (self.prev())
2. Create replacement: Convert predecessor leaf cell to interior cell format, using original cell's left child pointer
3. Replace: Drop original cell from parent page, insert replacement at same position
4. Cleanup: Delete predecessor from leaf page

The error in our logic was that we always expected to only traverse down
one level of the btree:

```rust
let parent_page = self.stack.parent_page().unwrap();
let leaf_page = self.stack.top();
```

This meant that when the deletion happened on say, level 1, and the replacement
cell was taken from level 3, we actually inserted the replacement cell into
level 2 instead of level 1.

In #2106, this manifested as the following chain of pages, going from parent to children:

3 -> 111 -> 119

Cell was deleted from page 3 (whose left pointer is 111), and a replacement cell was taken
from 119, incorrectly inserted into 111, and its left child pointer also set as 111!

The fix is quite trivial: store the page we are on before we start traversing down.

Closes #2106
2025-07-16 10:12:59 +03:00
Jussi Saurio
6e5b407505 btree: add some assertions related to #2106 2025-07-16 08:02:34 +03:00
Diego Reis
d0af54ae77 refactor: Change CursorResult to IOResult
The reasoning here is to treat I/O operations (Either is "Done" or
yields to IO) with the same generic type.
2025-07-15 20:52:25 -03:00
Jussi Saurio
927a1f158a Merge 'btree: unify table&index seek page boundary handling' from Jussi Saurio
## Background
PR #2065 fixed a bug with table btree seeks concerning boundaries of
leaf pages.
The issue was that if we were e.g. looking for the first key greater
than (GT) 100, we always assumed the key would either be found on the
left child page of a given divider (e.g. divider 102) or not at all,
which is incorrect. #2065 has more discussion and documentation about
this, so read that one for more context.
## This PR
We already had similar handling for index btrees as #2065 introduced for
table btrees, but it was baked into the `BTreeCursor` struct's seek
handling itself, whereas #2065 handled this on the VDBE side.
This PR unifies this handling for both table and index btrees by always
doing the additional cursor advancement in the VDBE.
Unfortunately, unlike table btrees, index btrees may also need to do an
additional advance when they are looking for an exact match. This
resulted in a bigger refactor than anticipated, since there are quite a
few VDBE instructions that may perform a seek, e.g.: `IdxInsert`,
`IdxDelete`, `Found`, `NotFound`, `NoConflict`. All of these can
potentially end up in a similar situation where the cursor needs one
more advance after the initial seek, and they were currently calling
`cursor.seek()` directly and expecting the `BTreeCursor` to handle the
auto-advance fallback internally.
For this reason, I have 1. removed the "TryAdvance"-ish logic from the
index btree internals and 2. extracted a common VDBE helper `fn
seek_internal()` - heavily based on the existing `op_seek_internal()`,
but decoupled from instructions and the program counter - which all the
interested VDBE instructions will call to delegate their seek logic.
Closes #2083

Reviewed-by: Nikita Sivukhin (@sivukhin)
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2084
2025-07-15 18:02:52 +03:00
meteorgan
a6faab17e9 fix query page size 2025-07-15 16:34:07 +08:00
meteorgan
cf126824de Support set page size 2025-07-15 16:34:07 +08:00
Jussi Saurio
553396e9ca btree: unify table&index seek page boundary handling
PR #2065 fixed a bug with table btree seeks concerning boundaries
of leaf pages.

The issue was that if we were e.g. looking for the first key greater than
(GT) 100, we always assumed the key would either be found on the left child
page of a given divider (e.g. divider 102), which is incorrect. #2065 has more
discussion and documentation about this, so read that one for more context.

Anyway:

We already had similar handling for index btrees, but it was baked into
the `BTreeCursor` struct's seek handling itself, whereas #2065 handled this
on the VDBE side.

This PR unifies this handling for both table and index btrees by always doing
the additional cursor advancement in the VDBE.

Unfortunately, since indexes may also need to do an additional advance when they
are looking for an exact match, this resulted in a bigger refactor than anticipated,
since there are quite a few VDBE instructions that may perform a seek, e.g.:
`IdxInsert`, `IdxDelete`, `Found`, `NotFound`, `NoConflict`.

All of these can potentially end up in a similar situation where the cursor needs
one more advance after the initial seek.

For this reason, I have extracted a common VDBE helper `fn seek_internal()` which
all the interested VDBE instructions will call to delegate their seek logic.
2025-07-14 16:46:43 +03:00
Nikita Sivukhin
5bd3287826 add comments 2025-07-14 13:01:15 +04:00
Nikita Sivukhin
6e2ccdff20 add btree fuzz tests which generate seed file from scratch 2025-07-14 13:01:15 +04:00
Nikita Sivukhin
fc400906d5 handle case when target seek page has no matching entries 2025-07-14 13:01:15 +04:00
Nikita Sivukhin
03b2725cc7 return SeekResult from seek operation
- Apart from regular states Found/NotFound seek result has TryAdvance
  value which tells caller to advance the cursor in necessary direction
  because the leaf page which would hold the entry if it was present
  actually has no matching entry (but neighbouring page can have match)
2025-07-14 13:01:15 +04:00
Krishna Vishal
ea4a4708ea - Address some review comments
- Add docs for `RecordCursor`
2025-07-14 03:28:55 +05:30
Krishna Vishal
b1f27cad94 chore: fix clippy 2025-07-14 03:28:55 +05:30
Krishna Vishal
d3368a28bc fix merge conflicts 2025-07-14 03:28:55 +05:30
Krishna Vishal
e7e5f28c0a chore: Clippy chill 2025-07-14 03:28:54 +05:30
Krishna Vishal
35ed279644 Clean up indexbtree_move_to 2025-07-14 03:28:54 +05:30
Krishna Vishal
f0e8e5871b Replace compare_immutable with compare_records_generic 2025-07-14 03:28:54 +05:30
Krishna Vishal
860de412d9 Add num_columns to BTreeCursor so we can initialize
`Vecs` inside `RecordCursor` to their appropriate to reduce
allocations.
2025-07-14 03:28:54 +05:30
Krishna Vishal
ef147181c2 Working version of the incremental column and "optimal" record
compare functions. Now we optimize them
2025-07-14 03:28:54 +05:30
Krishna Vishal
692f0413eb Stash 2025-07-14 03:28:54 +05:30
Krishna Vishal
c7aa3c3d93 Fix btree to invalidate RecordCursor
Use `read_value` instead of `deserialize_column_data`
Add `sqlite_int_float_compare` which takes care of out of range
floats
2025-07-14 03:28:54 +05:30
Krishna Vishal
2323763a4f Integrate incremental column parsing into btree.rs 2025-07-14 03:28:54 +05:30
Jussi Saurio
a48b6d049a Another post-rebase clippy round with 1.88.0 2025-07-12 19:10:56 +03:00
Nils Koch
1a91966c7e fix clippy errors for rust 1.88.0 (manual fix) 2025-07-12 18:58:55 +03:00
Nils Koch
828d4f5016 fix clippy errors for rust 1.88.0 (auto fix) 2025-07-12 18:58:41 +03:00
Jussi Saurio
0009683566 Merge 'btree/balance/validation: fix divider cell insert validation' from Jussi Saurio
Closes #2047
the validation code was assuming that:
- if the parent has overflow cells after inserting a divider cell
- the exact divider we are validating MUST be in those overflow cells
However, this is not necessarily the case. Imagine:
- First divider gets inserted at index `n`. It is too large to fit, so
it gets pushed to `parent.overflow_cells()`. Parent usable space does
not decrease.
- Second divider gets inserted at index `n+1`. It is smaller, so it
still fits in usable space.
Hence:
Provide information to the validation function about whether the
inserted cell overflowed, and use that to find the left pointer and
assert accordingly.

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2050
2025-07-11 15:09:19 +03:00
Pekka Enberg
7b646da82c Merge 'Btree: more balance docs' from Jussi Saurio
Continuation from #2031 -- some variable extractions and comments

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2035
2025-07-11 15:06:45 +03:00
Jussi Saurio
c81522c20f btree/balance/validation: fix divider cell insert validation
the validation code was assuming that:

- if the parent has overflow cells after a inserting divider cell
- the exact divider we are validating MUST be in those overflow cells

However, this is not necessarily the case. Imagine:

- First divider gets inserted at index `n`. It is too large to fit,
  so it gets pushed to `parent.overflow_cells()`. Parent usable space
  does not decrease.
- Second divider gets inserted at index `n+1`. It is smaller, so it
  still fits in usable space.

Hence:

Provide information to the validation function about whether the inserted
cell overflowed, and use that to find the left pointer and assert accordingly.
2025-07-11 11:21:05 +03:00
Jussi Saurio
99df69f603 btree/balance/validation: fix use-after-free of rightmost ptr validation
We can use `right_page_id` directly to perform the validation instead of
carrying a raw pointer around which might be invalidated by the time we
do the validation.
2025-07-10 18:23:54 +03:00
Jussi Saurio
a403e55319 btree: add comment about when left-to-right size balancing is stopped 2025-07-10 15:52:18 +03:00
Jussi Saurio
bc328f9738 btree: one more is_last_sibling doc variable 2025-07-10 15:39:22 +03:00
Jussi Saurio
fc27c08e11 clippy 2025-07-10 15:36:46 +03:00
Jussi Saurio
fba05b1998 btree: add named range variables to make cell movement double-pass clearer 2025-07-10 15:34:41 +03:00
Jussi Saurio
8f1109692f btree: replace a bunch of 'count-1' conditions with is_last_sibling variables 2025-07-10 15:31:53 +03:00
Jussi Saurio
a81d81685f btree: rename divider_cells to divider_cell_payloads for clarity 2025-07-10 15:29:01 +03:00
Jussi Saurio
5d0b410e70 btree: rename constant to mention siblings 2025-07-10 15:28:11 +03:00
Jussi Saurio
78a249c6d0 btree: add MAX_SIBLING_PAGES_TO_BALANCE constant and use it 2025-07-10 15:27:37 +03:00