Commit Graph

882 Commits

Author SHA1 Message Date
Levy A.
0ea7849dca feat: IOExt utility trait 2025-07-19 01:40:42 -03:00
Jussi Saurio
40df1725c5 Fix restore_context() not advancing when required 2025-07-18 13:48:23 +03:00
Jussi Saurio
2a2ab16c52 fix moved_before handling in cursor.insert 2025-07-18 13:48:23 +03:00
Jussi Saurio
28c050dd27 seek before insert to ensure correct location in fuzz test 2025-07-18 13:48:23 +03:00
Jussi Saurio
fdeb15bb9d btree/delete: rightmost_cell_was_dropped logic is not needed since a) if we balance, we seek anyway, and b) if we dont balance, we retreat anyway 2025-07-18 13:48:23 +03:00
Jussi Saurio
4f0ef663e2 btree: add target cell tracking for EQ seeks 2025-07-18 13:48:23 +03:00
Jussi Saurio
2b23495943 btree: allow overwriting index interior cell 2025-07-18 13:48:23 +03:00
Jussi Saurio
e33ff667dc btree: use seek() when inserting -- replaces find_cell() 2025-07-18 13:48:23 +03:00
Jussi Saurio
aeab89bd75 Fix parent page stack location after interior node replacement
Another fix extracted from running simulations on the #1988 branch.

When interior cell replacement happens as described in #2108,
we use the `cursor.prev()` method to locate the largest key in the
left subtree.

There was an error during backwards traversal in the `get_prev_record()`
method where the parent's cell index was set as `i32::MAX` but not properly
set to `cell_count + 1` (indicating that rightmost pointer has been visited).

The reason `i32::MAX` is used is that the cell count of the page is not
necessarily known at the time it is pushed to the stack.

This PR fixes the issue by setting the cell index of the parent properly
when visiting the rightmost child.
2025-07-18 13:30:01 +03:00
Pekka Enberg
02f4bc39b3 Merge 'Reanimate MVCC' from Pekka Enberg
Bit-rot happened. Bring MVCC back from the dead.

Closes #2136
2025-07-18 11:22:49 +03:00
Jussi Saurio
347a9152a6 Merge 'Replace verbose IO Completion methods with helpers' from Preston Thorpe
one of the last remnants of some original verbosity

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2156
2025-07-18 10:52:17 +03:00
Iaroslav Zeigerman
20bdbd5ca5 address suggestions 2025-07-18 07:28:37 +02:00
Iaroslav Zeigerman
78f3bf3475 Core: Introduce external sorting 2025-07-18 07:28:36 +02:00
PThorpe92
dced94aec6 Replace verbose completions with new helpers 2025-07-17 23:47:21 -04:00
Jussi Saurio
2f2ecb3576 microsoft paperclip 2025-07-17 23:48:31 +03:00
Jussi Saurio
483dc27539 Merge 'make most instrumentation levels to be Debug or Trace instead' from Pedro Muniz
Span creation in debug mode is very slow and impacts our ability to run
the Simulator faster.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2146
2025-07-17 23:45:07 +03:00
pedrocarlo
c15f1e02d3 make most instrumentation levels to be Debug or Trace instead. Span creation in debug mode is very slow and impacts our ability to run the Simulator fast enough 2025-07-17 16:48:24 -03:00
pedrocarlo
1f67d69e8e forgot to set the state to NewTrunk if we have more leaf pages than free entries 2025-07-17 15:09:52 -03:00
Jussi Saurio
e56325bf05 Merge 'Implement IO latency correctly in simulator' from Pedro Muniz
Closes #1998. Now I am queuing IO to be run at some later point in time.
Also Latency for some reason is slowing the simulator a looot for some
runs.
This PR also adds a StateMachine variant in Balance as now `free_pages`
is correctly an asynchronous function. With this change, we now need a
state machine in the `Pager` so that `free_pages` can be reentrant.
Lastly, I removed a timeout in `checkpoint_shutdown` as it was
triggering constantly due to the slightly increased latency.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1943
2025-07-17 21:05:17 +03:00
Jussi Saurio
49b9a69c40 fix/btree: fix insert_into_cell() logic
During running simulations for #1988 I ran into a post-balance validation
error where the correct divider cell could not be found from the parent.

This was caused by divider cell insertion happening this way:
- First divider cell caused overflow
- Second technically had space to fit, so we didn't add it to overflow cells

I looked at SQLite source, and it seems SQLite always adds the cell to overflow
cells if there are existing overflow cells:

```c
if( pPage->nOverflow || sz+2>pPage->nFree ){
  ...add to overflow cells...
}
```

So, I changed our implementation to do the same, which fixed the balance validation
issue.

However, then I ran into another issue:

A cell inserted during balancing in the `edit_page()` stage was added to overflow cells,
which should not happen. The reason for this was the changed logic in `insert_into_page()`,
outlined above.

It looks like SQLite doesn't use `insert_into_cell()´ in its implementation of `page_insert_array()`
which explains this.

For simplicity, I made a second version of `insert_into_cell()` called `insert_into_cell_during_balance()`
which allows regular cell insertion despite existing overflow cells, since the existing overflow cells are
what caused the balance to happen in the first place and will be cleared as soon as `edit_page()` is done.
2025-07-17 18:26:14 +03:00
pedrocarlo
b80218324d fix merge conflicts 2025-07-17 12:25:31 -03:00
pedrocarlo
7b8eec90bd edit state machine in Btree for freeing pages + Pager state machine for free_page 2025-07-17 12:24:43 -03:00
pedrocarlo
5771d1a00e disable wal sync timeout on checkpoint 2025-07-17 12:24:43 -03:00
pedrocarlo
dc5f73887e refactor to require Arc<Completion> in file traits so that we can delay IO calls correctly 2025-07-17 12:24:43 -03:00
Pekka Enberg
aa84daabf1 core/storage: Fix BTreeCursor::rowid() with MVCC 2025-07-17 16:23:31 +03:00
Pekka Enberg
cef0195b42 core/storage: Fix BTreeCursor::record() for MVCC
Respect immutable record invalidation.
2025-07-17 16:12:40 +03:00
Pekka Enberg
962987e9a1 core/mvcc: Fix MVCC cursor traversal
Add an explicit rewind() to move to the beginning. Change forward()
semantics so that *after* first forward() call, you are pointing to the
first row, which matches the get_next_record() semantics in B-tree
cursor.
2025-07-17 16:12:40 +03:00
Pekka Enberg
72df538a76 core/storage: Add MVCC asertion to BTreeCursor::seek_to_last() 2025-07-17 14:13:22 +03:00
Pekka Enberg
3aca9c54c7 core/storage: Fix BTreeCursor::record() with MVCC 2025-07-17 14:13:22 +03:00
Pekka Enberg
1fc6126157 core/storage: Allocate page1 lazily for MVCC transactions 2025-07-17 14:13:22 +03:00
Jussi Saurio
01ad75ecd0 page cache: temporarily increase default size until WAL spill is implemented 2025-07-17 12:28:44 +03:00
Jussi Saurio
5a2efa3077 Merge 'refactor/btree&vdbe: fold index key info (sort order, collations) into a single struct' from Jussi Saurio
These are nearly always used together in some form, so it makes sense to
colocate them, and it also makes many code paths simpler, as we don't
separately pass `collations` and `key_sort_order` around
As a side effect, as the bitfield-based `IndexKeySortOrder` is removed,
we now remove the arbitrary 64 column restriction for indexes, see e.g.
this sim failure which fails to 64+ index columns (not sure why it uses
an index if they are disabled):
https://github.com/tursodatabase/turso/actions/runs/16339391964/job/4615
8045158

Closes #2131
2025-07-17 11:55:56 +03:00
Jussi Saurio
e8199cb26c btree/vdbe: fold index key info (sort order, collations) into a single struct
These are nearly always used together in some form, so it makes sense to colocate
them, and it also makes many code paths simpler.
2025-07-17 10:58:43 +03:00
Pekka Enberg
99cdcf5348 Merge 'core: Copy-on-write for in-memory schema' from Levy A.
<img height="400" alt="image" src="https://github.com/user-
attachments/assets/bdd5c0a8-1bbb-4199-9026-57f0e5202d73" />
<img height="400" alt="image" src="https://github.com/user-
attachments/assets/7ea63e58-2ab7-4132-b29e-b20597c7093f" />
We were copying the schema preemptively on each `Database::connect`, now
the schema is shared until a change needs to be made by sharing a single
`Arc` and mutating it via `Arc::make_mut`. This is faster as reduces
memory usage.

Closes #2022
2025-07-17 10:46:46 +03:00
Pekka Enberg
af182d9895 Merge 'btree: fix post-balancing seek bug in delete path' from Jussi Saurio
Aftermath of seek-related refactor in #2065, which you can read for
background. The change in this PR is documented pretty well inline - if
we receive a `TryAdvance` seek result when seeking after balancing, we
need to - well - try to advance.
Closes #2116

Closes #2115
2025-07-16 20:08:15 +03:00
Levy A.
d0e26db01a use lock for database schema 2025-07-16 13:54:39 -03:00
Levy A.
4c77d771ff only copy schema on writes 2025-07-16 13:54:36 -03:00
Jussi Saurio
bb0c017d9f Merge 'btree: fix trying to go upwards when we are already at the end of the entire btree' from Jussi Saurio
## What does this fix
This PR fixes an issue with BTree upwards traversal logic where we would
try to go up to a parent node in `next()` even though we are at the very
end of the btree. This behavior can leave the cursor incorrectly
positioned at an interior node when it should be at the right edge of
the rightmost leaf.
## Why doesn't it cause problems on main
This bug is masked on `main` by every table `insert()` (wastefully)
calling `find_cell()`:
- `op_new_rowid` called, let's say the current max rowid is `666`.
Cursor is left pointing at `666`.
- `insert()` is called with rowid `667`, cursor is currently pointing at
`666`, which is incorrect.
- `find_cell()` does a binary search every time, and hence somewhat
accidentally positions the cursor correctly _after_ `666` so that the
insert goes to the correct place
## Why was this issue found
in #1988, I am removing `find_cell()` entirely in favor of always
performing a seek to the correct location - and skipping `seek` when it
is not required, saving us from wasting a binary search on every insert
- but this change means that we need to call `next()` after
`op_new_rowid` to have the cursor positioned correctly at the new
insertion slot. Doing this surfaces this upwards traversal bug in that
PR branch.
## Details of solution
- Store `cell_count` together with `cell_idx` in pagestack, so that
chlidren can know whether their parents have reached their end without
doing IO
- To make this foolproof, pin pages on `PageStack` so the page cache
cannot evict them during tree traversal
- `cell_indices` renamed to `node_states` since it now carries more
information (cell index AND count, instead of just index)

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2005
2025-07-16 19:44:21 +03:00
Diego Reis
b86674adbb Remove cache clearing in cacheflush 2025-07-16 11:11:52 -03:00
Jussi Saurio
8558675c4c page cache: pin pages on the stack 2025-07-16 17:09:05 +03:00
Diego Reis
817ad8d50f Separate user-callable cacheflush from internal cacheflush logic
Cacheflush should only spill pages to WAL as non-commit frames, without checkpointing nor syncing. Check SQLite's sqlite3PagerFlush
2025-07-16 11:08:50 -03:00
Jussi Saurio
f7b9265c26 btree: fix trying to go upwards when at end of btree 2025-07-16 16:58:42 +03:00
Jussi Saurio
e0d797aac0 btree: use node_states instead of cell_indices (tracks cell count too) 2025-07-16 16:58:41 +03:00
Jussi Saurio
f0145fef5c btree: create BTreeNodeState struct for tracking cell idx and count 2025-07-16 16:58:11 +03:00
Jussi Saurio
ac065a79bb btree: fix post-balancing seek bug in delete path 2025-07-16 14:23:46 +03:00
Pekka Enberg
84d8842fbe Merge 'btree: fix interior cell replacement in btrees with depth >=3' from Jussi Saurio
## Background
When a divider cell is deleted from an index interior page, the
following algorithm is used:
1. Find predecessor: Move to largest key in left subtree of the current
page. This is always a leaf page.
2. Create replacement: Convert this predecessor leaf cell to interior
cell format, using original cell's left child page pointer
3. Replace: Drop original cell from parent page, insert replacement at
same position
4. Cleanup: Delete the taken predecessor cell from the leaf page
<img width="845" height="266" alt="Screenshot 2025-07-16 at 10 39 18"
src="https://github.com/user-
attachments/assets/30517da4-a4dc-471e-a8f5-c27ba0979c86" />
## The faulty code leading to the bug
The error in our logic was that we always expected to only traverse down
one level of the btree:
```rust
let parent_page = self.stack.parent_page().unwrap();
let leaf_page = self.stack.top();
```
This meant that when the deletion happened on say, level 1, and the
replacement cell was taken from level 3, we actually inserted the
replacement cell into level 2 instead of level 1.
## Manifestation of the bug in issue 2106
In #2106, this manifested as the following chain of pages, going from
parent to children:
3 -> 111 -> 119
- Cell to be deleted was on page 3 (whose left pointer is 111)
- Going to the largest key in the left subtree meant traversing from 3
to 111 and then from 111 to 119
- a replacement cell was taken from 119
- incorrectly inserted into 111
- and its left child pointer also set as 111!
- now whenever page 111 wanted to go to its left child page, it would
just traverse back to itself, eventually causing a crash because we have
a hard limit of the number of pages on the page stack.
## The fix
The fix is quite trivial: store the page we are on before we start
traversing down.
Closes #2106

Closes #2108
2025-07-16 13:15:54 +03:00
Jussi Saurio
bd69af7372 btree: ensure re-entrancy of InteriorNodeReplacement 2025-07-16 10:50:22 +03:00
Jussi Saurio
47ef30b22e btree: fix interior cell replacement in btrees with depth >=3
When a divider cell is deleted from an index interior page, the following
algorithm is used:

1. Find predecessor: Move to largest key in left subtree (self.prev())
2. Create replacement: Convert predecessor leaf cell to interior cell format, using original cell's left child pointer
3. Replace: Drop original cell from parent page, insert replacement at same position
4. Cleanup: Delete predecessor from leaf page

The error in our logic was that we always expected to only traverse down
one level of the btree:

```rust
let parent_page = self.stack.parent_page().unwrap();
let leaf_page = self.stack.top();
```

This meant that when the deletion happened on say, level 1, and the replacement
cell was taken from level 3, we actually inserted the replacement cell into
level 2 instead of level 1.

In #2106, this manifested as the following chain of pages, going from parent to children:

3 -> 111 -> 119

Cell was deleted from page 3 (whose left pointer is 111), and a replacement cell was taken
from 119, incorrectly inserted into 111, and its left child pointer also set as 111!

The fix is quite trivial: store the page we are on before we start traversing down.

Closes #2106
2025-07-16 10:12:59 +03:00
Jussi Saurio
6e5b407505 btree: add some assertions related to #2106 2025-07-16 08:02:34 +03:00
Diego Reis
0e9771ac07 refactor: Change redundant "Status" enums to IOResult
Let's unify the semantics of "something done" or yields I/O into a
single type
2025-07-15 20:56:18 -03:00