## Background
When a divider cell is deleted from an index interior page, the
following algorithm is used:
1. Find predecessor: Move to largest key in left subtree of the current
page. This is always a leaf page.
2. Create replacement: Convert this predecessor leaf cell to interior
cell format, using original cell's left child page pointer
3. Replace: Drop original cell from parent page, insert replacement at
same position
4. Cleanup: Delete the taken predecessor cell from the leaf page
<img width="845" height="266" alt="Screenshot 2025-07-16 at 10 39 18"
src="https://github.com/user-
attachments/assets/30517da4-a4dc-471e-a8f5-c27ba0979c86" />
## The faulty code leading to the bug
The error in our logic was that we always expected to only traverse down
one level of the btree:
```rust
let parent_page = self.stack.parent_page().unwrap();
let leaf_page = self.stack.top();
```
This meant that when the deletion happened on say, level 1, and the
replacement cell was taken from level 3, we actually inserted the
replacement cell into level 2 instead of level 1.
## Manifestation of the bug in issue 2106
In #2106, this manifested as the following chain of pages, going from
parent to children:
3 -> 111 -> 119
- Cell to be deleted was on page 3 (whose left pointer is 111)
- Going to the largest key in the left subtree meant traversing from 3
to 111 and then from 111 to 119
- a replacement cell was taken from 119
- incorrectly inserted into 111
- and its left child pointer also set as 111!
- now whenever page 111 wanted to go to its left child page, it would
just traverse back to itself, eventually causing a crash because we have
a hard limit of the number of pages on the page stack.
## The fix
The fix is quite trivial: store the page we are on before we start
traversing down.
Closes#2106Closes#2108
Instead of e.g.:
> Error: failed with error: 'InternalError("`SELECT * FROM sparkling_kuh
WHERE (sparkling_kuh.warmhearted_bowden > 'givhaqhwibn') ` should return
no values for table `sparkling_kuh`")'
Now you get:
> Error: failed with error: 'InternalError("Assertion '`SELECT * FROM
sparkling_kuh WHERE (sparkling_kuh.warmhearted_bowden > 'givhaqhwibn') `
should return no values for table `sparkling_kuh`' failed: expected no
rows but got 1 rows: \"-37703991.25525856, sleek_leeder,
passionate_deleuze, -2463056772.592847, shining_kuo, polite_mcbarron,
X'616D626974696F75735F647261676F6E6F776C', warmhearted_bekken\"")'
Closes#2111
`FaultyQuery` causes frequent false positives in simulator due to the
following chain of events:
- we write rows and flush wal to disk
- inject fault during fsync which fails
- error is returned to caller, simulator thinks those rows dont exist
because the query failed
- we reopen the database i.e. read the WAL back to memory from disk, it
has those extra rows we think we didn't write
- assertion fails because table has more rows than simulator expected
More discussion about fsync behavior in issue #2091Closes#2110
`FaultyQuery` causes frequent false positives in simulator due to
the following chain of events:
- we write rows and flush wal to disk
- inject fault during fsync which fails
- error is returned to caller, simulator thinks those rows dont exist because the query failed
- we reopen the database i.e. read the WAL back to memory from disk, it has those extra rows we think we didn't write
- assertion fails because table has more rows than simulator expected
More discussion about fsync behavior in issue #2091
- Add generation for UNION/JOIN
- Rearchitect the oracle calling conventions to simplify the code paths
- Add brute force shrinking option by @echoumcp1
Closes#2049
When a divider cell is deleted from an index interior page, the following
algorithm is used:
1. Find predecessor: Move to largest key in left subtree (self.prev())
2. Create replacement: Convert predecessor leaf cell to interior cell format, using original cell's left child pointer
3. Replace: Drop original cell from parent page, insert replacement at same position
4. Cleanup: Delete predecessor from leaf page
The error in our logic was that we always expected to only traverse down
one level of the btree:
```rust
let parent_page = self.stack.parent_page().unwrap();
let leaf_page = self.stack.top();
```
This meant that when the deletion happened on say, level 1, and the replacement
cell was taken from level 3, we actually inserted the replacement cell into
level 2 instead of level 1.
In #2106, this manifested as the following chain of pages, going from parent to children:
3 -> 111 -> 119
Cell was deleted from page 3 (whose left pointer is 111), and a replacement cell was taken
from 119, incorrectly inserted into 111, and its left child pointer also set as 111!
The fix is quite trivial: store the page we are on before we start traversing down.
Closes#2106
as nilskch points out in #1807, Rust 1.88.0 is stricter about alignment
checks.
because rust integers default to `i32`, we were casting a pointer to an
`i32` as a pointer to an `i64` causing a panic when dereferenced due to
misalignment as rust expects it to be 8 byte aligned.
Reviewed-by: Preston Thorpe (@PThorpe92)
Closes#2064
- Fix not being able to create table while importing
* The behavior now aligns with SQLite so that if the table already
exists, all the rows are treated as data. If the table doesn't exist,
the first row is treated as the header from which column names for the
new table are populated.
- Insert in batches instead of one at a time
This was a pretty quick vibecoding effort tbh :]
Closes#2079Closes#2094
Small refactoring to reduce confusion (I was caught in this trap and set
`amount` to one in CDC branch during development)
Also, this PR slightly fix broken `concat_ws` emit logic.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#2100
This PR unify the concept of a result that either have something done or
yields to IO, into a single type.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#2103
- remove assumptions that record header size fits into 1 byte or serial
type fits into 1 byte
- add tests for record header size calculation
```sql
turso> CREATE TABLE t(x TEXT, y);
CREATE INDEX t_idx ON t(x);
INSERT INTO t VALUES (replace(zeroblob(1000), x'00', 'a') || 'a', 1); -- 1000 bytes of 'a'
INSERT INTO t VALUES (replace(zeroblob(1000), x'00', 'a') || 'b', 2);
INSERT INTO t VALUES (replace(zeroblob(1000), x'00', 'a') || 'c', 3);
INSERT INTO t VALUES (replace(zeroblob(1000), x'00', 'a') || 'd', 4);
INSERT INTO t VALUES (replace(zeroblob(1000), x'00', 'a') || 'e', 5);
INSERT INTO t VALUES (replace(zeroblob(1000), x'00', 'a') || 'f', 6);
INSERT INTO t VALUES (replace(zeroblob(1000), x'00', 'a') || 'g', 7);
INSERT INTO t VALUES (replace(zeroblob(1000), x'00', 'a') || 'h', 8);
SELECT COUNT(*) FROM t WHERE x >= replace(hex(zeroblob(100)), '00', 'a');
┌───────────┐
│ COUNT (*) │
├───────────┤
│ 8 │
└───────────┘
```
Fixes#2096Fixes#2088
Reviewed-by: Nikita Sivukhin (@sivukhin)
Closes#2098
## Background
PR #2065 fixed a bug with table btree seeks concerning boundaries of
leaf pages.
The issue was that if we were e.g. looking for the first key greater
than (GT) 100, we always assumed the key would either be found on the
left child page of a given divider (e.g. divider 102) or not at all,
which is incorrect. #2065 has more discussion and documentation about
this, so read that one for more context.
## This PR
We already had similar handling for index btrees as #2065 introduced for
table btrees, but it was baked into the `BTreeCursor` struct's seek
handling itself, whereas #2065 handled this on the VDBE side.
This PR unifies this handling for both table and index btrees by always
doing the additional cursor advancement in the VDBE.
Unfortunately, unlike table btrees, index btrees may also need to do an
additional advance when they are looking for an exact match. This
resulted in a bigger refactor than anticipated, since there are quite a
few VDBE instructions that may perform a seek, e.g.: `IdxInsert`,
`IdxDelete`, `Found`, `NotFound`, `NoConflict`. All of these can
potentially end up in a similar situation where the cursor needs one
more advance after the initial seek, and they were currently calling
`cursor.seek()` directly and expecting the `BTreeCursor` to handle the
auto-advance fallback internally.
For this reason, I have 1. removed the "TryAdvance"-ish logic from the
index btree internals and 2. extracted a common VDBE helper `fn
seek_internal()` - heavily based on the existing `op_seek_internal()`,
but decoupled from instructions and the program counter - which all the
interested VDBE instructions will call to delegate their seek logic.
Closes#2083
Reviewed-by: Nikita Sivukhin (@sivukhin)
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#2084
- Fix not being able to create table while importing
* The behavior now aligns with SQLite so that if the table already
exists, all the rows are treated as data. If the table doesn't exist,
the first row is treated as the header from which column names for the
new table are populated.
- Insert in batches instead of one at a time
First step toward resolving
https://github.com/tursodatabase/limbo/issues/1643.
### This PR
With this change, the following two queries are considered equivalent:
```sql
SELECT value FROM generate_series(5, 50);
SELECT value FROM generate_series WHERE start = 5 AND stop = 50;
```
Arguments passed in parentheses to the virtual table name are now
matched to hidden columns.
Additionally, I fixed two bugs related to virtual tables.
### TODO (I'll handle this in a separate PR)
Column references are still not supported as table-valued function
arguments. The only difference is that previously, a query like:
```sql
SELECT one.value, series.value
FROM (SELECT 1 AS value) one, generate_series(one.value, 3) series;
```
would cause a panic. Now, it returns a proper error message instead.
Adding support for column references is more nuanced for two main
reasons:
* We need to ensure that in joins where a TVF depends on other tables,
those other tables are processed first. For example, in:
```sql
SELECT one.value, series.value
FROM generate_series(one.value, 3) series, (SELECT 1 AS value) one;
```
the one table must be processed by the top-level loop, and series must
be nested.
* For outer joins involving TVFs, the arguments must be treated as `ON`
predicates, not `WHERE` predicates.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#1727
Closes: #1379
This PR consists of three main changes:
1. Rebuild `Pager` to set the correct page size for `buffer_pool`, `wal`
and other components when the database is uninitialized.
2. Persist the latest page size when allocate page 1.
3. Ensure all pragmas emit the correct transaction instructions,
preventing even a `page_size` read from triggering database
initialization.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#2053