Commit Graph

6199 Commits

Author SHA1 Message Date
bit-aloo
5be10bb5bc doc: fix broken contribution guide link and expand language binding links 2025-07-20 22:18:59 +05:30
Pekka Enberg
b03b06107b Turso 0.1.3-pre.2 2025-07-16 20:08:46 +03:00
Pekka Enberg
c378f8a8bb Merge 'compat: add integrity_check' from Pere Diaz Bou
Reviewed-by: Avinash Sajjanshetty (@avinassh)
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2119
2025-07-16 20:08:32 +03:00
Pekka Enberg
e6c3a5a9b8 Merge 'rename operation_xxx to change_xxx to make naming more consistent' from Nikita Sivukhin
This PR renames CDC table column names to use "change"-centric
terminology and avoid using `operation_xxx` column names.
Just a small refactoring to bring more consistency as `turso-db` refer
to the feature as capture data **changes** - and there is no word
operation here.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2120
2025-07-16 20:08:23 +03:00
Pekka Enberg
af182d9895 Merge 'btree: fix post-balancing seek bug in delete path' from Jussi Saurio
Aftermath of seek-related refactor in #2065, which you can read for
background. The change in this PR is documented pretty well inline - if
we receive a `TryAdvance` seek result when seeking after balancing, we
need to - well - try to advance.
Closes #2116

Closes #2115
2025-07-16 20:08:15 +03:00
Jussi Saurio
bb0c017d9f Merge 'btree: fix trying to go upwards when we are already at the end of the entire btree' from Jussi Saurio
## What does this fix
This PR fixes an issue with BTree upwards traversal logic where we would
try to go up to a parent node in `next()` even though we are at the very
end of the btree. This behavior can leave the cursor incorrectly
positioned at an interior node when it should be at the right edge of
the rightmost leaf.
## Why doesn't it cause problems on main
This bug is masked on `main` by every table `insert()` (wastefully)
calling `find_cell()`:
- `op_new_rowid` called, let's say the current max rowid is `666`.
Cursor is left pointing at `666`.
- `insert()` is called with rowid `667`, cursor is currently pointing at
`666`, which is incorrect.
- `find_cell()` does a binary search every time, and hence somewhat
accidentally positions the cursor correctly _after_ `666` so that the
insert goes to the correct place
## Why was this issue found
in #1988, I am removing `find_cell()` entirely in favor of always
performing a seek to the correct location - and skipping `seek` when it
is not required, saving us from wasting a binary search on every insert
- but this change means that we need to call `next()` after
`op_new_rowid` to have the cursor positioned correctly at the new
insertion slot. Doing this surfaces this upwards traversal bug in that
PR branch.
## Details of solution
- Store `cell_count` together with `cell_idx` in pagestack, so that
chlidren can know whether their parents have reached their end without
doing IO
- To make this foolproof, pin pages on `PageStack` so the page cache
cannot evict them during tree traversal
- `cell_indices` renamed to `node_states` since it now carries more
information (cell index AND count, instead of just index)

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2005
2025-07-16 19:44:21 +03:00
Jussi Saurio
43f0ab39dc Merge 'Separate user-callable cacheflush from internal cacheflush logic' from Diego Reis
Cacheflush should only spill pages to WAL as non-commit frames, without
checkpointing nor syncing.
- [docs](https://sqlite.org/c3ref/db_cacheflush.html)
- [sqlite3PagerFlush](https://github.com/sqlite/sqlite/blob/625d0b70febe
cb0864a81b2a047a961a59e8c17e/src/pager.c#L4669)

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2044
2025-07-16 19:44:12 +03:00
Nikita Sivukhin
97b82fe6d8 rename operation_xxx to change_xxx to make naming more consistent 2025-07-16 20:16:24 +04:00
Pere Diaz Bou
d559bf3d9f compat: add integrity_check 2025-07-16 17:19:51 +02:00
Diego Reis
b86674adbb Remove cache clearing in cacheflush 2025-07-16 11:11:52 -03:00
Jussi Saurio
8558675c4c page cache: pin pages on the stack 2025-07-16 17:09:05 +03:00
Diego Reis
5dd571483f Add cacheflush to Rust binding 2025-07-16 11:08:52 -03:00
Diego Reis
817ad8d50f Separate user-callable cacheflush from internal cacheflush logic
Cacheflush should only spill pages to WAL as non-commit frames, without checkpointing nor syncing. Check SQLite's sqlite3PagerFlush
2025-07-16 11:08:50 -03:00
Jussi Saurio
f7b9265c26 btree: fix trying to go upwards when at end of btree 2025-07-16 16:58:42 +03:00
Jussi Saurio
e0d797aac0 btree: use node_states instead of cell_indices (tracks cell count too) 2025-07-16 16:58:41 +03:00
Jussi Saurio
f0145fef5c btree: create BTreeNodeState struct for tracking cell idx and count 2025-07-16 16:58:11 +03:00
Jussi Saurio
ac065a79bb btree: fix post-balancing seek bug in delete path 2025-07-16 14:23:46 +03:00
Pekka Enberg
93634d56ba Turso 0.1.3-pre.1 2025-07-16 13:16:57 +03:00
Pekka Enberg
84d8842fbe Merge 'btree: fix interior cell replacement in btrees with depth >=3' from Jussi Saurio
## Background
When a divider cell is deleted from an index interior page, the
following algorithm is used:
1. Find predecessor: Move to largest key in left subtree of the current
page. This is always a leaf page.
2. Create replacement: Convert this predecessor leaf cell to interior
cell format, using original cell's left child page pointer
3. Replace: Drop original cell from parent page, insert replacement at
same position
4. Cleanup: Delete the taken predecessor cell from the leaf page
<img width="845" height="266" alt="Screenshot 2025-07-16 at 10 39 18"
src="https://github.com/user-
attachments/assets/30517da4-a4dc-471e-a8f5-c27ba0979c86" />
## The faulty code leading to the bug
The error in our logic was that we always expected to only traverse down
one level of the btree:
```rust
let parent_page = self.stack.parent_page().unwrap();
let leaf_page = self.stack.top();
```
This meant that when the deletion happened on say, level 1, and the
replacement cell was taken from level 3, we actually inserted the
replacement cell into level 2 instead of level 1.
## Manifestation of the bug in issue 2106
In #2106, this manifested as the following chain of pages, going from
parent to children:
3 -> 111 -> 119
- Cell to be deleted was on page 3 (whose left pointer is 111)
- Going to the largest key in the left subtree meant traversing from 3
to 111 and then from 111 to 119
- a replacement cell was taken from 119
- incorrectly inserted into 111
- and its left child pointer also set as 111!
- now whenever page 111 wanted to go to its left child page, it would
just traverse back to itself, eventually causing a crash because we have
a hard limit of the number of pages on the page stack.
## The fix
The fix is quite trivial: store the page we are on before we start
traversing down.
Closes #2106

Closes #2108
2025-07-16 13:15:54 +03:00
Pekka Enberg
7d94aea3d5 Merge 'make unixepoch to return i64' from Nikita Sivukhin
See no reason why it will be string
```
sqlite> select abs(unixepoch());
1752660221

turso> select abs(unixepoch());
┌────────────────────┐
│ abs (unixepoch ()) │
├────────────────────┤
│                0.0 │
└────────────────────┘
```

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2112
2025-07-16 13:15:30 +03:00
Pekka Enberg
363a45e3ef Merge 'sim: provide additional context in assertion failures' from Jussi Saurio
Instead of e.g.:
> Error: failed with error: 'InternalError("`SELECT * FROM sparkling_kuh
WHERE (sparkling_kuh.warmhearted_bowden > 'givhaqhwibn') ` should return
no values for table `sparkling_kuh`")'
Now you get:
> Error: failed with error: 'InternalError("Assertion '`SELECT * FROM
sparkling_kuh WHERE (sparkling_kuh.warmhearted_bowden > 'givhaqhwibn') `
should return no values for table `sparkling_kuh`' failed: expected no
rows but got 1 rows: \"-37703991.25525856, sleek_leeder,
passionate_deleuze, -2463056772.592847, shining_kuo, polite_mcbarron,
X'616D626974696F75735F647261676F6E6F776C', warmhearted_bekken\"")'

Closes #2111
2025-07-16 13:15:22 +03:00
Nikita Sivukhin
41482915f6 make unixepoch to return i64 2025-07-16 14:02:56 +04:00
Jussi Saurio
ea427b3b64 sim: provide additional context in assertion failures 2025-07-16 12:05:30 +03:00
Pekka Enberg
192d1efc7f Merge 'btree: add some assertions related to #2106' from Jussi Saurio
Closes #2107
2025-07-16 11:17:23 +03:00
Pekka Enberg
99d61aad3c simulator: Add mention of fsync() issue for disabled fsync faults 2025-07-16 11:15:41 +03:00
Pekka Enberg
b0971f98c2 Merge 'sim: ignore fsync faults' from Jussi Saurio
`FaultyQuery` causes frequent false positives in simulator due to the
following chain of events:
- we write rows and flush wal to disk
- inject fault during fsync which fails
- error is returned to caller, simulator thinks those rows dont exist
because the query failed
- we reopen the database i.e. read the WAL back to memory from disk, it
has those extra rows we think we didn't write
- assertion fails because table has more rows than simulator expected
More discussion about fsync behavior in issue #2091

Closes #2110
2025-07-16 11:15:23 +03:00
Jussi Saurio
bb0cad459e sim: ignore fsync faults
`FaultyQuery` causes frequent false positives in simulator due to
the following chain of events:

- we write rows and flush wal to disk
- inject fault during fsync which fails
- error is returned to caller, simulator thinks those rows dont exist because the query failed
- we reopen the database i.e. read the WAL back to memory from disk, it has those extra rows we think we didn't write
- assertion fails because table has more rows than simulator expected

More discussion about fsync behavior in issue #2091
2025-07-16 11:09:54 +03:00
Pekka Enberg
1a8bade9d5 Merge 'Updates to the simulator' from Alperen Keleş
- Add generation for UNION/JOIN
- Rearchitect the oracle calling conventions to simplify the code paths
- Add brute force shrinking option by @echoumcp1

Closes #2049
2025-07-16 11:03:41 +03:00
Jussi Saurio
bd69af7372 btree: ensure re-entrancy of InteriorNodeReplacement 2025-07-16 10:50:22 +03:00
Jussi Saurio
47ef30b22e btree: fix interior cell replacement in btrees with depth >=3
When a divider cell is deleted from an index interior page, the following
algorithm is used:

1. Find predecessor: Move to largest key in left subtree (self.prev())
2. Create replacement: Convert predecessor leaf cell to interior cell format, using original cell's left child pointer
3. Replace: Drop original cell from parent page, insert replacement at same position
4. Cleanup: Delete predecessor from leaf page

The error in our logic was that we always expected to only traverse down
one level of the btree:

```rust
let parent_page = self.stack.parent_page().unwrap();
let leaf_page = self.stack.top();
```

This meant that when the deletion happened on say, level 1, and the replacement
cell was taken from level 3, we actually inserted the replacement cell into
level 2 instead of level 1.

In #2106, this manifested as the following chain of pages, going from parent to children:

3 -> 111 -> 119

Cell was deleted from page 3 (whose left pointer is 111), and a replacement cell was taken
from 119, incorrectly inserted into 111, and its left child pointer also set as 111!

The fix is quite trivial: store the page we are on before we start traversing down.

Closes #2106
2025-07-16 10:12:59 +03:00
Pekka Enberg
f72ceaf177 Merge 'extensions/vtab: fix i32 being passed as i64 across FFI boundary' from Jussi Saurio
as nilskch points out in #1807, Rust 1.88.0 is stricter about alignment
checks.
because rust integers default to `i32`, we were casting a pointer to an
`i32` as a pointer to an `i64` causing a panic when dereferenced due to
misalignment as rust expects it to be 8 byte aligned.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #2064
2025-07-16 08:28:24 +03:00
Pekka Enberg
f4e82df00e Merge 'Fix CSV import in the shell' from Jussi Saurio
- Fix not being able to create table while importing
    * The behavior now aligns with SQLite so that if the table already
exists, all the rows are treated as data. If the table doesn't exist,
the first row is treated as the header from which column names for the
new table are populated.
- Insert in batches instead of one at a time
This was a pretty quick vibecoding effort tbh :]
Closes #2079

Closes #2094
2025-07-16 08:26:30 +03:00
Jussi Saurio
6e5b407505 btree: add some assertions related to #2106 2025-07-16 08:02:34 +03:00
alpaylan
04f5b91e87 fix faulty Update generation within delete_select 2025-07-16 00:06:35 -04:00
Jussi Saurio
f482424d77 Merge 'small refactor: rename "amount" to "extra_amount"' from Nikita Sivukhin
Small refactoring to reduce confusion (I was caught in this trap and set
`amount` to one in CDC branch during development)
Also, this PR slightly fix broken `concat_ws` emit logic.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2100
2025-07-16 06:51:35 +03:00
Jussi Saurio
3aae46ccc7 Merge 'refactor: Changes CursorResult to IOResult' from Diego Reis
This PR unify the concept of a result that either have something done or
yields to IO, into a single type.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2103
2025-07-16 06:50:13 +03:00
alpaylan
28ecb083e1 fix faulty Insert::Select generation within delete_select 2025-07-15 22:35:05 -04:00
Diego Reis
0e9771ac07 refactor: Change redundant "Status" enums to IOResult
Let's unify the semantics of "something done" or yields I/O into a
single type
2025-07-15 20:56:18 -03:00
Diego Reis
d0af54ae77 refactor: Change CursorResult to IOResult
The reasoning here is to treat I/O operations (Either is "Done" or
yields to IO) with the same generic type.
2025-07-15 20:52:25 -03:00
Nikita Sivukhin
e15f72da2d add simple test for concat_ws bug 2025-07-16 00:52:14 +04:00
Nikita Sivukhin
c018b06bf5 fix bug in concat_ws translation 2025-07-16 00:48:17 +04:00
Nikita Sivukhin
f7fb2aac5e adjust extra_amount for schema translation code 2025-07-16 00:47:59 +04:00
Nikita Sivukhin
be0a607ba8 rename amount -> extra_amount 2025-07-16 00:46:17 +04:00
Jussi Saurio
86b1b0d009 Merge 'fix record header size calculations and incorrect assumptions' from Jussi Saurio
- remove assumptions that record header size fits into 1 byte or serial
type fits into 1 byte
- add tests for record header size calculation
```sql
turso> CREATE TABLE t(x TEXT, y);
CREATE INDEX t_idx ON t(x);
INSERT INTO t VALUES (replace(zeroblob(1000), x'00', 'a') || 'a', 1); -- 1000 bytes of 'a'
INSERT INTO t VALUES (replace(zeroblob(1000), x'00', 'a') || 'b', 2);
INSERT INTO t VALUES (replace(zeroblob(1000), x'00', 'a') || 'c', 3);
INSERT INTO t VALUES (replace(zeroblob(1000), x'00', 'a') || 'd', 4);
INSERT INTO t VALUES (replace(zeroblob(1000), x'00', 'a') || 'e', 5);
INSERT INTO t VALUES (replace(zeroblob(1000), x'00', 'a') || 'f', 6);
INSERT INTO t VALUES (replace(zeroblob(1000), x'00', 'a') || 'g', 7);
INSERT INTO t VALUES (replace(zeroblob(1000), x'00', 'a') || 'h', 8);
SELECT COUNT(*) FROM t WHERE x >= replace(hex(zeroblob(100)), '00', 'a');
┌───────────┐
│ COUNT (*) │
├───────────┤
│         8 │
└───────────┘
```
Fixes #2096
Fixes #2088

Reviewed-by: Nikita Sivukhin (@sivukhin)

Closes #2098
2025-07-15 19:09:31 +03:00
Jussi Saurio
fda92d43a2 adjust comment in header size test 2025-07-15 18:52:27 +03:00
Jussi Saurio
38183d3b3b tcl: add regression test for large text keys 2025-07-15 18:48:06 +03:00
Jussi Saurio
025ddd98a6 Merge 'bench: add insert benchmark (batch sizes: 1,10,100)' from Jussi Saurio
```
Insert rows in batches/limbo_insert_1_rows
                        time:   [344.71 µs 363.45 µs 379.31 µs]
Insert rows in batches/sqlite_insert_1_rows
                        time:   [575.12 µs 769.16 µs 983.30 µs]

Insert rows in batches/limbo_insert_10_rows
                        time:   [1.4964 ms 1.5694 ms 1.6334 ms]
Insert rows in batches/sqlite_insert_10_rows
                        time:   [510.79 µs 766.56 µs 1.0677 ms]

Insert rows in batches/limbo_insert_100_rows
                        time:   [5.5177 ms 5.6806 ms 5.8619 ms]
Insert rows in batches/sqlite_insert_100_rows
                        time:   [439.91 µs 879.43 µs 1.4260 ms]
```

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2092
2025-07-15 18:12:59 +03:00
Jussi Saurio
927a1f158a Merge 'btree: unify table&index seek page boundary handling' from Jussi Saurio
## Background
PR #2065 fixed a bug with table btree seeks concerning boundaries of
leaf pages.
The issue was that if we were e.g. looking for the first key greater
than (GT) 100, we always assumed the key would either be found on the
left child page of a given divider (e.g. divider 102) or not at all,
which is incorrect. #2065 has more discussion and documentation about
this, so read that one for more context.
## This PR
We already had similar handling for index btrees as #2065 introduced for
table btrees, but it was baked into the `BTreeCursor` struct's seek
handling itself, whereas #2065 handled this on the VDBE side.
This PR unifies this handling for both table and index btrees by always
doing the additional cursor advancement in the VDBE.
Unfortunately, unlike table btrees, index btrees may also need to do an
additional advance when they are looking for an exact match. This
resulted in a bigger refactor than anticipated, since there are quite a
few VDBE instructions that may perform a seek, e.g.: `IdxInsert`,
`IdxDelete`, `Found`, `NotFound`, `NoConflict`. All of these can
potentially end up in a similar situation where the cursor needs one
more advance after the initial seek, and they were currently calling
`cursor.seek()` directly and expecting the `BTreeCursor` to handle the
auto-advance fallback internally.
For this reason, I have 1. removed the "TryAdvance"-ish logic from the
index btree internals and 2. extracted a common VDBE helper `fn
seek_internal()` - heavily based on the existing `op_seek_internal()`,
but decoupled from instructions and the program counter - which all the
interested VDBE instructions will call to delegate their seek logic.
Closes #2083

Reviewed-by: Nikita Sivukhin (@sivukhin)
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2084
2025-07-15 18:02:52 +03:00
Jussi Saurio
932536a03f compare_records: fix assumption that header size is 1 byte and serial type is 1 byte 2025-07-15 17:57:52 +03:00
Jussi Saurio
7c353095ed types: fix and unify record header size calculation 2025-07-15 17:38:02 +03:00