Commit Graph

5453 Commits

Author SHA1 Message Date
Pekka Enberg
02023ce821 Merge 'core/storage: Switch page cache queue to linked list' from Pekka Enberg
The page cache implementation uses a pre-allocated vector (`entries`)
with fixed capacity, along with a custom hash map and freelist. This
design requires expensive upfront allocation when creating a new
connection, which severely impacted performance in workloads that open
many short-lived connections (e.g., our concurrent write benchmarks that
create a new connection per transaction).
Therefore, replace the pre-allocated vector with an intrusive doubly-
linked list. This eliminates the page cache initialization overhead from
connection establishment, but also reduces memory usage to entries that
are actually used. Furthermore, the approach allows us to grow the page
cache with much less overhead.
The patch improves concurrent write throughput benchmark by 4x for
single-threaded performance.
Before:
```
$ write-throughput --threads 1 --batch-size 100 -i 1000 --mode concurrent
Running write throughput benchmark with 1 threads, 100 batch size, 1000 iterations, mode: Concurrent
Database created at: write_throughput_test.db
Thread 0: 100000 inserts in 3.82s (26173.63 inserts/sec)
```
After:
```
$ write-throughput --threads 1 --batch-size 100 -i 1000 --mode concurrent
Running write throughput benchmark with 1 threads, 100 batch size, 1000 iterations, mode: Concurrent
Database created at: write_throughput_test.db
Thread 0: 100000 inserts in 0.90s (110848.46 inserts/sec)
```

Closes #3456
2025-10-01 16:39:47 +03:00
Pekka Enberg
2b168cf7b0 core/storage: Switch page cache queue to linked list
The page cache implementation uses a pre-allocated vector (`entries`)
with fixed capacity, along with a custom hash map and freelist. This
design requires expensive upfront allocation when creating a new
connection, which severely impacted performance in workloads that open
many short-lived connections (e.g., our concurrent write benchmarks that
create a new connection per transaction).

Therefore, replace the pre-allocated vector with an intrusive
doubly-linked list. This eliminates the page cache initialization
overhead from connection establishment, but also reduces memory usage to
entries that are actually used. Furthermore, the approach allows us to
grow the page cache with much less overhead.

The patch improves concurrent write throughput benchmark by 4x for
single-threaded performance.

Before:

```
$ write-throughput --threads 1 --batch-size 100 -i 1000 --mode concurrent
Running write throughput benchmark with 1 threads, 100 batch size, 1000 iterations, mode: Concurrent
Database created at: write_throughput_test.db
Thread 0: 100000 inserts in 3.82s (26173.63 inserts/sec)
```

After:

```
$ write-throughput --threads 1 --batch-size 100 -i 1000 --mode concurrent
Running write throughput benchmark with 1 threads, 100 batch size, 1000 iterations, mode: Concurrent
Database created at: write_throughput_test.db
Thread 0: 100000 inserts in 0.90s (110848.46 inserts/sec)
```
2025-10-01 14:41:35 +03:00
Jussi Saurio
ee6b943586 Merge 'fix/mvcc: set log offset to end of file after recovery finishes' from Jussi Saurio
otherwise we start overwriting existing log entries
Closes #3495

Reviewed-by: Nikita Sivukhin (@sivukhin)
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #3496
2025-10-01 13:52:12 +03:00
Jussi Saurio
480d066147 Merge 'mvcc: dont use mv store for ephemeral tables' from Jussi Saurio
not sure how these would even work with mvcc - either way, an ephemeral
table use an ephemeral database file and pager so i don't think putting
its writes into MV store makes sense
TBH i have no idea if there are any weird interactions here but the code
we have now for sure does not work
Closes #3486

Reviewed-by: Nikita Sivukhin (@sivukhin)

Closes #3490
2025-10-01 13:50:30 +03:00
Jussi Saurio
e9f0c59bcc fix/mvcc: set log offset to end of file after recovery finishes
otherwise we start overwriting existing log entries
2025-10-01 12:46:24 +03:00
Jussi Saurio
3bc6311bfd mvcc: dont use mv store for ephemeral tables 2025-10-01 10:16:02 +03:00
Jussi Saurio
28c1ebc128 Add Database::indexes_enabled() 2025-10-01 10:14:05 +03:00
Jussi Saurio
3ff6b44de2 Merge 'Fix index bookkeeping in DROP COLUMN' from Jussi Saurio
Closes #3448. Nasty bug - see issue for details

Closes #3449
2025-10-01 08:57:08 +03:00
Jussi Saurio
fb7e3918b3 Merge 'simplify exec_trim code + only pattern match on whitespace char' from Pedro Muniz
Consolidates the `exec_trim`, `exec_rtrim`, `exec_ltrim` code and only
pattern matches on whitespace character.
Fixes #3319

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3437
2025-10-01 08:56:39 +03:00
Jussi Saurio
27b1c1a1db Merge 'Fix self-insert with nested subquery' from Mikaël Francoeur
There were 2 problems:
1. The SELECT wasn't propagating which register it used for its results,
so sometimes the INSERT read bad data.
2. `TableReferences::contains_table` was only checking the top-level
tables, not the nested tables in FROM queries. This condition is used to
emit "template 4", the bytecode template for self-inserts.
Closes https://github.com/tursodatabase/turso/issues/3312

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3436
2025-10-01 08:56:16 +03:00
Jussi Saurio
8a08f085e8 Merge 'Fix SQLite database file pending byte page' from Pedro Muniz
Sqlite has a crazy easter egg where a 1 Gib file offset, it creates a
`PENDING_BYTE_PAGE` that is used only by the VFS layer, and is never
read or written into.
To properly test this, I took inspiration from SQLITE testing framework,
and defined a helper method, that is conditionally compiled with the
`test_helper` feature enabled.
https://github.com/sqlite/sqlite/blob/7e38287da43ea3b661da3d8c1f431aa907
d648c9/src/main.c#L4327
As the `PENDING_BYTE` is normally at the 1 Gib mark, I created a
function that modifies the static `PENDING_BYTE` atomic to whatever
value we want. This means we can test this unusual behaviours at any DB
file size we want.
`fuzz_pending_byte_database` is the test that fuzzes different pending
byte offsets and does an integrity check at the end to confirm, we are
compatible with SQLITE
Closes #2749
<img width="1100" height="740" alt="image" src="https://github.com/user-
attachments/assets/06eb258f-b4b4-47bf-85f9-df1cf411e1df" />

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3431
2025-10-01 08:55:44 +03:00
Jussi Saurio
65abe3efdc Merge 'MVCC: Handle table ID / rootpages properly for both checkpointed and non-checkpointed tables' from Jussi Saurio
**Handle table ID / rootpages properly for both checkpointed and non-
checkpointed tables**
Table ID is an opaque identifier that is only meaningful to the MV
store.
Each checkpointed MVCC table corresponds to a single B-tree on the
pager,
which naturally has a root page.
**We cannot use root page as the MVCC table ID directly because:**
- We assign table IDs during MVCC commit, but
- we commit pages to the pager only during checkpoint
which means the root page is not easily knowable ahead of time.
**Hence:**
- MVCC table ids are always negative
- sqlite_schema rows will have a negative rootpage column if the
  table has not been checkpointed yet.
- on checkpoint when the table is allocated a real root page, we update
the row in sqlite_schema and in MV store's internal mapping
**On recovery:**
- All sqlite_schema tables are read directly from disk and assigned
`table_id = -1 * root_page` -- root_page on disk must be positive
- Logical log is deserialized and inserted into MV store
- Schema changes from logical_log are captured into the DB's global
schema
**Note about recovery:**
I changed MVCC recovery to happen on DB initialization which should
prevent any races, so no need for `recover_lock`, right @pereman2 ?

Closes #3419
2025-10-01 08:55:10 +03:00
Preston Thorpe
6fd2ad2f5e Merge 'support multiple conflict clauses in upsert' from Nikita Sivukhin
This PR implements support for `ON CONFLICT` clause chain, e.g.
```
INSERT INTO ct(id, x, y) VALUES (4, 'x', 'y1'), (5, 'a1', 'b'), (3, '_', '_')
  ON CONFLICT(x) DO UPDATE SET x = excluded.x || '-' || x, y = excluded.y || '@' || y, z = 'x' 
  ON CONFLICT(y) DO UPDATE SET x = excluded.x || '+' || x, y = excluded.y || '!' || y, z = 'y' 
  ON CONFLICT DO UPDATE SET x = excluded.x || '#' || x, y = excluded.y || '%' || y, z = 'fallback';
```

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3453
2025-09-30 19:50:59 -04:00
Jussi Saurio
d4d50b564a fix even more tests 2025-09-30 23:22:07 +03:00
Jussi Saurio
adc5b7b27f remove monkey print 2025-09-30 22:57:21 +03:00
Jussi Saurio
fe871188bf fix tests again 2025-09-30 22:54:48 +03:00
Jussi Saurio
fb2878973f fix sort order of write set 2025-09-30 22:54:36 +03:00
Jussi Saurio
509bde109e mvcc benchmark compilation fix 2025-09-30 22:27:28 +03:00
Jussi Saurio
fd84fd0683 fix test compilation errors 2025-09-30 22:27:28 +03:00
Jussi Saurio
e68c652f8f Add some table ID integrity checks to logical log recovery 2025-09-30 22:27:28 +03:00
pedrocarlo
65cd4d998d page_size can be 0 when it is not initialized, so account for that 2025-09-30 15:58:38 -03:00
Pekka Enberg
229d96abf2 Merge 'core/vdbe: Don't clear parameters in Statement::reset()' from Pekka Enberg
As per SQLite API, sqlite3_reset() does *not* clear bind parameters.
Instead they're persistent across statement reset and only cleared with
sqlite3_clear_bindings().

Reviewed-by: Avinash Sajjanshetty (@avinassh)

Closes #3466
2025-09-30 21:57:59 +03:00
Nikita Sivukhin
f4263bf472 fix clippy 2025-09-30 22:43:58 +04:00
Nikita Sivukhin
9ef05adc5e fix upsert conflict handling 2025-09-30 22:39:55 +04:00
Pekka Enberg
25ffd4f01e core/vdbe: Don't clear parameters in Statement::reset()
As per SQLite API, sqlite3_reset() does *not* clear bind parameters.
Instead they're persistent across statement reset and only cleared with
sqlite3_clear_bindings().
2025-09-30 20:22:09 +03:00
pedrocarlo
aa5055e563 fuzz tests for pending_byte 2025-09-30 13:52:40 -03:00
Nikita Sivukhin
73f68dfcfb remove unnecessary log 2025-09-30 20:47:39 +04:00
Nikita Sivukhin
f6d829f52d simplify upsert codegen 2025-09-30 20:47:39 +04:00
Nikita Sivukhin
3590f9882d support multiple conflict clauses in upsert 2025-09-30 20:47:39 +04:00
pedrocarlo
3d5978c718 add special hipp pending page that is supposed to be ignored 2025-09-30 13:43:10 -03:00
Preston Thorpe
3456d61ac0 Merge 'Index search fixes' from Nikita Sivukhin
This PR bundles 2 fixes:
1. Index search must skip NULL values
2. UPDATE must avoid using index which column is used in the SET clause
    * This was an optimization to not do full scan in case of `UPDATE t
SET ... WHERE col = ?` but instead of doing this hacks we must properly
load updated row set to the ephemeral index and flush it after update
will be finished instead of modifying BTree inplace
    * So, for now we completely remove this optimization and quitely
wait for proper optimization to land

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3459
2025-09-30 12:34:52 -04:00
Pekka Enberg
b511b23e70 Merge 'Make encryption opt in via flag' from Avinash Sajjanshetty
We had encryption feature behind a compiler flag. However, it wasn't
enabled by default. This patch:
- enables compiler flag by default
- it also adds an opt in runtime flag `experimental-encryption`
- the runtime flag is disabled by default

Closes #3457
2025-09-30 19:31:28 +03:00
Nikita Sivukhin
c84486c411 clippy logged in as jussi - so I need to fix more stuff 2025-09-30 18:45:17 +04:00
pedrocarlo
642679889a simplify exec_trim code + only pattern match on whitespace char 2025-09-30 11:09:47 -03:00
Nikita Sivukhin
bf5567de35 fix clippy
- the proper fix is to nuke it actually :)
2025-09-30 18:06:42 +04:00
Jussi Saurio
64ce33bd5c Move resolution of tableid/rootpage inside MvCursor constructor 2025-09-30 17:04:37 +03:00
Nikita Sivukhin
4a9309fe31 fix clippy 2025-09-30 17:58:12 +04:00
Nikita Sivukhin
f1597dea90 fix all combinations of iteration direction and index order to properly handle nulls 2025-09-30 17:57:03 +04:00
Jussi Saurio
7c897d382f Implement MvTableId newtype for better type safety of table ids 2025-09-30 16:54:22 +03:00
Jussi Saurio
0ba4c6c00e use negative table id in mvcc tests 2025-09-30 16:53:12 +03:00
Jussi Saurio
a52dbb7842 Handle table ID / rootpages properly for both checkpointed and non-checkpointed tables
Table ID is an opaque identifier that is only meaningful to the MV store.
Each checkpointed MVCC table corresponds to a single B-tree on the pager,
which naturally has a root page.

We cannot use root page as the MVCC table ID directly because:
- We assign table IDs during MVCC commit, but
- we commit pages to the pager only during checkpoint
which means the root page is not easily knowable ahead of time.

Hence, we:

- store the mapping between table id and btree rootpage
- sqlite_schema rows will have a negative rootpage column if the
  table has not been checkpointed yet.
2025-09-30 16:53:12 +03:00
Avinash Sajjanshetty
a360efa6e0 enable encryption feature flag by default 2025-09-30 19:04:25 +05:30
Nikita Sivukhin
c211fd1359 handle btree-table search properly
- btree-table doesn't have nulls in keys - so seek operation do some conversions and we shouldn't emit SeekGT { Null } in this case
2025-09-30 17:05:39 +04:00
Pekka Enberg
9788f6d005 Merge 'core/mvcc: Optimize exclusive transaction check' from Pekka Enberg
The check is in fastpath so switch to an atomic instead.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3458
2025-09-30 16:02:42 +03:00
Avinash Sajjanshetty
c8111f9555 Put encryption behind an opt in (runtime) flag 2025-09-30 18:29:18 +05:30
Jussi Saurio
81e7c26f55 Merge 'Anonymous params fix' from Nikita Sivukhin
This PR auto-assign ids for anonymous variables straight into parser.
Otherwise - it's pretty easy to mess up with traversal order in the core
code and assign ids incorrectly.
For example, before the fix, following code worked incorrectly because
parameter values were assigned first to conflict clause instead of
values:
```rs
let mut stmt = conn.prepare("INSERT INTO test VALUES (?, ?), (?, ?) ON CONFLICT DO UPDATE SET v = ?")?;
stmt.bind_at(1.try_into()?, Value::Integer(1));
stmt.bind_at(2.try_into()?, Value::Integer(20));
stmt.bind_at(3.try_into()?, Value::Integer(3));
stmt.bind_at(4.try_into()?, Value::Integer(40));
stmt.bind_at(5.try_into()?, Value::Integer(66));
```

Closes #3455
2025-09-30 15:48:35 +03:00
Nikita Sivukhin
a32ed53bd8 remove optimization
- even if index search will return only 1 row - it will call next in the loop - and we incorrecty can process same row values multiple times
- the following query failed with this optimization:

turso> CREATE TABLE t (id INTEGER PRIMARY KEY AUTOINCREMENT, k TEXT, c0 INT);
turso> CREATE UNIQUE INDEX idx_p1_0 ON t(c0);
turso> insert into t values (null, 'uu', -1);
turso> insert into t values (null, 'uu', -2);
turso> UPDATE t SET c0 = NULL WHERE c0 = -1;
turso> SELECT * FROM t
┌────┬────┬────┐
│ id │ k  │ c0 │
├────┼────┼────┤
│  1 │ uu │    │
├────┼────┼────┤
│  2 │ uu │    │
└────┴────┴────┘
2025-09-30 16:37:41 +04:00
Nikita Sivukhin
e9b8b0265d skip NULL in case of search over index 2025-09-30 16:16:04 +04:00
Pekka Enberg
3d327ba63c core/mvcc: Optimize exclusive transaction check
The check is in fastpath so switch to an atomic instead.
2025-09-30 15:00:24 +03:00
Nikita Sivukhin
e111226f3b add comment 2025-09-30 15:28:50 +04:00