Commit Graph

3650 Commits

Author SHA1 Message Date
Diego Reis
0346c65a72 Fix clippy 2025-07-28 14:48:52 -03:00
Pekka Enberg
e2d4cbbe48 Merge 'core: Enforce shared database object per database file' from Pekka Enberg
We need to ensures that there is a single, shared `Database` object per
a database file. We need because it is not safe to have multiple
independent WAL files open because coordination happens at process-level
POSIX file advisory locks.
Fixes #2267

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2299
2025-07-28 19:34:35 +03:00
Pekka Enberg
50e03ee90e core: Clean up Connection::open_with_flags()
Co-authored-by: bit-aloo <84662239+Shourya742@users.noreply.github.com>
2025-07-28 19:16:01 +03:00
Pekka Enberg
ab1a152100 core: Enforce single shared database object per database file
We need to ensures that there is a single, shared `Database` object per
a database file. We need because it is not safe to have multiple
independent WAL files open because coordination happens at process-level
POSIX file advisory locks.

Fixes #2267
Co-authored-by: ultraman <sunhuayangak47@gmail.com>
2025-07-28 19:13:53 +03:00
Pekka Enberg
9b67eb0e77 core: Fix transaction cleanup in Connection::close() 2025-07-28 19:13:53 +03:00
Pekka Enberg
5b6a30c1df core/storage: Fix B-Tree test cases to use ":memory:"
...otherwise they all share the same `Database` object.
2025-07-28 19:13:53 +03:00
Nikita Sivukhin
3614b022ab add WalInsertInfo type 2025-07-28 17:20:10 +04:00
Nikita Sivukhin
09b18f6b6e add WAL API methods to the rust bindings and extend result of wal_insert_frame method 2025-07-28 17:20:10 +04:00
Jussi Saurio
1e4e8c243a Merge 'btree/pager: Improve update performance by reusing freelist pages in allocate_page()' from Jussi Saurio
Closes #2225.
## What
We currently do not use pages in the
[freelist](https://www.sqlite.org/fileformat.html#the_freelist) at all
when allocating new pages.
## Why is this bad
The effect of this is that 1. UPDATEs with overflow pages become really
slow and 2. the database size grows really quickly. See #2225 for an
extreme example comparison with SQLite.
## The fix
Whenever `allocate_page()` is called, we first check if we have pages in
the freelist, and if we do, we recycle one of those pages instead of
creating a new one. If there are no freelist pages, we allocate a new
page as normal.
## Implementation notes
- `allocate_page()` now needs to return an `IOResult`, which means all
of its callers also need to return an `IOResult`, necessitating quite a
bit of new state machine logic to ensure re-entrancy.
- I left a few "synchronous IO hacks" in the `balance()` routine because
the size of this PR would balloon even more than it already has if I
were to fix those immediately in this PR.
- `fill_cell_payload()` uses some `unsafe` code to avoid lifetime
issues, and adds an unfortunate double-indirection via
`Arc<Mutex<Vec<T>>>` because the existing btree code constantly clones
`WriteState`, and we must ensure the underlying buffers referenced by
raw pointers in `fill_cell_payload` remain valid.
**Follow-up cleanups:**
1. remove synchronous IO hacks that would require even more state
machines and are best left for another PR
2. remove `Clone` from `WriteState` and implement it better
## Perf comparison
`main`: 33 seconds
```
jussi@Jussis-MacBook-Pro limbo % time target/release/tursodb --experimental-indexes apinatest_main.db <<'EOF'
create table t(x, y, z unique);
insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100);
insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100);
insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100);
insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100);
insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100);
insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
EOF
Turso v0.1.3-pre.3
Enter ".help" for usage hints.
This software is ALPHA, only use for development, testing, and experimentation.
target/release/tursodb --experimental-indexes apinatest_main.db <<<''  6.81s user 21.18s system 83% cpu 33.643 total
```
PR: 13 seconds
```
jussi@Jussis-MacBook-Pro limbo % time target/release/tursodb --experimental-indexes apinatest_PR.db <<'EOF'
create table t(x, y, z unique);
insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100);
insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100);
insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100);
insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100);
insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100);
insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
EOF
Turso v0.1.3-pre.3
Enter ".help" for usage hints.
This software is ALPHA, only use for development, testing, and experimentation.

target/release/tursodb --experimental-indexes apinatest_PR.db <<<''  3.89s user 7.83s system 89% cpu 13.162 total
```
(sqlite: 2 seconds 🤡 )
---
TODO:
- [x] Fix whatever issue the simulator caught in CI (#2238 )
- [x] Post a performance comparison
- [x] Fix autovacuum test failure
- [x] Improve docs
- [x] Fix `fill_cell_payload` re-entrancy issue when allocating overflow
pages
- [x] Add proper PR description

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2233
2025-07-28 15:30:13 +03:00
Jussi Saurio
d8a133a1a5 Merge 'VDBE/op_column: use references to cursor payload instead of cloning' from Jussi Saurio
instead use RefValue to refer to record payload directly and then copy
to register as necessary
my local:
```sql
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/1: Warming u
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/1: Collectin
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/1
                        time:   [491.64 ns 492.54 ns 493.64 ns]
                        change: [-3.6642% -3.3050% -2.9558%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/10: Warming
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/10: Collecti
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/10: Analyzin
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/10
                        time:   [2.7923 µs 2.8001 µs 2.8114 µs]
                        change: [-14.643% -14.282% -13.878%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low severe
  1 (1.00%) high mild
  4 (4.00%) high severe
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/50: Warming
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/50: Collecti
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/50: Analyzin
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/50
                        time:   [13.452 µs 13.496 µs 13.550 µs]
                        change: [-15.768% -15.471% -15.182%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/100: Warming
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/100: Collect
Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/100: Analyzi
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/100
                        time:   [27.110 µs 27.162 µs 27.226 µs]
                        change: [-15.878% -15.604% -15.336%] (p = 0.00 < 0.05)
                        Performance has improved.
```
ci, main:
```
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/100
                        time:   [70.671 µs 71.741 µs 72.910 µs]
```
ci, branch:
```
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/100
                        time:   [53.969 µs 54.013 µs 54.061 µs]
```

Reviewed-by: bit-aloo (@Shourya742)

Closes #2205
2025-07-28 14:13:54 +03:00
Pekka Enberg
aca6ffa042 Merge 'io/unix: wrap file with Mutex' from Pere Diaz Bou
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2301
2025-07-28 12:53:38 +03:00
Pere Diaz Bou
f458f622a5 io/unix: wrap file with Mutex 2025-07-28 11:33:57 +02:00
Pere Diaz Bou
752a876f9a change every Rc to Arc in schema internals 2025-07-28 10:51:17 +02:00
Pere Diaz Bou
d273de483f comment clone for schema 2025-07-28 10:50:50 +02:00
Pere Diaz Bou
6ec80b3364 clone everything in schema 2025-07-28 10:27:45 +02:00
Jussi Saurio
111c0032ae Always extend texts and blobs 2025-07-28 11:06:16 +03:00
Jussi Saurio
ae5470f1d0 use default directly 2025-07-28 11:01:26 +03:00
Jussi Saurio
12d8b266a1 Define some helper traits to reduce duplication 2025-07-28 11:01:26 +03:00
Jussi Saurio
7eb52c65d3 Add missing program counter increment 2025-07-28 11:01:26 +03:00
Jussi Saurio
b14124ad3b VDBE/op_column: avoid first cloning text/blob and then copying it again
instead use RefValue to refer to record payload directly and then copy
to register as necessary
2025-07-28 11:01:26 +03:00
Jussi Saurio
36e0ca5a9f pager: remove unnecessary LoadFreelistTrunkPage state 2025-07-28 10:11:57 +03:00
Jussi Saurio
e7b07c1357 pager: reset allocate_page_state in reset_internal_states() 2025-07-28 10:11:57 +03:00
Jussi Saurio
c349a9d689 Ensure underlying payload vec cannot be copied so that raw pointers remain valid 2025-07-28 10:11:57 +03:00
Pekka Enberg
fd2a7f9098 core: Switch to unreachable for invalid enum variants
The parser unfortunately outputs Stmt, which has some enum variants that
we never actually encounter in some parts of the core. Switch to
unreachable instead of todo.
2025-07-28 09:52:20 +03:00
Jussi Saurio
08d5b3b4bc btree: make fill_cell_payload() re-entrant (overflow pages may require IO) 2025-07-28 09:00:59 +03:00
Jussi Saurio
927aca7857 Fix incorrect autovacuum test 2025-07-28 09:00:59 +03:00
Jussi Saurio
e2e25a48f6 Pager: document origins of BtreePageAllocMode 2025-07-28 09:00:59 +03:00
Jussi Saurio
5ce65bf8e7 btree/pager: reuse freelist pages in allocate_page() to fix UPDATE perf 2025-07-28 09:00:59 +03:00
Pekka Enberg
a02a590f88 Merge 'core/translate: Handle Expr::Id in CREATE INDEX' from Kristofer
I am running into issues when creating indexes and made this PR with a
possible fix.
`Error: cannot use expressions in CREATE INDEX`
In my setup, running on `wasm32-unknown-unknown` (not in the browser), I
can reproduce the issue like this. First, creating a table:
```rust
conn.execute(
    r#"
    CREATE TABLE IF NOT EXISTS users (
        name TEXT,
        created DATETIME DEFAULT CURRENT_TIMESTAMP
    )
    "#,
    (),
)
.await
.unwrap();
```
Here, creating an index for that table:
```rust
conn.execute(
    "CREATE INDEX IF NOT EXISTS idx_users_name ON users(name)",
    (),
)
.await
.unwrap();
```
## Findings
I had a closer look at `resolve_sorted_columns`. In this bit, it checks
the expression of the sorted column.
https://github.com/tursodatabase/turso/blob/a2a31a520ff6e228a00e785026da
e19b5b2cced7/core/translate/index.rs#L252-L257
```rust
let ident = normalize_ident(match &sc.expr {
    // SQLite supports indexes on arbitrary expressions, but we don't (yet).
    // See "How to use indexes on expressions" in https://www.sqlite.org/expridx.html
    Expr::Name(ast::Name::Ident(col_name)) | Expr::Name(ast::Name::Quoted(col_name)) => {
        col_name
    }
    _ => crate::bail_parse_error!("Error: cannot use expressions in CREATE INDEX"),
});
```
If it is not an `Expr::Name`, function fails.
But, the `sc.expr` I am getting is not `Expr::Name` but `Expr::Id`.
Which doesn't seem unexpected but rather expected. Reading up on the
`sqlite3_parser` AST, it seems that both `Name` and `Id` can   be
expected.
Adding `Expr::Id` to the check fixes the issue.
```rust
let ident = normalize_ident(match &sc.expr {
    // SQLite supports indexes on arbitrary expressions, but we don't (yet).
    // See "How to use indexes on expressions" in https://www.sqlite.org/expridx.html
    Expr::Id(ast::Name::Ident(col_name))
    | Expr::Id(ast::Name::Quoted(col_name))
    | Expr::Name(ast::Name::Ident(col_name))
    | Expr::Name(ast::Name::Quoted(col_name)) => col_name,
    _ => crate::bail_parse_error!("Error: cannot use expressions in CREATE INDEX"),
});
```

Closes #2294
2025-07-28 08:54:45 +03:00
Pekka Enberg
d92ebd6d37 Merge 'Fix writing wal header for async IO' from Preston Thorpe
We previously were making another inline completion inside io_uring.rs,
I thought this wouldn't be needed anymore because of the Arc that is now
wrapping the RefCell<Buffer>, but in the case of the WAL header, where
it's not pinned to a page in the cache, there is nothing to keep it
alive and we will write a corrupt wal header.
```rust
        #[allow(clippy::arc_with_non_send_sync)]
        Arc::new(RefCell::new(buffer))
    };

    let write_complete = move |bytes_written: i32| {
     turso_assert!(
            bytes_written == WAL_HEADER_SIZE as i32,
            "wal header wrote({bytes_written}) != expected({WAL_HEADER_SIZE})"
        );
    };
// buffer is never referenced again, this works for sync IO but io_uring writes junk bytes
```
<img width="881" height="134" alt="image" src="https://github.com/user-
attachments/assets/0ff06ad5-411a-43d2-abac-caf9e23ceaeb" />

Closes #2297
2025-07-28 08:47:12 +03:00
PThorpe92
b08c465450 Fix writing wal header for async IO 2025-07-27 21:52:13 -04:00
Levy A.
1f57ab02cf feat: instrument WindowsIO functions 2025-07-27 20:39:49 -03:00
Levy A.
c95c6b67ee fix: thread-safe WindowsFile 2025-07-27 20:39:49 -03:00
Kristofer Lund
cbd5a26cf7 Adding Expr::Id as an allowed Expr when
creating an index.
2025-07-27 22:54:20 +02:00
Pekka Enberg
6bf6cc28e4 Merge 'Implement the Returning statement for inserts and updates' from Glauber Costa
They are very similar. DELETE is very different, so that one we'll do it
later.

Closes #2276
2025-07-27 09:11:16 +03:00
Pekka Enberg
86c97fca6d Merge 'Fix sum() to follow the SQLite semantics' from FamHaggs
### Follow SUM [spec](https://sqlite.org/lang_aggfunc.html)
This PR updates the `SUM` aggregation logic to follow the
[Kahan–Babushka–Neumaier summation
algorithm](https://en.wikipedia.org/wiki/Kahan_summation_algorithm),
consistent with SQLite’s implementation. It improves the numerical
stability of floating-point summation.This fixes issue #2252 . I added a
fuzz test to ensure the compatibility of the implementations
I also fixed the return types for `SUM` to match SQLite’s documented
behavior. This was previously discussed in
[#2182](https://github.com/tursodatabase/turso/pull/2182), but part of
the logic was later unintentionally overwritten by
[#2265](https://github.com/tursodatabase/turso/pull/2265).
I introduced two helper functions, `apply_kbn_step` and
`apply_kbn_step_int`, in `vbde/execute.rs` to handle floating-point and
integer accumulation respectively. However, I’m new to this codebase and
would welcome constructive feedback on whether there’s a better place
for these helpers.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #2270
2025-07-27 09:08:34 +03:00
Pekka Enberg
ab39ea54c7 Merge 'Fix error handling when binding column references while translating the UPDATE statement' from Iaroslav Zeigerman
Closes #1968

Reviewed-by: bit-aloo (@Shourya742)

Closes #2273
2025-07-27 09:05:17 +03:00
Pekka Enberg
6d88c6851b Merge 'io_uring: use Arc pointer for user data of entries' from Preston Thorpe
trying to pull bite sized adjustments out of other open PR's

Closes #2281
2025-07-27 09:04:35 +03:00
PThorpe92
e6737d923d Return correct value for pragma checkpoint 2025-07-26 23:09:40 -04:00
PThorpe92
fb611390c0 Update test to use realistic expectations for should_checkpoint in cacheflush 2025-07-26 23:03:51 -04:00
PThorpe92
7c027fed8c Keep should_checkpoint logic for now until greater checkpointing is fixed 2025-07-26 23:03:51 -04:00
PThorpe92
6644036be4 Stop checkpointing after every write when wal frame size > threshold 2025-07-26 23:03:47 -04:00
Glauber Costa
b8ee38868d implement the pragma encoding
Do not allow setting it. That ship has sailed around 2005.
2025-07-26 19:37:39 -05:00
PThorpe92
735026b502 Use Arc pointer for user data and save indirection when processing sqe/cqes 2025-07-26 16:35:40 -04:00
Glauber Costa
5d8d08d1b6 Implement the Returning statement for inserts and updates
They are very similar. DELETE is very different, so that one we'll
do it later.
2025-07-26 09:01:09 -05:00
Iaroslav Zeigerman
6f63327320 fix overlooked tests 2025-07-26 04:51:44 -07:00
Iaroslav Zeigerman
f13b9105b9 Fix error handling when binding column references while translating the UPDATE statement 2025-07-26 04:51:42 -07:00
Pekka Enberg
cc5d4dc3ba Merge 'support doubly qualified identifiers' from Glauber Costa
Closes #2271
2025-07-26 11:31:42 +03:00
Glauber Costa
b5927dcfd5 support doubly qualified identifiers 2025-07-25 14:52:45 -05:00
FHaggs
54edfa09d5 Replicate the sqlite Kahan-Babaska-Neumaier algorithm 2025-07-25 15:25:29 -03:00