We need to ensures that there is a single, shared `Database` object per
a database file. We need because it is not safe to have multiple
independent WAL files open because coordination happens at process-level
POSIX file advisory locks.
Fixes#2267
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#2299
We need to ensures that there is a single, shared `Database` object per
a database file. We need because it is not safe to have multiple
independent WAL files open because coordination happens at process-level
POSIX file advisory locks.
Fixes#2267
Co-authored-by: ultraman <sunhuayangak47@gmail.com>
Closes#2225.
## What
We currently do not use pages in the
[freelist](https://www.sqlite.org/fileformat.html#the_freelist) at all
when allocating new pages.
## Why is this bad
The effect of this is that 1. UPDATEs with overflow pages become really
slow and 2. the database size grows really quickly. See #2225 for an
extreme example comparison with SQLite.
## The fix
Whenever `allocate_page()` is called, we first check if we have pages in
the freelist, and if we do, we recycle one of those pages instead of
creating a new one. If there are no freelist pages, we allocate a new
page as normal.
## Implementation notes
- `allocate_page()` now needs to return an `IOResult`, which means all
of its callers also need to return an `IOResult`, necessitating quite a
bit of new state machine logic to ensure re-entrancy.
- I left a few "synchronous IO hacks" in the `balance()` routine because
the size of this PR would balloon even more than it already has if I
were to fix those immediately in this PR.
- `fill_cell_payload()` uses some `unsafe` code to avoid lifetime
issues, and adds an unfortunate double-indirection via
`Arc<Mutex<Vec<T>>>` because the existing btree code constantly clones
`WriteState`, and we must ensure the underlying buffers referenced by
raw pointers in `fill_cell_payload` remain valid.
**Follow-up cleanups:**
1. remove synchronous IO hacks that would require even more state
machines and are best left for another PR
2. remove `Clone` from `WriteState` and implement it better
## Perf comparison
`main`: 33 seconds
```
jussi@Jussis-MacBook-Pro limbo % time target/release/tursodb --experimental-indexes apinatest_main.db <<'EOF'
create table t(x, y, z unique);
insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100);
insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100);
insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100);
insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100);
insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100);
insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
EOF
Turso v0.1.3-pre.3
Enter ".help" for usage hints.
This software is ALPHA, only use for development, testing, and experimentation.
target/release/tursodb --experimental-indexes apinatest_main.db <<<'' 6.81s user 21.18s system 83% cpu 33.643 total
```
PR: 13 seconds
```
jussi@Jussis-MacBook-Pro limbo % time target/release/tursodb --experimental-indexes apinatest_PR.db <<'EOF'
create table t(x, y, z unique);
insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100);
insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100);
insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100);
insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100);
insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100);
insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
update t set x = x + 1 WHERE z > randomblob(1024*128);
EOF
Turso v0.1.3-pre.3
Enter ".help" for usage hints.
This software is ALPHA, only use for development, testing, and experimentation.
target/release/tursodb --experimental-indexes apinatest_PR.db <<<'' 3.89s user 7.83s system 89% cpu 13.162 total
```
(sqlite: 2 seconds 🤡 )
---
TODO:
- [x] Fix whatever issue the simulator caught in CI (#2238 )
- [x] Post a performance comparison
- [x] Fix autovacuum test failure
- [x] Improve docs
- [x] Fix `fill_cell_payload` re-entrancy issue when allocating overflow
pages
- [x] Add proper PR description
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#2233
The parser unfortunately outputs Stmt, which has some enum variants that
we never actually encounter in some parts of the core. Switch to
unreachable instead of todo.
I am running into issues when creating indexes and made this PR with a
possible fix.
`Error: cannot use expressions in CREATE INDEX`
In my setup, running on `wasm32-unknown-unknown` (not in the browser), I
can reproduce the issue like this. First, creating a table:
```rust
conn.execute(
r#"
CREATE TABLE IF NOT EXISTS users (
name TEXT,
created DATETIME DEFAULT CURRENT_TIMESTAMP
)
"#,
(),
)
.await
.unwrap();
```
Here, creating an index for that table:
```rust
conn.execute(
"CREATE INDEX IF NOT EXISTS idx_users_name ON users(name)",
(),
)
.await
.unwrap();
```
## Findings
I had a closer look at `resolve_sorted_columns`. In this bit, it checks
the expression of the sorted column.
https://github.com/tursodatabase/turso/blob/a2a31a520ff6e228a00e785026da
e19b5b2cced7/core/translate/index.rs#L252-L257
```rust
let ident = normalize_ident(match &sc.expr {
// SQLite supports indexes on arbitrary expressions, but we don't (yet).
// See "How to use indexes on expressions" in https://www.sqlite.org/expridx.html
Expr::Name(ast::Name::Ident(col_name)) | Expr::Name(ast::Name::Quoted(col_name)) => {
col_name
}
_ => crate::bail_parse_error!("Error: cannot use expressions in CREATE INDEX"),
});
```
If it is not an `Expr::Name`, function fails.
But, the `sc.expr` I am getting is not `Expr::Name` but `Expr::Id`.
Which doesn't seem unexpected but rather expected. Reading up on the
`sqlite3_parser` AST, it seems that both `Name` and `Id` can be
expected.
Adding `Expr::Id` to the check fixes the issue.
```rust
let ident = normalize_ident(match &sc.expr {
// SQLite supports indexes on arbitrary expressions, but we don't (yet).
// See "How to use indexes on expressions" in https://www.sqlite.org/expridx.html
Expr::Id(ast::Name::Ident(col_name))
| Expr::Id(ast::Name::Quoted(col_name))
| Expr::Name(ast::Name::Ident(col_name))
| Expr::Name(ast::Name::Quoted(col_name)) => col_name,
_ => crate::bail_parse_error!("Error: cannot use expressions in CREATE INDEX"),
});
```
Closes#2294
We previously were making another inline completion inside io_uring.rs,
I thought this wouldn't be needed anymore because of the Arc that is now
wrapping the RefCell<Buffer>, but in the case of the WAL header, where
it's not pinned to a page in the cache, there is nothing to keep it
alive and we will write a corrupt wal header.
```rust
#[allow(clippy::arc_with_non_send_sync)]
Arc::new(RefCell::new(buffer))
};
let write_complete = move |bytes_written: i32| {
turso_assert!(
bytes_written == WAL_HEADER_SIZE as i32,
"wal header wrote({bytes_written}) != expected({WAL_HEADER_SIZE})"
);
};
// buffer is never referenced again, this works for sync IO but io_uring writes junk bytes
```
<img width="881" height="134" alt="image" src="https://github.com/user-
attachments/assets/0ff06ad5-411a-43d2-abac-caf9e23ceaeb" />
Closes#2297
### Follow SUM [spec](https://sqlite.org/lang_aggfunc.html)
This PR updates the `SUM` aggregation logic to follow the
[Kahan–Babushka–Neumaier summation
algorithm](https://en.wikipedia.org/wiki/Kahan_summation_algorithm),
consistent with SQLite’s implementation. It improves the numerical
stability of floating-point summation.This fixes issue #2252 . I added a
fuzz test to ensure the compatibility of the implementations
I also fixed the return types for `SUM` to match SQLite’s documented
behavior. This was previously discussed in
[#2182](https://github.com/tursodatabase/turso/pull/2182), but part of
the logic was later unintentionally overwritten by
[#2265](https://github.com/tursodatabase/turso/pull/2265).
I introduced two helper functions, `apply_kbn_step` and
`apply_kbn_step_int`, in `vbde/execute.rs` to handle floating-point and
integer accumulation respectively. However, I’m new to this codebase and
would welcome constructive feedback on whether there’s a better place
for these helpers.
Reviewed-by: Preston Thorpe (@PThorpe92)
Closes#2270