Commit Graph

4840 Commits

Author SHA1 Message Date
Piotr Rzysko
6d84cbedc2 Fix delimiter handling in group_concat and string_agg
Non-literal delimiters must be translated by AggArgumentSource.
2025-09-13 10:49:14 +02:00
Piotr Rzysko
110ffba2a1 Fix accumulator reset when arguments outnumber aggregates
Previously, while resetting accumulator registers, we would also
reset subsequent registers. This happened because the number of registers
to reset was computed as the sum of arguments rather than the number of
aggregate functions.
2025-09-13 10:49:14 +02:00
Piotr Rzysko
6224cdbbd3 Support WalkControl in walk_expr_mut
Now walk_expr_mut can use WalkControl to skip parts of the expression
tree. This makes it consistent with walk_expr.
2025-09-13 10:49:14 +02:00
Piotr Rzysko
b911e80607 Add AggValue instruction
Adds the AggValue instruction, which computes the current aggregate
result and writes it to a dedicated destination register.

Unlike AggFinal, it does not overwrite or clear the accumulator
register. This makes it possible to retrieve aggregate results multiple
times—needed when processing window functions—while preserving the
accumulator state.
2025-09-13 10:49:14 +02:00
Piotr Rzysko
c5a12f52c2 Don't mutate state in op_agg_final
Previously, only the External and Avg aggregates mutated state during
AggFinal. This is unnecessary because AggFinal runs only once per group,
so caching the result provides no performance benefit.

By avoiding state mutation, we can also reuse op_agg_final for the
AggValue instruction that will be added soon.
2025-09-13 10:49:14 +02:00
Piotr Rzysko
458172220e Remove unused method from AggContext 2025-09-13 10:49:14 +02:00
Piotr Rzysko
867bef55d8 Add ResetSorter instruction
This instruction isn't used yet, but it will be needed for window
functions, since they heavily rely on ephemeral tables.
2025-09-13 10:44:56 +02:00
Piotr Rzysko
ea9599681e Add OpenDup instruction
The instruction isn’t used yet, but it’ll be needed for window functions,
since they heavily rely on ephemeral tables.
2025-09-13 10:35:33 +02:00
Preston Thorpe
b1420904bb Merge 'fix(btree): advance cursor after interior node replacement in delete' from Jussi Saurio
## Problem
When a delete replaces an index interior cell, the replacement key is LT
the deleted key. Currently on the main branch, after the deletion
happens, the following call to BTreeCursor::next() stops at the replaced
interior cell.
This is incorrect - imagine the following sequence:
- We are executing a query that deletes all keys WHERE key > 5
- We delete <key=6> from an interior node, and take a replacement
<key=5> from the left subtree of that interior page
- next() is called, and we land on the interior node again, which now
has <key=5>, and we incorrectly delete it even though our WHERE
condition is key > 5.
## Solution
This PR:
- Tracks `interior_node_was_replaced` in CheckNeedsBalancing
- If no balancing is needed and a replacement occurred, advances once so
the next invocation of next() will skip the replaced cell properly
i.e. we prevent next() from landing on the replaced content and ensures
iteration continues with the next logical record.
## Details
This problem only became apparent once we started using indexes as valid
iteration cursors for DELETE operations in #2981
Closes #3045

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3049
2025-09-12 17:37:01 -04:00
Pekka Enberg
ad6157028e Merge 'core/vdbe: Fix BEGIN CONCURRENT transactions' from Pekka Enberg
The transaction upgrade logic in Transaction opcode is total nonsense
for concurrent transactions so just drop it.
Fixes #3061

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #3070
2025-09-12 23:11:12 +03:00
Pekka Enberg
a0921c4221 Merge 'core/storage: Remove unused import warning' from Pekka Enberg
Closes #3069
2025-09-12 23:11:05 +03:00
Pekka Enberg
5e2b1bc0d3 Merge 'Fix incompatible math functions' from Levy A.
Fixes #1817, #2068, #1326, #1397.
The solution is very much not ideal, but fixes all math function related
incompatibilities.

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3033
2025-09-12 21:28:08 +03:00
Pekka Enberg
86dcdad3d0 core/vdbe: Fix BEGIN CONCURRENT transactions
The transaction upgrade logic in Transaction opcode is total nonsense
for concurrent transactions so just drop it.

Fixes #3061
2025-09-12 21:19:34 +03:00
Pekka Enberg
2bc8c0c850 core/storage: Remove unused import warning 2025-09-12 21:09:38 +03:00
Pekka Enberg
dcd43ab8fc Merge 'Handle EXPLAIN QUERY PLAN like SQLite' from Lâm Hoàng Phúc
After this PR:
```
turso> EXPLAIN QUERY PLAN SELECT 1;
QUERY PLAN
`--SCAN CONSTANT ROW
turso> EXPLAIN QUERY PLAN SELECT 1 UNION SELECT 1;
QUERY PLAN
`--COMPOUND QUERY
   |--LEFT-MOST SUBQUERY
   |  `--SCAN CONSTANT ROW
   `--UNION USING TEMP B-TREE
      `--SCAN CONSTANT ROW
turso> CREATE TABLE x(y);
turso> CREATE TABLE z(y);
turso> EXPLAIN QUERY PLAN SELECT * from x,z;
QUERY PLAN
|--SCAN x
`--SCAN z
turso> EXPLAIN QUERY PLAN SELECT * from x,z ON x.y = z.y;
QUERY PLAN
|--SCAN x
`--SEARCH z USING INDEX ephemeral_z_t2
turso>
```

Closes #3057
2025-09-12 20:41:23 +03:00
PThorpe92
b04c364981 Fix clippy error 2025-09-12 11:43:38 -04:00
PThorpe92
7a14c7394f Remove the header copy stored on the WalFile, fix fast_path 2025-09-12 11:29:43 -04:00
PThorpe92
25e7c719f1 Update checkpoint_seq on each checkpoint, not just when log restarts
This was causing checkpoint_seq to be 0 when we had already successfully
ran a passive checkpoint, and causing us to use improper pages from the
cache.
2025-09-12 11:29:42 -04:00
Pekka Enberg
14da283e36 Merge 'MVCC: remove reliance on BTreeCursor::has_record()' from Jussi Saurio
Closes #3051
Closes #3032

Closes #3056
2025-09-12 17:31:15 +03:00
Pekka Enberg
54b4c9f30b Merge 'Implement the balance_quick algorithm' from Jussi Saurio
Fast balancing routine for the common special case where the rightmost
leaf page of a given subtree overflows such that the overflowing cell
would be the rightmost cell on the page -- i.e. an append. In this case
we just add a new leaf page as the right sibling of that page, put the
overflow cell there, and insert a new divider cell into the parent. The
high level steps are:
1. Allocate a new leaf page and insert the overflow cell payload in it.
2. Create a new divider cell in the parent - it contains the page number
of the old rightmost leaf, plus the largest rowid on that page.
3. Update the rightmost pointer of the parent to point to the new leaf
page.
4. Continue balance from the parent page (inserting the new divider cell
may have overflowed the parent

Closes #3041
2025-09-12 17:30:52 +03:00
Pekka Enberg
443720c74a Merge 'benchmark: introduce simple 1 thread concurrent benchmark for mvcc/sq…' from Pere Diaz Bou
…lite/wal
This is considerably simpler with 1 thread as we just try to yield
control when I/O happens and we only run io.run_once when all
connections tried to do some work. This allows connections to
cooperatively progress.

Closes #3060
2025-09-12 17:27:41 +03:00
Pekka Enberg
7fdb116d41 Merge 'core/mvcc: queue mvcc txns on pager's end_tx' from Pere Diaz Bou
Flushing mvcc changes to disk requires serialization. To do so we simply
introduce a lock for pager.end_tx, which will take ownership of flushing
to WAL. Once this is finished we can simply release lock.
When multiple tx writes happen concurrently in mvcc, max frame will be
updated. This new max_frame makes is the point of view of the other
transaction return busy because his current wal snapshot is outdated.

Closes #3059
2025-09-12 17:27:17 +03:00
Pere Diaz Bou
ec2cff2026 benchmark: introduce simple 1 thread concurrent benchmark for mvcc/sqlite/wal
This is considerably simpler with 1 thread as we just try to yield
control when I/O happens and we only run io.run_once when all
connections tried to do some work. This allows connections to
cooperatively progress.
2025-09-12 14:02:57 +00:00
Pere Diaz Bou
39fb5913e0 core/mvcc: queue write txn commits in mvcc on pager end_tx
Flushing mvcc changes to disk requires serialization. To do so we simply
introduce a lock for pager.end_tx, which will take ownership of flushing
to WAL. Once this is finished we can simply release lock.
2025-09-12 14:00:02 +00:00
Pere Diaz Bou
e87226548c core/mvcc: fix concurrent tests mvcc 2025-09-12 13:49:40 +00:00
Pere Diaz Bou
9b6d181be4 wal: add hacky update max frame for mvcc use
When multiple tx writes happen concurrently in mvcc, max frame will be
updated. This new max_frame makes is the point of view of the other
transaction return busy because his current wal snapshot is outdated.
2025-09-12 13:49:14 +00:00
Pere Diaz Bou
66b5630870 vdbe/mvcc: rollback mvcc txn on vdbe error 2025-09-12 13:47:45 +00:00
Jussi Saurio
305b2f55ae MVCC: remove reliance on BTreeCursor::has_record() 2025-09-12 16:03:55 +03:00
TcMits
9dac467b40 support EXPLAIN QUERY PLAN 2025-09-12 19:58:45 +07:00
PThorpe92
5849819a59 Fix tests for views 2025-09-12 08:20:40 -04:00
Preston Thorpe
b09dcceeef Merge 'Fixes views' from Glauber Costa
This is a collection of fixes for materialized views ahead of adding
support for JOINs.
It is mostly issues with how we assume there is a single table, with a
single delta, but we have to send more than one.
Those are things that are just objectively wrong, so I am sending it
separately to make the JOIN PR smaller.

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3009
2025-09-12 07:43:32 -04:00
Preston Thorpe
16a3410934 Merge 'Fix checkpoint fast-path, don't use cached pages w/o write lock' from Preston Thorpe
closes #3024
Don't use pages from the cache unless we hold an exclusive write lock,
because a page could be updated by a writer in-memory at any point
before we backfill it.
Clear the WAL tag in other areas to prevent any stale tags. Also, we
will just snapshot the page when we determine that it's eligible, and
pay a memcpy instead of the read from disk, but this further prevents
any in-memory changes to the page/TOCTOU issues, and we also assert that
it's still eligible after we copy it to a new buffer.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3036
2025-09-12 07:39:32 -04:00
Preston Thorpe
f55023acc8 Merge 'Refactor UPSERT to use wal_expr_mut to walk AST.' from Preston Thorpe
Working on https://github.com/tursodatabase/turso/issues/2964 I came
upon `walk_expr_mut`, I don't think it existed last time I really spent
much time in the translator. So quickly went back and cleaned this up.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3044
2025-09-12 06:45:13 -04:00
PThorpe92
f60ca3970f Remove old comment from wal 2025-09-12 06:39:59 -04:00
PThorpe92
faf3531a4e Fix checkpoint fast-path, don't use cached pages w/o write lock
closes #3024
Also we snapshot the page when we determine that it's eligible, and pay a
memcpy instead of the read from disk, but this further prevents any in-memory
changes to the page/TOCTOU issues.
2025-09-12 06:38:02 -04:00
TcMits
29d8d04d58 Merge branch 'main' into explain-query-plan 2025-09-12 17:34:11 +07:00
TcMits
5dddc5e00b introduce OP_Explain 2025-09-12 17:31:50 +07:00
Pekka Enberg
6a992e551c Merge 'core: Fix reprepare to properly reset statement cursors and registers' from Pedro Muniz
Before we were not updating the number of registers and cursors, which
meant that on a schema change the Program could now open an additional
cursor and we would not have space for it in the ProgramState, which
lead to the panic.
Closes #3002

Closes #3034
2025-09-12 12:29:53 +03:00
Jussi Saurio
9f6e1a2e7c fix(btree): advance cursor after interior node replacement in delete
When a delete replaces an interior cell, the replacement key is LT the
deleted key. Currently on the main branch, after the deletion happens,
the following call to BTreeCursor::next() stops at the replaced interior
cell.

This is incorrect - imagine the following sequence:

- We are executing a query that deletes all keys WHERE key > 5
- We delete <key=6> from an interior node, and take a replacement
  <key=5> from the left subtree of that interior page
- next() is called, and we land on the interior node again, which
  now has <key=5>, and we incorrectly delete it even though our
  WHERE condition is key > 5.

This PR:
- Tracks `interior_node_was_replaced` in CheckNeedsBalancing
- If no balancing is needed and a replacement occurred, advances once
  so the next invocation of next() will skip the replaced cell properly

i.e. we prevent next() from landing on the replaced content and ensures iteration continues with the next logical record.

Closes #3045
2025-09-12 10:49:44 +03:00
Pekka Enberg
aa32574554 core/mvcc: Fix begin_exclusive_tx()
The RwLock elimination patches conflicted with the BEGIN CONCURRENT
changes.
2025-09-12 08:42:14 +03:00
Pekka Enberg
a9a48f6272 Merge 'core/schema: Optimize get_dependent_materialized_views() when no views' from Pekka Enberg
Eliminates get_dependent_materialized_views() overhead when there are no
views. Note that we need to optimize the case when there are views as
well because this ends up being pretty hot in write-intensive workloads.

Closes #3046
2025-09-12 08:29:24 +03:00
Pekka Enberg
06371d8894 Merge 'Add BEGIN CONCURRENT support for MVCC mode' from Pekka Enberg
Currently, when MVCC is enabled, every transaction mode supports
concurrent reads and writes, which makes it hard to adopt for existing
applications that use `BEGIN DEFERRED` or `BEGIN IMMEDIATE`.
Therefore, add support for `BEGIN CONCURRENT` transactions when MVCC is
enabled. The transaction mode allows multiple concurrent read/write
transactions that don't block each other, with conflicts resolved at
commit time. Furthermore, implement the correct semantics for `BEGIN
DEFERRED` and `BEGIN IMMEDIATE` by taking advantage of the pager level
write lock when transaction upgrades to write. This means that now
concurrent MVCC transactions are serialized against the legacy ones when
needed.
The implementation includes:
- Parser support for CONCURRENT keyword in BEGIN statements
- New Concurrent variant in TransactionMode to distinguish from regular
read/write transactions
- MVCC store tracking of exclusive transactions to support IMMEDIATE and
EXCLUSIVE modes alongside CONCURRENT
- Proper transaction state management for all transaction types in MVCC
This enables better concurrency for applications that can handle
optimistic concurrency control, while still supporting traditional
SQLite transaction semantics via IMMEDIATE and EXCLUSIVE modes.

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #3021
2025-09-12 07:38:53 +03:00
Pekka Enberg
d80814fa2c core/schema: Optimize get_dependent_materialized_views() when no views
Eliminates get_dependent_materialized_views() overhead when there are no
views. Note that we need to optimize the case when there are views as
well because this ends up being pretty hot in write-intensive workloads.
2025-09-12 07:22:18 +03:00
Preston Thorpe
f9f7a44955 Merge 'add explicit usize type annotation to range iterator in test' from Denizhan Dakılır
very small fix when i was reading the codebase with rust-analyser while
trying to find a bug for simulator.
original error:
`non-primitive cast: <Range<i32> as Iterator>::Item as i32 rust-analyzer
E0605`

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3043
2025-09-11 21:12:32 -04:00
PThorpe92
36425b2ada Refactor UPSERT to use wal_expr_mut to walk AST.
Working on https://github.com/tursodatabase/turso/issues/2964 I came
upon `walk_expr_mut`, I don't think it existed last time I really spent
much time in the translator. So quickly went back and cleaned this up.
2025-09-11 21:08:11 -04:00
Denizhan Dakılır
70102f5f6e add explicit usize type annotation to range iterator in test 2025-09-12 02:18:49 +03:00
Jussi Saurio
9b14c0022d Implement the balance_quick algorithm
Fast balancing routine for the common special case where the rightmost leaf page of a given subtree overflows (= an append).
In this case we just add a new leaf page as the right sibling of that page, and insert a new divider cell into the parent.
The high level steps are:
1. Allocate a new leaf page and insert the overflow cell payload in it.
2. Create a new divider cell in the parent - it contains the page number of the old rightmost leaf, plus the largest rowid on that page.
3. Update the rightmost pointer of the parent to point to the new leaf page.
4. Continue balance from the parent page (inserting the new divider cell may have overflowedImplement the balance_quick algorithm
2025-09-12 00:42:27 +03:00
Preston Thorpe
e9944f5d1f Merge 'Fix automatic indexes' from Jussi Saurio
Closes #2993
## Background
When a `CREATE TABLE` statement specifies constraints like `col UNIQUE`,
`col PRIMARY KEY`, `UNIQUE (col1, col2)`, `PRIMARY KEY(col3, col4)`,
SQLite creates indexes for these constraints automatically with the
naming scheme `sqlite_autoindex_<table_name>_<increasing_number>`.
## Problem
SQLite expects these indexes to be created in table definition order.
For example:
```sql
CREATE TABLE t(x UNIQUE, y PRIMARY KEY, c, d, UNIQUE(c,d));
```
Should result in:
```sql
sqlite_autoindex_t_1 -- x UNIQUE
sqlite_autoindex_t_2 -- y PRIMARY KEY
sqlite_autoindex_t_3-- UNIQUE(c,d)
```
However, `tursodb` currently doesn't uphold this invariant -- for
example: the PRIMARY KEY index is always constructed first. SQLite flags
this as a corruption error (see #2993).
## Solution
- Process "unique sets" in table definition order. "Unique sets" are
groups of 1-n columns that are part of either a UNIQUE or a PRIMARY KEY
constraint.
- Deduplicate unique sets properly: a PRIMARY KEY of a rowid alias
(INTEGER PRIMARY KEY) is not a unique set. `UNIQUE (a desc, b)` and
`PRIMARY KEY(a, b)` are a single unique set, not two.
- Unify logic for creating automatic indexes and parsing them - remove
separate logic in `check_automatic_pk_index_required()` and use the
existing `create_table()` utility in both index creation and
deserialization.
- Deserialize a single automatic index per unique set, and assert that
`unique_sets.len() == autoindexes.len()`.
- Verify consistent behavior by adding a fuzz tests that creates 1000
databases with 1 table each and runs `PRAGMA integrity_check` on all of
them with SQLite.
## Trivia
Apart from fixing the exact issue #2993, this PR also fixes other bugs
related to autoindex construction - namely cases where too many indexes
were created due to improper deduplication of unique sets.

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3018
2025-09-11 17:04:53 -04:00
pedrocarlo
dbb7d6f532 reprepare optimization using reset() 2025-09-11 16:59:53 -03:00
pedrocarlo
c04cf535b0 flip is_done to is_busy 2025-09-11 14:58:51 -03:00