The previous implementation of CompletionGroup would call the group's
callback function directly when the last completion finished:
if prev == 1 {
let group_result = group.result.get().and_then(|e| *e);
(group.complete)(group_result.map_or(Ok(0), Err));
}
This broke nested completion groups because parent groups track their
children via the Completion::callback() method. By calling the function
pointer directly, we bypassed the completion chain and parent groups
never received notification that their child had completed.
The fix stores a reference to the group's own Completion object in
self_completion during build(). When the last child finishes, we call
group_completion.callback() instead of invoking the function directly.
This properly propagates through the completion hierarchy, ensuring
parent groups decrement their outstanding count and eventually complete.
This matches the behavior of individual completions and maintains the
invariant that all completions notify their parents through the unified
callback() mechanism.
The previous implementation of CompletionGroup::add() would filter out
successfully-finished completions:
if !completion.finished() || completion.failed() {
self.completions.push(completion.clone());
}
This caused a problem when combined with drain() in the calling code.
Completions that were already finished would be removed from the source
vector by drain() but not added to the group, effectively losing track
of them.
This breaks the invariant that all completions passed to a group must
be tracked, regardless of their state. The build() method already
handles finished completions correctly by not including them in the
outstanding count.
The fix is to always add all completions and let build() handle their
state appropriately, matching the behavior of the old io_yield_many!()
macro.
Closes#3687 .
Previously, the `try_fold_expr_to_i64` function casted `NULL` as `0`
when evaluating expressions in `LIMIT` or `OFFSET` clauses. I removed
this function since evaluating the expression directly and relying on
the MustBeInt operation for casting seems to handle everything.
Closes#3695
We merged two concurrent fixes to `nchange` handling last night and
AFAICT the fix in #3692 was incorrect because it doesn't count UPDATEs
in cases where the original row was DELETEd as part of the UPDATE
statement.
The correct fix was in 87434b8
EDIT: okay, it's not strictly _incorrect_ in #3692 I guess, I just think
it's more intuitive to increment the change for UPDATE in the `insert`
opcode because that's what performs the actual update.
Closes#3735
We merged two concurrent fixes to `nchange` handling last night and
AFAICT the fix in #3692 was incorrect because it doesn't count UPDATEs
in cases where the original row was DELETEd as part of the UPDATE
statement.
The correct fix was in 87434b8
Closes#2600
## Problem
Every btree has a key it is sorted by - this is the integer `rowid` for
tables and an arbitrary-sized, potentially multi-column key for indexes.
Executing an UPDATE in a loop is not safe if the update modifies any
part of the key of the btree that is used for iterating the rows in said
loop. For example:
- Using the table itself to iterate rows is not safe if the UPDATE
modifies the rowid (or rowid alias) of a row, because since it modifies
the iteration order itself, it may cause rows to be skipped:
```sql
CREATE TABLE t(x INTEGER PRIMARY KEY, y);
INSERT <something>
UPDATE t SET y = RANDOM() where x > 100; // safe to iterate 't', 'y' is not being modified
UPDATE t SET x = RANDOM() where x > 100; // not safe to iterate 't', 'x' is being modified
```
- Using an index to iterate rows is not safe if the UPDATE modifies any
of the columns in the index key
```sql
CREATE TABLE t(x, y, z);
CREATE INDEX txy ON t (x,y);
INSERT <something>
UPDATE t SET z = RANDOM() where x = 100 and y > 0; // safe to iterate txy, neither x or y is being modified
UPDATE t SET x = RANDOM() where x = 100 and y > 0; // not safe to iterate txy, 'x' is being modified
UPDATE t SET y = RANDOM() where x = 100 and y > 0; // not safe to iterate txy, 'y' is being modified
```
## Current solution in tursodb
Our current `main` code recognizes this issue and adopts this pseudocode
algorithm from SQLite:
- open a table or index for reading the rows of the source table,
- for each row that matches the condition in the UPDATE statement, write
the row into a temporary table
- then use that temporary table for iteration in the UPDATE loop.
This guarantees that the iteration order will not be affected by the
UPDATEs because the ephemeral table is not under modification.
## Problem with current solution
Our `main` code specialcases the ephemeral table solution to rowids /
rowid aliases only. Using indexes for UPDATE iteration was disabled in
an earlier PR (#2599) due to the safety issue mentioned above, which
means that many UPDATE statements become full table scans:
```sql
turso> create table t(x PRIMARY KEY);
turso> insert into t select value from generate_series(1,10000);
turso> explain update t set x = x + 100000 where x > 50 and x < 60;
addr opcode p1 p2 p3 p4 p5 comment
---- ----------------- ---- ---- ---- ------------- -- -------
0 Init 0 28 0 0 Start at 28
1 OpenWrite 0 2 0 0 root=2; iDb=0
2 OpenWrite 1 3 0 0 root=3; iDb=0
-- scan entire 't' despite very narrow update range!
3 Rewind 0 27 0 0 Rewind table t
...
```
## Solution
We move the ephemeral table logic to _after_ the optimizer has selected
the best access path for the table, and then, if the UPDATE modifies the
key of the chosen access path (table or index; whichever was selected by
the optimizer), we change the plan to include the ephemeral table
prepopulation. Hence, the same query from above becomes:
```sql
turso> explain update t set x = x + 100000 where x > 50 and x < 60;
addr opcode p1 p2 p3 p4 p5 comment
---- ----------------- ---- ---- ---- ------------- -- -------
0 Init 0 35 0 0 Start at 35
1 OpenEphemeral 0 1 0 0 cursor=0 is_table=true
2 OpenRead 1 3 0 0 index=sqlite_autoindex_t_1, root=3, iDb=0
3 Integer 50 2 0 0 r[2]=50
-- index seek on PRIMARY KEY index
4 SeekGT 1 10 2 0 key=[2..2]
5 Integer 60 2 0 0 r[2]=60
6 IdxGE 1 10 2 0 key=[2..2]
7 IdxRowId 1 1 0 0 r[1]=cursor 1 for index sqlite_autoindex_t_1.rowid
8 Insert 0 3 1 ephemeral_scratch 2 intkey=r[1] data=r[3]
9 Next 1 6 0 0
10 OpenWrite 2 2 0 0 root=2; iDb=0
11 OpenWrite 3 3 0 0 root=3; iDb=0
-- only scan rows that were inserted to ephemeral index
12 Rewind 0 34 0 0 Rewind table ephemeral_scratch
13 RowId 0 5 0 0 r[5]=ephemeral_scratch.rowid
```
Note that an ephemeral index does not have to be used if the index is
not affected:
```sql
turso> create table t(x PRIMARY KEY, data);
turso> explain update t set data = 'some_data' where x > 50 and x < 60;
addr opcode p1 p2 p3 p4 p5 comment
---- ----------------- ---- ---- ---- ------------- -- -------
0 Init 0 15 0 0 Start at 15
1 OpenWrite 0 2 0 0 root=2; iDb=0
2 OpenWrite 1 3 0 0 root=3; iDb=0
3 Integer 50 1 0 0 r[1]=50
-- direct index seek
4 SeekGT 1 14 1 0 key=[1..1]
```
Reviewed-by: Preston Thorpe <preston@turso.tech>
Closes#3728
This PR contains NO semantic changes at all, this simply refactors
existing INSERT code to be easier to reason about.
Very sorry I know I've been working on `INSERT OR IGNORE|REPLACE|etc..`
for days now but the insert translation was literally unbearable and I
got it working but I barely could wrap my head around that whole
`translate_insert` function, so I spent a bunch of time refactoring the
whole INSERT handling out into different "Plans"... which turned into a
whole different clusterf***... So I just went back and made the existing
insert emission more modular and created some context that can make it
easier to reason about.
This should be able to just be merged quickly
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#3731
When populating an ephemeral table for UPDATE, it may open a cursor
on the (permanent) table - in this case we don't need to open it
again in the UPDATE loop
- Encode information about ephemeral source table in OperationMode::UPDATE
if present
- Use OperationMode information to correctly resolve cursors in UPDATE
the decision to use an ephemeral table in UPDATE will be made after
the optimizer has made the decision about which index to use. this will
be implemented in a later commit.
This adds support for running the simulator under Miri to detect UB.
There are a few things to note about Miri and its limitations
- It has limited `libc` coverage, so it's not really possible to have
Miri help with `UringIO`/`UringFile` or `UnixIO`/`UnixFile`. That's a
big gap ☹️
- It **can** work for `GenericIO`/`GenericFile`, which only uses `std`
- It can't call external C libraries, so even using `sqlite` is out
(hence adding `--disable-integrity-check` to the simulator for Miri use)
- It runs on nightly, consequently there are a few new lints that don't
exist on turso's pinned version of rustc
Some questions I have about this MR
- I made `GenericFile::{lock_file,unlock_file}` noops so I could use
`GenericIO`. This isn't great, but if/when you update from Rust 1.88.0
to 1.89.0, `std::File::{lock,lock_shared,unlock}` will be stabilized and
available. Should I note that as a TODO or something?
- Previously, the sim runner shelled out to `git` to get stuff like the
current git hash and the repo directory. For Miri, that's out, and so is
`git2`. Unfortunately, `gix` is also out since it has a required
dependency that uses inline assembly, which Miri doesn't like. I wrote a
hacky shim that uses only std to look for `.git` and find the hash that
HEAD is pointing to. It doesn't deal with stuff like packed-refs or the
repo being a secondary one made with `git worktree`. I'm happy to
support that, but wanted to hear from maintainers before doing more
work.
Two UB occurrences I already found:
- `TursoRwLock::read` used `AtomicU64::compare_exchange_weak`, which is
(evidently) [allowed to spuriously fail](https://doc.rust-lang.org/std/s
ync/atomic/struct.AtomicU64.html#method.compare_exchange_weak) in
exchange for perf. Miri forces this behavior, which triggers trivial
read deadlocks even with zero readers/writers. I changed it to
`compare_exchange`, but I'm not an atomics expert.
- Uninitialized read in non-Unix
`core::storage::buffer_pool::arena::alloc`. This is a simple one,
resolved by using `std::alloc::alloc_zeroed` instead of
`std::alloc::alloc`
Moving forward, I'd be interested in potentially getting the tests to
run in Miri, too. `tokio` looks like a good example of a project with
partial coverage that runs it where they can. They have some extra test
config to allow as many as possible to run under Miri, with
appropriately scaled-down parameter values since Miri is super slow
Closes#3720