Commit Graph

10251 Commits

Author SHA1 Message Date
Pekka Enberg
d3e8285d93 core/io: Never skip a completion in CompletionGroup::add()
The previous implementation of CompletionGroup::add() would filter out
successfully-finished completions:

    if !completion.finished() || completion.failed() {
        self.completions.push(completion.clone());
    }

This caused a problem when combined with drain() in the calling code.
Completions that were already finished would be removed from the source
vector by drain() but not added to the group, effectively losing track
of them.

This breaks the invariant that all completions passed to a group must
be tracked, regardless of their state. The build() method already
handles finished completions correctly by not including them in the
outstanding count.

The fix is to always add all completions and let build() handle their
state appropriately, matching the behavior of the old io_yield_many!()
macro.
2025-10-15 10:47:16 +03:00
Jussi Saurio
0d8a3dda8c Merge 'sql_generation: Fix implementation of LTValue and GTValue for Text types' from Jussi Saurio
## Background
Simulator wants to create predicates that it knows will be Greater or
Less than some known value. It uses `LTValue` and `GTValue` for
generating these.
## Problem
Current implementation simply decrements or increments a random char by
1, and can thus generate strings with control characters like null
terminators that result in parse errors, as seen in e.g. this CI run htt
ps://github.com/tursodatabase/turso/actions/runs/18459131141/job/5258630
5749?pr=3702 of PR #3702
EDIT: I realized the _actual_ problem is in `GTValue` when it decides to
make the string longer, it uses a random char value from `0..255` which
can include null terminators etc. Fixed that too. I think in general
this PR's approach is a bit more predictable so let's keep it.
## Solution
Restrict string mutations to ascii string characters so that the
mutation always results in another ascii string character.

Closes #3708
2025-10-15 09:25:17 +03:00
Jussi Saurio
b1cb897216 Merge 'Fix another "should have been rewritten" translation panic' from Jussi Saurio
Closes #2158

Closes #3702
2025-10-15 09:25:01 +03:00
Preston Thorpe
e5a74b347a Merge 'relax check in the vector test' from Nikita Sivukhin
- fixes https://github.com/tursodatabase/turso/issues/3732

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #3733
2025-10-14 16:14:33 -04:00
Preston Thorpe
74bbb0d5a3 Merge 'Allow using indexes to iterate rows in UPDATE statements' from Jussi Saurio
Closes #2600
## Problem
Every btree has a key it is sorted by - this is the integer `rowid` for
tables and an arbitrary-sized, potentially multi-column key for indexes.
Executing an UPDATE in a loop is not safe if the update modifies any
part of the key of the btree that is used for iterating the rows in said
loop. For example:
- Using the table itself to iterate rows is not safe if the UPDATE
modifies the rowid (or rowid alias) of a row, because since it modifies
the iteration order itself, it may cause rows to be skipped:
```sql
CREATE TABLE t(x INTEGER PRIMARY KEY, y);
INSERT <something>
UPDATE t SET y = RANDOM() where x > 100; // safe to iterate 't', 'y' is not being modified
UPDATE t SET x = RANDOM() where x > 100; // not safe to iterate 't', 'x' is being modified
```
- Using an index to iterate rows is not safe if the UPDATE modifies any
of the columns in the index key
```sql
CREATE TABLE t(x, y, z);
CREATE INDEX txy ON t (x,y);
INSERT <something>
UPDATE t SET z = RANDOM() where x = 100 and y > 0; // safe to iterate txy, neither x or y is being modified
UPDATE t SET x = RANDOM() where x = 100 and y > 0; // not safe to iterate txy, 'x' is being modified
UPDATE t SET y = RANDOM() where x = 100 and y > 0; // not safe to iterate txy, 'y' is being modified
```
## Current solution in tursodb
Our current `main` code recognizes this issue and adopts this pseudocode
algorithm from SQLite:
- open a table or index for reading the rows of the source table,
- for each row that matches the condition in the UPDATE statement, write
the row into a temporary table
- then use that temporary table for iteration in the UPDATE loop.
This guarantees that the iteration order will not be affected by the
UPDATEs because the ephemeral table is not under modification.
## Problem with current solution
Our `main` code specialcases the ephemeral table solution to rowids /
rowid aliases only. Using indexes for UPDATE iteration was disabled in
an earlier PR (#2599) due to the safety issue mentioned above, which
means that many UPDATE statements become full table scans:
```sql
turso> create table t(x PRIMARY KEY);
turso> insert into t select value from generate_series(1,10000);
turso> explain update t set x = x + 100000 where x > 50 and x < 60;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     28    0                    0   Start at 28
1     OpenWrite          0     2     0                    0   root=2; iDb=0
2     OpenWrite          1     3     0                    0   root=3; iDb=0
-- scan entire 't' despite very narrow update range!
3     Rewind             0     27    0                    0   Rewind table t
...
```
## Solution
We move the ephemeral table logic to _after_ the optimizer has selected
the best access path for the table, and then, if the UPDATE modifies the
key of the chosen access path (table or index; whichever was selected by
the optimizer), we change the plan to include the ephemeral table
prepopulation. Hence, the same query from above becomes:
```sql
turso> explain update t set x = x + 100000 where x > 50 and x < 60;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     35    0                    0   Start at 35
1     OpenEphemeral      0     1     0                    0   cursor=0 is_table=true
2     OpenRead           1     3     0                    0   index=sqlite_autoindex_t_1, root=3, iDb=0
3     Integer            50    2     0                    0   r[2]=50
-- index seek on PRIMARY KEY index
4     SeekGT             1     10    2                    0   key=[2..2]
5       Integer          60    2     0                    0   r[2]=60
6       IdxGE            1     10    2                    0   key=[2..2]
7       IdxRowId         1     1     0                    0   r[1]=cursor 1 for index sqlite_autoindex_t_1.rowid
8       Insert           0     3     1     ephemeral_scratch  2   intkey=r[1] data=r[3]
9     Next               1     6     0                    0   
10    OpenWrite          2     2     0                    0   root=2; iDb=0
11    OpenWrite          3     3     0                    0   root=3; iDb=0
-- only scan rows that were inserted to ephemeral index
12    Rewind             0     34    0                    0   Rewind table ephemeral_scratch
13      RowId            0     5     0                    0   r[5]=ephemeral_scratch.rowid
```
Note that an ephemeral index does not have to be used if the index is
not affected:
```sql
turso> create table t(x PRIMARY KEY, data);
turso> explain update t set data = 'some_data' where x > 50 and x < 60;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     15    0                    0   Start at 15
1     OpenWrite          0     2     0                    0   root=2; iDb=0
2     OpenWrite          1     3     0                    0   root=3; iDb=0
3     Integer            50    1     0                    0   r[1]=50
-- direct index seek
4     SeekGT             1     14    1                    0   key=[1..1]
```

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3728
2025-10-14 16:11:25 -04:00
Preston Thorpe
a6b0778fb3 Merge 'Refactor INSERT translation to a modular setup with emitter context' from Preston Thorpe
This PR contains NO semantic changes at all, this simply refactors
existing INSERT code to be easier to reason about.
Very sorry I know I've been working on `INSERT OR IGNORE|REPLACE|etc..`
for days now but the insert translation was literally unbearable and I
got it working but I barely could wrap my head around that whole
`translate_insert` function, so I spent a bunch of time refactoring the
whole INSERT handling out into different "Plans"...  which turned into a
whole different clusterf***... So I just went back and made the existing
insert emission more modular and created some context that can make it
easier to reason about.
This should be able to just be merged quickly

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3731
2025-10-14 16:10:43 -04:00
Pere Diaz Bou
1a464664a7 Merge 'increment Changes() only once conditionally ' from Pavan Nambi
closes #3688

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #3692
2025-10-14 20:26:04 +02:00
Nikita Sivukhin
427a145663 fmt 2025-10-14 22:22:14 +04:00
Pere Diaz Bou
a2097188f0 Merge 'make comparison case sensitive' from Pavan Nambi
closes https://github.com/tursodatabase/turso/issues/3672

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #3686
2025-10-14 20:20:02 +02:00
Nikita Sivukhin
9dac7e00ba relax check in the vector test
- fixes https://github.com/tursodatabase/turso/issues/3732
2025-10-14 22:19:19 +04:00
PThorpe92
792877d421 add doc comments to InsertEmitCtx 2025-10-14 13:22:32 -04:00
PThorpe92
20bdb1133d fix clippy warnings 2025-10-14 13:00:31 -04:00
PThorpe92
22e98964cc Refactor INSERT translation to a modular setup with emitter context 2025-10-14 12:48:34 -04:00
Jussi Saurio
3cbdf433a9 fuzz: update multiple columns in table_index_mutation_fuzz 2025-10-14 17:26:21 +03:00
Jussi Saurio
0ae4425e4c fuzz: create multi-column indices in table_index_mutation_fuzz 2025-10-14 17:23:21 +03:00
Jussi Saurio
b3b07252dc Add TCL smoke tests for UPDATEs affecting indexes 2025-10-14 16:25:05 +03:00
Jussi Saurio
b3be21f472 Do not count ephemeral table INSERTs as changes 2025-10-14 16:15:20 +03:00
Jussi Saurio
87434b8a72 Do not count DELETEs occuring in an UPDATE stmt as separate changes 2025-10-14 16:11:43 +03:00
Jussi Saurio
0173d31c04 clippy: collapse nested if 2025-10-14 15:51:31 +03:00
Jussi Saurio
4b80678898 Allow case where cursor for btree is already opened
When populating an ephemeral table for UPDATE, it may open a cursor
on the (permanent) table - in this case we don't need to open it
again in the UPDATE loop
2025-10-14 15:32:48 +03:00
Jussi Saurio
495e66e12b fuzz: run rusqlite integrity check after each DML operation 2025-10-14 15:32:22 +03:00
Jussi Saurio
3465a01bf5 fuzz: sometimes make UPDATEd value a function of the old value 2025-10-14 15:31:43 +03:00
Jussi Saurio
f5ee4807da Properly differentiate between source and target in UPDATE
- Encode information about ephemeral source table in OperationMode::UPDATE
  if present
- Use OperationMode information to correctly resolve cursors in UPDATE
2025-10-14 14:17:28 +03:00
Jussi Saurio
691dce6b8a Make decision about UpdatePlan::ephemeral_plan _after_ optimizer
An ephemeral table is required if the b-tree key of the table (rowid)
or the index (index key) is affected by the UPDATE.
2025-10-14 14:17:28 +03:00
Jussi Saurio
c2fe13ad4f Update documentation of UpdatePlan::ephemeral_plan
It now better reflects when it is used.
2025-10-14 12:18:53 +03:00
Jussi Saurio
bc80ac1754 require &mut ProgramBuilder argument in optimize_plan()
this will be used for ephemeral plan construction for UPDATE in
a later commit.
2025-10-14 12:18:13 +03:00
Jussi Saurio
29770382f9 temporarily remove ephemeral plan construction from prepare_update_plan
the decision to use an ephemeral table in UPDATE will be made after
the optimizer has made the decision about which index to use. this will
be implemented in a later commit.
2025-10-14 12:14:15 +03:00
Pekka Enberg
7cf51e74ca Merge 'core/mvcc: implement CursorTrait on MVCC cursor' from Pere Diaz Bou
Closes #3714
2025-10-14 10:24:42 +03:00
Pekka Enberg
07b94faeb7 Merge 'Add test case for vector() format crash' from Pedro Muniz
Added test to close #1454. The Go code incorrectly, did not quote the
vector array.

Closes #3716
2025-10-14 09:37:11 +03:00
Pekka Enberg
9822fc2c90 Merge 'bindings/rust: Bump version recommendation to 0.2' from Kyle Kelley
Bump version number for crate docs starter setup

Closes #3711
2025-10-14 09:32:07 +03:00
Pekka Enberg
9a1bd2112d Merge 'Run simulator under Miri' from Bob Peterson
This adds support for running the simulator under Miri to detect UB.
There are a few things to note about Miri and its limitations
- It has limited `libc` coverage, so it's not really possible to have
Miri help with `UringIO`/`UringFile` or `UnixIO`/`UnixFile`. That's a
big gap ☹️
- It **can** work for `GenericIO`/`GenericFile`, which only uses `std`
- It can't call external C libraries, so even using `sqlite` is out
(hence adding `--disable-integrity-check` to the simulator for Miri use)
- It runs on nightly, consequently there are a few new lints that don't
exist on turso's pinned version of rustc
Some questions I have about this MR
- I made `GenericFile::{lock_file,unlock_file}` noops so I could use
`GenericIO`. This isn't great, but if/when you update from Rust 1.88.0
to 1.89.0, `std::File::{lock,lock_shared,unlock}` will be stabilized and
available. Should I note that as a TODO or something?
- Previously, the sim runner shelled out to `git` to get stuff like the
current git hash and the repo directory. For Miri, that's out, and so is
`git2`. Unfortunately, `gix` is also out since it has a required
dependency that uses inline assembly, which Miri doesn't like. I wrote a
hacky shim that uses only std to look for `.git` and find the hash that
HEAD is pointing to. It doesn't deal with stuff like packed-refs or the
repo being a secondary one made with `git worktree`. I'm happy to
support that, but wanted to hear from maintainers before doing more
work.
Two UB occurrences I already found:
- `TursoRwLock::read` used `AtomicU64::compare_exchange_weak`, which is
(evidently) [allowed to spuriously fail](https://doc.rust-lang.org/std/s
ync/atomic/struct.AtomicU64.html#method.compare_exchange_weak) in
exchange for perf. Miri forces this behavior, which triggers trivial
read deadlocks even with zero readers/writers. I changed it to
`compare_exchange`, but I'm not an atomics expert.
- Uninitialized read in non-Unix
`core::storage::buffer_pool::arena::alloc`. This is a simple one,
resolved by using `std::alloc::alloc_zeroed` instead of
`std::alloc::alloc`
Moving forward, I'd be interested in potentially getting the tests to
run in Miri, too. `tokio` looks like a good example of a project with
partial coverage that runs it where they can. They have some extra test
config to allow as many as possible to run under Miri, with
appropriately scaled-down parameter values since Miri is super slow

Closes #3720
2025-10-14 09:26:55 +03:00
Pekka Enberg
829ca291f9 Merge 'Import workspace crates by name and not path' from Pedro Muniz
Reviewed-by: bit-aloo (@Shourya742)

Closes #3725
2025-10-14 09:25:13 +03:00
Jussi Saurio
4e34c6be51 Merge 'names shall not be shared between tables,indexs,vtabs,views' from Pavan Nambi
closes #3675

Closes #3681
2025-10-14 07:30:37 +03:00
Jussi Saurio
bd15fee1f8 Merge 'Get aliases to where shall they be used' from Pavan Nambi
closes #3678

Closes #3680
2025-10-14 07:28:09 +03:00
Jussi Saurio
cce2bf9328 Merge 'Add correct unique constraint test for tcl' from Pedro Muniz
Closes #1710
We should use `do_execsql_test_in_memory_error_content` to test errors
instead

Closes #3718
2025-10-14 07:26:48 +03:00
Jussi Saurio
1aed9c9694 Merge 'remove cfg for MAP_ANONYMOUS' from Pedro Muniz
Related to #2587

Reviewed-by: Preston Thorpe <preston@turso.tech>
Reviewed-by: bit-aloo (@Shourya742)

Closes #3721
2025-10-14 07:26:09 +03:00
Jussi Saurio
ebc4ddb2a2 Merge 'Simulator: fix alter table shadowing to modify index column name ' from Pedro Muniz
Forgot to modify the column name referenced in the indexes when
shadowing

Reviewed-by: bit-aloo (@Shourya742)

Closes #3712
2025-10-14 07:25:29 +03:00
Jussi Saurio
a710d2f124 Merge 'Simulator: Drop Index' from Pedro Muniz
Added the ability for us to generate `Drop Index` queries in the
simulator. Most of the code is just boilerplate and some checks to make
sure we do not generate `Drop Index` when we have no indexes to drop

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Reviewed-by: bit-aloo (@Shourya742)

Closes #3713
2025-10-14 07:25:02 +03:00
Jussi Saurio
61109963fc Merge 'fix backwards compatible rowid alias behaviour' from Pedro Muniz
Closes #3665

Closes #3723
2025-10-14 07:24:42 +03:00
pedrocarlo
5b2cce946a do not reference workspace package by path 2025-10-13 21:07:15 -03:00
pedrocarlo
2e722af93c proof issue 1710 2025-10-13 20:51:21 -03:00
pedrocarlo
83dde9b55c fix backwards compatible rowid alias behaviour 2025-10-13 20:41:45 -03:00
pedrocarlo
0ef5ec007c remove cfg for MAP_ANONYMOUS 2025-10-13 18:05:18 -03:00
Bob Peterson
4d843804b7 Add --disable-integrity-check option to simulator
Miri can't execute sqlite via the FFI, so this needs to be configurable
2025-10-13 14:54:16 -05:00
Bob Peterson
3d4c10df40 Document using Miri to run the simulator 2025-10-13 14:54:16 -05:00
Bob Peterson
dfc77b0350 Non-Unix arena: use zeroed alloc to avoid UB
Reads to the arena were flagged by Miri as UB since it contained
uninitialized memory
2025-10-13 14:54:16 -05:00
Bob Peterson
74ef9ad5ca Drop weak in TursoRwLock::read's compare_exchange
compare_exchange_weak can spuriously fail, which Miri obliges us with,
causing a read deadlock
2025-10-13 14:54:16 -05:00
Bob Peterson
ce2f286df0 Replace git shell commands with std shims
gix doesn't work here, since while it's pure Rust, it has a
non-configurable dependency on crates using inline assembly, which Miri
does not support. This commit is a bit of a hack, and only works in
non-bare git repos without e.g packed-refs.
2025-10-13 14:54:16 -05:00
Bob Peterson
cd56f52bd6 Add cfg attributes for running under Miri 2025-10-13 14:54:16 -05:00
pedrocarlo
2798fafa6c proof issue 1454 2025-10-13 16:14:29 -03:00