Commit Graph

1039 Commits

Author SHA1 Message Date
Jussi Saurio
b1cb897216 Merge 'Fix another "should have been rewritten" translation panic' from Jussi Saurio
Closes #2158

Closes #3702
2025-10-15 09:25:01 +03:00
Preston Thorpe
74bbb0d5a3 Merge 'Allow using indexes to iterate rows in UPDATE statements' from Jussi Saurio
Closes #2600
## Problem
Every btree has a key it is sorted by - this is the integer `rowid` for
tables and an arbitrary-sized, potentially multi-column key for indexes.
Executing an UPDATE in a loop is not safe if the update modifies any
part of the key of the btree that is used for iterating the rows in said
loop. For example:
- Using the table itself to iterate rows is not safe if the UPDATE
modifies the rowid (or rowid alias) of a row, because since it modifies
the iteration order itself, it may cause rows to be skipped:
```sql
CREATE TABLE t(x INTEGER PRIMARY KEY, y);
INSERT <something>
UPDATE t SET y = RANDOM() where x > 100; // safe to iterate 't', 'y' is not being modified
UPDATE t SET x = RANDOM() where x > 100; // not safe to iterate 't', 'x' is being modified
```
- Using an index to iterate rows is not safe if the UPDATE modifies any
of the columns in the index key
```sql
CREATE TABLE t(x, y, z);
CREATE INDEX txy ON t (x,y);
INSERT <something>
UPDATE t SET z = RANDOM() where x = 100 and y > 0; // safe to iterate txy, neither x or y is being modified
UPDATE t SET x = RANDOM() where x = 100 and y > 0; // not safe to iterate txy, 'x' is being modified
UPDATE t SET y = RANDOM() where x = 100 and y > 0; // not safe to iterate txy, 'y' is being modified
```
## Current solution in tursodb
Our current `main` code recognizes this issue and adopts this pseudocode
algorithm from SQLite:
- open a table or index for reading the rows of the source table,
- for each row that matches the condition in the UPDATE statement, write
the row into a temporary table
- then use that temporary table for iteration in the UPDATE loop.
This guarantees that the iteration order will not be affected by the
UPDATEs because the ephemeral table is not under modification.
## Problem with current solution
Our `main` code specialcases the ephemeral table solution to rowids /
rowid aliases only. Using indexes for UPDATE iteration was disabled in
an earlier PR (#2599) due to the safety issue mentioned above, which
means that many UPDATE statements become full table scans:
```sql
turso> create table t(x PRIMARY KEY);
turso> insert into t select value from generate_series(1,10000);
turso> explain update t set x = x + 100000 where x > 50 and x < 60;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     28    0                    0   Start at 28
1     OpenWrite          0     2     0                    0   root=2; iDb=0
2     OpenWrite          1     3     0                    0   root=3; iDb=0
-- scan entire 't' despite very narrow update range!
3     Rewind             0     27    0                    0   Rewind table t
...
```
## Solution
We move the ephemeral table logic to _after_ the optimizer has selected
the best access path for the table, and then, if the UPDATE modifies the
key of the chosen access path (table or index; whichever was selected by
the optimizer), we change the plan to include the ephemeral table
prepopulation. Hence, the same query from above becomes:
```sql
turso> explain update t set x = x + 100000 where x > 50 and x < 60;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     35    0                    0   Start at 35
1     OpenEphemeral      0     1     0                    0   cursor=0 is_table=true
2     OpenRead           1     3     0                    0   index=sqlite_autoindex_t_1, root=3, iDb=0
3     Integer            50    2     0                    0   r[2]=50
-- index seek on PRIMARY KEY index
4     SeekGT             1     10    2                    0   key=[2..2]
5       Integer          60    2     0                    0   r[2]=60
6       IdxGE            1     10    2                    0   key=[2..2]
7       IdxRowId         1     1     0                    0   r[1]=cursor 1 for index sqlite_autoindex_t_1.rowid
8       Insert           0     3     1     ephemeral_scratch  2   intkey=r[1] data=r[3]
9     Next               1     6     0                    0   
10    OpenWrite          2     2     0                    0   root=2; iDb=0
11    OpenWrite          3     3     0                    0   root=3; iDb=0
-- only scan rows that were inserted to ephemeral index
12    Rewind             0     34    0                    0   Rewind table ephemeral_scratch
13      RowId            0     5     0                    0   r[5]=ephemeral_scratch.rowid
```
Note that an ephemeral index does not have to be used if the index is
not affected:
```sql
turso> create table t(x PRIMARY KEY, data);
turso> explain update t set data = 'some_data' where x > 50 and x < 60;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     15    0                    0   Start at 15
1     OpenWrite          0     2     0                    0   root=2; iDb=0
2     OpenWrite          1     3     0                    0   root=3; iDb=0
3     Integer            50    1     0                    0   r[1]=50
-- direct index seek
4     SeekGT             1     14    1                    0   key=[1..1]
```

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3728
2025-10-14 16:11:25 -04:00
Pere Diaz Bou
1a464664a7 Merge 'increment Changes() only once conditionally ' from Pavan Nambi
closes #3688

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #3692
2025-10-14 20:26:04 +02:00
Pere Diaz Bou
a2097188f0 Merge 'make comparison case sensitive' from Pavan Nambi
closes https://github.com/tursodatabase/turso/issues/3672

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #3686
2025-10-14 20:20:02 +02:00
Jussi Saurio
b3b07252dc Add TCL smoke tests for UPDATEs affecting indexes 2025-10-14 16:25:05 +03:00
Pekka Enberg
07b94faeb7 Merge 'Add test case for vector() format crash' from Pedro Muniz
Added test to close #1454. The Go code incorrectly, did not quote the
vector array.

Closes #3716
2025-10-14 09:37:11 +03:00
Jussi Saurio
4e34c6be51 Merge 'names shall not be shared between tables,indexs,vtabs,views' from Pavan Nambi
closes #3675

Closes #3681
2025-10-14 07:30:37 +03:00
Jussi Saurio
bd15fee1f8 Merge 'Get aliases to where shall they be used' from Pavan Nambi
closes #3678

Closes #3680
2025-10-14 07:28:09 +03:00
Jussi Saurio
cce2bf9328 Merge 'Add correct unique constraint test for tcl' from Pedro Muniz
Closes #1710
We should use `do_execsql_test_in_memory_error_content` to test errors
instead

Closes #3718
2025-10-14 07:26:48 +03:00
pedrocarlo
2e722af93c proof issue 1710 2025-10-13 20:51:21 -03:00
pedrocarlo
83dde9b55c fix backwards compatible rowid alias behaviour 2025-10-13 20:41:45 -03:00
pedrocarlo
2798fafa6c proof issue 1454 2025-10-13 16:14:29 -03:00
Pavan-Nambi
57a06835bf add test and fmt and clippy
i was stupid

remove comment
2025-10-13 18:07:51 +05:30
Jussi Saurio
c54e150a52 Merge 'Fix: Table entry is not removed from sqlite_schema when a table is dropped' from
Fixes #3682 .
Ignore case of table name when dropping table.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3683
2025-10-13 15:02:29 +03:00
Jussi Saurio
523b155df1 Fix another "should have been rewritten" translation panic
Closes #2158
2025-10-13 11:02:42 +03:00
Pavan-Nambi
295612feea add test 2025-10-12 23:11:28 +05:30
Pavan-Nambi
e1f23aeb2c fmt and add tests 2025-10-12 22:23:04 +05:30
Pavan-Nambi
7e8dabaee5 make comparison case sensitive 2025-10-12 18:02:03 +05:30
Pavan-Nambi
36bf88119f add tests
clippy

expect err to make clippy happy

cleanup
2025-10-12 16:38:12 +05:30
Pavan-Nambi
bd9ce7c485 add test 2025-10-12 15:58:10 +05:30
rajajisai
9061024fad add test 2025-10-11 21:39:46 -04:00
Jussi Saurio
74e04634aa Fix incorrectly using an equality constraint twice for index seek
Prevents something like `WHERE x = 5 AND x = 5` from becoming a two
component index key.

Closes #3656
2025-10-10 13:19:19 +03:00
Diego Reis
da323fa0c4 Some clean ups and correctly working on WHERE clauses 2025-10-09 11:57:15 -03:00
Jussi Saurio
7948259d37 Merge 'optimizer: optimize range scans to use upper and lower bounds more efficiently' from Jussi Saurio
Made a new PR based on @sivukhin 's PR #2869 that had a lot of
conflicts. You can check out the PR description from there.
## The main idea is:
Before, if we had an index on `x` and had a query like `WHERE x > 100
and x < 200`, the plan would be something like:
```
- Seek to first row where x > 100
- Then, for every row, discard the row if x >= 200
```
This is highly wasteful in cases where there are a lot of rows where `x
>= 200`. Since our index is sorted on `x`, we know that once we hit the
_first_ row where `x >= 200`, we can stop iterating entirely.
So, the new plan is:
```
- Seek to first row where x > 100
- Then, iterate rows until x >= 200, and then stop
```
This also improves the situation for multi-column indexes. Imagine index
on `(x,y)` and a condition like `WHERE x = 100 and y > 100 and y < 200`.
Before, the plan was:
```
- Seek to first row where x=100 and y > 100
- Then, iterate rows while x = 100 and discard the row if y >= 200
- Stop when x > 100
```
This also suffers from a problem where if there are a lot of rows where
`x=100` and `y >= 200`, we go through those rows unnecessarily. The new
plan is:
```
- Seek to first row where x=100 and y > 100
- Then, iterate rows while x = 100 and y < 200
- Stop when either x > 100 or y >= 200
```
Which prevents us from iterating rows like `x=100, y = 666`
unnecessarily because we know the index is sorted on `(x,y)` - once we
hit any row where `x>100` OR `x=100, y >= 200`, we can stop.

Closes #3644
2025-10-09 14:47:15 +03:00
Jussi Saurio
e726803ab4 Merge 'translate: make bind_and_rewrite_expr() reject unbound identifiers if no referenced tables exist' from Jussi Saurio
Before, we just skipped evaluating `Id`, `Qualified` and
`DoublyQualified` if `referenced_tables` was `None`, leading to shit
like #3621. Let's eagerly return `"No such column"` parse errors in
these cases instead, and punch exceptions for cases where that doesn't
cleanly work
Top tip: use `Hide whitespace` toggle when inspecting the diff of this
PR
Closes #3621

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3626
2025-10-09 12:45:16 +03:00
Nikita Sivukhin
4313f57ecb Optimize range scans 2025-10-09 11:47:41 +03:00
Pavan-Nambi
f0d9ead19f add more tests
refactor and use sort_unstable_by_key
2025-10-09 08:28:59 +05:30
Pavan-Nambi
f138448da2 don't allow duplicate col names in create table 2025-10-09 08:09:31 +05:30
PThorpe92
a232e3cc7a Implement proper handling of deferred foreign keys 2025-10-07 16:45:23 -04:00
PThorpe92
f56f37fae5 Add more tests for self-referencing FKs and remove unneeded FkIfZero checks/labels in emitter 2025-10-07 16:45:23 -04:00
PThorpe92
99ae96c5f6 Fix self-referential FK relationships and validation of FKs 2025-10-07 16:45:22 -04:00
PThorpe92
ae975afe49 Remove unnecessary FK resolution on schema parsing 2025-10-07 16:45:16 -04:00
Jussi Saurio
a343dacaaf translate: make bind_and_rewrite_expr() reject identifiers if no referenced tables exist 2025-10-07 23:34:26 +03:00
PThorpe92
16d19fd39e Add tcl tests for foreign keys 2025-10-07 16:28:04 -04:00
Pekka Enberg
dacb8e3350 Merge 'Fix attach I/O error with in-memory databases' from Preston Thorpe
closes #3540

Closes #3602
2025-10-07 09:00:02 +03:00
PThorpe92
addb9ef65b Add regression test for #3540 attach issue 2025-10-06 21:33:42 -04:00
Glauber Costa
beb44e8e8c fix mviews with re-insertion of data with the same key
There is currently a bug found in our materialized view implementation
that happens when we delete a row, and then re-insert another row with
the same primary key.

Our insert code needs to detect updates and generate a DELETE +
INSERT. But in this case, after the initial DELETE, the fresh insert
generates another delete.

We ended up with the wrong response for aggregations (and I am pretty
sure even filter-only views would manifest the bug as well), where
groups that should still be present just disappeared because of the
extra delete.

A new test case is added that fails without the fix.
2025-10-06 20:12:49 -05:00
Pekka Enberg
b063d0d41a Merge 'Don't panic if doing INSERT INTO ... SELECT rowid' from Jussi Saurio
Backport: 0.2
Closes #3567

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3572
2025-10-04 10:11:09 +03:00
Pekka Enberg
50607607fa Merge 'Actually enforce uniqueness in create unique index' from Jussi Saurio
we just weren't doing it 🤡
Backport: 0.2
Closes #3568

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3571
2025-10-04 10:07:44 +03:00
Jussi Saurio
81b437c690 Don't panic if doing INSERT INTO ... SELECT rowid
Closes #3567
2025-10-03 23:12:24 +03:00
Jussi Saurio
8dac1ba21a Fix: actually enforce uniqueness in CREATE UNIQUE INDEX
...we just didn't do it
2025-10-03 22:58:42 +03:00
Pekka Enberg
1b42f77300 Merge 'Add short writes to unreliable-libc' from FamHaggs
Add short writes in the faulty_libc
As @PThorpe92 stated in #3209, this should be implemented here instead
of the memory io in the simulator. Running this in the stress test I
caught a logic bug in the try_pwritev_raw I will create a pr for that
small fix. I will close #3209 in favor of this pr.

Closes #3569
2025-10-03 21:52:47 +03:00
FHaggs
dd6e092a5c Add short writes to pwritev in faulty_libc. 2025-10-03 18:35:03 +02:00
Jussi Saurio
d2f5e67b25 Merge 'Fix COLLATE' from Jussi Saurio
Fixes the following problems with COLLATE:
- Fix: incorrectly used e.g. `x COLLATE NOCASE = 'fOo'` as index
constraint on an index whose column was not case-insensitively collated
- Fix: various ephemeral indexes (in GROUP BY, ORDER BY, DISTINCT) and
subqueries did not retain proper collation information of columns
- Fix: collation of a given expression was not determined properly
according to SQLite's rules
Adds TCL tests and fuzz test
Closes #3476
Closes #1524
Closes #3305

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3538
2025-10-03 09:34:24 +03:00
Jussi Saurio
ce7fe54841 Collate: add more TCL tests 2025-10-02 21:49:33 +03:00
PThorpe92
361bd70a26 Add regression test for rowid affinity 2025-10-02 14:31:22 -04:00
Pekka Enberg
dc1463c70d Merge 'Improve error handling for cyclic views' from Duy Dang
The cycle is detected by marking a seen view, if a seen view is process
again, that's a cycle and we throw an error.
Close #3404

Closes #3467
2025-10-02 19:33:12 +03:00
FHaggs
095e72eac9 Add short write to pwrite in faulty_libc. 2025-10-02 16:11:15 +02:00
Jussi Saurio
30e6524c4e Fix: JOIN USING should pick columns from left table, not right
Closes #3468
Closes #3479
2025-10-02 06:56:52 +03:00
Jussi Saurio
c0da38e24a Merge 'Clear WhereTerm 'from_outer_join' state when LEFT JOIN is optimized to INNER JOIN' from Jussi Saurio
Closes #3470
## Background
In a query like `SELECT * FROM t LEFT JOIN s ON t.a=s.a WHERE s.a =
'foo'` we can remove the LEFT JOIN and replace it with an `INNER JOIN`
because NULL values will never be equal to 'foo'. Rewriting as `INNER
JOIN` allows the optimizer to also reorder the table join order to come
up with a more efficient query plan. In fact, we have this optimization
already.
## Problem
However, there is a dumb bug where `WhereTerm`s involving this join
still retain their `from_outer_join` state, resulting in forcing the
evaluation of those terms at the original join index, which results in
completely wrong bytecode if the join optimizer decides to reorder the
join as `s JOIN t` instead. Effectively it will evaluate `t.a=s.a` after
table `s` is open but table `t` is not open yet.
## Fix
This PR fixes that issue by clearing `from_outer_join` properly from the
relevant `WhereTerm`s.

Closes #3475
2025-10-02 06:56:07 +03:00