This optimization reuses an existing cursor when op_open_write() is
called on the same table/index (same root_page). This is safe because
the cursor position doesn't matter - op_rewind() is always called after
op_open_write() to position the cursor at the beginning of the
table/index before any operations are performed.
This change speeds up op_open_write() by avoiding unnecessary cursor re-
initialization.
Closes#3815
This optimization reuses an existing cursor when op_open_write() is
called on the same table/index (same root_page). This is safe because
the cursor position doesn't matter - op_rewind() is always called
after op_open_write() to position the cursor at the beginning of the
table/index before any operations are performed.
This change speeds up op_open_write() by avoiding unnecessary cursor
re-initialization.
This PR implements simple heap-sort approach for query plans like
`SELECT ... FROM t WHERE ... ORDER BY ... LIMIT N` in order to maintain
small set of top N elements in the ephemeral B-tree and avoid sort and
materialization of whole dataset.
I removed all optimizations not related to this particular change in
order to make branch lightweight.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#3726
## Gist
This PR implements _statement subtransactions_, which means that a
single statement within an interactive transaction can individually be
rolled back.
## Background
The default constraint violation resolution strategy in SQLite is
`ABORT`, which means to rollback the statement that caused the conflict.
For example:
```sql
CREATE TABLE t(x UNIQUE);
INSERT INTO t VALUES (1);
BEGIN;
INSERT INTO t VALUES (2),(3); -- ok
INSERT INTO t VALUES (4),(1); -- conflict on 1, this statement should rollback
INSERT INTO t VALUES (5); -- ok
COMMIT; -- ok
SELECT * FROM t;
1
2
3
5
```
So far we haven't been able to support this due to lack of support for
subtransactions, and have used the `ROLLBACK` strategy, which means to
rollback the entire transaction on any constraint error.
## Problem
Although PRIMARY KEY and UNIQUE constraints allow defining the conflict
resolution strategy (e.g. `id INTEGER PRIMARY KEY ON CONFLICT
ROLLBACK`), FOREIGN KEY violations do not support this: they always use
`ABORT` i.e. statement subtransaction rollback. For this reason alone it
is important to implement this mechanism now rather than later, since we
already have FOREIGN KEY support implemented.
## Details
This PR implements statement subtransactions with _anonymous
savepoints_. This means that whenever a statement begins, it will open a
new savepoint which will write "page undo images" into a temporary file
called a _subjournal_. Whenever the statement marks a page as dirty, it
will write the before-image of the page into the subjournal so that its
modifications can be undone in the event of an ABORT (statement
rollback).
- Right now, only anonymous savepoints are supported, so the explicit
`SAVEPOINT` syntax is not.
- Due to the above, there can be only one savepoint open per pager, and
this is enforced with assertions.
- The subjournal file is currently entirely in memory. If it were not,
we would either have to block on IO or refactor many usages of code to
account for potentially pending completions.
- Constraint errors no longer cause transactions to abort nor do they
cause the page cache to be cleared - instead, subjournaled pages will be
brought back into the page cache which effectively handles the same
behavior albeit more fine-grained.
Reviewed-by: Preston Thorpe <preston@turso.tech>
Closes#3792
We don't want something like `BEGIN IMMEDIATE` to start a subtransaction,
so instead we will open it if:
- Statement is write, AND
a) Statement has >0 table_references, or
b) The statement is an INSERT (INSERT doesn't track table_references in
the same way as other program types)
1. The number of deferred FK violations when the statement started.
When a statement subtransaction rolls back, the connection's
deferred violation counter will be reset to this value.
2. The number of immediate FK violations that occurred during the
statement. In practice we just need to know whether this number
is nonzero, and if it is, the statement subtransaction will roll
back.
Statement subtransactions will be implemented in future commits.
Every transaction was reading page 1 from the WAL to check the schema cookie
in op_transaction, causing unnecessary WAL lookups.
This commit caches the schema_cookie in Pager as AtomicU64, similar to how
page_size and reserved_space are already cached. The cache is updated when the
header is read/modified and invalidated in begin_read_tx() when WAL changes
are detected from other connections.
This matches SQLite's approach of caching frequently accessed header fields to
avoid repeated page 1 reads. Improves write throughput by 5% in our
benchmarks.
Depends on #3584 to use the most up-to-date implementation of
`ThreadRng`
- Add `fill_bytes` method to `IO`
- use `thread_rng` instead of `getrandom`, as `getrandom` is much slower
and `thread_rng` offers enough security
- modify `exec_randomblob`, `exec_random` and random_rowid generation to
use methods from IO for determinism
- modified simulator IO to implement `fill_bytes`
This the PRNG for sqlite if someone is curious. It is similar to
`thread_rng`:
```c
/* Initialize the state of the random number generator once,
** the first time this routine is called.
*/
if( wsdPrng.s[0]==0 ){
sqlite3_vfs *pVfs = sqlite3_vfs_find(0);
static const u32 chacha20_init[] = {
0x61707865, 0x3320646e, 0x79622d32, 0x6b206574
};
memcpy(&wsdPrng.s[0], chacha20_init, 16);
if( NEVER(pVfs==0) ){
memset(&wsdPrng.s[4], 0, 44);
}else{
sqlite3OsRandomness(pVfs, 44, (char*)&wsdPrng.s[4]);
}
wsdPrng.s[15] = wsdPrng.s[12];
wsdPrng.s[12] = 0;
wsdPrng.n = 0;
}
assert( N>0 );
while( 1 /* exit by break */ ){
if( N<=wsdPrng.n ){
memcpy(zBuf, &wsdPrng.out[wsdPrng.n-N], N);
wsdPrng.n -= N;
break;
}
if( wsdPrng.n>0 ){
memcpy(zBuf, wsdPrng.out, wsdPrng.n);
N -= wsdPrng.n;
zBuf += wsdPrng.n;
}
wsdPrng.s[12]++;
chacha_block((u32*)wsdPrng.out, wsdPrng.s);
wsdPrng.n = 64;
}
sqlite3_mutex_leave(mutex);
```
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#3799
closes#3666
and probably other issues i'll have to go digging through to see if
there is any others.
<img width="948" height="445" alt="image" src="https://github.com/user-
attachments/assets/2844e09b-109a-4a70-bd18-d8a814e49ea0" />
Any ALTER COLUMN stmt will now update the constraints on the table
(primary key, foreign key, unique)
Closes#3776
DEFERRED was a bit too deferred - it allowed the dirty pages to be
written out to WAL before checking for violations, resulting in the
violations effectively being committed even though the transaction
ended up aborting
Closes#3687 .
Previously, the `try_fold_expr_to_i64` function casted `NULL` as `0`
when evaluating expressions in `LIMIT` or `OFFSET` clauses. I removed
this function since evaluating the expression directly and relying on
the MustBeInt operation for casting seems to handle everything.
Closes#3695
We merged two concurrent fixes to `nchange` handling last night and
AFAICT the fix in #3692 was incorrect because it doesn't count UPDATEs
in cases where the original row was DELETEd as part of the UPDATE
statement.
The correct fix was in 87434b8
Closes#2600
## Problem
Every btree has a key it is sorted by - this is the integer `rowid` for
tables and an arbitrary-sized, potentially multi-column key for indexes.
Executing an UPDATE in a loop is not safe if the update modifies any
part of the key of the btree that is used for iterating the rows in said
loop. For example:
- Using the table itself to iterate rows is not safe if the UPDATE
modifies the rowid (or rowid alias) of a row, because since it modifies
the iteration order itself, it may cause rows to be skipped:
```sql
CREATE TABLE t(x INTEGER PRIMARY KEY, y);
INSERT <something>
UPDATE t SET y = RANDOM() where x > 100; // safe to iterate 't', 'y' is not being modified
UPDATE t SET x = RANDOM() where x > 100; // not safe to iterate 't', 'x' is being modified
```
- Using an index to iterate rows is not safe if the UPDATE modifies any
of the columns in the index key
```sql
CREATE TABLE t(x, y, z);
CREATE INDEX txy ON t (x,y);
INSERT <something>
UPDATE t SET z = RANDOM() where x = 100 and y > 0; // safe to iterate txy, neither x or y is being modified
UPDATE t SET x = RANDOM() where x = 100 and y > 0; // not safe to iterate txy, 'x' is being modified
UPDATE t SET y = RANDOM() where x = 100 and y > 0; // not safe to iterate txy, 'y' is being modified
```
## Current solution in tursodb
Our current `main` code recognizes this issue and adopts this pseudocode
algorithm from SQLite:
- open a table or index for reading the rows of the source table,
- for each row that matches the condition in the UPDATE statement, write
the row into a temporary table
- then use that temporary table for iteration in the UPDATE loop.
This guarantees that the iteration order will not be affected by the
UPDATEs because the ephemeral table is not under modification.
## Problem with current solution
Our `main` code specialcases the ephemeral table solution to rowids /
rowid aliases only. Using indexes for UPDATE iteration was disabled in
an earlier PR (#2599) due to the safety issue mentioned above, which
means that many UPDATE statements become full table scans:
```sql
turso> create table t(x PRIMARY KEY);
turso> insert into t select value from generate_series(1,10000);
turso> explain update t set x = x + 100000 where x > 50 and x < 60;
addr opcode p1 p2 p3 p4 p5 comment
---- ----------------- ---- ---- ---- ------------- -- -------
0 Init 0 28 0 0 Start at 28
1 OpenWrite 0 2 0 0 root=2; iDb=0
2 OpenWrite 1 3 0 0 root=3; iDb=0
-- scan entire 't' despite very narrow update range!
3 Rewind 0 27 0 0 Rewind table t
...
```
## Solution
We move the ephemeral table logic to _after_ the optimizer has selected
the best access path for the table, and then, if the UPDATE modifies the
key of the chosen access path (table or index; whichever was selected by
the optimizer), we change the plan to include the ephemeral table
prepopulation. Hence, the same query from above becomes:
```sql
turso> explain update t set x = x + 100000 where x > 50 and x < 60;
addr opcode p1 p2 p3 p4 p5 comment
---- ----------------- ---- ---- ---- ------------- -- -------
0 Init 0 35 0 0 Start at 35
1 OpenEphemeral 0 1 0 0 cursor=0 is_table=true
2 OpenRead 1 3 0 0 index=sqlite_autoindex_t_1, root=3, iDb=0
3 Integer 50 2 0 0 r[2]=50
-- index seek on PRIMARY KEY index
4 SeekGT 1 10 2 0 key=[2..2]
5 Integer 60 2 0 0 r[2]=60
6 IdxGE 1 10 2 0 key=[2..2]
7 IdxRowId 1 1 0 0 r[1]=cursor 1 for index sqlite_autoindex_t_1.rowid
8 Insert 0 3 1 ephemeral_scratch 2 intkey=r[1] data=r[3]
9 Next 1 6 0 0
10 OpenWrite 2 2 0 0 root=2; iDb=0
11 OpenWrite 3 3 0 0 root=3; iDb=0
-- only scan rows that were inserted to ephemeral index
12 Rewind 0 34 0 0 Rewind table ephemeral_scratch
13 RowId 0 5 0 0 r[5]=ephemeral_scratch.rowid
```
Note that an ephemeral index does not have to be used if the index is
not affected:
```sql
turso> create table t(x PRIMARY KEY, data);
turso> explain update t set data = 'some_data' where x > 50 and x < 60;
addr opcode p1 p2 p3 p4 p5 comment
---- ----------------- ---- ---- ---- ------------- -- -------
0 Init 0 15 0 0 Start at 15
1 OpenWrite 0 2 0 0 root=2; iDb=0
2 OpenWrite 1 3 0 0 root=3; iDb=0
3 Integer 50 1 0 0 r[1]=50
-- direct index seek
4 SeekGT 1 14 1 0 key=[1..1]
```
Reviewed-by: Preston Thorpe <preston@turso.tech>
Closes#3728