Commit Graph

738 Commits

Author SHA1 Message Date
PThorpe92
5c207618a7 Fix extensions py test 2025-11-09 11:35:57 -05:00
PThorpe92
b443b09516 Remove VRollback and VCommit as they are unused opcodes in sqlite 2025-11-09 11:27:09 -05:00
PThorpe92
993c9d34b4 Rollback vtab txns when when err code is present in Halt 2025-11-09 11:07:43 -05:00
PThorpe92
f35ccfba17 Add support for renaming virtual tables 2025-11-09 11:07:42 -05:00
PThorpe92
e09d9eb720 Add VBegin, VRename, VRollback and VCommit opcodes 2025-11-09 11:07:42 -05:00
Preston Thorpe
4e8b4c96d3 Merge 'use dyn DatabaseStorage instead of DatabaseFile' from Nikita Sivukhin
Partial sync for sync engine will need to implement its own version of
`DatabaseStorage` which willl load database pages on demand

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3922
2025-11-06 15:11:13 -05:00
Nikita Sivukhin
da61fa32b4 use dyn DatabaseStorage instead of DatabaseFile 2025-11-06 17:42:03 +04:00
PThorpe92
c5a3e590f7 Fix rewriting sql to persist for foreign keys in alter table func 2025-11-03 09:47:28 -05:00
PThorpe92
ef24911824 Handle renaming child foreign keys on op_rename_table 2025-11-03 09:47:28 -05:00
PThorpe92
481d86f567 Optimize and refactor schema::Column type 2025-11-02 20:46:02 -05:00
PThorpe92
23496f0bea Fix incorrect unreachable precondition for affinity char in op_seek_rowid 2025-11-01 20:43:44 -04:00
Jussi Saurio
ad723b615f Merge 'index_method: fully integrate into query planner' from Nikita Sivukhin
This PR completely integrate custom indices to the query planner.
In order to do that new `Cursor::IndexMethod` is introduced with few
correlated changes in the VM implementation:
1. Added special `IndexMethod{Create,Destroy,Query}` opcodes to handle
index method creation, deletion and query
2. `Next` , `IdxRowid` , `IdxInsert`, `IdxDelete` opcodes updated to
properly handle new cursor case

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3827
2025-10-29 09:42:37 +02:00
Pekka Enberg
810ed8ad60 Merge 'Don't allow autovacuum to be flipped on non-empty databases' from Pavan Nambi
Turso incorrectly creates the first table in an autovacuumed table in
page 2.
(Note: this is on collaboration with @LeMikaelF)
SQLite does not allow enabling or disabling auto-vacuum after the first
table has been created
(https://sqlite.org/pragma.html#pragma_auto_vacuum). This is because the
sequence of the pages in the databases is different when auto-vacuum is
enabled, because the first b-tree page must be page 3 instead of 2, to
make room for the first [Pointer Map
page](https://sqlite.org/fileformat.html#pointer_map_or_ptrmap_pages).
But Turso doesn't currently consider this, which can lead to data loss.
The simplest way to reproduce this is to create an autovacuumed
databases with either `pragma auto_vacuum=full` so that autovacuum runs
on each commit, and then create a table with some data. Turso will
incorrectly create the new table on page 2. After this, every time a new
page is created, either through a page split or because a new table is
created, Turso will write a 5-byte pointer in page 2, starting from the
top of the page, thereby overwriting existing data.
For example, let's start with a clean database and the first bytes of
page 2. It starts with `0d`, the discriminator for a leaf page
([source](https://www.sqlite.org/fileformat.html#b_tree_pages)). The
next interesting number is the number of cells contained in this page
(`01`) at offset 5.
```
$ cargo run -- /tmp/a.db
turso> create table t(a);
turso> insert into t values ('myvalue');

$ dbtotxt /tmp/a.db
| size 8192 pagesize 4096 filename a.db
| page 1 offset 0
# ...snip...
| page 2 offset 4096
|      0: 0d 00 00 00 01 0f f5 00 0f f5 00 00 00 00 00 00   ................
|   4080: 00 00 00 00 00 09 01 02 1b 6d 79 76 61 6c 75 65   .........myvalue
| end a.db
```
Pointer map pages are located every N pages, starting from page 2, and
contain a list of 5-byte pointers that represent the parent page of a
certain page. So whenever Turso or SQLite needs to add a page, it will
overwrite 5 bytes of page 2. This means that for data loss to occur, it
is sufficient to add a single page to the database, for example by
creating a table. Offset 5 will then be zeroed out:
```
$ cargo run -- /tmp/a.db
turso> create table t(a);
turso> insert into t values ('myvalue');
turso> pragma auto_vacuum=full;
turso> create table tt(a);

$ dbtotxt /tmp/a.db
| size 12288 pagesize 4096 filename a.db
| page 1 offset 0
# ...snip...
| page 2 offset 4096
|      0: 01 00 00 00 00 0f f5 00 0f f5 00 00 00 00 00 00   ................
|   4080: 00 00 00 00 00 09 01 02 1b 6d 79 76 61 6c 75 65   .........myvalue
```
Creating more tables, or adding more B-tree pages, will keep overwriting
the rest of the page, until the cells themselves are also overwritten.
## Reproducing the issue in the simulator
We have been unable to reproduce this exact corruption mode in the
simulator, but patching it shows many failure modes, all of which don't
occur with the unpatched simulator. The following seeds are failing. The
following seeds are showing the issue when the patched simulator is ran
against `main`:
- `11522841279124073062`, with "Assertion 'table inquisitive_graham_159
should contain all of its expected values' failed: table
inquisitive_graham_159 does not contain the expected values, the
simulator model has more rows than the database"
- `7057400018220918989`, `16028085350691325843`, `7721542713659053944`,
and `203017821863546118`, with "Failed to read ptrmap key=XXX"
- `12533694709304969540`, `18357088553315413457`, `3108945730906932377`,
with "Integrity Check Failed: Cell N in page 2 is out of range."
    - `4757352625344646473`, with "dirty pages should be empty for read
txn"
- `7083498604824302257`, with "header_size: 6272, header_len_bytes: 2,
payload.len(): 13"
- `17881876827470741581`, with "ParseError("no such table:
focused_historians_416")"
- `2092231500503735693`, with "range end index 4789 out of range for
slice of length 4096"
- `7555257419378470845`, with malformed database schema
(imaginative_ontivero\u{1})"
- `12905270229511147245`, with "index out of bounds: the len is 4096 but
the index is 4096"
## Fixing the issue
- When DB is opened, we read the `auto_vacuum` state, instead of
assuming `auto_vacuum=none`.
- Don't allow auto_vacuum to be flipped on non-empty databases as if we
allow this it could cause overlap with existing bits.(ptrmap could
overwrite existing data)
- Modify integrity check to avoid reporting that page 2 is orphaned in
auto-vacuumed databases.
Fixes #3752

Closes #3830
2025-10-28 14:48:35 +02:00
Nikita Sivukhin
8ea733f917 fix bug with cursor allocation 2025-10-28 11:27:35 +04:00
Nikita Sivukhin
8acbe3de66 make query_start method to return bool - if result will have some rows or not 2025-10-28 11:27:35 +04:00
Nikita Sivukhin
67c1855ba8 fix bug 2025-10-28 11:27:35 +04:00
Nikita Sivukhin
6206294584 fix clippy 2025-10-28 11:27:35 +04:00
Nikita Sivukhin
8dd2644c07 add support for new cursor type in existing op codes and also implement new opcodes in the VM 2025-10-28 11:27:35 +04:00
Nikita Sivukhin
408ca235d1 small refactoring 2025-10-27 12:43:38 +04:00
Nikita Sivukhin
906bbdd1c4 support deep nestedness 2025-10-27 11:37:42 +04:00
Pekka Enberg
7d035f27d8 Merge 'Strict numeric cast for op_must_be_int' from bit-aloo
closes: #3302

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3771
2025-10-26 16:42:35 +02:00
Pekka Enberg
6603f5318a Merge 'core/vdbe: Reuse cursor in op_open_write()' from Pekka Enberg
This optimization reuses an existing cursor when op_open_write() is
called on the same table/index (same root_page). This is safe because
the cursor position doesn't matter - op_rewind() is always called after
op_open_write() to position the cursor at the beginning of the
table/index before any operations are performed.
This change speeds up op_open_write() by avoiding unnecessary cursor re-
initialization.

Closes #3815
2025-10-26 12:29:20 +02:00
Pekka Enberg
ca073b5ecd Merge 'core: Switch RwLock<Arc<Pager>> to ArcSwap<Pager>' from Pekka Enberg
We don't actually need the RwLock locking capabilities, just the ability
to swap the instance.

Closes #3814
2025-10-26 12:29:11 +02:00
Pavan-Nambi
277a989a71 fmt 2025-10-24 21:34:17 +05:30
Pavan-Nambi
7dda783006 clippy - gotta feature autovaccuum n ptrmaps 2025-10-24 21:30:34 +05:30
Pavan-Nambi
8d0ae362da Merge branch 'main' of github.com:tursodatabase/turso into avcm 2025-10-24 18:58:30 +05:30
Pekka Enberg
c3fb867173 core: Switch RwLock<Arc<Pager>> to ArcSwap<Pager>
We don't actually need the RwLock locking capabilities, just the ability
to swap the instance.
2025-10-24 14:10:08 +03:00
bit-aloo
64bbca9e12 Fix op_must_be_int to use strict numeric cast 2025-10-24 16:08:15 +05:30
Pekka Enberg
413c582b41 core/vdbe: Reuse cursor in op_open_write()
This optimization reuses an existing cursor when op_open_write() is
called on the same table/index (same root_page). This is safe because
the cursor position doesn't matter - op_rewind() is always called
after op_open_write() to position the cursor at the beginning of the
table/index before any operations are performed.

This change speeds up op_open_write() by avoiding unnecessary cursor
re-initialization.
2025-10-23 16:34:42 +03:00
Jussi Saurio
ae22468d8b Merge 'Order by heap sort' from Nikita Sivukhin
This PR implements simple heap-sort approach for query plans like
`SELECT ... FROM t WHERE ... ORDER BY ... LIMIT N` in order to maintain
small set of top N elements in the ephemeral B-tree and avoid sort and
materialization of whole dataset.
I removed all optimizations not related to this particular change in
order to make branch lightweight.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3726
2025-10-23 15:00:42 +03:00
Jussi Saurio
64560a61c3 Merge 'Support statement-level rollback via anonymous savepoints' from Jussi Saurio
## Gist
This PR implements _statement subtransactions_, which means that a
single statement within an interactive transaction can individually be
rolled back.
## Background
The default constraint violation resolution strategy in SQLite is
`ABORT`, which means to rollback the statement that caused the conflict.
For example:
```sql
CREATE TABLE t(x UNIQUE);
INSERT INTO t VALUES (1);

BEGIN;
  INSERT INTO t VALUES (2),(3); -- ok
  INSERT INTO t VALUES (4),(1); -- conflict on 1, this statement should rollback
  INSERT INTO t VALUES (5); -- ok
COMMIT; -- ok

SELECT * FROM t;
1
2
3
5
```
 So far we haven't been able to support this due to lack of support for
subtransactions, and have used the `ROLLBACK` strategy, which means to
rollback the entire transaction on any constraint error.
## Problem
Although PRIMARY KEY and UNIQUE constraints allow defining the conflict
resolution strategy (e.g. `id INTEGER PRIMARY KEY ON CONFLICT
ROLLBACK`), FOREIGN KEY violations do not support this: they always use
`ABORT` i.e. statement subtransaction rollback. For this reason alone it
is important to implement this mechanism now rather than later, since we
already have FOREIGN KEY support implemented.
## Details
This PR implements statement subtransactions with _anonymous
savepoints_. This means that whenever a statement begins, it will open a
new savepoint which will write "page undo images" into a temporary file
called a _subjournal_. Whenever the statement marks a page as dirty, it
will write the before-image of the page into the subjournal so that its
modifications can be undone in the event of an ABORT (statement
rollback).
- Right now, only anonymous savepoints are supported, so the explicit
`SAVEPOINT` syntax is not.
- Due to the above, there can be only one savepoint open per pager, and
this is enforced with assertions.
- The subjournal file is currently entirely in memory. If it were not,
we would either have to block on IO or refactor many usages of code to
account for potentially pending completions.
- Constraint errors no longer cause transactions to abort nor do they
cause the page cache to be cleared - instead, subjournaled pages will be
brought back into the page cache which effectively handles the same
behavior albeit more fine-grained.

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3792
2025-10-23 15:00:11 +03:00
Jussi Saurio
2b73260dd9 Handle cases where DB grows or shrinks due to savepoint rollback 2025-10-22 23:40:45 +03:00
Jussi Saurio
fe51804e6b Implement crude way of making opening subtransaction conditional
We don't want something like `BEGIN IMMEDIATE` to start a subtransaction,
so instead we will open it if:

- Statement is write, AND

a) Statement has >0 table_references, or
b) The statement is an INSERT (INSERT doesn't track table_references in
   the same way as other program types)
2025-10-22 23:40:45 +03:00
Jussi Saurio
7376475cb3 Do not start statement subtransactions when MVCC is enabled
MVCC does not support statement-level rollback.
2025-10-22 23:40:45 +03:00
Jussi Saurio
d8cc57cf14 clippy: Remove unnecessary referencing 2025-10-22 23:40:45 +03:00
Jussi Saurio
086ba8c946 VDBE: begin statement subtransaction in op_transaction 2025-10-22 23:40:45 +03:00
Jussi Saurio
904cbe535d VDBE: handle subtransaction commits/aborts in op_halt 2025-10-22 23:40:45 +03:00
Jussi Saurio
ad80285437 Rename is_scope to deferred and invert respective boolean logic
Much clearer name for what it is/does
2025-10-22 23:40:44 +03:00
Jussi Saurio
d4a9797f79 Store two foreign key counters in ProgramState
1. The number of deferred FK violations when the statement started.
   When a statement subtransaction rolls back, the connection's
   deferred violation counter will be reset to this value.
2. The number of immediate FK violations that occurred during the
   statement. In practice we just need to know whether this number
   is nonzero, and if it is, the statement subtransaction will roll
   back.

Statement subtransactions will be implemented in future commits.
2025-10-22 23:40:44 +03:00
Nikita Sivukhin
91ffb4e249 Revert "avoid allocations"
This reverts commit dba195bdfa.
2025-10-22 20:21:39 +04:00
Nikita Sivukhin
53957b6d22 Revert "simplify serial_type size calculation"
This reverts commit f19c73822e.
2025-10-22 20:21:00 +04:00
Nikita Sivukhin
8e1cec5104 Revert "alternative read_variant implementation"
This reverts commit 68650cf594.
2025-10-22 19:30:43 +04:00
Pekka Enberg
5dd503b7b9 core/storage: Cache schema cookie in Pager
Every transaction was reading page 1 from the WAL to check the schema cookie
in op_transaction, causing unnecessary WAL lookups.

This commit caches the schema_cookie in Pager as AtomicU64, similar to how
page_size and reserved_space are already cached. The cache is updated when the
header is read/modified and invalidated in begin_read_tx() when WAL changes
are detected from other connections.

This matches SQLite's approach of caching frequently accessed header fields to
avoid repeated page 1 reads. Improves write throughput by 5% in our
benchmarks.
2025-10-22 16:51:15 +03:00
Nikita Sivukhin
671d266dd6 Revert "wip"
This reverts commit dd34f7fd50.
2025-10-22 11:47:46 +04:00
Nikita Sivukhin
bf77862fab Merge branch 'main' into order-by-heap-sort 2025-10-22 11:44:55 +04:00
Pekka Enberg
1aad1b224a Merge 'core/io: Make random generation deterministically simulated' from Pedro Muniz
Depends on #3584 to use the most up-to-date implementation of
`ThreadRng`
- Add `fill_bytes` method to `IO`
- use `thread_rng` instead of `getrandom`, as `getrandom` is much slower
and `thread_rng` offers enough security
- modify `exec_randomblob`, `exec_random` and random_rowid generation to
use methods from IO for determinism
- modified simulator IO to implement `fill_bytes`
This the PRNG for sqlite if someone is curious. It is similar to
`thread_rng`:
```c
/* Initialize the state of the random number generator once,
  ** the first time this routine is called.
  */
  if( wsdPrng.s[0]==0 ){
    sqlite3_vfs *pVfs = sqlite3_vfs_find(0);
    static const u32 chacha20_init[] = {
      0x61707865, 0x3320646e, 0x79622d32, 0x6b206574
    };
    memcpy(&wsdPrng.s[0], chacha20_init, 16);
    if( NEVER(pVfs==0) ){
      memset(&wsdPrng.s[4], 0, 44);
    }else{
      sqlite3OsRandomness(pVfs, 44, (char*)&wsdPrng.s[4]);
    }
    wsdPrng.s[15] = wsdPrng.s[12];
    wsdPrng.s[12] = 0;
    wsdPrng.n = 0;
  }

  assert( N>0 );
  while( 1 /* exit by break */ ){
    if( N<=wsdPrng.n ){
      memcpy(zBuf, &wsdPrng.out[wsdPrng.n-N], N);
      wsdPrng.n -= N;
      break;
    }
    if( wsdPrng.n>0 ){
      memcpy(zBuf, wsdPrng.out, wsdPrng.n);
      N -= wsdPrng.n;
      zBuf += wsdPrng.n;
    }
    wsdPrng.s[12]++;
    chacha_block((u32*)wsdPrng.out, wsdPrng.s);
    wsdPrng.n = 64;
  }
  sqlite3_mutex_leave(mutex);
```

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #3799
2025-10-22 09:10:36 +03:00
Pere Diaz Bou
3227caaa1d Merge 'core: move BTreeCursor under MVCC cursor' from Pere Diaz Bou
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3756
2025-10-21 19:20:49 +02:00
pedrocarlo
8c0b9c6979 add additional fill_bytes method to IO to deterministically generate
random bytes and modify random functions to use them
2025-10-21 14:10:38 -03:00
pedrocarlo
8501bc930a use workspace rand version 2025-10-21 14:10:05 -03:00
Pere Diaz Bou
ea04e9033a core/mvcc: add btree_cursor under MVCC cursor 2025-10-21 18:22:37 +02:00