Commit Graph

1627 Commits

Author SHA1 Message Date
Pavan-Nambi
8d0ae362da Merge branch 'main' of github.com:tursodatabase/turso into avcm 2025-10-24 18:58:30 +05:30
Pekka Enberg
4c59f29931 Merge 'core/storage: Fix WAL already enabled issue' from Pekka Enberg
If WAL is already enabled, let's just continue execution instead of
erroring out.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #3819
2025-10-23 20:56:57 +03:00
Pekka Enberg
87069fde93 core/storage: Fix WAL already enabled issue
If WAL is already enabled, let's just continue execution instead of erroring out.
2025-10-23 19:35:46 +03:00
Jussi Saurio
ae22468d8b Merge 'Order by heap sort' from Nikita Sivukhin
This PR implements simple heap-sort approach for query plans like
`SELECT ... FROM t WHERE ... ORDER BY ... LIMIT N` in order to maintain
small set of top N elements in the ephemeral B-tree and avoid sort and
materialization of whole dataset.
I removed all optimizations not related to this particular change in
order to make branch lightweight.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3726
2025-10-23 15:00:42 +03:00
Jussi Saurio
64560a61c3 Merge 'Support statement-level rollback via anonymous savepoints' from Jussi Saurio
## Gist
This PR implements _statement subtransactions_, which means that a
single statement within an interactive transaction can individually be
rolled back.
## Background
The default constraint violation resolution strategy in SQLite is
`ABORT`, which means to rollback the statement that caused the conflict.
For example:
```sql
CREATE TABLE t(x UNIQUE);
INSERT INTO t VALUES (1);

BEGIN;
  INSERT INTO t VALUES (2),(3); -- ok
  INSERT INTO t VALUES (4),(1); -- conflict on 1, this statement should rollback
  INSERT INTO t VALUES (5); -- ok
COMMIT; -- ok

SELECT * FROM t;
1
2
3
5
```
 So far we haven't been able to support this due to lack of support for
subtransactions, and have used the `ROLLBACK` strategy, which means to
rollback the entire transaction on any constraint error.
## Problem
Although PRIMARY KEY and UNIQUE constraints allow defining the conflict
resolution strategy (e.g. `id INTEGER PRIMARY KEY ON CONFLICT
ROLLBACK`), FOREIGN KEY violations do not support this: they always use
`ABORT` i.e. statement subtransaction rollback. For this reason alone it
is important to implement this mechanism now rather than later, since we
already have FOREIGN KEY support implemented.
## Details
This PR implements statement subtransactions with _anonymous
savepoints_. This means that whenever a statement begins, it will open a
new savepoint which will write "page undo images" into a temporary file
called a _subjournal_. Whenever the statement marks a page as dirty, it
will write the before-image of the page into the subjournal so that its
modifications can be undone in the event of an ABORT (statement
rollback).
- Right now, only anonymous savepoints are supported, so the explicit
`SAVEPOINT` syntax is not.
- Due to the above, there can be only one savepoint open per pager, and
this is enforced with assertions.
- The subjournal file is currently entirely in memory. If it were not,
we would either have to block on IO or refactor many usages of code to
account for potentially pending completions.
- Constraint errors no longer cause transactions to abort nor do they
cause the page cache to be cleared - instead, subjournaled pages will be
brought back into the page cache which effectively handles the same
behavior albeit more fine-grained.

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3792
2025-10-23 15:00:11 +03:00
Pekka Enberg
418fc90f8a Merge 'core/storage: Cache schema cookie in Pager' from Pekka Enberg
Every transaction was reading page 1 from the WAL to check the schema
cookie in op_transaction, causing unnecessary WAL lookups.
This commit caches the schema_cookie in Pager as AtomicU64, similar to
how page_size and reserved_space are already cached. The cache is
updated when the header is read/modified and invalidated in
begin_read_tx() when WAL changes are detected from other connections.
This matches SQLite's approach of caching frequently accessed header
fields to avoid repeated page 1 reads. Improves write throughput by 5%
in our benchmarks.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3727
2025-10-23 14:00:27 +03:00
Jussi Saurio
2b73260dd9 Handle cases where DB grows or shrinks due to savepoint rollback 2025-10-22 23:40:45 +03:00
Jussi Saurio
fe51804e6b Implement crude way of making opening subtransaction conditional
We don't want something like `BEGIN IMMEDIATE` to start a subtransaction,
so instead we will open it if:

- Statement is write, AND

a) Statement has >0 table_references, or
b) The statement is an INSERT (INSERT doesn't track table_references in
   the same way as other program types)
2025-10-22 23:40:45 +03:00
Jussi Saurio
e04c6c9b46 Mark pages_to_balance as dirty only after loading 2025-10-22 23:40:45 +03:00
Jussi Saurio
a14bbdecf2 Add assertion that page is loaded when pager.add_dirty() is called 2025-10-22 23:40:45 +03:00
Jussi Saurio
d8cc57cf14 clippy: Remove unnecessary referencing 2025-10-22 23:40:45 +03:00
Jussi Saurio
25f8ba0025 Pager: clear savepoints when tx rolls back 2025-10-22 23:40:45 +03:00
Jussi Saurio
a8cf8e4594 Pager: subjournal page if required when it's marked as dirty 2025-10-22 23:40:45 +03:00
Jussi Saurio
97177dae02 add missing imports 2025-10-22 23:40:44 +03:00
Jussi Saurio
f4af7c2242 Pager: add begin_statement() method 2025-10-22 23:40:44 +03:00
Jussi Saurio
a19c5c22ac Pager: add rollback_to_newest_savepoint() method 2025-10-22 23:40:44 +03:00
Jussi Saurio
86d5ad6815 pager: allow upserted cached page not to be dirty 2025-10-22 23:40:44 +03:00
Jussi Saurio
5b01605fae Pager: add subjournal_page_if_required() method 2025-10-22 23:40:44 +03:00
Jussi Saurio
e8226c0e4b Pager: add clear_savepoint() method 2025-10-22 23:40:44 +03:00
Jussi Saurio
aa1eebbfcb Pager: add open_savepoint() and release_savepoint() methods 2025-10-22 23:40:44 +03:00
Jussi Saurio
77be1f08ae Pager: add open_subjournal method 2025-10-22 23:40:44 +03:00
Jussi Saurio
2a03c1a617 Add subjournal and savepoints to Pager struct 2025-10-22 23:40:44 +03:00
Jussi Saurio
8b15a06a85 Add Savepoint struct 2025-10-22 23:40:44 +03:00
Jussi Saurio
459c01f93c Add subjournal module
The subjournal is a temporary file where stmt subtransactions write an
'undo log' of pages before modifying them. If a stmt subtransaction
rolls back, the pages are restored from the subjournal.
2025-10-22 23:40:44 +03:00
Nikita Sivukhin
6aa67c6ea0 Revert "slight reorder of operations"
This reverts commit 8e107ab18e.
2025-10-22 20:21:52 +04:00
Nikita Sivukhin
8e1cec5104 Revert "alternative read_variant implementation"
This reverts commit 68650cf594.
2025-10-22 19:30:43 +04:00
Pekka Enberg
5dd503b7b9 core/storage: Cache schema cookie in Pager
Every transaction was reading page 1 from the WAL to check the schema cookie
in op_transaction, causing unnecessary WAL lookups.

This commit caches the schema_cookie in Pager as AtomicU64, similar to how
page_size and reserved_space are already cached. The cache is updated when the
header is read/modified and invalidated in begin_read_tx() when WAL changes
are detected from other connections.

This matches SQLite's approach of caching frequently accessed header fields to
avoid repeated page 1 reads. Improves write throughput by 5% in our
benchmarks.
2025-10-22 16:51:15 +03:00
PThorpe92
a8b257c664 Replace several RwLock<Enum> values with new AtomicEnums 2025-10-22 09:35:26 -04:00
Nikita Sivukhin
671d266dd6 Revert "wip"
This reverts commit dd34f7fd50.
2025-10-22 11:47:46 +04:00
Nikita Sivukhin
bf77862fab Merge branch 'main' into order-by-heap-sort 2025-10-22 11:44:55 +04:00
Pekka Enberg
1aad1b224a Merge 'core/io: Make random generation deterministically simulated' from Pedro Muniz
Depends on #3584 to use the most up-to-date implementation of
`ThreadRng`
- Add `fill_bytes` method to `IO`
- use `thread_rng` instead of `getrandom`, as `getrandom` is much slower
and `thread_rng` offers enough security
- modify `exec_randomblob`, `exec_random` and random_rowid generation to
use methods from IO for determinism
- modified simulator IO to implement `fill_bytes`
This the PRNG for sqlite if someone is curious. It is similar to
`thread_rng`:
```c
/* Initialize the state of the random number generator once,
  ** the first time this routine is called.
  */
  if( wsdPrng.s[0]==0 ){
    sqlite3_vfs *pVfs = sqlite3_vfs_find(0);
    static const u32 chacha20_init[] = {
      0x61707865, 0x3320646e, 0x79622d32, 0x6b206574
    };
    memcpy(&wsdPrng.s[0], chacha20_init, 16);
    if( NEVER(pVfs==0) ){
      memset(&wsdPrng.s[4], 0, 44);
    }else{
      sqlite3OsRandomness(pVfs, 44, (char*)&wsdPrng.s[4]);
    }
    wsdPrng.s[15] = wsdPrng.s[12];
    wsdPrng.s[12] = 0;
    wsdPrng.n = 0;
  }

  assert( N>0 );
  while( 1 /* exit by break */ ){
    if( N<=wsdPrng.n ){
      memcpy(zBuf, &wsdPrng.out[wsdPrng.n-N], N);
      wsdPrng.n -= N;
      break;
    }
    if( wsdPrng.n>0 ){
      memcpy(zBuf, wsdPrng.out, wsdPrng.n);
      N -= wsdPrng.n;
      zBuf += wsdPrng.n;
    }
    wsdPrng.s[12]++;
    chacha_block((u32*)wsdPrng.out, wsdPrng.s);
    wsdPrng.n = 64;
  }
  sqlite3_mutex_leave(mutex);
```

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #3799
2025-10-22 09:10:36 +03:00
Pere Diaz Bou
3227caaa1d Merge 'core: move BTreeCursor under MVCC cursor' from Pere Diaz Bou
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3756
2025-10-21 19:20:49 +02:00
pedrocarlo
8501bc930a use workspace rand version 2025-10-21 14:10:05 -03:00
Pere Diaz Bou
ea04e9033a core/mvcc: add btree_cursor under MVCC cursor 2025-10-21 18:22:37 +02:00
Nikita Sivukhin
00e382a7c7 avoid unnecessary allocations 2025-10-21 20:13:39 +04:00
Pekka Enberg
f764f3061d Merge 'Add Miri support for turso_stress, with bash scripts to run' from Bob Peterson
It was mentioned in https://github.com/tursodatabase/turso/pull/3720
that adding Miri support for `turso_stress` would be useful. And, that a
bash script to start Miri with the right config would be a big help.
Notable changes:
- `antithesis_sdk`'s default features are disabled at the workspace
level, and only enabled as needed with the `antithesis` feature flag in
the various turso crates. Miri needs the noop version of
`antithesis_sdk` to run `turso_stress`, and feature unification
previously prevented this. I'm not able to ensure locally that all the
Antithesis stuff is still happy with these changes.
- Bash script to run `turso_stress` - this is barebones for now, see
below
- Bash script to run `simulator` - this passes any args to the `cargo
run` invocation inside, intercepting `--seed` if it's present, and
generating one from `/dev/random` if it's not. The seed is passed to
both Miri and the simulator to keep the overall execution reproducible.
(I checked this with a simple case)
- A `const fn`, `normal_or_miri` to supply different defaults in things
like CLI args for normal operation and Miri, since it's so slow. (An
idea I stole from tokio.) Right now the relevant values are 100x smaller
for Miri, although Miri is probably 1000 to 10,000x slower overall from
a rough estimation.
Caught UB from running `turso_stress` with Miri:
- An unsafe cast of a `*u8` to `*u32` inside the BTree implementation
resulted in the `*u32` making an unaligned read: `read()` ->
`read_unaligned()` fixes this
Future work - Making `turso_stress` reproducible under Miri:
- Right now `turso_stress` is plugged in to Antithesis, which is great!
But, `antithesis_sdk`'s noop mode (`default-features = false`) turns
`antithesis_sdk::random::get_random()` into `rand::random<u64>()`, which
isn't seedable/reproducible. It's more work than I wanted to take on in
this PR, but I'd like to instead conditionally replace `get_random` with
a seedable `ChaCha8Rng` like in the simulator, if Miri is being used.
Comment:
- On a machine without all necessary dependencies, running the bash
scripts fails in a way that cargo prompts you through installing the
nightly toolchain, Miri, etc. until it works
- Below is a snippet of the output from Miri on the Btree alignment
issue. Because turso_stress isn't yet deterministic/reproducible under
Miri, I can't always reproduce it. (It doesn't always happen like the
ones in my last MR)
```
error: Undefined Behavior: accessing memory based on pointer with alignment 1, but alignment 4 is required
    --> /home/rwp/git/turso/core/storage/btree.rs:2860:50
     |
2860 |                     let mut pgno: u32 = unsafe { right_pointer.cast::<u32>().read().swap_bytes() };
     |                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Undefined Behavior occurred here
     |
     = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
     = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information

```

Closes #3790
2025-10-21 11:53:49 +03:00
Bob Peterson
2cb0a9b34b Use read_unaligned with *u8 cast to *u32
Avoids undefined behavior due to unaligned read caught with Miri
2025-10-20 22:50:57 -05:00
pedrocarlo
ba9a1ebbef add mutable scoped locking for SharedWalFile 2025-10-20 10:45:14 -03:00
pedrocarlo
b00a276960 add scoped locking for SharedWalFile to avoid holding locks for longer than needed 2025-10-20 10:45:14 -03:00
Pavan-Nambi
9841f487a6 dont allow autovacuum on nonempty dbs adds a is_db_empty fn 2025-10-18 19:01:21 +05:30
Pavan-Nambi
1a058a1531 get autovacuum mode from db header on existing dbs
if autovaccum on, look for ptrmap pages
2025-10-18 18:47:30 +05:30
Pekka Enberg
e03f6dbf94 core/storage: Reduce logging level 2025-10-17 20:09:00 +03:00
Jussi Saurio
2ca388d78d WAL: don't hold shared lock across IO operations
Without this change and running:

```
cd stress
cargo run -- --nr-threads=4 -i 1000 --verbose --busy-timeout=0
```

I can produce a deadlock quite reliably.

With this change, I can't.

Even with 5 second busy timeout (the default), the run makes progress although it is slow as hell because of the busy timeout.
2025-10-16 22:00:01 +03:00
Pekka Enberg
afa89c66c0 Merge 'Replace io_yield_many with completion groups' from Pekka Enberg
Reviewed-by: Pedro Muniz (@pedrocarlo)

Closes #3703
2025-10-16 17:17:43 +03:00
Pere Diaz Bou
57eb63cee0 core/bree: remove duplicated code in BTreeCursor 2025-10-16 14:50:08 +02:00
Pekka Enberg
bf5de920f2 core: Unsafe Send and Sync pushdown
This patch pushes unsafe Send and Sync to individual components instead
of doing it at Database level. This makes it easier for us to
incrementally fix thread-safety, but avoid developers adding more thread
unsafe code.
2025-10-16 11:26:50 +03:00
Nikita Sivukhin
dd34f7fd50 wip 2025-10-15 17:27:22 +04:00
Nikita Sivukhin
68650cf594 alternative read_variant implementation
- it faster in benchmark (who knows why)
- also seems bit faster for some my query
- let's test on CI
2025-10-15 17:27:22 +04:00
Nikita Sivukhin
a6a5ffd821 move read_varint_fast closer to the read_varint impl 2025-10-15 17:27:22 +04:00
Nikita Sivukhin
8e107ab18e slight reorder of operations 2025-10-15 17:27:22 +04:00