turso

mirror of https://github.com/aljazceru/turso.git synced 2026-01-04 17:04:18 +01:00

Author	SHA1	Message	Date
Pavan-Nambi	8d0ae362da	Merge branch 'main' of github.com:tursodatabase/turso into avcm	2025-10-24 18:58:30 +05:30
Pekka Enberg	4c59f29931	Merge 'core/storage: Fix WAL already enabled issue' from Pekka Enberg If WAL is already enabled, let's just continue execution instead of erroring out. Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #3819	2025-10-23 20:56:57 +03:00
Pekka Enberg	87069fde93	core/storage: Fix WAL already enabled issue If WAL is already enabled, let's just continue execution instead of erroring out.	2025-10-23 19:35:46 +03:00
Jussi Saurio	ae22468d8b	Merge 'Order by heap sort' from Nikita Sivukhin This PR implements simple heap-sort approach for query plans like `SELECT ... FROM t WHERE ... ORDER BY ... LIMIT N` in order to maintain small set of top N elements in the ephemeral B-tree and avoid sort and materialization of whole dataset. I removed all optimizations not related to this particular change in order to make branch lightweight. Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #3726	2025-10-23 15:00:42 +03:00
Jussi Saurio	64560a61c3	Merge 'Support statement-level rollback via anonymous savepoints' from Jussi Saurio ## Gist This PR implements _statement subtransactions_, which means that a single statement within an interactive transaction can individually be rolled back. ## Background The default constraint violation resolution strategy in SQLite is `ABORT`, which means to rollback the statement that caused the conflict. For example: ```sql CREATE TABLE t(x UNIQUE); INSERT INTO t VALUES (1); BEGIN; INSERT INTO t VALUES (2),(3); -- ok INSERT INTO t VALUES (4),(1); -- conflict on 1, this statement should rollback INSERT INTO t VALUES (5); -- ok COMMIT; -- ok SELECT * FROM t; 1 2 3 5 ``` So far we haven't been able to support this due to lack of support for subtransactions, and have used the `ROLLBACK` strategy, which means to rollback the entire transaction on any constraint error. ## Problem Although PRIMARY KEY and UNIQUE constraints allow defining the conflict resolution strategy (e.g. `id INTEGER PRIMARY KEY ON CONFLICT ROLLBACK`), FOREIGN KEY violations do not support this: they always use `ABORT` i.e. statement subtransaction rollback. For this reason alone it is important to implement this mechanism now rather than later, since we already have FOREIGN KEY support implemented. ## Details This PR implements statement subtransactions with _anonymous savepoints_. This means that whenever a statement begins, it will open a new savepoint which will write "page undo images" into a temporary file called a _subjournal_. Whenever the statement marks a page as dirty, it will write the before-image of the page into the subjournal so that its modifications can be undone in the event of an ABORT (statement rollback). - Right now, only anonymous savepoints are supported, so the explicit `SAVEPOINT` syntax is not. - Due to the above, there can be only one savepoint open per pager, and this is enforced with assertions. - The subjournal file is currently entirely in memory. If it were not, we would either have to block on IO or refactor many usages of code to account for potentially pending completions. - Constraint errors no longer cause transactions to abort nor do they cause the page cache to be cleared - instead, subjournaled pages will be brought back into the page cache which effectively handles the same behavior albeit more fine-grained. Reviewed-by: Preston Thorpe <preston@turso.tech> Closes #3792	2025-10-23 15:00:11 +03:00
Pekka Enberg	418fc90f8a	Merge 'core/storage: Cache schema cookie in Pager' from Pekka Enberg Every transaction was reading page 1 from the WAL to check the schema cookie in op_transaction, causing unnecessary WAL lookups. This commit caches the schema_cookie in Pager as AtomicU64, similar to how page_size and reserved_space are already cached. The cache is updated when the header is read/modified and invalidated in begin_read_tx() when WAL changes are detected from other connections. This matches SQLite's approach of caching frequently accessed header fields to avoid repeated page 1 reads. Improves write throughput by 5% in our benchmarks. Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #3727	2025-10-23 14:00:27 +03:00
Jussi Saurio	2b73260dd9	Handle cases where DB grows or shrinks due to savepoint rollback	2025-10-22 23:40:45 +03:00
Jussi Saurio	fe51804e6b	Implement crude way of making opening subtransaction conditional We don't want something like `BEGIN IMMEDIATE` to start a subtransaction, so instead we will open it if: - Statement is write, AND a) Statement has >0 table_references, or b) The statement is an INSERT (INSERT doesn't track table_references in the same way as other program types)	2025-10-22 23:40:45 +03:00
Jussi Saurio	e04c6c9b46	Mark pages_to_balance as dirty only after loading	2025-10-22 23:40:45 +03:00
Jussi Saurio	a14bbdecf2	Add assertion that page is loaded when pager.add_dirty() is called	2025-10-22 23:40:45 +03:00
Jussi Saurio	d8cc57cf14	clippy: Remove unnecessary referencing	2025-10-22 23:40:45 +03:00
Jussi Saurio	25f8ba0025	Pager: clear savepoints when tx rolls back	2025-10-22 23:40:45 +03:00
Jussi Saurio	a8cf8e4594	Pager: subjournal page if required when it's marked as dirty	2025-10-22 23:40:45 +03:00
Jussi Saurio	97177dae02	add missing imports	2025-10-22 23:40:44 +03:00
Jussi Saurio	f4af7c2242	Pager: add begin_statement() method	2025-10-22 23:40:44 +03:00
Jussi Saurio	a19c5c22ac	Pager: add rollback_to_newest_savepoint() method	2025-10-22 23:40:44 +03:00
Jussi Saurio	86d5ad6815	pager: allow upserted cached page not to be dirty	2025-10-22 23:40:44 +03:00
Jussi Saurio	5b01605fae	Pager: add subjournal_page_if_required() method	2025-10-22 23:40:44 +03:00
Jussi Saurio	e8226c0e4b	Pager: add clear_savepoint() method	2025-10-22 23:40:44 +03:00
Jussi Saurio	aa1eebbfcb	Pager: add open_savepoint() and release_savepoint() methods	2025-10-22 23:40:44 +03:00
Jussi Saurio	77be1f08ae	Pager: add open_subjournal method	2025-10-22 23:40:44 +03:00
Jussi Saurio	2a03c1a617	Add subjournal and savepoints to Pager struct	2025-10-22 23:40:44 +03:00
Jussi Saurio	8b15a06a85	Add Savepoint struct	2025-10-22 23:40:44 +03:00
Jussi Saurio	459c01f93c	Add subjournal module The subjournal is a temporary file where stmt subtransactions write an 'undo log' of pages before modifying them. If a stmt subtransaction rolls back, the pages are restored from the subjournal.	2025-10-22 23:40:44 +03:00
Nikita Sivukhin	6aa67c6ea0	Revert "slight reorder of operations" This reverts commit `8e107ab18e`.	2025-10-22 20:21:52 +04:00
Nikita Sivukhin	8e1cec5104	Revert "alternative read_variant implementation" This reverts commit `68650cf594`.	2025-10-22 19:30:43 +04:00
Pekka Enberg	5dd503b7b9	core/storage: Cache schema cookie in Pager Every transaction was reading page 1 from the WAL to check the schema cookie in op_transaction, causing unnecessary WAL lookups. This commit caches the schema_cookie in Pager as AtomicU64, similar to how page_size and reserved_space are already cached. The cache is updated when the header is read/modified and invalidated in begin_read_tx() when WAL changes are detected from other connections. This matches SQLite's approach of caching frequently accessed header fields to avoid repeated page 1 reads. Improves write throughput by 5% in our benchmarks.	2025-10-22 16:51:15 +03:00
PThorpe92	a8b257c664	Replace several RwLock<Enum> values with new AtomicEnums	2025-10-22 09:35:26 -04:00
Nikita Sivukhin	671d266dd6	Revert "wip" This reverts commit `dd34f7fd50`.	2025-10-22 11:47:46 +04:00
Nikita Sivukhin	bf77862fab	Merge branch 'main' into order-by-heap-sort	2025-10-22 11:44:55 +04:00
Pekka Enberg	1aad1b224a	Merge 'core/io: Make random generation deterministically simulated' from Pedro Muniz Depends on #3584 to use the most up-to-date implementation of `ThreadRng` - Add `fill_bytes` method to `IO` - use `thread_rng` instead of `getrandom`, as `getrandom` is much slower and `thread_rng` offers enough security - modify `exec_randomblob`, `exec_random` and random_rowid generation to use methods from IO for determinism - modified simulator IO to implement `fill_bytes` This the PRNG for sqlite if someone is curious. It is similar to `thread_rng`: ```c /* Initialize the state of the random number generator once, ** the first time this routine is called. / if( wsdPrng.s[0]==0 ){ sqlite3_vfs pVfs = sqlite3_vfs_find(0); static const u32 chacha20_init[] = { 0x61707865, 0x3320646e, 0x79622d32, 0x6b206574 }; memcpy(&wsdPrng.s[0], chacha20_init, 16); if( NEVER(pVfs==0) ){ memset(&wsdPrng.s[4], 0, 44); }else{ sqlite3OsRandomness(pVfs, 44, (char)&wsdPrng.s[4]); } wsdPrng.s[15] = wsdPrng.s[12]; wsdPrng.s[12] = 0; wsdPrng.n = 0; } assert( N>0 ); while( 1 / exit by break / ){ if( N<=wsdPrng.n ){ memcpy(zBuf, &wsdPrng.out[wsdPrng.n-N], N); wsdPrng.n -= N; break; } if( wsdPrng.n>0 ){ memcpy(zBuf, wsdPrng.out, wsdPrng.n); N -= wsdPrng.n; zBuf += wsdPrng.n; } wsdPrng.s[12]++; chacha_block((u32)wsdPrng.out, wsdPrng.s); wsdPrng.n = 64; } sqlite3_mutex_leave(mutex); ``` Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #3799	2025-10-22 09:10:36 +03:00
Pere Diaz Bou	3227caaa1d	Merge 'core: move BTreeCursor under MVCC cursor' from Pere Diaz Bou Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #3756	2025-10-21 19:20:49 +02:00
pedrocarlo	8501bc930a	use workspace rand version	2025-10-21 14:10:05 -03:00
Pere Diaz Bou	ea04e9033a	core/mvcc: add btree_cursor under MVCC cursor	2025-10-21 18:22:37 +02:00
Nikita Sivukhin	00e382a7c7	avoid unnecessary allocations	2025-10-21 20:13:39 +04:00
Pekka Enberg	f764f3061d	Merge 'Add Miri support for turso_stress, with bash scripts to run' from Bob Peterson It was mentioned in https://github.com/tursodatabase/turso/pull/3720 that adding Miri support for `turso_stress` would be useful. And, that a bash script to start Miri with the right config would be a big help. Notable changes: - `antithesis_sdk`'s default features are disabled at the workspace level, and only enabled as needed with the `antithesis` feature flag in the various turso crates. Miri needs the noop version of `antithesis_sdk` to run `turso_stress`, and feature unification previously prevented this. I'm not able to ensure locally that all the Antithesis stuff is still happy with these changes. - Bash script to run `turso_stress` - this is barebones for now, see below - Bash script to run `simulator` - this passes any args to the `cargo run` invocation inside, intercepting `--seed` if it's present, and generating one from `/dev/random` if it's not. The seed is passed to both Miri and the simulator to keep the overall execution reproducible. (I checked this with a simple case) - A `const fn`, `normal_or_miri` to supply different defaults in things like CLI args for normal operation and Miri, since it's so slow. (An idea I stole from tokio.) Right now the relevant values are 100x smaller for Miri, although Miri is probably 1000 to 10,000x slower overall from a rough estimation. Caught UB from running `turso_stress` with Miri: - An unsafe cast of a `u8` to `u32` inside the BTree implementation resulted in the `*u32` making an unaligned read: `read()` -> `read_unaligned()` fixes this Future work - Making `turso_stress` reproducible under Miri: - Right now `turso_stress` is plugged in to Antithesis, which is great! But, `antithesis_sdk`'s noop mode (`default-features = false`) turns `antithesis_sdk::random::get_random()` into `rand::random<u64>()`, which isn't seedable/reproducible. It's more work than I wanted to take on in this PR, but I'd like to instead conditionally replace `get_random` with a seedable `ChaCha8Rng` like in the simulator, if Miri is being used. Comment: - On a machine without all necessary dependencies, running the bash scripts fails in a way that cargo prompts you through installing the nightly toolchain, Miri, etc. until it works - Below is a snippet of the output from Miri on the Btree alignment issue. Because turso_stress isn't yet deterministic/reproducible under Miri, I can't always reproduce it. (It doesn't always happen like the ones in my last MR) ``` error: Undefined Behavior: accessing memory based on pointer with alignment 1, but alignment 4 is required --> /home/rwp/git/turso/core/storage/btree.rs:2860:50 \| 2860 \| let mut pgno: u32 = unsafe { right_pointer.cast::<u32>().read().swap_bytes() }; \| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Undefined Behavior occurred here \| = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information ``` Closes #3790	2025-10-21 11:53:49 +03:00
Bob Peterson	2cb0a9b34b	Use read_unaligned with u8 cast to u32 Avoids undefined behavior due to unaligned read caught with Miri	2025-10-20 22:50:57 -05:00
pedrocarlo	ba9a1ebbef	add mutable scoped locking for SharedWalFile	2025-10-20 10:45:14 -03:00
pedrocarlo	b00a276960	add scoped locking for SharedWalFile to avoid holding locks for longer than needed	2025-10-20 10:45:14 -03:00
Pavan-Nambi	9841f487a6	dont allow autovacuum on nonempty dbs adds a is_db_empty fn	2025-10-18 19:01:21 +05:30
Pavan-Nambi	1a058a1531	get autovacuum mode from db header on existing dbs if autovaccum on, look for ptrmap pages	2025-10-18 18:47:30 +05:30
Pekka Enberg	e03f6dbf94	core/storage: Reduce logging level	2025-10-17 20:09:00 +03:00
Jussi Saurio	2ca388d78d	WAL: don't hold shared lock across IO operations Without this change and running: ``` cd stress cargo run -- --nr-threads=4 -i 1000 --verbose --busy-timeout=0 ``` I can produce a deadlock quite reliably. With this change, I can't. Even with 5 second busy timeout (the default), the run makes progress although it is slow as hell because of the busy timeout.	2025-10-16 22:00:01 +03:00
Pekka Enberg	afa89c66c0	Merge 'Replace io_yield_many with completion groups' from Pekka Enberg Reviewed-by: Pedro Muniz (@pedrocarlo) Closes #3703	2025-10-16 17:17:43 +03:00
Pere Diaz Bou	57eb63cee0	core/bree: remove duplicated code in BTreeCursor	2025-10-16 14:50:08 +02:00
Pekka Enberg	bf5de920f2	core: Unsafe Send and Sync pushdown This patch pushes unsafe Send and Sync to individual components instead of doing it at Database level. This makes it easier for us to incrementally fix thread-safety, but avoid developers adding more thread unsafe code.	2025-10-16 11:26:50 +03:00
Nikita Sivukhin	dd34f7fd50	wip	2025-10-15 17:27:22 +04:00
Nikita Sivukhin	68650cf594	alternative read_variant implementation - it faster in benchmark (who knows why) - also seems bit faster for some my query - let's test on CI	2025-10-15 17:27:22 +04:00
Nikita Sivukhin	a6a5ffd821	move read_varint_fast closer to the read_varint impl	2025-10-15 17:27:22 +04:00
Nikita Sivukhin	8e107ab18e	slight reorder of operations	2025-10-15 17:27:22 +04:00

1 2 3 4 5 ...

1627 Commits