turso

mirror of https://github.com/aljazceru/turso.git synced 2026-01-07 02:04:21 +01:00

Author	SHA1	Message	Date
Pekka Enberg	d3e8285d93	core/io: Never skip a completion in CompletionGroup::add() The previous implementation of CompletionGroup::add() would filter out successfully-finished completions: if !completion.finished() \|\| completion.failed() { self.completions.push(completion.clone()); } This caused a problem when combined with drain() in the calling code. Completions that were already finished would be removed from the source vector by drain() but not added to the group, effectively losing track of them. This breaks the invariant that all completions passed to a group must be tracked, regardless of their state. The build() method already handles finished completions correctly by not including them in the outstanding count. The fix is to always add all completions and let build() handle their state appropriately, matching the behavior of the old io_yield_many!() macro.	2025-10-15 10:47:16 +03:00
Jussi Saurio	0d8a3dda8c	Merge 'sql_generation: Fix implementation of LTValue and GTValue for Text types' from Jussi Saurio ## Background Simulator wants to create predicates that it knows will be Greater or Less than some known value. It uses `LTValue` and `GTValue` for generating these. ## Problem Current implementation simply decrements or increments a random char by 1, and can thus generate strings with control characters like null terminators that result in parse errors, as seen in e.g. this CI run htt ps://github.com/tursodatabase/turso/actions/runs/18459131141/job/5258630 5749?pr=3702 of PR #3702 EDIT: I realized the _actual_ problem is in `GTValue` when it decides to make the string longer, it uses a random char value from `0..255` which can include null terminators etc. Fixed that too. I think in general this PR's approach is a bit more predictable so let's keep it. ## Solution Restrict string mutations to ascii string characters so that the mutation always results in another ascii string character. Closes #3708	2025-10-15 09:25:17 +03:00
Jussi Saurio	b1cb897216	Merge 'Fix another "should have been rewritten" translation panic' from Jussi Saurio Closes #2158 Closes #3702	2025-10-15 09:25:01 +03:00
Preston Thorpe	e5a74b347a	Merge 'relax check in the vector test' from Nikita Sivukhin - fixes https://github.com/tursodatabase/turso/issues/3732 Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #3733	2025-10-14 16:14:33 -04:00
Preston Thorpe	74bbb0d5a3	Merge 'Allow using indexes to iterate rows in UPDATE statements' from Jussi Saurio Closes #2600 ## Problem Every btree has a key it is sorted by - this is the integer `rowid` for tables and an arbitrary-sized, potentially multi-column key for indexes. Executing an UPDATE in a loop is not safe if the update modifies any part of the key of the btree that is used for iterating the rows in said loop. For example: - Using the table itself to iterate rows is not safe if the UPDATE modifies the rowid (or rowid alias) of a row, because since it modifies the iteration order itself, it may cause rows to be skipped: ```sql CREATE TABLE t(x INTEGER PRIMARY KEY, y); INSERT <something> UPDATE t SET y = RANDOM() where x > 100; // safe to iterate 't', 'y' is not being modified UPDATE t SET x = RANDOM() where x > 100; // not safe to iterate 't', 'x' is being modified ``` - Using an index to iterate rows is not safe if the UPDATE modifies any of the columns in the index key ```sql CREATE TABLE t(x, y, z); CREATE INDEX txy ON t (x,y); INSERT <something> UPDATE t SET z = RANDOM() where x = 100 and y > 0; // safe to iterate txy, neither x or y is being modified UPDATE t SET x = RANDOM() where x = 100 and y > 0; // not safe to iterate txy, 'x' is being modified UPDATE t SET y = RANDOM() where x = 100 and y > 0; // not safe to iterate txy, 'y' is being modified ``` ## Current solution in tursodb Our current `main` code recognizes this issue and adopts this pseudocode algorithm from SQLite: - open a table or index for reading the rows of the source table, - for each row that matches the condition in the UPDATE statement, write the row into a temporary table - then use that temporary table for iteration in the UPDATE loop. This guarantees that the iteration order will not be affected by the UPDATEs because the ephemeral table is not under modification. ## Problem with current solution Our `main` code specialcases the ephemeral table solution to rowids / rowid aliases only. Using indexes for UPDATE iteration was disabled in an earlier PR (#2599) due to the safety issue mentioned above, which means that many UPDATE statements become full table scans: ```sql turso> create table t(x PRIMARY KEY); turso> insert into t select value from generate_series(1,10000); turso> explain update t set x = x + 100000 where x > 50 and x < 60; addr opcode p1 p2 p3 p4 p5 comment ---- ----------------- ---- ---- ---- ------------- -- ------- 0 Init 0 28 0 0 Start at 28 1 OpenWrite 0 2 0 0 root=2; iDb=0 2 OpenWrite 1 3 0 0 root=3; iDb=0 -- scan entire 't' despite very narrow update range! 3 Rewind 0 27 0 0 Rewind table t ... ``` ## Solution We move the ephemeral table logic to _after_ the optimizer has selected the best access path for the table, and then, if the UPDATE modifies the key of the chosen access path (table or index; whichever was selected by the optimizer), we change the plan to include the ephemeral table prepopulation. Hence, the same query from above becomes: ```sql turso> explain update t set x = x + 100000 where x > 50 and x < 60; addr opcode p1 p2 p3 p4 p5 comment ---- ----------------- ---- ---- ---- ------------- -- ------- 0 Init 0 35 0 0 Start at 35 1 OpenEphemeral 0 1 0 0 cursor=0 is_table=true 2 OpenRead 1 3 0 0 index=sqlite_autoindex_t_1, root=3, iDb=0 3 Integer 50 2 0 0 r[2]=50 -- index seek on PRIMARY KEY index 4 SeekGT 1 10 2 0 key=[2..2] 5 Integer 60 2 0 0 r[2]=60 6 IdxGE 1 10 2 0 key=[2..2] 7 IdxRowId 1 1 0 0 r[1]=cursor 1 for index sqlite_autoindex_t_1.rowid 8 Insert 0 3 1 ephemeral_scratch 2 intkey=r[1] data=r[3] 9 Next 1 6 0 0 10 OpenWrite 2 2 0 0 root=2; iDb=0 11 OpenWrite 3 3 0 0 root=3; iDb=0 -- only scan rows that were inserted to ephemeral index 12 Rewind 0 34 0 0 Rewind table ephemeral_scratch 13 RowId 0 5 0 0 r[5]=ephemeral_scratch.rowid ``` Note that an ephemeral index does not have to be used if the index is not affected: ```sql turso> create table t(x PRIMARY KEY, data); turso> explain update t set data = 'some_data' where x > 50 and x < 60; addr opcode p1 p2 p3 p4 p5 comment ---- ----------------- ---- ---- ---- ------------- -- ------- 0 Init 0 15 0 0 Start at 15 1 OpenWrite 0 2 0 0 root=2; iDb=0 2 OpenWrite 1 3 0 0 root=3; iDb=0 3 Integer 50 1 0 0 r[1]=50 -- direct index seek 4 SeekGT 1 14 1 0 key=[1..1] ``` Reviewed-by: Preston Thorpe <preston@turso.tech> Closes #3728	2025-10-14 16:11:25 -04:00
Preston Thorpe	a6b0778fb3	Merge 'Refactor INSERT translation to a modular setup with emitter context' from Preston Thorpe This PR contains NO semantic changes at all, this simply refactors existing INSERT code to be easier to reason about. Very sorry I know I've been working on `INSERT OR IGNORE\|REPLACE\|etc..` for days now but the insert translation was literally unbearable and I got it working but I barely could wrap my head around that whole `translate_insert` function, so I spent a bunch of time refactoring the whole INSERT handling out into different "Plans"... which turned into a whole different clusterf***... So I just went back and made the existing insert emission more modular and created some context that can make it easier to reason about. This should be able to just be merged quickly Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #3731	2025-10-14 16:10:43 -04:00
Pere Diaz Bou	1a464664a7	Merge 'increment Changes() only once conditionally ' from Pavan Nambi closes #3688 Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #3692	2025-10-14 20:26:04 +02:00
Nikita Sivukhin	427a145663	fmt	2025-10-14 22:22:14 +04:00
Pere Diaz Bou	a2097188f0	Merge 'make comparison case sensitive' from Pavan Nambi closes https://github.com/tursodatabase/turso/issues/3672 Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #3686	2025-10-14 20:20:02 +02:00
Nikita Sivukhin	9dac7e00ba	relax check in the vector test - fixes https://github.com/tursodatabase/turso/issues/3732	2025-10-14 22:19:19 +04:00
PThorpe92	792877d421	add doc comments to InsertEmitCtx	2025-10-14 13:22:32 -04:00
PThorpe92	20bdb1133d	fix clippy warnings	2025-10-14 13:00:31 -04:00
PThorpe92	22e98964cc	Refactor INSERT translation to a modular setup with emitter context	2025-10-14 12:48:34 -04:00
Jussi Saurio	3cbdf433a9	fuzz: update multiple columns in table_index_mutation_fuzz	2025-10-14 17:26:21 +03:00
Jussi Saurio	0ae4425e4c	fuzz: create multi-column indices in table_index_mutation_fuzz	2025-10-14 17:23:21 +03:00
Jussi Saurio	b3b07252dc	Add TCL smoke tests for UPDATEs affecting indexes	2025-10-14 16:25:05 +03:00
Jussi Saurio	b3be21f472	Do not count ephemeral table INSERTs as changes	2025-10-14 16:15:20 +03:00
Jussi Saurio	87434b8a72	Do not count DELETEs occuring in an UPDATE stmt as separate changes	2025-10-14 16:11:43 +03:00
Jussi Saurio	0173d31c04	clippy: collapse nested if	2025-10-14 15:51:31 +03:00
Jussi Saurio	4b80678898	Allow case where cursor for btree is already opened When populating an ephemeral table for UPDATE, it may open a cursor on the (permanent) table - in this case we don't need to open it again in the UPDATE loop	2025-10-14 15:32:48 +03:00
Jussi Saurio	495e66e12b	fuzz: run rusqlite integrity check after each DML operation	2025-10-14 15:32:22 +03:00
Jussi Saurio	3465a01bf5	fuzz: sometimes make UPDATEd value a function of the old value	2025-10-14 15:31:43 +03:00
Jussi Saurio	f5ee4807da	Properly differentiate between source and target in UPDATE - Encode information about ephemeral source table in OperationMode::UPDATE if present - Use OperationMode information to correctly resolve cursors in UPDATE	2025-10-14 14:17:28 +03:00
Jussi Saurio	691dce6b8a	Make decision about UpdatePlan::ephemeral_plan _after_ optimizer An ephemeral table is required if the b-tree key of the table (rowid) or the index (index key) is affected by the UPDATE.	2025-10-14 14:17:28 +03:00
Jussi Saurio	c2fe13ad4f	Update documentation of UpdatePlan::ephemeral_plan It now better reflects when it is used.	2025-10-14 12:18:53 +03:00
Jussi Saurio	bc80ac1754	require &mut ProgramBuilder argument in optimize_plan() this will be used for ephemeral plan construction for UPDATE in a later commit.	2025-10-14 12:18:13 +03:00
Jussi Saurio	29770382f9	temporarily remove ephemeral plan construction from prepare_update_plan the decision to use an ephemeral table in UPDATE will be made after the optimizer has made the decision about which index to use. this will be implemented in a later commit.	2025-10-14 12:14:15 +03:00
Pekka Enberg	7cf51e74ca	Merge 'core/mvcc: implement CursorTrait on MVCC cursor' from Pere Diaz Bou Closes #3714	2025-10-14 10:24:42 +03:00
Pekka Enberg	07b94faeb7	Merge 'Add test case for vector() format crash' from Pedro Muniz Added test to close #1454. The Go code incorrectly, did not quote the vector array. Closes #3716	2025-10-14 09:37:11 +03:00
Pekka Enberg	9822fc2c90	Merge 'bindings/rust: Bump version recommendation to 0.2' from Kyle Kelley Bump version number for crate docs starter setup Closes #3711	2025-10-14 09:32:07 +03:00
Pekka Enberg	9a1bd2112d	Merge 'Run simulator under Miri' from Bob Peterson This adds support for running the simulator under Miri to detect UB. There are a few things to note about Miri and its limitations - It has limited `libc` coverage, so it's not really possible to have Miri help with `UringIO`/`UringFile` or `UnixIO`/`UnixFile`. That's a big gap ☹️ - It can work for `GenericIO`/`GenericFile`, which only uses `std` - It can't call external C libraries, so even using `sqlite` is out (hence adding `--disable-integrity-check` to the simulator for Miri use) - It runs on nightly, consequently there are a few new lints that don't exist on turso's pinned version of rustc Some questions I have about this MR - I made `GenericFile::{lock_file,unlock_file}` noops so I could use `GenericIO`. This isn't great, but if/when you update from Rust 1.88.0 to 1.89.0, `std::File::{lock,lock_shared,unlock}` will be stabilized and available. Should I note that as a TODO or something? - Previously, the sim runner shelled out to `git` to get stuff like the current git hash and the repo directory. For Miri, that's out, and so is `git2`. Unfortunately, `gix` is also out since it has a required dependency that uses inline assembly, which Miri doesn't like. I wrote a hacky shim that uses only std to look for `.git` and find the hash that HEAD is pointing to. It doesn't deal with stuff like packed-refs or the repo being a secondary one made with `git worktree`. I'm happy to support that, but wanted to hear from maintainers before doing more work. Two UB occurrences I already found: - `TursoRwLock::read` used `AtomicU64::compare_exchange_weak`, which is (evidently) [allowed to spuriously fail](https://doc.rust-lang.org/std/s ync/atomic/struct.AtomicU64.html#method.compare_exchange_weak) in exchange for perf. Miri forces this behavior, which triggers trivial read deadlocks even with zero readers/writers. I changed it to `compare_exchange`, but I'm not an atomics expert. - Uninitialized read in non-Unix `core::storage::buffer_pool::arena::alloc`. This is a simple one, resolved by using `std::alloc::alloc_zeroed` instead of `std::alloc::alloc` Moving forward, I'd be interested in potentially getting the tests to run in Miri, too. `tokio` looks like a good example of a project with partial coverage that runs it where they can. They have some extra test config to allow as many as possible to run under Miri, with appropriately scaled-down parameter values since Miri is super slow Closes #3720	2025-10-14 09:26:55 +03:00
Pekka Enberg	829ca291f9	Merge 'Import workspace crates by name and not path' from Pedro Muniz Reviewed-by: bit-aloo (@Shourya742) Closes #3725	2025-10-14 09:25:13 +03:00
Jussi Saurio	4e34c6be51	Merge 'names shall not be shared between tables,indexs,vtabs,views' from Pavan Nambi closes #3675 Closes #3681	2025-10-14 07:30:37 +03:00
Jussi Saurio	bd15fee1f8	Merge 'Get aliases to where shall they be used' from Pavan Nambi closes #3678 Closes #3680	2025-10-14 07:28:09 +03:00
Jussi Saurio	cce2bf9328	Merge 'Add correct unique constraint test for tcl' from Pedro Muniz Closes #1710 We should use `do_execsql_test_in_memory_error_content` to test errors instead Closes #3718	2025-10-14 07:26:48 +03:00
Jussi Saurio	1aed9c9694	Merge 'remove cfg for `MAP_ANONYMOUS`' from Pedro Muniz Related to #2587 Reviewed-by: Preston Thorpe <preston@turso.tech> Reviewed-by: bit-aloo (@Shourya742) Closes #3721	2025-10-14 07:26:09 +03:00
Jussi Saurio	ebc4ddb2a2	Merge 'Simulator: fix alter table shadowing to modify index column name ' from Pedro Muniz Forgot to modify the column name referenced in the indexes when shadowing Reviewed-by: bit-aloo (@Shourya742) Closes #3712	2025-10-14 07:25:29 +03:00
Jussi Saurio	a710d2f124	Merge 'Simulator: `Drop Index`' from Pedro Muniz Added the ability for us to generate `Drop Index` queries in the simulator. Most of the code is just boilerplate and some checks to make sure we do not generate `Drop Index` when we have no indexes to drop Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Reviewed-by: bit-aloo (@Shourya742) Closes #3713	2025-10-14 07:25:02 +03:00
Jussi Saurio	61109963fc	Merge 'fix backwards compatible rowid alias behaviour' from Pedro Muniz Closes #3665 Closes #3723	2025-10-14 07:24:42 +03:00
pedrocarlo	5b2cce946a	do not reference workspace package by path	2025-10-13 21:07:15 -03:00
pedrocarlo	2e722af93c	proof issue 1710	2025-10-13 20:51:21 -03:00
pedrocarlo	83dde9b55c	fix backwards compatible rowid alias behaviour	2025-10-13 20:41:45 -03:00
pedrocarlo	0ef5ec007c	remove cfg for `MAP_ANONYMOUS`	2025-10-13 18:05:18 -03:00
Bob Peterson	4d843804b7	Add --disable-integrity-check option to simulator Miri can't execute sqlite via the FFI, so this needs to be configurable	2025-10-13 14:54:16 -05:00
Bob Peterson	3d4c10df40	Document using Miri to run the simulator	2025-10-13 14:54:16 -05:00
Bob Peterson	dfc77b0350	Non-Unix arena: use zeroed alloc to avoid UB Reads to the arena were flagged by Miri as UB since it contained uninitialized memory	2025-10-13 14:54:16 -05:00
Bob Peterson	74ef9ad5ca	Drop weak in TursoRwLock::read's compare_exchange compare_exchange_weak can spuriously fail, which Miri obliges us with, causing a read deadlock	2025-10-13 14:54:16 -05:00
Bob Peterson	ce2f286df0	Replace git shell commands with std shims gix doesn't work here, since while it's pure Rust, it has a non-configurable dependency on crates using inline assembly, which Miri does not support. This commit is a bit of a hack, and only works in non-bare git repos without e.g packed-refs.	2025-10-13 14:54:16 -05:00
Bob Peterson	cd56f52bd6	Add cfg attributes for running under Miri	2025-10-13 14:54:16 -05:00
pedrocarlo	2798fafa6c	proof issue 1454	2025-10-13 16:14:29 -03:00

1 2 3 4 5 ...

10251 Commits