turso

mirror of https://github.com/aljazceru/turso.git synced 2026-02-09 02:04:22 +01:00

Author	SHA1	Message	Date
Jussi Saurio	16097e7355	Merge 'Add RowSet<Add/Read/Test> instructions and rowset implementation' from Jussi Saurio ## What Rowsets are used in SQLite for two purposes: 1. for membership tests on a set of `i64`s, 2. for in-order iteration of a set of `i64`s, Both in cases where we can just use rowids (which are `i64`) instead of building an entire ephemeral btree from a table's contents. For example, in cases where a `DELETE FROM tbl WHERE ...` is performed on a table that has any `BEFORE DELETE` triggers, SQLite collects the table's rowids into a RowSet before actually performing the deletion. This is similar to how an UPDATE that modifies rowids (or the index used to iterate the UPDATE loop) will first collect the rows into an ephemeral index, and same with `INSERT INTO ... SELECT`. ## Details RowSet uses a "batch" concept where insertions of a given batch must be guaranteed by caller to contain no duplicates and will be pushed into a vector for O(1). When a new batch is started, the previous batch is folded into a `BTreeSet` so that membership tests can be performed in O(logn). As far as I can tell, the "in-order iteration" use case doesn't use this batch logic at all. ## AI disclosure This entire PR description was written by me - no AIs were harmed in the production of it. However, the code itself was mostly vibecoded using two agents in Cursor: - Composer 1: given the SQLite opcode documentation and rowset.c source code, and asked to implement the VDBE instructions and the RowSet module. - GPT-5: given the same SQLite docs and source code, and asked to review Composer 1's work and write feedback into a separate markdown file. This loop was run for roughly 4-5 iterations, where each time GPT-5's feedback was given to Composer 1, until GPT-5 found nothing to comment anymore. After this, I instructed Composer 1 to improve the documentation to be less stupid. After that, I made a manual editing pass over the runtime code to e.g. change boolean flags to a `RowSetMode` enum to make clearer that the rowset has two distinct mutually exclusive purposes (membership tests and in-order iteration), plus cleaned up some other dumb shit and added comments. I am still not sure if this saved time or not. Closes #3938	2025-11-12 13:02:00 +02:00
Jussi Saurio	da92982f41	Add RowSet<Add/Read/Test> instructions and rowset implementation Rowsets are used in SQLite for two purposes: 1. for membership tests on a set of `i64`s, 2. for in-order iteration of a set of `i64`s, Both in cases where we can just use rowids (which are `i64`) instead of building an entire ephemeral btree from a table's contents. For example, in cases where a `DELETE FROM tbl WHERE ...` is performed on a table that has any `BEFORE DELETE` triggers, SQLite collects the table's rowids into a RowSet before actually performing the deletion. This is similar to how an UPDATE that modifies rowids (or the index used to iterate the UPDATE loop) will first collect the rows into an ephemeral index, and same with `INSERT INTO ... SELECT`. This entire PR description was written by me - no AIs were harmed in the production of it. However, the code itself was mostly vibecoded using two agents in Cursor: - Composer 1: given the SQLite opcode documentation and rowset.c source code, and asked to implement the VDBE instructions and the RowSet module. - GPT-5: given the same SQLite docs and source code, and asked to review Composer 1's work and write feedback into a separate markdown file. This loop was run for roughly 4-5 iterations, where each time GPT-5's feedback was given to Composer 1, until GPT-5 found nothing to comment anymore. After this, I instructed Composer 1 to improve the documentation to be less stupid. After that, I made a manual editing pass over the runtime code to e.g. change boolean flags to a `RowSetMode` enum to make clearer that the rowset has two distinct mutually exclusive purposes (membership tests and in-order iteration), plus cleaned up some other dumb shit and added comments. I am still not sure if this saved time or not.	2025-11-12 11:39:40 +02:00
pedrocarlo	e1d36a2221	clippy fix	2025-11-11 16:11:46 -03:00
pedrocarlo	84268c155b	convert json functions to use `AsValueRef`	2025-11-11 16:11:46 -03:00
pedrocarlo	98d268cdc6	change datetime functions to accept `AsValueRef` and not registers	2025-11-11 16:11:46 -03:00
pedrocarlo	505a6ba5ea	convert vector functions to use `AsValueRef`	2025-11-11 16:11:46 -03:00
pedrocarlo	4a94ce89e3	Change `ValueRef::Text` to use a `&str` instead of `&[u8]`	2025-11-11 16:11:46 -03:00
pedrocarlo	1db13889e3	Change `Value::Text` to use a `Cow<'static, str>` instead of `Vec<u8>`	2025-11-11 16:11:46 -03:00
Pere Diaz Bou	b581519be4	more clippy	2025-11-10 17:20:15 +01:00
Pere Diaz Bou	2fd4407a03	core/execute: map negative root page to positive if we can	2025-11-10 16:51:01 +01:00
Pekka Enberg	b74ddf30f9	Merge 'extensions/vtabs: implement remaining opcodes' from Preston Thorpe The only real benefit right now here is the ability to rename virtual tables. Then this now properly calls `VBegin` at the start of a vtab write transaction, despite none of our extensions needing or implementing transactions at this point. ```console explain insert into t values ('key','value'); addr opcode p1 p2 p3 p4 p5 comment ---- ----------------- ---- ---- ---- ------------- -- ------- 0 Init 0 10 0 0 Start at 10 1 VOpen 0 0 0 0 t 2 VBegin 0 0 0 0 3 Null 0 1 0 0 r[1]=NULL 4 Null 0 3 0 0 r[3]=NULL 5 String8 0 4 0 key 0 r[4]='key' 6 String8 0 5 0 value 0 r[5]='value' 7 VUpdate 0 5 1 0 args=r[1..5] 8 Close 0 0 0 0 9 Halt 0 0 0 0 10 Transaction 0 2 1 0 iDb=0 tx_mode=Write 11 Goto 0 1 0 0 Exiting Turso SQL Shell. ``` Closes #3930	2025-11-10 09:03:07 +02:00
Pekka Enberg	7891be96fd	Merge 'Refactor affinity conversions for reusability' from Pedro Muniz Depends on #3920 Moves some code around so it is easier to reuse and less cluttered in `execute.rs`, and changes how `compare` works. Instead of mutating some register, we now just return the possible `ValueRef` representation of that affinity. This allows other parts of the codebase to reuse this logic without needing to have an owned `Value` or a `&mut Register` Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #3923	2025-11-10 09:02:22 +02:00
Pekka Enberg	2be515247f	Merge 'Create `AsValueRef` trait to allow us to be agnostic over ownership of `Value` or `ValueRef`' from Pedro Muniz Depends on #3919 Also change `op_compare` to reuse the same compare_immutable logic First step to finish #2304 Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #3920	2025-11-10 09:01:59 +02:00
Pekka Enberg	4bb0edac5e	Merge 'Move value functions to separate file' from Pedro Muniz Makes it easier to visualize what is related to Value and what is related to opcodes. This will also facilitate in my next PR to generalize certain function over `Value` and `ValueRef` as listed in #2304 Closes #3919	2025-11-10 09:01:29 +02:00
PThorpe92	5c207618a7	Fix extensions py test	2025-11-09 11:35:57 -05:00
PThorpe92	b443b09516	Remove VRollback and VCommit as they are unused opcodes in sqlite	2025-11-09 11:27:09 -05:00
PThorpe92	993c9d34b4	Rollback vtab txns when when err code is present in Halt	2025-11-09 11:07:43 -05:00
PThorpe92	f35ccfba17	Add support for renaming virtual tables	2025-11-09 11:07:42 -05:00
PThorpe92	e09d9eb720	Add VBegin, VRename, VRollback and VCommit opcodes	2025-11-09 11:07:42 -05:00
PThorpe92	a012e98bfa	core/translate remove unused ParamState and some minor refactoring	2025-11-07 19:18:10 -05:00
pedrocarlo	9007340e99	change convert function to accept 1 value	2025-11-07 12:47:39 -03:00
pedrocarlo	9f350f7fd9	change Text variant in `ValueRef` to hold a `TextRef` that can automatically convert to &str avoiding string allocations everywhere	2025-11-07 12:47:39 -03:00
pedrocarlo	5cfc898049	clippy	2025-11-07 12:47:39 -03:00
pedrocarlo	af05d9ba10	move more affinity logic to separate file and avoid more clones	2025-11-07 12:47:39 -03:00
pedrocarlo	61036d5f51	move affinity handling to separate file	2025-11-07 12:47:39 -03:00
pedrocarlo	99c596d340	separate part of comparison logic for reuse later with seek operations	2025-11-07 12:47:39 -03:00
pedrocarlo	ce3527df40	chnage `RecordCompare::compare` to use an iterator	2025-11-07 12:47:39 -03:00
pedrocarlo	e5e97a5b0a	for `op_compare` reuse `compare_immutable`	2025-11-07 12:44:57 -03:00
pedrocarlo	9c2324cbd8	move some more functions to be scoped under `Value`	2025-11-07 12:10:27 -03:00
pedrocarlo	44cab91722	move Value functions to separate file	2025-11-07 12:10:27 -03:00
Preston Thorpe	4e8b4c96d3	Merge 'use dyn DatabaseStorage instead of DatabaseFile' from Nikita Sivukhin Partial sync for sync engine will need to implement its own version of `DatabaseStorage` which willl load database pages on demand Reviewed-by: Preston Thorpe <preston@turso.tech> Closes #3922	2025-11-06 15:11:13 -05:00
Nikita Sivukhin	da61fa32b4	use dyn DatabaseStorage instead of DatabaseFile	2025-11-06 17:42:03 +04:00
PThorpe92	c5a3e590f7	Fix rewriting sql to persist for foreign keys in alter table func	2025-11-03 09:47:28 -05:00
PThorpe92	ef24911824	Handle renaming child foreign keys on op_rename_table	2025-11-03 09:47:28 -05:00
PThorpe92	481d86f567	Optimize and refactor schema::Column type	2025-11-02 20:46:02 -05:00
PThorpe92	23496f0bea	Fix incorrect unreachable precondition for affinity char in op_seek_rowid	2025-11-01 20:43:44 -04:00
Nikita Sivukhin	4c98861590	adjust logs	2025-10-29 16:24:05 +04:00
Nikita Sivukhin	a2d11f9263	reset cursors when statement is reseted	2025-10-29 15:13:00 +04:00
Jussi Saurio	ad723b615f	Merge 'index_method: fully integrate into query planner' from Nikita Sivukhin This PR completely integrate custom indices to the query planner. In order to do that new `Cursor::IndexMethod` is introduced with few correlated changes in the VM implementation: 1. Added special `IndexMethod{Create,Destroy,Query}` opcodes to handle index method creation, deletion and query 2. `Next` , `IdxRowid` , `IdxInsert`, `IdxDelete` opcodes updated to properly handle new cursor case Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #3827	2025-10-29 09:42:37 +02:00
Pekka Enberg	810ed8ad60	Merge 'Don't allow autovacuum to be flipped on non-empty databases' from Pavan Nambi Turso incorrectly creates the first table in an autovacuumed table in page 2. (Note: this is on collaboration with @LeMikaelF) SQLite does not allow enabling or disabling auto-vacuum after the first table has been created (https://sqlite.org/pragma.html#pragma_auto_vacuum). This is because the sequence of the pages in the databases is different when auto-vacuum is enabled, because the first b-tree page must be page 3 instead of 2, to make room for the first [Pointer Map page](https://sqlite.org/fileformat.html#pointer_map_or_ptrmap_pages). But Turso doesn't currently consider this, which can lead to data loss. The simplest way to reproduce this is to create an autovacuumed databases with either `pragma auto_vacuum=full` so that autovacuum runs on each commit, and then create a table with some data. Turso will incorrectly create the new table on page 2. After this, every time a new page is created, either through a page split or because a new table is created, Turso will write a 5-byte pointer in page 2, starting from the top of the page, thereby overwriting existing data. For example, let's start with a clean database and the first bytes of page 2. It starts with `0d`, the discriminator for a leaf page ([source](https://www.sqlite.org/fileformat.html#b_tree_pages)). The next interesting number is the number of cells contained in this page (`01`) at offset 5. ``` $ cargo run -- /tmp/a.db turso> create table t(a); turso> insert into t values ('myvalue'); $ dbtotxt /tmp/a.db \| size 8192 pagesize 4096 filename a.db \| page 1 offset 0 # ...snip... \| page 2 offset 4096 \| 0: 0d 00 00 00 01 0f f5 00 0f f5 00 00 00 00 00 00 ................ \| 4080: 00 00 00 00 00 09 01 02 1b 6d 79 76 61 6c 75 65 .........myvalue \| end a.db ``` Pointer map pages are located every N pages, starting from page 2, and contain a list of 5-byte pointers that represent the parent page of a certain page. So whenever Turso or SQLite needs to add a page, it will overwrite 5 bytes of page 2. This means that for data loss to occur, it is sufficient to add a single page to the database, for example by creating a table. Offset 5 will then be zeroed out: ``` $ cargo run -- /tmp/a.db turso> create table t(a); turso> insert into t values ('myvalue'); turso> pragma auto_vacuum=full; turso> create table tt(a); $ dbtotxt /tmp/a.db \| size 12288 pagesize 4096 filename a.db \| page 1 offset 0 # ...snip... \| page 2 offset 4096 \| 0: 01 00 00 00 00 0f f5 00 0f f5 00 00 00 00 00 00 ................ \| 4080: 00 00 00 00 00 09 01 02 1b 6d 79 76 61 6c 75 65 .........myvalue ``` Creating more tables, or adding more B-tree pages, will keep overwriting the rest of the page, until the cells themselves are also overwritten. ## Reproducing the issue in the simulator We have been unable to reproduce this exact corruption mode in the simulator, but patching it shows many failure modes, all of which don't occur with the unpatched simulator. The following seeds are failing. The following seeds are showing the issue when the patched simulator is ran against `main`: - `11522841279124073062`, with "Assertion 'table inquisitive_graham_159 should contain all of its expected values' failed: table inquisitive_graham_159 does not contain the expected values, the simulator model has more rows than the database" - `7057400018220918989`, `16028085350691325843`, `7721542713659053944`, and `203017821863546118`, with "Failed to read ptrmap key=XXX" - `12533694709304969540`, `18357088553315413457`, `3108945730906932377`, with "Integrity Check Failed: Cell N in page 2 is out of range." - `4757352625344646473`, with "dirty pages should be empty for read txn" - `7083498604824302257`, with "header_size: 6272, header_len_bytes: 2, payload.len(): 13" - `17881876827470741581`, with "ParseError("no such table: focused_historians_416")" - `2092231500503735693`, with "range end index 4789 out of range for slice of length 4096" - `7555257419378470845`, with malformed database schema (imaginative_ontivero\u{1})" - `12905270229511147245`, with "index out of bounds: the len is 4096 but the index is 4096" ## Fixing the issue - When DB is opened, we read the `auto_vacuum` state, instead of assuming `auto_vacuum=none`. - Don't allow auto_vacuum to be flipped on non-empty databases as if we allow this it could cause overlap with existing bits.(ptrmap could overwrite existing data) - Modify integrity check to avoid reporting that page 2 is orphaned in auto-vacuumed databases. Fixes #3752 Closes #3830	2025-10-28 14:48:35 +02:00
Nikita Sivukhin	8ea733f917	fix bug with cursor allocation	2025-10-28 11:27:35 +04:00
Nikita Sivukhin	8acbe3de66	make query_start method to return bool - if result will have some rows or not	2025-10-28 11:27:35 +04:00
Nikita Sivukhin	67c1855ba8	fix bug	2025-10-28 11:27:35 +04:00
Nikita Sivukhin	6206294584	fix clippy	2025-10-28 11:27:35 +04:00
Nikita Sivukhin	d6972a9cf3	fix explain	2025-10-28 11:27:35 +04:00
Nikita Sivukhin	8dd2644c07	add support for new cursor type in existing op codes and also implement new opcodes in the VM	2025-10-28 11:27:35 +04:00
Nikita Sivukhin	b994e2cbd8	add new Cursor type	2025-10-28 11:27:35 +04:00
Nikita Sivukhin	5af10e6ccb	add IndexMethod specific VM instructions	2025-10-28 11:27:35 +04:00
Jussi Saurio	9c87b20cb2	Merge 'Where clause subquery support' from Jussi Saurio Closes #1282 # Support for WHERE clause subqueries This PR implements support for subqueries that appear in the WHERE clause of SELECT statements. ## What are those lol 1. EXISTS subqueries: `WHERE EXISTS (SELECT ...)` 2. Row value subqueries: `WHERE x = (SELECT ...)` or `WHERE (x, y) = (SELECT ...)`. The latter are not yet supported - only the single-column ("scalar subquery") case is. 3. IN subqueries: `WHERE x IN (SELECT ...)` or `WHERE (x, y) IN (SELECT ...)` ## Correlated vs Uncorrelated Subqueries - Uncorrelated subqueries reference only their own tables and can be evaluated once. - Correlated subqueries reference columns from the outer query (e.g., `WHERE EXISTS (SELECT * FROM t2 WHERE t2.id = t1.id)`) and must be re-evaluated for each row of the outer query ## Implementation ### Planning During query planning, the WHERE clause is walked to find subquery expressions (`Expr::Exists`, `Expr::Subquery`, `Expr::InSelect`). Each subquery is: 1. Assigned a unique internal ID 2. Compiled into its own `SelectPlan` with outer query tables provided as available references 3. Replaced in the AST with an `Expr::SubqueryResult` node that references the subquery with its internal ID 4. Stored in a `Vec<NonFromClauseSubquery>` on the `SelectPlan` For IN subqueries, an ephemeral index is created to store the subquery results; for other kinds, the results are stored in register(s). ### Translation Before emitting bytecode, we need to determine when each subquery should be evaluated: - Uncorrelated: Evaluated once before opening any table cursors - Correlated: Evaluated at the appropriate nested loop depth after all referenced outer tables are in scope This is calculated by examining which outer query tables the subquery references and finding the right-most (innermost) loop that opens those tables - using similar mechanisms that we use for figuring out when to evaluate other `WhereTerm`s too. ### Code Generation - EXISTS: Sets a register to 1 if any row is produced, 0 otherwise. Has new `QueryDestination::ExistsSubqueryResult` variant. - IN: Results stored in an ephemeral index and the index is probed. - RowValue: Results stored in a range of registers. Has new `QueryDestination::RowValueSubqueryResult` variant. ## Annoying details ### Which cursor to read from in a subquery? Sometimes a query will use a covering index, i.e. skip opening the table cursor at all if the index contains All The Needed Stuff. Correlated subqueries reading columns from outer tables is a bit problematic in this regard: with our current translation code, the subquery doesn't know whether the outer query opened a table cursor, index cursor, or both. So, for now, we try to find a table cursor first, then fall back to finding any index cursor for that table. Reviewed-by: Preston Thorpe <preston@turso.tech> Closes #3847	2025-10-28 06:36:55 +02:00
Jussi Saurio	e7aa7ee2ff	ProgramBuilder: add a few utility methods needed for correlated subqueries	2025-10-27 14:03:41 +02:00

1 2 3 4 5 ...

1570 Commits