turso

mirror of https://github.com/aljazceru/turso.git synced 2026-01-11 20:24:21 +01:00

Author	SHA1	Message	Date
Nikita Sivukhin	4c98861590	adjust logs	2025-10-29 16:24:05 +04:00
Nikita Sivukhin	a2d11f9263	reset cursors when statement is reseted	2025-10-29 15:13:00 +04:00
Jussi Saurio	ad723b615f	Merge 'index_method: fully integrate into query planner' from Nikita Sivukhin This PR completely integrate custom indices to the query planner. In order to do that new `Cursor::IndexMethod` is introduced with few correlated changes in the VM implementation: 1. Added special `IndexMethod{Create,Destroy,Query}` opcodes to handle index method creation, deletion and query 2. `Next` , `IdxRowid` , `IdxInsert`, `IdxDelete` opcodes updated to properly handle new cursor case Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #3827	2025-10-29 09:42:37 +02:00
Pekka Enberg	810ed8ad60	Merge 'Don't allow autovacuum to be flipped on non-empty databases' from Pavan Nambi Turso incorrectly creates the first table in an autovacuumed table in page 2. (Note: this is on collaboration with @LeMikaelF) SQLite does not allow enabling or disabling auto-vacuum after the first table has been created (https://sqlite.org/pragma.html#pragma_auto_vacuum). This is because the sequence of the pages in the databases is different when auto-vacuum is enabled, because the first b-tree page must be page 3 instead of 2, to make room for the first [Pointer Map page](https://sqlite.org/fileformat.html#pointer_map_or_ptrmap_pages). But Turso doesn't currently consider this, which can lead to data loss. The simplest way to reproduce this is to create an autovacuumed databases with either `pragma auto_vacuum=full` so that autovacuum runs on each commit, and then create a table with some data. Turso will incorrectly create the new table on page 2. After this, every time a new page is created, either through a page split or because a new table is created, Turso will write a 5-byte pointer in page 2, starting from the top of the page, thereby overwriting existing data. For example, let's start with a clean database and the first bytes of page 2. It starts with `0d`, the discriminator for a leaf page ([source](https://www.sqlite.org/fileformat.html#b_tree_pages)). The next interesting number is the number of cells contained in this page (`01`) at offset 5. ``` $ cargo run -- /tmp/a.db turso> create table t(a); turso> insert into t values ('myvalue'); $ dbtotxt /tmp/a.db \| size 8192 pagesize 4096 filename a.db \| page 1 offset 0 # ...snip... \| page 2 offset 4096 \| 0: 0d 00 00 00 01 0f f5 00 0f f5 00 00 00 00 00 00 ................ \| 4080: 00 00 00 00 00 09 01 02 1b 6d 79 76 61 6c 75 65 .........myvalue \| end a.db ``` Pointer map pages are located every N pages, starting from page 2, and contain a list of 5-byte pointers that represent the parent page of a certain page. So whenever Turso or SQLite needs to add a page, it will overwrite 5 bytes of page 2. This means that for data loss to occur, it is sufficient to add a single page to the database, for example by creating a table. Offset 5 will then be zeroed out: ``` $ cargo run -- /tmp/a.db turso> create table t(a); turso> insert into t values ('myvalue'); turso> pragma auto_vacuum=full; turso> create table tt(a); $ dbtotxt /tmp/a.db \| size 12288 pagesize 4096 filename a.db \| page 1 offset 0 # ...snip... \| page 2 offset 4096 \| 0: 01 00 00 00 00 0f f5 00 0f f5 00 00 00 00 00 00 ................ \| 4080: 00 00 00 00 00 09 01 02 1b 6d 79 76 61 6c 75 65 .........myvalue ``` Creating more tables, or adding more B-tree pages, will keep overwriting the rest of the page, until the cells themselves are also overwritten. ## Reproducing the issue in the simulator We have been unable to reproduce this exact corruption mode in the simulator, but patching it shows many failure modes, all of which don't occur with the unpatched simulator. The following seeds are failing. The following seeds are showing the issue when the patched simulator is ran against `main`: - `11522841279124073062`, with "Assertion 'table inquisitive_graham_159 should contain all of its expected values' failed: table inquisitive_graham_159 does not contain the expected values, the simulator model has more rows than the database" - `7057400018220918989`, `16028085350691325843`, `7721542713659053944`, and `203017821863546118`, with "Failed to read ptrmap key=XXX" - `12533694709304969540`, `18357088553315413457`, `3108945730906932377`, with "Integrity Check Failed: Cell N in page 2 is out of range." - `4757352625344646473`, with "dirty pages should be empty for read txn" - `7083498604824302257`, with "header_size: 6272, header_len_bytes: 2, payload.len(): 13" - `17881876827470741581`, with "ParseError("no such table: focused_historians_416")" - `2092231500503735693`, with "range end index 4789 out of range for slice of length 4096" - `7555257419378470845`, with malformed database schema (imaginative_ontivero\u{1})" - `12905270229511147245`, with "index out of bounds: the len is 4096 but the index is 4096" ## Fixing the issue - When DB is opened, we read the `auto_vacuum` state, instead of assuming `auto_vacuum=none`. - Don't allow auto_vacuum to be flipped on non-empty databases as if we allow this it could cause overlap with existing bits.(ptrmap could overwrite existing data) - Modify integrity check to avoid reporting that page 2 is orphaned in auto-vacuumed databases. Fixes #3752 Closes #3830	2025-10-28 14:48:35 +02:00
Nikita Sivukhin	8ea733f917	fix bug with cursor allocation	2025-10-28 11:27:35 +04:00
Nikita Sivukhin	8acbe3de66	make query_start method to return bool - if result will have some rows or not	2025-10-28 11:27:35 +04:00
Nikita Sivukhin	67c1855ba8	fix bug	2025-10-28 11:27:35 +04:00
Nikita Sivukhin	6206294584	fix clippy	2025-10-28 11:27:35 +04:00
Nikita Sivukhin	d6972a9cf3	fix explain	2025-10-28 11:27:35 +04:00
Nikita Sivukhin	8dd2644c07	add support for new cursor type in existing op codes and also implement new opcodes in the VM	2025-10-28 11:27:35 +04:00
Nikita Sivukhin	b994e2cbd8	add new Cursor type	2025-10-28 11:27:35 +04:00
Nikita Sivukhin	5af10e6ccb	add IndexMethod specific VM instructions	2025-10-28 11:27:35 +04:00
Jussi Saurio	9c87b20cb2	Merge 'Where clause subquery support' from Jussi Saurio Closes #1282 # Support for WHERE clause subqueries This PR implements support for subqueries that appear in the WHERE clause of SELECT statements. ## What are those lol 1. EXISTS subqueries: `WHERE EXISTS (SELECT ...)` 2. Row value subqueries: `WHERE x = (SELECT ...)` or `WHERE (x, y) = (SELECT ...)`. The latter are not yet supported - only the single-column ("scalar subquery") case is. 3. IN subqueries: `WHERE x IN (SELECT ...)` or `WHERE (x, y) IN (SELECT ...)` ## Correlated vs Uncorrelated Subqueries - Uncorrelated subqueries reference only their own tables and can be evaluated once. - Correlated subqueries reference columns from the outer query (e.g., `WHERE EXISTS (SELECT * FROM t2 WHERE t2.id = t1.id)`) and must be re-evaluated for each row of the outer query ## Implementation ### Planning During query planning, the WHERE clause is walked to find subquery expressions (`Expr::Exists`, `Expr::Subquery`, `Expr::InSelect`). Each subquery is: 1. Assigned a unique internal ID 2. Compiled into its own `SelectPlan` with outer query tables provided as available references 3. Replaced in the AST with an `Expr::SubqueryResult` node that references the subquery with its internal ID 4. Stored in a `Vec<NonFromClauseSubquery>` on the `SelectPlan` For IN subqueries, an ephemeral index is created to store the subquery results; for other kinds, the results are stored in register(s). ### Translation Before emitting bytecode, we need to determine when each subquery should be evaluated: - Uncorrelated: Evaluated once before opening any table cursors - Correlated: Evaluated at the appropriate nested loop depth after all referenced outer tables are in scope This is calculated by examining which outer query tables the subquery references and finding the right-most (innermost) loop that opens those tables - using similar mechanisms that we use for figuring out when to evaluate other `WhereTerm`s too. ### Code Generation - EXISTS: Sets a register to 1 if any row is produced, 0 otherwise. Has new `QueryDestination::ExistsSubqueryResult` variant. - IN: Results stored in an ephemeral index and the index is probed. - RowValue: Results stored in a range of registers. Has new `QueryDestination::RowValueSubqueryResult` variant. ## Annoying details ### Which cursor to read from in a subquery? Sometimes a query will use a covering index, i.e. skip opening the table cursor at all if the index contains All The Needed Stuff. Correlated subqueries reading columns from outer tables is a bit problematic in this regard: with our current translation code, the subquery doesn't know whether the outer query opened a table cursor, index cursor, or both. So, for now, we try to find a table cursor first, then fall back to finding any index cursor for that table. Reviewed-by: Preston Thorpe <preston@turso.tech> Closes #3847	2025-10-28 06:36:55 +02:00
Jussi Saurio	e7aa7ee2ff	ProgramBuilder: add a few utility methods needed for correlated subqueries	2025-10-27 14:03:41 +02:00
Nikita Sivukhin	408ca235d1	small refactoring	2025-10-27 12:43:38 +04:00
Nikita Sivukhin	906bbdd1c4	support deep nestedness	2025-10-27 11:37:42 +04:00
Pekka Enberg	7d035f27d8	Merge 'Strict numeric cast for op_must_be_int' from bit-aloo closes: #3302 Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #3771	2025-10-26 16:42:35 +02:00
Pekka Enberg	6603f5318a	Merge 'core/vdbe: Reuse cursor in op_open_write()' from Pekka Enberg This optimization reuses an existing cursor when op_open_write() is called on the same table/index (same root_page). This is safe because the cursor position doesn't matter - op_rewind() is always called after op_open_write() to position the cursor at the beginning of the table/index before any operations are performed. This change speeds up op_open_write() by avoiding unnecessary cursor re- initialization. Closes #3815	2025-10-26 12:29:20 +02:00
Pekka Enberg	ca073b5ecd	Merge 'core: Switch RwLock<Arc<Pager>> to ArcSwap<Pager>' from Pekka Enberg We don't actually need the RwLock locking capabilities, just the ability to swap the instance. Closes #3814	2025-10-26 12:29:11 +02:00
Pavan-Nambi	277a989a71	fmt	2025-10-24 21:34:17 +05:30
Pavan-Nambi	7dda783006	clippy - gotta feature autovaccuum n ptrmaps	2025-10-24 21:30:34 +05:30
Pavan-Nambi	8d0ae362da	Merge branch 'main' of github.com:tursodatabase/turso into avcm	2025-10-24 18:58:30 +05:30
Pekka Enberg	c3fb867173	core: Switch RwLock<Arc<Pager>> to ArcSwap<Pager> We don't actually need the RwLock locking capabilities, just the ability to swap the instance.	2025-10-24 14:10:08 +03:00
bit-aloo	64bbca9e12	Fix op_must_be_int to use strict numeric cast	2025-10-24 16:08:15 +05:30
Pekka Enberg	413c582b41	core/vdbe: Reuse cursor in op_open_write() This optimization reuses an existing cursor when op_open_write() is called on the same table/index (same root_page). This is safe because the cursor position doesn't matter - op_rewind() is always called after op_open_write() to position the cursor at the beginning of the table/index before any operations are performed. This change speeds up op_open_write() by avoiding unnecessary cursor re-initialization.	2025-10-23 16:34:42 +03:00
Jussi Saurio	ae22468d8b	Merge 'Order by heap sort' from Nikita Sivukhin This PR implements simple heap-sort approach for query plans like `SELECT ... FROM t WHERE ... ORDER BY ... LIMIT N` in order to maintain small set of top N elements in the ephemeral B-tree and avoid sort and materialization of whole dataset. I removed all optimizations not related to this particular change in order to make branch lightweight. Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #3726	2025-10-23 15:00:42 +03:00
Jussi Saurio	64560a61c3	Merge 'Support statement-level rollback via anonymous savepoints' from Jussi Saurio ## Gist This PR implements _statement subtransactions_, which means that a single statement within an interactive transaction can individually be rolled back. ## Background The default constraint violation resolution strategy in SQLite is `ABORT`, which means to rollback the statement that caused the conflict. For example: ```sql CREATE TABLE t(x UNIQUE); INSERT INTO t VALUES (1); BEGIN; INSERT INTO t VALUES (2),(3); -- ok INSERT INTO t VALUES (4),(1); -- conflict on 1, this statement should rollback INSERT INTO t VALUES (5); -- ok COMMIT; -- ok SELECT * FROM t; 1 2 3 5 ``` So far we haven't been able to support this due to lack of support for subtransactions, and have used the `ROLLBACK` strategy, which means to rollback the entire transaction on any constraint error. ## Problem Although PRIMARY KEY and UNIQUE constraints allow defining the conflict resolution strategy (e.g. `id INTEGER PRIMARY KEY ON CONFLICT ROLLBACK`), FOREIGN KEY violations do not support this: they always use `ABORT` i.e. statement subtransaction rollback. For this reason alone it is important to implement this mechanism now rather than later, since we already have FOREIGN KEY support implemented. ## Details This PR implements statement subtransactions with _anonymous savepoints_. This means that whenever a statement begins, it will open a new savepoint which will write "page undo images" into a temporary file called a _subjournal_. Whenever the statement marks a page as dirty, it will write the before-image of the page into the subjournal so that its modifications can be undone in the event of an ABORT (statement rollback). - Right now, only anonymous savepoints are supported, so the explicit `SAVEPOINT` syntax is not. - Due to the above, there can be only one savepoint open per pager, and this is enforced with assertions. - The subjournal file is currently entirely in memory. If it were not, we would either have to block on IO or refactor many usages of code to account for potentially pending completions. - Constraint errors no longer cause transactions to abort nor do they cause the page cache to be cleared - instead, subjournaled pages will be brought back into the page cache which effectively handles the same behavior albeit more fine-grained. Reviewed-by: Preston Thorpe <preston@turso.tech> Closes #3792	2025-10-23 15:00:11 +03:00
Jussi Saurio	2b73260dd9	Handle cases where DB grows or shrinks due to savepoint rollback	2025-10-22 23:40:45 +03:00
Jussi Saurio	fe51804e6b	Implement crude way of making opening subtransaction conditional We don't want something like `BEGIN IMMEDIATE` to start a subtransaction, so instead we will open it if: - Statement is write, AND a) Statement has >0 table_references, or b) The statement is an INSERT (INSERT doesn't track table_references in the same way as other program types)	2025-10-22 23:40:45 +03:00
Jussi Saurio	7376475cb3	Do not start statement subtransactions when MVCC is enabled MVCC does not support statement-level rollback.	2025-10-22 23:40:45 +03:00
Jussi Saurio	d8cc57cf14	clippy: Remove unnecessary referencing	2025-10-22 23:40:45 +03:00
Jussi Saurio	086ba8c946	VDBE: begin statement subtransaction in op_transaction	2025-10-22 23:40:45 +03:00
Jussi Saurio	904cbe535d	VDBE: handle subtransaction commits/aborts in op_halt	2025-10-22 23:40:45 +03:00
Jussi Saurio	f0548c280f	ProgramState: add begin_statement() and end_statement()	2025-10-22 23:40:45 +03:00
Jussi Saurio	734eeb5bab	VDBE: constraint errors do not cause a tx rollback by default	2025-10-22 23:40:45 +03:00
Jussi Saurio	ad80285437	Rename is_scope to deferred and invert respective boolean logic Much clearer name for what it is/does	2025-10-22 23:40:44 +03:00
Jussi Saurio	d4a9797f79	Store two foreign key counters in ProgramState 1. The number of deferred FK violations when the statement started. When a statement subtransaction rolls back, the connection's deferred violation counter will be reset to this value. 2. The number of immediate FK violations that occurred during the statement. In practice we just need to know whether this number is nonzero, and if it is, the statement subtransaction will roll back. Statement subtransactions will be implemented in future commits.	2025-10-22 23:40:44 +03:00
Nikita Sivukhin	91ffb4e249	Revert "avoid allocations" This reverts commit `dba195bdfa`.	2025-10-22 20:21:39 +04:00
Nikita Sivukhin	53957b6d22	Revert "simplify serial_type size calculation" This reverts commit `f19c73822e`.	2025-10-22 20:21:00 +04:00
Nikita Sivukhin	b32d22a2fd	Revert "move more possible option higher" This reverts commit `c0fdaeb475`.	2025-10-22 20:20:54 +04:00
Nikita Sivukhin	8e1cec5104	Revert "alternative read_variant implementation" This reverts commit `68650cf594`.	2025-10-22 19:30:43 +04:00
Pekka Enberg	5dd503b7b9	core/storage: Cache schema cookie in Pager Every transaction was reading page 1 from the WAL to check the schema cookie in op_transaction, causing unnecessary WAL lookups. This commit caches the schema_cookie in Pager as AtomicU64, similar to how page_size and reserved_space are already cached. The cache is updated when the header is read/modified and invalidated in begin_read_tx() when WAL changes are detected from other connections. This matches SQLite's approach of caching frequently accessed header fields to avoid repeated page 1 reads. Improves write throughput by 5% in our benchmarks.	2025-10-22 16:51:15 +03:00
Nikita Sivukhin	689c11a21a	cargo fmt	2025-10-22 17:45:49 +04:00
Nikita Sivukhin	671d266dd6	Revert "wip" This reverts commit `dd34f7fd50`.	2025-10-22 11:47:46 +04:00
Nikita Sivukhin	bf77862fab	Merge branch 'main' into order-by-heap-sort	2025-10-22 11:44:55 +04:00
Pekka Enberg	1aad1b224a	Merge 'core/io: Make random generation deterministically simulated' from Pedro Muniz Depends on #3584 to use the most up-to-date implementation of `ThreadRng` - Add `fill_bytes` method to `IO` - use `thread_rng` instead of `getrandom`, as `getrandom` is much slower and `thread_rng` offers enough security - modify `exec_randomblob`, `exec_random` and random_rowid generation to use methods from IO for determinism - modified simulator IO to implement `fill_bytes` This the PRNG for sqlite if someone is curious. It is similar to `thread_rng`: ```c /* Initialize the state of the random number generator once, ** the first time this routine is called. / if( wsdPrng.s[0]==0 ){ sqlite3_vfs pVfs = sqlite3_vfs_find(0); static const u32 chacha20_init[] = { 0x61707865, 0x3320646e, 0x79622d32, 0x6b206574 }; memcpy(&wsdPrng.s[0], chacha20_init, 16); if( NEVER(pVfs==0) ){ memset(&wsdPrng.s[4], 0, 44); }else{ sqlite3OsRandomness(pVfs, 44, (char)&wsdPrng.s[4]); } wsdPrng.s[15] = wsdPrng.s[12]; wsdPrng.s[12] = 0; wsdPrng.n = 0; } assert( N>0 ); while( 1 / exit by break / ){ if( N<=wsdPrng.n ){ memcpy(zBuf, &wsdPrng.out[wsdPrng.n-N], N); wsdPrng.n -= N; break; } if( wsdPrng.n>0 ){ memcpy(zBuf, wsdPrng.out, wsdPrng.n); N -= wsdPrng.n; zBuf += wsdPrng.n; } wsdPrng.s[12]++; chacha_block((u32)wsdPrng.out, wsdPrng.s); wsdPrng.n = 64; } sqlite3_mutex_leave(mutex); ``` Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #3799	2025-10-22 09:10:36 +03:00
Pere Diaz Bou	3227caaa1d	Merge 'core: move BTreeCursor under MVCC cursor' from Pere Diaz Bou Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #3756	2025-10-21 19:20:49 +02:00
pedrocarlo	8c0b9c6979	add additional `fill_bytes` method to `IO` to deterministically generate random bytes and modify random functions to use them	2025-10-21 14:10:38 -03:00
pedrocarlo	8501bc930a	use workspace rand version	2025-10-21 14:10:05 -03:00
Pere Diaz Bou	ea04e9033a	core/mvcc: add btree_cursor under MVCC cursor	2025-10-21 18:22:37 +02:00

1 2 3 4 5 ...

1534 Commits