turso

mirror of https://github.com/aljazceru/turso.git synced 2026-01-13 21:24:19 +01:00

Author	SHA1	Message	Date
Nikita Sivukhin	4c98861590	adjust logs	2025-10-29 16:24:05 +04:00
Pekka Enberg	dae2930743	Merge 'core: Switch to FxHash to improve performance' from Pekka Enberg The default Rust hash map is slow for integer keys. Switch to FxHash instead to reduce executed instructions for, for example, throughput benchmark. Before: ``` penberg@turing:~/src/tursodatabase/turso/perf/throughput/turso$ perf stat ../../../target/release/write-throughput --threads 1 --batch-size 100 --compute 0 -i 10000 Turso,1,100,0,106875.21 Performance counter stats for '../../../target/release/write-throughput --threads 1 --batch-size 100 --compute 0 -i 10000': 2,908.02 msec task-clock # 0.310 CPUs utilized 30,508 context-switches # 10.491 K/sec 261 cpu-migrations # 89.752 /sec 813 page-faults # 279.572 /sec 20,655,313,128 instructions # 1.73 insn per cycle # 0.14 stalled cycles per insn 11,930,088,949 cycles # 4.102 GHz 2,845,040,381 stalled-cycles-frontend # 23.85% frontend cycles idle 3,814,652,892 branches # 1.312 G/sec 54,760,600 branch-misses # 1.44% of all branches 9.372979876 seconds time elapsed 2.276835000 seconds user 0.530135000 seconds sys ``` After: ``` penberg@turing:~/src/tursodatabase/turso/perf/throughput/turso$ perf stat ../../../target/release/write-throughput --threads 1 --batch-size 100 --compute 0 -i 10000 Turso,1,100,0,108663.84 Performance counter stats for '../../../target/release/write-throughput --threads 1 --batch-size 100 --compute 0 -i 10000': 2,838.65 msec task-clock # 0.308 CPUs utilized 30,629 context-switches # 10.790 K/sec 351 cpu-migrations # 123.650 /sec 818 page-faults # 288.165 /sec 19,887,102,451 instructions # 1.72 insn per cycle # 0.14 stalled cycles per insn 11,593,166,024 cycles # 4.084 GHz 2,830,298,617 stalled-cycles-frontend # 24.41% frontend cycles idle 3,764,334,333 branches # 1.326 G/sec 53,157,766 branch-misses # 1.41% of all branches 9.218225731 seconds time elapsed 2.231889000 seconds user 0.508785000 seconds sys ``` Closes #3837	2025-10-28 14:49:09 +02:00
Pekka Enberg	810ed8ad60	Merge 'Don't allow autovacuum to be flipped on non-empty databases' from Pavan Nambi Turso incorrectly creates the first table in an autovacuumed table in page 2. (Note: this is on collaboration with @LeMikaelF) SQLite does not allow enabling or disabling auto-vacuum after the first table has been created (https://sqlite.org/pragma.html#pragma_auto_vacuum). This is because the sequence of the pages in the databases is different when auto-vacuum is enabled, because the first b-tree page must be page 3 instead of 2, to make room for the first [Pointer Map page](https://sqlite.org/fileformat.html#pointer_map_or_ptrmap_pages). But Turso doesn't currently consider this, which can lead to data loss. The simplest way to reproduce this is to create an autovacuumed databases with either `pragma auto_vacuum=full` so that autovacuum runs on each commit, and then create a table with some data. Turso will incorrectly create the new table on page 2. After this, every time a new page is created, either through a page split or because a new table is created, Turso will write a 5-byte pointer in page 2, starting from the top of the page, thereby overwriting existing data. For example, let's start with a clean database and the first bytes of page 2. It starts with `0d`, the discriminator for a leaf page ([source](https://www.sqlite.org/fileformat.html#b_tree_pages)). The next interesting number is the number of cells contained in this page (`01`) at offset 5. ``` $ cargo run -- /tmp/a.db turso> create table t(a); turso> insert into t values ('myvalue'); $ dbtotxt /tmp/a.db \| size 8192 pagesize 4096 filename a.db \| page 1 offset 0 # ...snip... \| page 2 offset 4096 \| 0: 0d 00 00 00 01 0f f5 00 0f f5 00 00 00 00 00 00 ................ \| 4080: 00 00 00 00 00 09 01 02 1b 6d 79 76 61 6c 75 65 .........myvalue \| end a.db ``` Pointer map pages are located every N pages, starting from page 2, and contain a list of 5-byte pointers that represent the parent page of a certain page. So whenever Turso or SQLite needs to add a page, it will overwrite 5 bytes of page 2. This means that for data loss to occur, it is sufficient to add a single page to the database, for example by creating a table. Offset 5 will then be zeroed out: ``` $ cargo run -- /tmp/a.db turso> create table t(a); turso> insert into t values ('myvalue'); turso> pragma auto_vacuum=full; turso> create table tt(a); $ dbtotxt /tmp/a.db \| size 12288 pagesize 4096 filename a.db \| page 1 offset 0 # ...snip... \| page 2 offset 4096 \| 0: 01 00 00 00 00 0f f5 00 0f f5 00 00 00 00 00 00 ................ \| 4080: 00 00 00 00 00 09 01 02 1b 6d 79 76 61 6c 75 65 .........myvalue ``` Creating more tables, or adding more B-tree pages, will keep overwriting the rest of the page, until the cells themselves are also overwritten. ## Reproducing the issue in the simulator We have been unable to reproduce this exact corruption mode in the simulator, but patching it shows many failure modes, all of which don't occur with the unpatched simulator. The following seeds are failing. The following seeds are showing the issue when the patched simulator is ran against `main`: - `11522841279124073062`, with "Assertion 'table inquisitive_graham_159 should contain all of its expected values' failed: table inquisitive_graham_159 does not contain the expected values, the simulator model has more rows than the database" - `7057400018220918989`, `16028085350691325843`, `7721542713659053944`, and `203017821863546118`, with "Failed to read ptrmap key=XXX" - `12533694709304969540`, `18357088553315413457`, `3108945730906932377`, with "Integrity Check Failed: Cell N in page 2 is out of range." - `4757352625344646473`, with "dirty pages should be empty for read txn" - `7083498604824302257`, with "header_size: 6272, header_len_bytes: 2, payload.len(): 13" - `17881876827470741581`, with "ParseError("no such table: focused_historians_416")" - `2092231500503735693`, with "range end index 4789 out of range for slice of length 4096" - `7555257419378470845`, with malformed database schema (imaginative_ontivero\u{1})" - `12905270229511147245`, with "index out of bounds: the len is 4096 but the index is 4096" ## Fixing the issue - When DB is opened, we read the `auto_vacuum` state, instead of assuming `auto_vacuum=none`. - Don't allow auto_vacuum to be flipped on non-empty databases as if we allow this it could cause overlap with existing bits.(ptrmap could overwrite existing data) - Modify integrity check to avoid reporting that page 2 is orphaned in auto-vacuumed databases. Fixes #3752 Closes #3830	2025-10-28 14:48:35 +02:00
Nikita Sivukhin	906bbdd1c4	support deep nestedness	2025-10-27 11:37:42 +04:00
Pekka Enberg	dfab8c44bc	core: Switch to FxHash to improve performance The default Rust hash map is slow for integer keys. Switch to FxHash instead to reduce executed instructions for, for example, throughput benchmark. Note that dirty page tracking is changed to BTreeMap to ensure that the hash function changes don't impact the WAL frame order, which SQLite guarantees to be page number ordered. Before: ``` penberg@turing:~/src/tursodatabase/turso/perf/throughput/turso$ perf stat ../../../target/release/write-throughput --threads 1 --batch-size 100 --compute 0 -i 10000 Turso,1,100,0,106875.21 Performance counter stats for '../../../target/release/write-throughput --threads 1 --batch-size 100 --compute 0 -i 10000': 2,908.02 msec task-clock # 0.310 CPUs utilized 30,508 context-switches # 10.491 K/sec 261 cpu-migrations # 89.752 /sec 813 page-faults # 279.572 /sec 20,655,313,128 instructions # 1.73 insn per cycle # 0.14 stalled cycles per insn 11,930,088,949 cycles # 4.102 GHz 2,845,040,381 stalled-cycles-frontend # 23.85% frontend cycles idle 3,814,652,892 branches # 1.312 G/sec 54,760,600 branch-misses # 1.44% of all branches 9.372979876 seconds time elapsed 2.276835000 seconds user 0.530135000 seconds sys ``` After: ``` penberg@turing:~/src/tursodatabase/turso/perf/throughput/turso$ perf stat ../../../target/release/write-throughput --threads 1 --batch-size 100 --compute 0 -i 10000 Turso,1,100,0,108663.84 Performance counter stats for '../../../target/release/write-throughput --threads 1 --batch-size 100 --compute 0 -i 10000': 2,838.65 msec task-clock # 0.308 CPUs utilized 30,629 context-switches # 10.790 K/sec 351 cpu-migrations # 123.650 /sec 818 page-faults # 288.165 /sec 19,887,102,451 instructions # 1.72 insn per cycle # 0.14 stalled cycles per insn 11,593,166,024 cycles # 4.084 GHz 2,830,298,617 stalled-cycles-frontend # 24.41% frontend cycles idle 3,764,334,333 branches # 1.326 G/sec 53,157,766 branch-misses # 1.41% of all branches 9.218225731 seconds time elapsed 2.231889000 seconds user 0.508785000 seconds sys ```	2025-10-26 16:48:59 +02:00
Pavan-Nambi	8d0ae362da	Merge branch 'main' of github.com:tursodatabase/turso into avcm	2025-10-24 18:58:30 +05:30
Jussi Saurio	64560a61c3	Merge 'Support statement-level rollback via anonymous savepoints' from Jussi Saurio ## Gist This PR implements _statement subtransactions_, which means that a single statement within an interactive transaction can individually be rolled back. ## Background The default constraint violation resolution strategy in SQLite is `ABORT`, which means to rollback the statement that caused the conflict. For example: ```sql CREATE TABLE t(x UNIQUE); INSERT INTO t VALUES (1); BEGIN; INSERT INTO t VALUES (2),(3); -- ok INSERT INTO t VALUES (4),(1); -- conflict on 1, this statement should rollback INSERT INTO t VALUES (5); -- ok COMMIT; -- ok SELECT * FROM t; 1 2 3 5 ``` So far we haven't been able to support this due to lack of support for subtransactions, and have used the `ROLLBACK` strategy, which means to rollback the entire transaction on any constraint error. ## Problem Although PRIMARY KEY and UNIQUE constraints allow defining the conflict resolution strategy (e.g. `id INTEGER PRIMARY KEY ON CONFLICT ROLLBACK`), FOREIGN KEY violations do not support this: they always use `ABORT` i.e. statement subtransaction rollback. For this reason alone it is important to implement this mechanism now rather than later, since we already have FOREIGN KEY support implemented. ## Details This PR implements statement subtransactions with _anonymous savepoints_. This means that whenever a statement begins, it will open a new savepoint which will write "page undo images" into a temporary file called a _subjournal_. Whenever the statement marks a page as dirty, it will write the before-image of the page into the subjournal so that its modifications can be undone in the event of an ABORT (statement rollback). - Right now, only anonymous savepoints are supported, so the explicit `SAVEPOINT` syntax is not. - Due to the above, there can be only one savepoint open per pager, and this is enforced with assertions. - The subjournal file is currently entirely in memory. If it were not, we would either have to block on IO or refactor many usages of code to account for potentially pending completions. - Constraint errors no longer cause transactions to abort nor do they cause the page cache to be cleared - instead, subjournaled pages will be brought back into the page cache which effectively handles the same behavior albeit more fine-grained. Reviewed-by: Preston Thorpe <preston@turso.tech> Closes #3792	2025-10-23 15:00:11 +03:00
Pekka Enberg	418fc90f8a	Merge 'core/storage: Cache schema cookie in Pager' from Pekka Enberg Every transaction was reading page 1 from the WAL to check the schema cookie in op_transaction, causing unnecessary WAL lookups. This commit caches the schema_cookie in Pager as AtomicU64, similar to how page_size and reserved_space are already cached. The cache is updated when the header is read/modified and invalidated in begin_read_tx() when WAL changes are detected from other connections. This matches SQLite's approach of caching frequently accessed header fields to avoid repeated page 1 reads. Improves write throughput by 5% in our benchmarks. Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #3727	2025-10-23 14:00:27 +03:00
Jussi Saurio	2b73260dd9	Handle cases where DB grows or shrinks due to savepoint rollback	2025-10-22 23:40:45 +03:00
Jussi Saurio	fe51804e6b	Implement crude way of making opening subtransaction conditional We don't want something like `BEGIN IMMEDIATE` to start a subtransaction, so instead we will open it if: - Statement is write, AND a) Statement has >0 table_references, or b) The statement is an INSERT (INSERT doesn't track table_references in the same way as other program types)	2025-10-22 23:40:45 +03:00
Jussi Saurio	a14bbdecf2	Add assertion that page is loaded when pager.add_dirty() is called	2025-10-22 23:40:45 +03:00
Jussi Saurio	d8cc57cf14	clippy: Remove unnecessary referencing	2025-10-22 23:40:45 +03:00
Jussi Saurio	25f8ba0025	Pager: clear savepoints when tx rolls back	2025-10-22 23:40:45 +03:00
Jussi Saurio	a8cf8e4594	Pager: subjournal page if required when it's marked as dirty	2025-10-22 23:40:45 +03:00
Jussi Saurio	97177dae02	add missing imports	2025-10-22 23:40:44 +03:00
Jussi Saurio	f4af7c2242	Pager: add begin_statement() method	2025-10-22 23:40:44 +03:00
Jussi Saurio	a19c5c22ac	Pager: add rollback_to_newest_savepoint() method	2025-10-22 23:40:44 +03:00
Jussi Saurio	86d5ad6815	pager: allow upserted cached page not to be dirty	2025-10-22 23:40:44 +03:00
Jussi Saurio	5b01605fae	Pager: add subjournal_page_if_required() method	2025-10-22 23:40:44 +03:00
Jussi Saurio	e8226c0e4b	Pager: add clear_savepoint() method	2025-10-22 23:40:44 +03:00
Jussi Saurio	aa1eebbfcb	Pager: add open_savepoint() and release_savepoint() methods	2025-10-22 23:40:44 +03:00
Jussi Saurio	77be1f08ae	Pager: add open_subjournal method	2025-10-22 23:40:44 +03:00
Jussi Saurio	2a03c1a617	Add subjournal and savepoints to Pager struct	2025-10-22 23:40:44 +03:00
Jussi Saurio	8b15a06a85	Add Savepoint struct	2025-10-22 23:40:44 +03:00
Pekka Enberg	5dd503b7b9	core/storage: Cache schema cookie in Pager Every transaction was reading page 1 from the WAL to check the schema cookie in op_transaction, causing unnecessary WAL lookups. This commit caches the schema_cookie in Pager as AtomicU64, similar to how page_size and reserved_space are already cached. The cache is updated when the header is read/modified and invalidated in begin_read_tx() when WAL changes are detected from other connections. This matches SQLite's approach of caching frequently accessed header fields to avoid repeated page 1 reads. Improves write throughput by 5% in our benchmarks.	2025-10-22 16:51:15 +03:00
PThorpe92	a8b257c664	Replace several RwLock<Enum> values with new AtomicEnums	2025-10-22 09:35:26 -04:00
Pavan-Nambi	9841f487a6	dont allow autovacuum on nonempty dbs adds a is_db_empty fn	2025-10-18 19:01:21 +05:30
Pavan-Nambi	1a058a1531	get autovacuum mode from db header on existing dbs if autovaccum on, look for ptrmap pages	2025-10-18 18:47:30 +05:30
Pekka Enberg	afa89c66c0	Merge 'Replace io_yield_many with completion groups' from Pekka Enberg Reviewed-by: Pedro Muniz (@pedrocarlo) Closes #3703	2025-10-16 17:17:43 +03:00
Pekka Enberg	bf5de920f2	core: Unsafe Send and Sync pushdown This patch pushes unsafe Send and Sync to individual components instead of doing it at Database level. This makes it easier for us to incrementally fix thread-safety, but avoid developers adding more thread unsafe code.	2025-10-16 11:26:50 +03:00
Pekka Enberg	af3a90bf4b	core: Kill Many variant from IOCompletions enum	2025-10-15 11:48:24 +03:00
Pekka Enberg	986faa42da	core/storage/pager: Replace io_yield_many with completion groups	2025-10-15 11:48:24 +03:00
Jussi Saurio	acb3c97fea	Merge 'When pwritev fails, clear the dirty pages' from Pedro Muniz If we don't clear the dirty pages, we will initiate a rollback. In the rollback, we will attempt to clear the whole page cache, but it will then panic because there will still be dirty pages from the failed writev Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #3189	2025-10-09 10:38:47 +03:00
Pere Diaz Bou	3e508a4b42	core/io: remove new_dummy in place of new_yield Yield is a completion that does not allocate any inner state. By design it is completed from the start and has no errors. This allows lightly yield without allocating any locks nor heap allocate inner state.	2025-10-07 12:00:33 +02:00
Pekka Enberg	a72b07e949	Merge 'Fix VDBE program abort' from Nikita Sivukhin This PR add proper program abort in case of unfinished statement reset and interruption. Also, this PR makes rollback methods non-failing because otherwise of their callers usually unclear (if rollback failed - what is the state of statement/connection/transaction?) Reviewed-by: Preston Thorpe <preston@turso.tech> Closes #3591	2025-10-07 09:07:07 +03:00
pedrocarlo	5a7390735d	rename Completion functions	2025-10-06 11:07:06 -03:00
Nikita Sivukhin	8dae601fac	make rollback non-failing method	2025-10-06 13:21:45 +04:00
Nikita Sivukhin	38d2630969	remove unnecessary SchemaLocked error - lock() return error in case when another thread panicked while holding the same lock - we better to just panic too in any such case	2025-10-06 12:15:15 +04:00
pedrocarlo	911b6791b9	when pwritev fails, clear the dirty pages add flag to `clear_page_cache`	2025-10-05 20:02:21 -03:00
pedrocarlo	e93add6c80	remove `dyn DatabaseStorage` and replace it with `DatabaseFile`	2025-10-03 14:14:15 -03:00
Avinash Sajjanshetty	3653c1a853	clear page cache when the encryption context is set	2025-10-02 19:50:12 +05:30
pedrocarlo	65cd4d998d	page_size can be 0 when it is not initialized, so account for that	2025-09-30 15:58:38 -03:00
pedrocarlo	aa5055e563	fuzz tests for pending_byte	2025-09-30 13:52:40 -03:00
pedrocarlo	3d5978c718	add special hipp pending page that is supposed to be ignored	2025-09-30 13:43:10 -03:00
Avinash Sajjanshetty	c8111f9555	Put encryption behind an opt in (runtime) flag	2025-09-30 18:29:18 +05:30
Jussi Saurio	35b584f050	Merge 'core: change root_page to i64' from Pere Diaz Bou Closes #3454	2025-09-30 12:50:23 +03:00
Pere Diaz Bou	2fff6bb119	core: page id to usize	2025-09-30 11:35:06 +02:00
Pere Diaz Bou	af98067ff1	fmt	2025-09-29 18:40:17 +02:00
Pere Diaz Bou	0f631101df	core: change page idx type from usize to i64 MVCC is like the annoying younger cousin (I know because I was him) that needs to be treated differently. MVCC requires us to use root_pages that might not be allocated yet, and the plan is to use negative root_pages for that case. Therefore, we need i64 in order to fit this change.	2025-09-29 18:38:43 +02:00
Pekka Enberg	f247b1a2bb	core/storage: Wrap Pager::commit_info with RwLock Also remove RefCells from CommitInfo because they're not only redundant, but cause CommitInfo not to be Send.	2025-09-29 13:54:28 +03:00

1 2 3 4 5 ...

457 Commits