turso

mirror of https://github.com/aljazceru/turso.git synced 2025-12-29 05:54:21 +01:00

Author	SHA1	Message	Date
Pere Diaz Bou	d616a375ee	core/mvcc: commit_tx state machine	2025-08-01 12:36:02 +02:00
Jussi Saurio	69fc1ea238	Merge 'perf/btree: improve performance of rowid() function' from Jussi Saurio if the table is an intkey table, we can read the rowid directly without deserializing the full cell, and we also don't need to start deserializing the record if only the rowid is requested. ```sql Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/1: Collecting 100 samples in estimated 5.0007 s (11M i Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/1 time: [469.38 ns 470.77 ns 472.40 ns] change: [-5.8959% -5.5232% -5.1840%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/10: Collecting 100 samples in estimated 5.0088 s (1.9M Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/10 time: [2.6523 µs 2.6596 µs 2.6685 µs] change: [-8.7117% -8.4083% -8.0949%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 1 (1.00%) low mild 3 (3.00%) high mild 3 (3.00%) high severe Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/50: Collecting 100 samples in estimated 5.0197 s (399k Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/50 time: [12.514 µs 12.545 µs 12.578 µs] change: [-9.5243% -9.0562% -8.6227%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/100: Collecting 100 samples in estimated 5.0600 s (202 Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/100 time: [25.135 µs 25.291 µs 25.470 µs] change: [-8.8822% -8.3943% -7.8854%] (p = 0.00 < 0.05) Performance has improved. ``` "only" 4x slower than sqlite on `SELECT * FROM users LIMIT 100` after this! Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #2382	2025-08-01 13:35:02 +03:00
Jussi Saurio	e1dd028136	Merge 'Fix vector deserialization alignment and blob/text empty mismatch' from bit-aloo * Previously, deserializing an empty vector used `Vec::new()`, resulting in zero capacity, which is not guaranteed to be aligned for `f32`/`f64`. This could lead to undefined behavior when interpreting the data. * We also inconsistently treated empty input: `"[]"` (text) was accepted as a zero-length vector, but empty blobs (`&[]`) were rejected. * Now: * We initialize empty vectors with at least one element’s capacity to preserve alignment. * We allow zero-sized blobs and treat them the same as `"[]""` input as empty vectors. Closes #2371	2025-08-01 13:03:20 +03:00
Pere Diaz Bou	0cefb01395	mvcc_benchmark: clippy	2025-08-01 11:01:29 +02:00
Jussi Saurio	c9a3a65942	perf/btree: don't waste time reading contents twice	2025-08-01 11:49:41 +03:00
Jussi Saurio	111c1e64c4	perf/btree: improve performance of rowid() function if the table is an intkey table, we can read the rowid directly without deserializing the full cell, and we also don't need to start deserializing the record if only the rowid is requested. ```sql Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/1: Collecting 100 samples in estimated 5.0007 s (11M i Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/1 time: [469.38 ns 470.77 ns 472.40 ns] change: [-5.8959% -5.5232% -5.1840%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/10: Collecting 100 samples in estimated 5.0088 s (1.9M Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/10 time: [2.6523 µs 2.6596 µs 2.6685 µs] change: [-8.7117% -8.4083% -8.0949%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 1 (1.00%) low mild 3 (3.00%) high mild 3 (3.00%) high severe Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/50: Collecting 100 samples in estimated 5.0197 s (399k Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/50 time: [12.514 µs 12.545 µs 12.578 µs] change: [-9.5243% -9.0562% -8.6227%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe Benchmarking Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/100: Collecting 100 samples in estimated 5.0600 s (202 Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/100 time: [25.135 µs 25.291 µs 25.470 µs] change: [-8.8822% -8.3943% -7.8854%] (p = 0.00 < 0.05) Performance has improved. ```	2025-08-01 11:44:53 +03:00
Pere Diaz Bou	c807b035c5	core/mvcc: fix tests again had to create connections for every different txn	2025-08-01 10:44:19 +02:00
Pere Diaz Bou	5ad7d10790	core/mvcc: fix use of rwlock	2025-08-01 10:38:41 +02:00
Pere Diaz Bou	b518e1f839	core/mvcc: add missing arc import	2025-08-01 10:38:41 +02:00
Pere Diaz Bou	c4318cac36	core/mvcc: fix tests	2025-08-01 10:38:41 +02:00
Pere Diaz Bou	49a00ff338	core/mvcc: load table's rowid on initialization We need to load rowids into mvcc's store in order before doing any read in case there are rows. This has a performance penalty for now as expected because we should, ideally, scan for row ids lazily instead.	2025-08-01 10:38:41 +02:00
Pere Diaz Bou	b399ddea1b	core/mvcc: begin pager read txn on mvcc begin_txn	2025-08-01 10:38:41 +02:00
Pere Diaz Bou	b4ac38cd25	core/mvcc: persist writes on mvcc commit On Mvcc `commit_txn` we need to persist changes to database, for this case we re-use pager's semantics of transactions: 1. If there are no conflicts, we start `pager.begin_write_txn` 2. `pager.end_txn`: We flush changes to WAL 3. We finish Mvcc transaction by marking rows with new timestamp.	2025-08-01 10:38:41 +02:00
Jussi Saurio	2233bb41c3	Merge 'fix/wal: reset ongoing checkpoint state when checkpoint fails' from Jussi Saurio ## What The following sequence of actions is possible: ```sql -- TRUNCATE checkpoint fails during WAL restart, -- but OngoingCheckpoint.state is still left at Done for conn 0 Connection 0(op=23): PRAGMA wal_checkpoint(TRUNCATE) Connection 0(op=23) Checkpoint TRUNCATE: OK: false, wal_page_count: NULL, checkpointed_count: NULL -- TRUNCATE checkpoint succeeds for conn 1 Connection 1(op=26): PRAGMA wal_checkpoint(TRUNCATE) Connection 1(op=26) Checkpoint TRUNCATE: OK: true, wal_page_count: 0, checkpointed_count: 0 -- Conn 0 now does a PASSIVE checkpoint, and immediately thinks -- it's in the Done state, and thinks it checkpointed 17 frames. -- since mode is PASSIVE, it now thinks both the WAL and the DB have those 17 frames -- so the first 17 frames of the WAL can be ignored from now on. Connection 0(op=27): PRAGMA wal_checkpoint(PASSIVE) Connection 0(op=27) Checkpoint PASSIVE: OK: true, wal_page_count: 0, checkpointed_count: 17 -- Connection 0 starts a txn with min=18 (ignore first 17 frames in WAL), -- and deletes rowid=690, which becomes WAL frame number 1 Connection 0(op=28): DELETE FROM test_table WHERE id = 690 begin_read_tx(min=18, max=0, slot=1, max_frame_in_wal=0) -- Connection 1 starts a txn with min=18 (ignore first 17 frames in WAL), -- and inserts rowid=1128, which becomes WAL frame number 2 Connection 1(op=28): INSERT INTO test_table (id, text) VALUES (1128, text_560) begin_read_tx(min=18, max=1, slot=1, max_frame_in_wal=1) -- Connection 0 again starts tx with min=18, and performs a read, and two wrong things happen: -- 1. it doesn't see row 690 as deleted, because it's in WAL frame 1, which it ignores -- 2. it doesn't see the new row 1128, because it's in WAL frame 2, which it ignores Connection 0(op=29): SELECT * FROM test_table begin_read_tx(min=18, max=2, slot=1, max_frame_in_wal=2) ``` ## Fix Reset `ongoing_checkpoint.state` to `Start` when checkpoint fails. Issue found in #2364 . Reviewed-by: bit-aloo (@Shourya742) Closes #2380	2025-08-01 11:28:04 +03:00
Jussi Saurio	d465abeced	Merge 'Open a temporary on-disk file for ephemeral tables' from Jussi Saurio Closes #2219 ## What Ephemeral tables and indexes should use a temporary database file instead of being backed only by memory. ## Why This makes them able to spill to disk when necessary when their page cache is nearing its memory limit. However, they should spill directly to the temporary database file without WAL journaling, since a WAL is not necessary (or even desirable) for ephemeral tables. Spilling is not implemented yet for any use case - this is just an enabler for it. ## Implementation details - Create random filename using `io.generate_random_number()` in platform-specific temporary directory - Make `pager.wal` an optional property again, removing `DummyWAL` - Remove `FileMemoryStorage` as it is never used Closes #2315	2025-08-01 11:06:08 +03:00
Jussi Saurio	c19e7d20c1	Merge 'Force Sqlite to parse schema on connection benchmark' from Levy A. Resolves #2312. <img width="973" height="213" alt="image" src="https://github.com/user- attachments/assets/a243d61c-9987-4520-9155-6bef5d162179" /> ``` Open/Connect/limbo_schema/ time: [11.669 ms 11.683 ms 11.700 ms] change: [-4.3350% -1.8204% +0.2040%] (p = 0.11 > 0.05) No change in performance detected. Found 9 outliers among 100 measurements (9.00%) 5 (5.00%) high mild 4 (4.00%) high severe Open/Connect/sqlite_schema/ time: [10.479 ms 10.693 ms 10.969 ms] change: [+0.3783% +2.4616% +5.2808%] (p = 0.02 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) high mild 5 (5.00%) high severe ``` Closes #2375	2025-08-01 10:24:03 +03:00
Jussi Saurio	7259751eba	Merge 'Support the OFFSET clause for Compound select' from meteorgan Closes #2376	2025-08-01 10:18:13 +03:00
Jussi Saurio	77666b1eb5	Merge 'Fix parser error for repetition in row values' from Diego Reis Closes #1948 This PR also adds pretty basic support for [row values in UPDATE stateme nts](https://sqlite.org/rowvalue.html#row_values_in_update_statements), but it only accepts expressions like: ```sql UPDATE t SET (a, b) = (2 + 2, 'joe'); ``` While SQLite accepts whole new statements, like: ```sql UPDATE tab3 SET (a,b,c) = (SELECT x,y,z FROM tab4 WHERE tab4.w=tab3.d) WHERE tab3.e BETWEEN 55 AND 66; ``` I noticed we don't explicitly have the concept of row values, maybe doing some plumbing in that matter could solve it? If there is a way to implement that with our current infrastructure (a.k.a skill issue from my side) please comment here. Closes #2355	2025-08-01 10:17:05 +03:00
Jussi Saurio	456b7404fb	storage: remove FileMemoryStorage as it is never used	2025-08-01 10:14:36 +03:00
Jussi Saurio	e147494642	pager: make WAL optional again and remove DummyWAL	2025-08-01 10:14:35 +03:00
Jussi Saurio	8c6293ebb7	VDBE: use temporary on-disk file for OpenEphemeral	2025-08-01 10:14:01 +03:00
Jussi Saurio	e6528f2664	fix/wal: reset ongoing checkpoint state when checkpoint fails	2025-08-01 08:39:34 +03:00
meteorgan	6262ff4267	support offset for values	2025-08-01 00:46:46 +08:00
Levy A.	cf91e36ed3	fix: force sqlite to parse schema on connection benchmark	2025-07-31 13:24:59 -03:00
bit-aloo	86b72758ff	fix clippy	2025-07-31 20:51:43 +05:30
bit-aloo	a3d3a21030	allow empty vector blobs by removing is_empty check in vector_type	2025-07-31 20:24:59 +05:30
bit-aloo	78d291b73f	assert empty vector concat returns empty vector	2025-07-31 20:24:59 +05:30
bit-aloo	09542c9be0	ensure f64 slice view is properly aligned and sized	2025-07-31 20:24:59 +05:30
bit-aloo	6b7b1f43a4	ensure f32 slice view is properly aligned and sized	2025-07-31 20:24:59 +05:30
pedrocarlo	1abe8fd70c	state machine `seek_to_last`	2025-07-31 11:51:17 -03:00
pedrocarlo	543cdb3e2c	underscoring completions and IOResult to avoid warning messages	2025-07-31 11:51:17 -03:00
pedrocarlo	6bfba2518e	state machine for `move_to_rightmost`	2025-07-31 11:49:12 -03:00
pedrocarlo	966b96882e	`move_to_root` should return completion	2025-07-31 11:49:12 -03:00
pedrocarlo	cf951e24cd	add state machine for `is_empty_table` in preparation for IO Completion refactor	2025-07-31 11:49:12 -03:00
pedrocarlo	7012860800	create separate state machines file	2025-07-31 11:49:12 -03:00
Preston Thorpe	bd9df6262f	Merge 'IN queries' from Glauber Costa Merge 'IN queries' from Glauber Costa Implement IN queries. It is currently as todo!(), but my main motivation is that scavenging for EXPLAINs, that pattern, at least in simple queries like SELECT ... IN (1,2,3) uses the AddImm instruction we just added. Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #2342	2025-07-31 10:00:18 -04:00
Jussi Saurio	eeceefe49d	Merge 'fix/wal: only rollback WAL if txn was write + fix start state for WalFile' from Jussi Saurio Closes #2363 ## What The following sequence of actions is possible: ``` Some committed frames already exist in the WAL. shared.pages_in_frames.len() > 0. Brand new connection does this: BEGIN ^-- deferred, no read tx started yet, so its `self.start_pages_in_frames` is `0` because it's a brand new WalFile instance ROLLBACK <-- calls `wal.rollback()` and truncates `shared.pages_in_frames` to length `0` PRAGMA wal_checkpoint(); ^-- because `pages_in_frames` is empty, it doesnt actually checkpoint anything but still sets shared.max_frame to 0, causing effectively data loss ``` ## Fix - Only call `wal.rollback()` for write transactions - Set `start_pages_in_frames` correctly so that this doesn't happen even if a regression starts calling `wal.rollback()` again Reviewed-by: Preston Thorpe (@PThorpe92) Closes #2366	2025-07-31 16:16:20 +03:00
Jussi Saurio	998d288cb8	Merge 'vdbe: Disallow checkpointing in transaction' from Jussi Saurio Closes #2358 Reviewed-by: Preston Thorpe (@PThorpe92) Closes #2365	2025-07-31 16:12:49 +03:00
Glauber Costa	9d41fa4489	implement IN patterns for non-conditional SELECT queries Extracts the core logic of IN from the conditional version, and uses the conditional metadata to determine the jump. Then Uses the AddImm operator we just added to force the integer conversion at the end (like SQLite does).	2025-07-31 08:11:41 -05:00
Glauber Costa	9e8ba5263b	Implement the AddImm opcode It is a simple opcode. The hard part was finding a sqlite statement that uses it =)	2025-07-31 08:08:07 -05:00
Jussi Saurio	62e804480e	fix/wal: make db_changed check detect cases where max frame happens to be the same	2025-07-31 14:37:33 +03:00
Jussi Saurio	e88707c6fd	fix/wal: only rollback WAL if txn was write	2025-07-31 14:18:43 +03:00
Jussi Saurio	9e1fca2eba	vdbe: disallow checkpointing in interactive tx	2025-07-31 13:16:33 +03:00
Jussi Saurio	39dec647a7	fix/wal: reset page cache when another connection checkpointed in between	2025-07-31 12:44:22 +03:00
meteorgan	a0f5554b08	support the OFFSET clause for Compound select	2025-07-31 17:43:54 +08:00
Jussi Saurio	7d082ab614	small fix after header accessor refactor	2025-07-31 10:05:52 +03:00
Jussi Saurio	f619556344	Merge 'Direct `DatabaseHeader` reads and writes – `with_header` and `with_header_mut`' from Levy A. This PR introduces two methods to pager. Very much inspired by `with_schema` and `with_schema_mut`. `Pager::with_header` and `Pager::with_header_mut` will give to the closure a shared and unique reference respectively that are transmuted references from the `PageRef` buffer. This PR also adds type-safe wrappers for `Version`, `PageSize`, `CacheSize` and `TextEncoding`, as they have special in-memory representations. Writing the `DatabaseHeader` is just a single `memcpy` now. ```rs pub fn write_database_header(&self, header: &DatabaseHeader) { let buf = self.as_ptr(); buf[0..DatabaseHeader::SIZE].copy_from_slice(bytemuck::bytes_of(header)); } ``` `HeaderRef` and `HeaderRefMut` are used in the `with_header*` methods, but also can be used on its own when there are multiple reads and writes to the header, where putting everything in a closure would add too much nesting. Reviewed-by: Preston Thorpe (@PThorpe92) Closes #2234	2025-07-31 10:02:47 +03:00
Jussi Saurio	62d79e8c16	Merge 'refactor/btree: simplify get_next_record()/get_prev_record()' from Jussi Saurio When traversing, we are only interested the following things: - Is the page a leaf or not - Is the page an index or table page - If not a leaf, what is the left child page This means we don't have to read the entire cell, just the left child page. Reviewed-by: Preston Thorpe (@PThorpe92) Closes #2317	2025-07-31 10:02:08 +03:00
Jussi Saurio	99e20e46bb	Merge 'Accumulate/batch vectored writes when backfilling during checkpoint' from Preston Thorpe After significant digging into what was causing (particularly writes) to be so much slower for io_uring back-end, it was determined that particularly checkpointing was incredibly slow, for several reasons. One is that we essentially end up calling `submit_and_wait` for every page. This PR (of course, heavily conflicts with my other open PR) attempts to remedy this: addding `pwritev` to the File trait for IO back-ends that want to support it, and aggregates contiguous writes into a series of `pwritev` calls instead of individually ### Performance: `make bench-vfs SQL="insert into products (name,price) values (randomblob(4096), randomblob(2048));" N=1000` # Update: main <img width="505" height="194" alt="image" src="https://github.com/user- attachments/assets/8e4a27af-0bb6-4e01-8725-00bc9f8a82d6" /> this branch <img width="555" height="197" alt="image" src="https://github.com/user- attachments/assets/fad1f685-3cb0-4e06-aa9d-f797a0db8c63" /> The same test (any test with writes) on this updated branch is now roughly as fast as syscall IO back-end, often runs will be faster. Illustrating a checkpoint. Every `count=N` where N > 1 is M syscalls saved, where M = N - 1. (roughly ~850 syscalls saved) <img width="590" height="534" alt="image" src="https://github.com/user- attachments/assets/a6171ac9-1192-4d3e-a6bf-eeda3f43af07" /> (if you are wondering about why it didn't add 12000-399 and 12400-417, it's because there is a `512` page batch limit that was hit to prevent hitting `IOV_MAX`, in the rare case that it's lower than 1024 and the entire checkpoint is a single run) Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #2278	2025-07-31 07:30:57 +03:00
Diego Reis	ab01b4e8ca	Refactor `UPDATE .. SET` row values logic and add some comments	2025-07-31 00:08:15 -03:00

1 2 3 4 5 ...

3828 Commits