turso

mirror of https://github.com/aljazceru/turso.git synced 2026-02-19 06:55:18 +01:00

Author	SHA1	Message	Date
Pekka Enberg	cef0195b42	core/storage: Fix BTreeCursor::record() for MVCC Respect immutable record invalidation.	2025-07-17 16:12:40 +03:00
Pekka Enberg	962987e9a1	core/mvcc: Fix MVCC cursor traversal Add an explicit rewind() to move to the beginning. Change forward() semantics so that after first forward() call, you are pointing to the first row, which matches the get_next_record() semantics in B-tree cursor.	2025-07-17 16:12:40 +03:00
Pekka Enberg	72df538a76	core/storage: Add MVCC asertion to BTreeCursor::seek_to_last()	2025-07-17 14:13:22 +03:00
Pekka Enberg	3aca9c54c7	core/storage: Fix BTreeCursor::record() with MVCC	2025-07-17 14:13:22 +03:00
Pekka Enberg	1fc6126157	core/storage: Allocate page1 lazily for MVCC transactions	2025-07-17 14:13:22 +03:00
Pekka Enberg	99cdcf5348	Merge 'core: Copy-on-write for in-memory schema' from Levy A. <img height="400" alt="image" src="https://github.com/user- attachments/assets/bdd5c0a8-1bbb-4199-9026-57f0e5202d73" /> <img height="400" alt="image" src="https://github.com/user- attachments/assets/7ea63e58-2ab7-4132-b29e-b20597c7093f" /> We were copying the schema preemptively on each `Database::connect`, now the schema is shared until a change needs to be made by sharing a single `Arc` and mutating it via `Arc::make_mut`. This is faster as reduces memory usage. Closes #2022	2025-07-17 10:46:46 +03:00
Pekka Enberg	af182d9895	Merge 'btree: fix post-balancing seek bug in delete path' from Jussi Saurio Aftermath of seek-related refactor in #2065, which you can read for background. The change in this PR is documented pretty well inline - if we receive a `TryAdvance` seek result when seeking after balancing, we need to - well - try to advance. Closes #2116 Closes #2115	2025-07-16 20:08:15 +03:00
Levy A.	d0e26db01a	use lock for database schema	2025-07-16 13:54:39 -03:00
Levy A.	4c77d771ff	only copy schema on writes	2025-07-16 13:54:36 -03:00
Jussi Saurio	bb0c017d9f	Merge 'btree: fix trying to go upwards when we are already at the end of the entire btree' from Jussi Saurio ## What does this fix This PR fixes an issue with BTree upwards traversal logic where we would try to go up to a parent node in `next()` even though we are at the very end of the btree. This behavior can leave the cursor incorrectly positioned at an interior node when it should be at the right edge of the rightmost leaf. ## Why doesn't it cause problems on main This bug is masked on `main` by every table `insert()` (wastefully) calling `find_cell()`: - `op_new_rowid` called, let's say the current max rowid is `666`. Cursor is left pointing at `666`. - `insert()` is called with rowid `667`, cursor is currently pointing at `666`, which is incorrect. - `find_cell()` does a binary search every time, and hence somewhat accidentally positions the cursor correctly _after_ `666` so that the insert goes to the correct place ## Why was this issue found in #1988, I am removing `find_cell()` entirely in favor of always performing a seek to the correct location - and skipping `seek` when it is not required, saving us from wasting a binary search on every insert - but this change means that we need to call `next()` after `op_new_rowid` to have the cursor positioned correctly at the new insertion slot. Doing this surfaces this upwards traversal bug in that PR branch. ## Details of solution - Store `cell_count` together with `cell_idx` in pagestack, so that chlidren can know whether their parents have reached their end without doing IO - To make this foolproof, pin pages on `PageStack` so the page cache cannot evict them during tree traversal - `cell_indices` renamed to `node_states` since it now carries more information (cell index AND count, instead of just index) Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #2005	2025-07-16 19:44:21 +03:00
Diego Reis	b86674adbb	Remove cache clearing in cacheflush	2025-07-16 11:11:52 -03:00
Jussi Saurio	8558675c4c	page cache: pin pages on the stack	2025-07-16 17:09:05 +03:00
Diego Reis	817ad8d50f	Separate user-callable cacheflush from internal cacheflush logic Cacheflush should only spill pages to WAL as non-commit frames, without checkpointing nor syncing. Check SQLite's sqlite3PagerFlush	2025-07-16 11:08:50 -03:00
Jussi Saurio	f7b9265c26	btree: fix trying to go upwards when at end of btree	2025-07-16 16:58:42 +03:00
Jussi Saurio	e0d797aac0	btree: use node_states instead of cell_indices (tracks cell count too)	2025-07-16 16:58:41 +03:00
Jussi Saurio	f0145fef5c	btree: create BTreeNodeState struct for tracking cell idx and count	2025-07-16 16:58:11 +03:00
Jussi Saurio	ac065a79bb	btree: fix post-balancing seek bug in delete path	2025-07-16 14:23:46 +03:00
Pekka Enberg	84d8842fbe	Merge 'btree: fix interior cell replacement in btrees with depth >=3' from Jussi Saurio ## Background When a divider cell is deleted from an index interior page, the following algorithm is used: 1. Find predecessor: Move to largest key in left subtree of the current page. This is always a leaf page. 2. Create replacement: Convert this predecessor leaf cell to interior cell format, using original cell's left child page pointer 3. Replace: Drop original cell from parent page, insert replacement at same position 4. Cleanup: Delete the taken predecessor cell from the leaf page <img width="845" height="266" alt="Screenshot 2025-07-16 at 10 39 18" src="https://github.com/user- attachments/assets/30517da4-a4dc-471e-a8f5-c27ba0979c86" /> ## The faulty code leading to the bug The error in our logic was that we always expected to only traverse down one level of the btree: ```rust let parent_page = self.stack.parent_page().unwrap(); let leaf_page = self.stack.top(); ``` This meant that when the deletion happened on say, level 1, and the replacement cell was taken from level 3, we actually inserted the replacement cell into level 2 instead of level 1. ## Manifestation of the bug in issue 2106 In #2106, this manifested as the following chain of pages, going from parent to children: 3 -> 111 -> 119 - Cell to be deleted was on page 3 (whose left pointer is 111) - Going to the largest key in the left subtree meant traversing from 3 to 111 and then from 111 to 119 - a replacement cell was taken from 119 - incorrectly inserted into 111 - and its left child pointer also set as 111! - now whenever page 111 wanted to go to its left child page, it would just traverse back to itself, eventually causing a crash because we have a hard limit of the number of pages on the page stack. ## The fix The fix is quite trivial: store the page we are on before we start traversing down. Closes #2106 Closes #2108	2025-07-16 13:15:54 +03:00
Jussi Saurio	bd69af7372	btree: ensure re-entrancy of InteriorNodeReplacement	2025-07-16 10:50:22 +03:00
Jussi Saurio	47ef30b22e	btree: fix interior cell replacement in btrees with depth >=3 When a divider cell is deleted from an index interior page, the following algorithm is used: 1. Find predecessor: Move to largest key in left subtree (self.prev()) 2. Create replacement: Convert predecessor leaf cell to interior cell format, using original cell's left child pointer 3. Replace: Drop original cell from parent page, insert replacement at same position 4. Cleanup: Delete predecessor from leaf page The error in our logic was that we always expected to only traverse down one level of the btree: ```rust let parent_page = self.stack.parent_page().unwrap(); let leaf_page = self.stack.top(); ``` This meant that when the deletion happened on say, level 1, and the replacement cell was taken from level 3, we actually inserted the replacement cell into level 2 instead of level 1. In #2106, this manifested as the following chain of pages, going from parent to children: 3 -> 111 -> 119 Cell was deleted from page 3 (whose left pointer is 111), and a replacement cell was taken from 119, incorrectly inserted into 111, and its left child pointer also set as 111! The fix is quite trivial: store the page we are on before we start traversing down. Closes #2106	2025-07-16 10:12:59 +03:00
Jussi Saurio	6e5b407505	btree: add some assertions related to #2106	2025-07-16 08:02:34 +03:00
Diego Reis	0e9771ac07	refactor: Change redundant "Status" enums to IOResult Let's unify the semantics of "something done" or yields I/O into a single type	2025-07-15 20:56:18 -03:00
Diego Reis	d0af54ae77	refactor: Change CursorResult to IOResult The reasoning here is to treat I/O operations (Either is "Done" or yields to IO) with the same generic type.	2025-07-15 20:52:25 -03:00
Jussi Saurio	927a1f158a	Merge 'btree: unify table&index seek page boundary handling' from Jussi Saurio ## Background PR #2065 fixed a bug with table btree seeks concerning boundaries of leaf pages. The issue was that if we were e.g. looking for the first key greater than (GT) 100, we always assumed the key would either be found on the left child page of a given divider (e.g. divider 102) or not at all, which is incorrect. #2065 has more discussion and documentation about this, so read that one for more context. ## This PR We already had similar handling for index btrees as #2065 introduced for table btrees, but it was baked into the `BTreeCursor` struct's seek handling itself, whereas #2065 handled this on the VDBE side. This PR unifies this handling for both table and index btrees by always doing the additional cursor advancement in the VDBE. Unfortunately, unlike table btrees, index btrees may also need to do an additional advance when they are looking for an exact match. This resulted in a bigger refactor than anticipated, since there are quite a few VDBE instructions that may perform a seek, e.g.: `IdxInsert`, `IdxDelete`, `Found`, `NotFound`, `NoConflict`. All of these can potentially end up in a similar situation where the cursor needs one more advance after the initial seek, and they were currently calling `cursor.seek()` directly and expecting the `BTreeCursor` to handle the auto-advance fallback internally. For this reason, I have 1. removed the "TryAdvance"-ish logic from the index btree internals and 2. extracted a common VDBE helper `fn seek_internal()` - heavily based on the existing `op_seek_internal()`, but decoupled from instructions and the program counter - which all the interested VDBE instructions will call to delegate their seek logic. Closes #2083 Reviewed-by: Nikita Sivukhin (@sivukhin) Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #2084	2025-07-15 18:02:52 +03:00
meteorgan	d7bdfeb711	reinitialize WalFileShare when reset page size	2025-07-15 16:34:07 +08:00
meteorgan	b42a1ef272	minor improvements based on PR comments	2025-07-15 16:34:07 +08:00
meteorgan	f123c77ee8	fix set page_size in pager	2025-07-15 16:34:07 +08:00
meteorgan	bf69b86e94	fix: not all pragma need transaction	2025-07-15 16:34:07 +08:00
meteorgan	a6faab17e9	fix query page size	2025-07-15 16:34:07 +08:00
meteorgan	cf126824de	Support set page size	2025-07-15 16:34:07 +08:00
Jussi Saurio	553396e9ca	btree: unify table&index seek page boundary handling PR #2065 fixed a bug with table btree seeks concerning boundaries of leaf pages. The issue was that if we were e.g. looking for the first key greater than (GT) 100, we always assumed the key would either be found on the left child page of a given divider (e.g. divider 102), which is incorrect. #2065 has more discussion and documentation about this, so read that one for more context. Anyway: We already had similar handling for index btrees, but it was baked into the `BTreeCursor` struct's seek handling itself, whereas #2065 handled this on the VDBE side. This PR unifies this handling for both table and index btrees by always doing the additional cursor advancement in the VDBE. Unfortunately, since indexes may also need to do an additional advance when they are looking for an exact match, this resulted in a bigger refactor than anticipated, since there are quite a few VDBE instructions that may perform a seek, e.g.: `IdxInsert`, `IdxDelete`, `Found`, `NotFound`, `NoConflict`. All of these can potentially end up in a similar situation where the cursor needs one more advance after the initial seek. For this reason, I have extracted a common VDBE helper `fn seek_internal()` which all the interested VDBE instructions will call to delegate their seek logic.	2025-07-14 16:46:43 +03:00
Pekka Enberg	55cf9c8f02	Merge 'Add async header accessor functionality' from Zaid Humayun This PR addresses https://github.com/tursodatabase/turso/issues/1828 in a phased manner. Making database header access async in one PR will be complicated. This PR ports adds an async API to `header_accessor.rs` and ports over some of `pager.rs` to use this API. This will allow gradual porting over of all call sites. Once all call sites are ported over, one mechanical rename will fix everything in the repo so we don't have any `<header_name>_async` functions. Also, porting header accessors over from sync to async would be a good way to get introduced to the Limbo codebase for first time contributors. Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #1966	2025-07-14 13:08:29 +03:00
Pekka Enberg	90532eabdf	Merge 'b-tree: fix bug in case when no matching rows was found in seek in the leaf page' from Nikita Sivukhin Current table B-Tree seek code rely on the invariant that if key `K` is present in interior page then it also must be present in the leaf page. This is generally not true if data was ever deleted from the table because leaf row which key was used as a divider in the interior pages can be deleted. Also, SQLite spec says nothing about such invariant - so `turso-db` implementation of B-Tree should not rely on it. This PR introduce 3 options for B-Tree `seek` result: `Found` / `NotFound` and `TryAdvance` which is generated when leaf page have no match for `seek_op` but DB don't know if neighbor page can have matching data. There is an alternative approach where we can move cursor in the `seek` itself to the neighbor page - but I was afraid to introduce such changes because analogue `seek` function from SQLite works exactly like current version of the code and I think some query planner internals (for insertion) can rely on the fact that repositioning will leave cursor at the position of insertion: > If an exact match is not found, then the cursor is always left pointing at a leaf page which would hold the entry if it were present. The cursor might point to an entry that comes before or after the key. Also, this PR introduces new B-tree fuzz tests which generate table B-tree from scratch and execute opreations over it. This can help to reach some non trivial states and also generate huge DBs faster (that's how this bug was discovered) Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #2065	2025-07-14 12:57:09 +03:00
Pekka Enberg	1a0d618a41	Merge 'Assert I/O read and write sizes' from Pere Diaz Bou Let's assert for now that we do not read/write less bytes than expected. This should be fixed to retrigger several reads/writes if we couldn't read/write enough but for now let's assert. Closes #2078	2025-07-14 12:22:18 +03:00
Nikita Sivukhin	5bd3287826	add comments	2025-07-14 13:01:15 +04:00
Nikita Sivukhin	6e2ccdff20	add btree fuzz tests which generate seed file from scratch	2025-07-14 13:01:15 +04:00
Nikita Sivukhin	fc400906d5	handle case when target seek page has no matching entries	2025-07-14 13:01:15 +04:00
Nikita Sivukhin	03b2725cc7	return SeekResult from seek operation - Apart from regular states Found/NotFound seek result has TryAdvance value which tells caller to advance the cursor in necessary direction because the leaf page which would hold the entry if it was present actually has no matching entry (but neighbouring page can have match)	2025-07-14 13:01:15 +04:00
Pere Diaz Bou	340391538a	io: change comment for assert	2025-07-14 10:36:06 +02:00
Pere Diaz Bou	88ff218810	io: assert small I/O Let's assert for now that we do not read/write less bytes than expected. This should be fixed to retrigger several reads/writes if we couldn't read/write enough but for now let's assert.	2025-07-14 10:19:41 +02:00
Nikita Sivukhin	f61d733dd3	make new functions dependend on "json" Cargo feature	2025-07-14 11:26:51 +04:00
Nikita Sivukhin	c9e7271eaf	properly pass subtype	2025-07-14 11:20:49 +04:00
Nikita Sivukhin	81cd04dd65	add bin_record_json_object and table_columns_json_array functions	2025-07-14 11:19:45 +04:00
Krishna Vishal	ea4a4708ea	- Address some review comments - Add docs for `RecordCursor`	2025-07-14 03:28:55 +05:30
Krishna Vishal	b1f27cad94	chore: fix clippy	2025-07-14 03:28:55 +05:30
Krishna Vishal	d3368a28bc	fix merge conflicts	2025-07-14 03:28:55 +05:30
Krishna Vishal	e7e5f28c0a	chore: Clippy chill	2025-07-14 03:28:54 +05:30
Krishna Vishal	9b315d1d7e	Manually inline the record deserialization code for performance. This is done because the compiler is refusing to inline even after adding inline hint. - Get refvalues from directly from registers without using `make_record`	2025-07-14 03:28:54 +05:30
Krishna Vishal	35ed279644	Clean up indexbtree_move_to	2025-07-14 03:28:54 +05:30
Krishna Vishal	f0e8e5871b	Replace compare_immutable with compare_records_generic	2025-07-14 03:28:54 +05:30

1 2 3 4 5 ...

854 Commits