turso

mirror of https://github.com/aljazceru/turso.git synced 2026-02-14 20:44:29 +01:00

Author	SHA1	Message	Date
Jussi Saurio	32cd01a615	fix deadlock	2025-09-15 14:48:26 +03:00
Pekka Enberg	247d4c06c6	Merge 'Fix MVCC update' from Jussi Saurio Based on #3126 Closes #3029 Closes #3030 Closes #3065 Closes #3083 Closes #3084 Closes #3085 simple reason why mvcc update didn't work: it didn't try to update. Closes #3127	2025-09-15 14:24:59 +03:00
Jussi Saurio	59f18e2dc8	fix mvcc update simple reason why mvcc update didn't work: it didn't try to update.	2025-09-15 11:27:56 +03:00
Nikita Sivukhin	3bcac441e4	reduce log level of some very frequent logs	2025-09-15 11:35:41 +04:00
Jussi Saurio	db3428a7a9	remove unused pager parameter	2025-09-14 23:44:24 +03:00
Pekka Enberg	95660535da	core/storage: Demote info logging to debug	2025-09-14 13:10:46 +03:00
PThorpe92	f6dd0bc4d6	Dont grab page cache write lock in a loop	2025-09-13 12:21:13 -04:00
Pekka Enberg	6a2f0d6061	Merge 'Add per page checksums' from Avinash Sajjanshetty This patch adds checksums to Turso DB. You may check the design here in the [RFC](https://github.com/tursodatabase/turso/issues/2178). 1. We use reserved bytes (8 bytes) to store the checksums. On every IO read, we verify that the checksum matches. 2. We use twox hash for checksums. 3. Checksum works only on 4K pages now. It's a small change to enable for all other sizes, I will send another PR. 4. Right now, it's not possible to switch to different algorithm or turn off altogether. That will be added in the future PRs. 5. Checksums can be enabled only for new dbs. For existing DBs, we will disable it. 6. To add checksums for existing DBs, we need vacuum since it would require rewrite of whole db. Closes #2840	2025-09-13 18:46:53 +03:00
Pekka Enberg	d8f07fe3da	core: Panic on fsync() error by default Retrying fsync() on error was historically not safe ("fsyncgate") and Postgres still defaults to panicing on fsync(). Therefore, add a "data_sync_retry" pragma (disabled by default) and use it to determine whether to panic on fsync() error or not.	2025-09-13 10:21:12 +03:00
Avinash Sajjanshetty	5256f29a9c	Add checksums behind a feature flag	2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty	11030056c7	rename method to `verify_checksum`	2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty	e010c46552	use checksums when reading/writing from db file	2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty	4b59cf19e5	use checksums when reading/writing from wal	2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty	14a1307720	Set reserved space as required when allocating page1	2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty	c2c1ec2dba	Pass use `usable_space()` instead of hardcoding the value	2025-09-13 11:00:38 +05:30
Avinash Sajjanshetty	15266105f7	Update IOContext to carry checksum ctx	2025-09-13 11:00:38 +05:30
Avinash Sajjanshetty	3f72de3623	Add checksum module	2025-09-13 11:00:37 +05:30
Preston Thorpe	b1420904bb	Merge 'fix(btree): advance cursor after interior node replacement in delete' from Jussi Saurio ## Problem When a delete replaces an index interior cell, the replacement key is LT the deleted key. Currently on the main branch, after the deletion happens, the following call to BTreeCursor::next() stops at the replaced interior cell. This is incorrect - imagine the following sequence: - We are executing a query that deletes all keys WHERE key > 5 - We delete <key=6> from an interior node, and take a replacement <key=5> from the left subtree of that interior page - next() is called, and we land on the interior node again, which now has <key=5>, and we incorrectly delete it even though our WHERE condition is key > 5. ## Solution This PR: - Tracks `interior_node_was_replaced` in CheckNeedsBalancing - If no balancing is needed and a replacement occurred, advances once so the next invocation of next() will skip the replaced cell properly i.e. we prevent next() from landing on the replaced content and ensures iteration continues with the next logical record. ## Details This problem only became apparent once we started using indexes as valid iteration cursors for DELETE operations in #2981 Closes #3045 Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Reviewed-by: Preston Thorpe <preston@turso.tech> Closes #3049	2025-09-12 17:37:01 -04:00
Pekka Enberg	2bc8c0c850	core/storage: Remove unused import warning	2025-09-12 21:09:38 +03:00
PThorpe92	b04c364981	Fix clippy error	2025-09-12 11:43:38 -04:00
PThorpe92	7a14c7394f	Remove the header copy stored on the WalFile, fix fast_path	2025-09-12 11:29:43 -04:00
PThorpe92	25e7c719f1	Update checkpoint_seq on each checkpoint, not just when log restarts This was causing checkpoint_seq to be 0 when we had already successfully ran a passive checkpoint, and causing us to use improper pages from the cache.	2025-09-12 11:29:42 -04:00
Pekka Enberg	14da283e36	Merge 'MVCC: remove reliance on BTreeCursor::has_record()' from Jussi Saurio Closes #3051 Closes #3032 Closes #3056	2025-09-12 17:31:15 +03:00
Pekka Enberg	54b4c9f30b	Merge 'Implement the balance_quick algorithm' from Jussi Saurio Fast balancing routine for the common special case where the rightmost leaf page of a given subtree overflows such that the overflowing cell would be the rightmost cell on the page -- i.e. an append. In this case we just add a new leaf page as the right sibling of that page, put the overflow cell there, and insert a new divider cell into the parent. The high level steps are: 1. Allocate a new leaf page and insert the overflow cell payload in it. 2. Create a new divider cell in the parent - it contains the page number of the old rightmost leaf, plus the largest rowid on that page. 3. Update the rightmost pointer of the parent to point to the new leaf page. 4. Continue balance from the parent page (inserting the new divider cell may have overflowed the parent Closes #3041	2025-09-12 17:30:52 +03:00
Pere Diaz Bou	9b6d181be4	wal: add hacky update max frame for mvcc use When multiple tx writes happen concurrently in mvcc, max frame will be updated. This new max_frame makes is the point of view of the other transaction return busy because his current wal snapshot is outdated.	2025-09-12 13:49:14 +00:00
Jussi Saurio	305b2f55ae	MVCC: remove reliance on BTreeCursor::has_record()	2025-09-12 16:03:55 +03:00
PThorpe92	f60ca3970f	Remove old comment from wal	2025-09-12 06:39:59 -04:00
PThorpe92	faf3531a4e	Fix checkpoint fast-path, don't use cached pages w/o write lock closes #3024 Also we snapshot the page when we determine that it's eligible, and pay a memcpy instead of the read from disk, but this further prevents any in-memory changes to the page/TOCTOU issues.	2025-09-12 06:38:02 -04:00
Jussi Saurio	9f6e1a2e7c	fix(btree): advance cursor after interior node replacement in delete When a delete replaces an interior cell, the replacement key is LT the deleted key. Currently on the main branch, after the deletion happens, the following call to BTreeCursor::next() stops at the replaced interior cell. This is incorrect - imagine the following sequence: - We are executing a query that deletes all keys WHERE key > 5 - We delete <key=6> from an interior node, and take a replacement <key=5> from the left subtree of that interior page - next() is called, and we land on the interior node again, which now has <key=5>, and we incorrectly delete it even though our WHERE condition is key > 5. This PR: - Tracks `interior_node_was_replaced` in CheckNeedsBalancing - If no balancing is needed and a replacement occurred, advances once so the next invocation of next() will skip the replaced cell properly i.e. we prevent next() from landing on the replaced content and ensures iteration continues with the next logical record. Closes #3045	2025-09-12 10:49:44 +03:00
Denizhan Dakılır	70102f5f6e	add explicit usize type annotation to range iterator in test	2025-09-12 02:18:49 +03:00
Jussi Saurio	9b14c0022d	Implement the balance_quick algorithm Fast balancing routine for the common special case where the rightmost leaf page of a given subtree overflows (= an append). In this case we just add a new leaf page as the right sibling of that page, and insert a new divider cell into the parent. The high level steps are: 1. Allocate a new leaf page and insert the overflow cell payload in it. 2. Create a new divider cell in the parent - it contains the page number of the old rightmost leaf, plus the largest rowid on that page. 3. Update the rightmost pointer of the parent to point to the new leaf page. 4. Continue balance from the parent page (inserting the new divider cell may have overflowedImplement the balance_quick algorithm	2025-09-12 00:42:27 +03:00
Pekka Enberg	7d8a1a0d5f	Merge 'whopper: A new DST with concurrency' from Pekka Enberg Our simulator is currently limited to concurrency of one. This introduces a much less sophisticated DST with focus on finding concurrency bugs. Closes #2985	2025-09-11 18:42:45 +03:00
Jussi Saurio	c30d320cab	Fix: read transaction cannot be allowed to start with a stale max frame If both of the following are true: 1. All read locks are already held 2. The highest readmark of any read lock is less than the committed max frame Then we must return Busy to the reader, because otherwise they would begin a transaction with a stale local max frame, and thus not see some committed changes.	2025-09-11 15:58:13 +03:00
Pekka Enberg	ca51a60b3c	core/storage: Demote restart_log() logging to debug	2025-09-11 08:35:18 +03:00
PThorpe92	b93ad749a9	Remove some traces in super hot paths in btree	2025-09-10 09:54:32 -04:00
Pekka Enberg	bb3fbb7962	Merge 'check freelist count in integrity check' from Jussi Saurio Closes #3003	2025-09-10 16:15:39 +03:00
Jussi Saurio	d7ce781a2a	Merge 'Enable the use of indexes in DELETE statements' from Jussi Saurio Closes #1714 This PR enables the use of an index as the iteration cursor for a point or range deletion operation. Main changes: - Use `Delete` opcode for the index that is iterating the rows - avoids unnecessary seeking on that index, since it's already positioned correctly - Fix delete balancing; details below: ### current state - a deletion may cause a btree rebalancing operation - to get the cursor back to the right place after a rebalancing, we must remember what the deleted key was and seek to it - right now we are using `SeekOp::LT` to move to one slot BEFORE the deleted key, so that if we delete rows in a loop, the following `Next()` call will put us back into the right place ### problem - When we delete multiple rows, we always iterate forwards. Using `SeekOp::LT` implies backwards iteration, but it works OK for table btrees since the cursor never remains on an internal node, because table internal cells do not have payloads. However: this behavior is problematic for indexes because we can effectively end up skipping visiting a page entirely. Honestly: despite spending some debugging the _old_ code, I still don't remember what exactly causes this to happen. :) It's one of the `iter_dir` specific behaviors in `indexbtree_move_to` or `get_prev_record()`, but I'm too tired to spend more time figuring it out. I had the reason in my head before going on vacation, but it was evicted from the cache it seems... ### solution use `SeekOp::GE { eq_only: true }` instead and make the next call to `Next()` a no-op instead. This has the same effect as SeekOp::LT + next(), but without introducing bugs due to `LT` being implied backwards iteration. Reviewed-by: Nikita Sivukhin (@sivukhin) Closes #2981	2025-09-10 16:00:54 +03:00
Jussi Saurio	e3594d0ae0	make the comment for skip_advance more accurate	2025-09-10 15:38:57 +03:00
Jussi Saurio	618f51330a	advance despite skip_advance flag if cursor not pointing at record	2025-09-10 14:54:51 +03:00
Jussi Saurio	80f8794fda	add comments	2025-09-10 14:54:51 +03:00
Jussi Saurio	36ec654631	Seek with GE after delete balancing and skip next advance	2025-09-10 14:54:51 +03:00
Jussi Saurio	df83b56083	check freelist count in integrity check	2025-09-10 14:53:28 +03:00
Pekka Enberg	2131a04b7d	core: Rename IO::run_once() to IO::step() The `run_once()` name is just a historical accident. Furthermore, it now started to appear elsewhere as well, so let's just call it IO::step() as we should have from the beginning.	2025-09-10 14:36:02 +03:00
Pekka Enberg	0b91f8a715	Merge 'IO: handle errors properly in io_uring' from Preston Thorpe Because `io_uring` may have many other I/O submission events queued (that are relevant to the operation) when we experience an error, marking our `Completion` objects as aborted is not sufficient, the kernel will still execute queued I/O, which can mutate WAL or DB state after we’ve declared failure and keep references (iovec arrays, buffers) alive and stall reuse. We need to stop those in-flight SQEs at the kernel and then drain the ring to a known-empty state before reusing any resources. The following methods were added to the `IO` trait: `cancel`: which takes a slice of `Completion` objects and has a default implementation that simply marks them as `aborted`. `drain`: which has a default noop implementation, but the `io_uring` backend implements this method to drain the ring. CC @sivukhin Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #2787	2025-09-10 14:24:43 +03:00
Jussi Saurio	5c8afc5caf	pager: fix incorrect freelist page count bookkeeping	2025-09-10 14:02:17 +03:00
Jussi Saurio	11339fc941	Merge 'Fix clear_page_cache method and rollback' from Preston Thorpe Previously we were iterating over every entry in the page cache, clearing the dirty flag from each page. Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Reviewed-by: Nikita Sivukhin (@sivukhin) Closes #2988	2025-09-10 11:11:37 +03:00
PThorpe92	2f4f67efa8	Remove some unused attributes	2025-09-09 16:17:49 -04:00
PThorpe92	02bebf02a5	Remove read_entire_wal_dumb in favor of reading chunks	2025-09-09 16:06:27 -04:00
PThorpe92	cb12a1319d	Fix page cache `clear` method to not re-initialize every slot	2025-09-09 15:55:59 -04:00
PThorpe92	8cc4e7f7a0	Fix rollback method to stop using highly inefficient cache::clear_dirty	2025-09-09 13:28:17 -04:00

1 2 3 4 5 ...

1433 Commits