turso

mirror of https://github.com/aljazceru/turso.git synced 2026-01-06 01:34:21 +01:00

Author	SHA1	Message	Date
Pekka Enberg	bb3fbb7962	Merge 'check freelist count in integrity check' from Jussi Saurio Closes #3003	2025-09-10 16:15:39 +03:00
Jussi Saurio	d7ce781a2a	Merge 'Enable the use of indexes in DELETE statements' from Jussi Saurio Closes #1714 This PR enables the use of an index as the iteration cursor for a point or range deletion operation. Main changes: - Use `Delete` opcode for the index that is iterating the rows - avoids unnecessary seeking on that index, since it's already positioned correctly - Fix delete balancing; details below: ### current state - a deletion may cause a btree rebalancing operation - to get the cursor back to the right place after a rebalancing, we must remember what the deleted key was and seek to it - right now we are using `SeekOp::LT` to move to one slot BEFORE the deleted key, so that if we delete rows in a loop, the following `Next()` call will put us back into the right place ### problem - When we delete multiple rows, we always iterate forwards. Using `SeekOp::LT` implies backwards iteration, but it works OK for table btrees since the cursor never remains on an internal node, because table internal cells do not have payloads. However: this behavior is problematic for indexes because we can effectively end up skipping visiting a page entirely. Honestly: despite spending some debugging the _old_ code, I still don't remember what exactly causes this to happen. :) It's one of the `iter_dir` specific behaviors in `indexbtree_move_to` or `get_prev_record()`, but I'm too tired to spend more time figuring it out. I had the reason in my head before going on vacation, but it was evicted from the cache it seems... ### solution use `SeekOp::GE { eq_only: true }` instead and make the next call to `Next()` a no-op instead. This has the same effect as SeekOp::LT + next(), but without introducing bugs due to `LT` being implied backwards iteration. Reviewed-by: Nikita Sivukhin (@sivukhin) Closes #2981	2025-09-10 16:00:54 +03:00
Jussi Saurio	e3594d0ae0	make the comment for skip_advance more accurate	2025-09-10 15:38:57 +03:00
Jussi Saurio	618f51330a	advance despite skip_advance flag if cursor not pointing at record	2025-09-10 14:54:51 +03:00
Jussi Saurio	80f8794fda	add comments	2025-09-10 14:54:51 +03:00
Jussi Saurio	36ec654631	Seek with GE after delete balancing and skip next advance	2025-09-10 14:54:51 +03:00
Jussi Saurio	df83b56083	check freelist count in integrity check	2025-09-10 14:53:28 +03:00
Pekka Enberg	2131a04b7d	core: Rename IO::run_once() to IO::step() The `run_once()` name is just a historical accident. Furthermore, it now started to appear elsewhere as well, so let's just call it IO::step() as we should have from the beginning.	2025-09-10 14:36:02 +03:00
Pekka Enberg	0b91f8a715	Merge 'IO: handle errors properly in io_uring' from Preston Thorpe Because `io_uring` may have many other I/O submission events queued (that are relevant to the operation) when we experience an error, marking our `Completion` objects as aborted is not sufficient, the kernel will still execute queued I/O, which can mutate WAL or DB state after we’ve declared failure and keep references (iovec arrays, buffers) alive and stall reuse. We need to stop those in-flight SQEs at the kernel and then drain the ring to a known-empty state before reusing any resources. The following methods were added to the `IO` trait: `cancel`: which takes a slice of `Completion` objects and has a default implementation that simply marks them as `aborted`. `drain`: which has a default noop implementation, but the `io_uring` backend implements this method to drain the ring. CC @sivukhin Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #2787	2025-09-10 14:24:43 +03:00
Jussi Saurio	5c8afc5caf	pager: fix incorrect freelist page count bookkeeping	2025-09-10 14:02:17 +03:00
Jussi Saurio	11339fc941	Merge 'Fix clear_page_cache method and rollback' from Preston Thorpe Previously we were iterating over every entry in the page cache, clearing the dirty flag from each page. Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Reviewed-by: Nikita Sivukhin (@sivukhin) Closes #2988	2025-09-10 11:11:37 +03:00
PThorpe92	2f4f67efa8	Remove some unused attributes	2025-09-09 16:17:49 -04:00
PThorpe92	02bebf02a5	Remove read_entire_wal_dumb in favor of reading chunks	2025-09-09 16:06:27 -04:00
PThorpe92	cb12a1319d	Fix page cache `clear` method to not re-initialize every slot	2025-09-09 15:55:59 -04:00
PThorpe92	8cc4e7f7a0	Fix rollback method to stop using highly inefficient cache::clear_dirty	2025-09-09 13:28:17 -04:00
PThorpe92	f7471a22c0	Fix clear_page_cache method and stop iterating over every entry	2025-09-09 13:25:33 -04:00
PThorpe92	37ec77eec2	Fix read_entire_wal_dumb to prefer streaming read if over 32mb wal file	2025-09-09 13:12:58 -04:00
PThorpe92	ccae3ab0f2	Change callsites to cancel any further IO when an error occurs and drain	2025-09-08 13:18:40 -04:00
Pekka Enberg	71a812ce55	Merge 'Fix infinite loop when IO failure happens on allocating first page' from Preston Thorpe closes #2919 Reviewed-by: Pedro Muniz (@pedrocarlo) Closes #2968	2025-09-08 18:59:34 +03:00
PThorpe92	237b9fefd7	Fix infinite loop when IO failure happens on allocating first page	2025-09-08 11:49:33 -04:00
Pekka Enberg	081a7b563b	Merge 'Fix crash in Next opcode if cursor stack has no pages' from Jussi Saurio Closes #2924 Unsure if this fix is that great, but it does fix the issue described in #2924 -- added minimal regression test to illustrate the behavior This crash requires a pretty specific set of circumstances: - 3-way join with two innermost being left joins - nullable seek key on the innermost table: * middle table gets nulled out because no matches with the outermost table * hence when we seek the innermost table using middle table values, the seek key is null, so `Insn::IsNull` entirely skips the innermost table Perhaps a bytecode plan illustrates this better: ```sql turso> explain select a.x, b.x, c.x from a left join b on a.y=b.x left join c on b.y=c.x; addr opcode p1 p2 p3 p4 p5 comment ---- ----------------- ---- ---- ---- ------------- -- ------- 0 Init 0 34 0 0 Start at 34 1 OpenRead 0 2 0 0 table=a, root=2, iDb=0 2 OpenRead 1 4 0 0 table=b, root=4, iDb=0 3 OpenRead 2 5 0 0 index=sqlite_autoindex_b_1, root=5, iDb=0 4 OpenRead 3 7 0 0 index=sqlite_autoindex_c_1, root=7, iDb=0 5 Rewind 0 33 0 0 Rewind table a 6 Integer 0 4 0 0 r[4]=0 7 Column 0 1 6 0 r[6]=a.y 8 IsNull 6 28 0 0 if (r[6]==NULL) goto 28 9 SeekGE 2 28 6 0 key=[6..6] 10 IdxGT 2 28 6 0 key=[6..6] 11 DeferredSeek 2 1 0 0 12 Integer 1 4 0 0 r[4]=1 13 Integer 0 5 0 0 r[5]=0 14 Column 1 1 7 0 r[7]=b.y -- if b.y is NULL, we skip the entire table loop between insns 16-23 -- except when we call NullRow and then Goto to re-enter that loop in order to -- return NULL values for the table 15 IsNull 7 24 0 0 if (r[7]==NULL) goto 24 16 SeekGE 3 24 7 0 key=[7..7] 17 IdxGT 3 24 7 0 key=[7..7] 18 Integer 1 5 0 0 r[5]=1 19 Column 0 0 1 0 r[1]=a.x 20 Column 1 0 2 0 r[2]=b.x 21 Column 3 0 3 0 r[3]=sqlite_autoindex_c_1.x 22 ResultRow 1 3 0 0 output=r[1..3] 23 Next 3 17 0 0 24 IfPos 5 27 0 0 r[5]>0 -> r[5]-=0, goto 27 25 NullRow 3 0 0 0 Set cursor 3 to a (pseudo) NULL row 26 Goto 0 18 0 0 27 Next 2 10 0 0 28 IfPos 4 32 0 0 r[4]>0 -> r[4]-=0, goto 32 29 NullRow 1 0 0 0 Set cursor 1 to a (pseudo) NULL row 30 NullRow 2 0 0 0 Set cursor 2 to a (pseudo) NULL row 31 Goto 0 12 0 0 32 Next 0 6 0 0 33 Halt 0 0 0 0 34 Transaction 0 0 3 0 iDb=0 write=false 35 Goto 0 1 0 0 ``` Reviewed-by: Preston Thorpe <preston@turso.tech> Closes #2967	2025-09-08 17:45:29 +03:00
Jussi Saurio	5820f691af	fix: do not crash in Next if cursor stack has no pages	2025-09-08 16:54:35 +03:00
TcMits	3aa4650f06	make mr.clippy happy	2025-09-08 18:24:50 +07:00
TcMits	a6ff568530	reduce cloning 'Arc<Page>'	2025-09-08 18:00:18 +07:00
Jussi Saurio	c664639c09	Merge 'Add assertion: we read a page with the correct id' from Jussi Saurio Part of debugging #2746 , but a good sanity check in any case. Reviewed-by: Avinash Sajjanshetty (@avinassh) Closes #2802	2025-09-08 09:52:31 +03:00
Jussi Saurio	2c6e48903e	Merge 'Prevent setting of encryption keys if already set' from Gaurav Sarma Fixes https://github.com/tursodatabase/turso/issues/2883 <img width="867" height="128" alt="Screenshot 2025-09-05 at 10 44 18 PM" src="https://github.com/user-attachments/assets/54a659ba- cfe1-4622-939b-c7c31362ee5a" /> Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Reviewed-by: Avinash Sajjanshetty (@avinassh) Closes #2914	2025-09-08 09:49:55 +03:00
Nikita Sivukhin	cd627c2368	remove unnecessary changes	2025-09-07 19:56:06 +04:00
Nikita Sivukhin	5b9fe0cdf3	fix	2025-09-07 19:56:06 +04:00
Nikita Sivukhin	0b6a6e7713	remove comma	2025-09-07 19:56:06 +04:00
Nikita Sivukhin	9aed831f2f	format	2025-09-07 19:56:05 +04:00
Nikita Sivukhin	db7c6b3370	try to speed up count(*) where 1 = 1	2025-09-07 19:55:42 +04:00
Nikita Sivukhin	c374cf0c93	remove Cell/RefCell from PageStack	2025-09-07 19:54:50 +04:00
Gaurav Sarma	b3242a18d9	Prevent setting of encryption keys if already set	2025-09-06 22:37:12 +08:00
PThorpe92	01d64977d7	Use more efficient circular list and rely on clock hand for pagecache	2025-09-05 22:40:27 -04:00
PThorpe92	644d0f270b	Add evict slot method in page cache	2025-09-05 16:13:30 -04:00
PThorpe92	b89513f031	remove useless saturating sub	2025-09-05 16:13:30 -04:00
PThorpe92	39a47d67e6	Apply PR suggestions	2025-09-05 16:13:29 -04:00
PThorpe92	f45a7538fe	Use true sieve/gclock algo instead of lru,dont link pages circilarly	2025-09-05 16:13:29 -04:00
PThorpe92	e418a902e5	Fix scoping issues now that refcells are gone to prevent extra destructors	2025-09-05 16:13:28 -04:00
PThorpe92	c85a61442f	Remove type alias in page cache	2025-09-05 16:13:28 -04:00
PThorpe92	5ba273eea5	remove unused impl for refbit	2025-09-05 16:13:28 -04:00
PThorpe92	246b62d513	Remove unnecessary refcells, as PageCacheEntry has interior mutability	2025-09-05 16:13:27 -04:00
PThorpe92	582e25241e	Implement GClock algorithm to distinguish between hot pages and scan touches	2025-09-05 16:13:27 -04:00
PThorpe92	254a0a9342	Apply fix and rename ignore_existing to upsert	2025-09-05 16:13:27 -04:00
PThorpe92	3a0b9b360a	Fix clippy warnings	2025-09-05 16:13:26 -04:00
PThorpe92	03d5598cfb	Use sieve algorithm in page cache in place of full LRU	2025-09-05 16:13:26 -04:00
Pere Diaz Bou	4ddf9c23de	core/pager: assert-ready-page-sanity fmt for jussi	2025-09-05 16:52:33 +02:00
Pere Diaz Bou	382a1e14ca	Merge 'core: handle edge cases for read_varint' from Sonny Add handling malformed inputs to function `read_varint` and test cases. ``` # 9 byte truncated to 8 read_varint(&[0x81, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80]) before -> panic index out of bounds: the len is 8 but the index is 8 after -> LimboError # bits set without end read_varint(&[0x80; 9]) before -> Ok((128, 9)) after -> LimboError ``` Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #2904	2025-09-05 16:15:42 +02:00
Pekka Enberg	6d80d862ee	Merge 'io_uring: prevent out of order operations that could interfere with durability' from Preston Thorpe closes #1419 When submitting a `pwritev` for flushing dirty pages, in the case that it's a commit frame, we use a new completion type which tells io_uring to add a flag, which ensures the following: 1. If any operation in the chain fails, subsequent operations get cancelled with -ECANCELED 2. All operations in the chain complete in order If there is an ongoing chain of `IO_LINK`, it ends at the `fsync` barrier, and ensures everything submitted before it has completed. for 99% of the cases, the syscall that immediately proceeds the `pwritev` is going to be the fsync, but just in case, this implementation links everything that comes between the final commit `pwritev` and the next `fsync` In the event that we get a partial write, if it was linked, then we submit an additional fsync after the partial write completes, with an `IO_DRAIN` flag after forcing a `submit`, which will mean durability is maintained, as that fsync will flush/drain everything in the squeue before submission. The other option in the event of partial writes on commit frames/linked writes is to error.. not sure which is the right move here. I guess it's possible that since the fsync completion fired, than the commit could be over without us being durable ondisk. So maybe it's an assertion instead? Thoughts? Closes #2909	2025-09-05 08:34:35 +03:00
Pekka Enberg	5950003eaf	core: Simplify WalFileShared life cycle Create one WalFileShared for a Database and update its state accordingly. Also support case where the WAL is disabled.	2025-09-04 21:09:12 +03:00

1 2 3 4 5 ...

1398 Commits