turso

mirror of https://github.com/aljazceru/turso.git synced 2025-12-26 12:34:22 +01:00

Author	SHA1	Message	Date
Pekka Enberg	380b27f58a	Merge 'Busy handler' from Pedro Muniz I searched using deepwiki how SQLite implements their busy handler. They use a callback system with exponential backoff, where it stores the callback in the pager and in the database. I confess I found this slightly confusing, so I just implemented a simple exponential backoff directly in the `Statement` struct. I imagine SQLite does this in a more convoluted manner, as they do not have a concept of yielding as we do. https://deepwiki.com/search/where-is-the-code-for-the- busy_4a5ed006-4eed-479f-80c3-dd038832831b I also fixed the rust bindings so that it yields when we return `StepResult::IO`, instead of just blocking the async function. To achieve this I implemented the `Stream` trait for `Rows` struct, which unfortunately came with a slight change to the function signature of `rows.next()` to `rows.try_next()`. EDIT: ~test `test_multiple_connections_fuzz` timeouts because now it has the busy handler "slowing" things down (this test generates a lot of busy transactions), so it takes a lot longer for the test to run. Not sure if it is acceptable for us to reduce the number of operations so the test is shorter.~ EDIT: Adjusted the API to be more in line with https://www.sqlite.org/c3ref/busy_timeout.html. Sets maximum total accumulated timeout. If the duration is None or Zero, we unset the busy handler for this Connection. This api defers slightly from SQLite as instead of sleeping for linear amount of time specified by the user, we will sleep in phases until the the total amount of time requested is reached. This means we first sleep of 1ms, then if we still return busy, we sleep for 2 ms, and repeat until a maximum of 100 ms per phase or we reached the total timeout. Example: 1. Set duration to 5ms 2. Step through query -> returns Busy -> sleep/yield for 1 ms 3. Step through query -> returns Busy -> sleep/yield for 2 ms 4. Step through query -> returns Busy -> sleep/yield for 2 ms (totaling 5 ms of sleep) 5. Step through query -> returns Busy -> return Busy to user This slight api change demonstrated a better throughtput in `perf/throughput/turso` benchmark ```sh cargo run -p write-throughput --release -- -t 2 Running write throughput benchmark with 2 threads, 100 batch size, 10 iterations, mode: Legacy Database created at: write_throughput_test.db Thread 1: 1000 inserts in 0.04s (23438.42 inserts/sec) Thread 0: 1000 inserts in 0.08s (12385.64 inserts/sec) === BENCHMARK RESULTS === Total inserts: 2000 Total time: 0.08s Overall throughput: 24762.60 inserts/sec Threads: 2 Batch size: 100 Iterations per thread: 10 Database file exists: true Database file size: 4096 bytes ``` Depends on #3102 Closes #3067 Closes #3074	2025-09-15 13:52:49 +03:00
Jussi Saurio	aa7a853cd2	mvcc: fix hang when CONCURRENT tx tries to commit and non-CONCURRENT tx is active	2025-09-15 11:09:19 +03:00
Jussi Saurio	9234ef86ae	mvcc: fix two sources of panic 1. commit state machine was assuming that begin_write_tx() cannot fail, but it can fail if there is another tx that is not using BEGIN CONCURRENT. 2. if a brand new non-CONCURRENT transaction attempts to start exclusive transaction but fails with Busy, we must end the read pager read tx it just started, because otherwise the next time it attempts to do something it will panic with: "cannot start a new read tx without ending an existing one"	2025-09-15 10:59:44 +03:00
Jussi Saurio	8f43741513	fix mvcc rollback executing ROLLBACK did not rollback the mv-store transaction	2025-09-15 09:29:08 +03:00
pedrocarlo	3d265489dc	modify semantics of `busy_timeout` to be more on par with sqlite	2025-09-15 02:20:32 -03:00
pedrocarlo	0586b75fbe	expose function to set busy timeout duration	2025-09-15 02:20:32 -03:00
pedrocarlo	a56680f79e	implement Busy Handler in Turso statements	2025-09-15 02:16:18 -03:00
Jussi Saurio	f4c15a37d3	add manual hack to mvcc test we rollback the mvcc transaction in the VDBE, so manually roll it back in the test	2025-09-14 23:46:38 +03:00
Jussi Saurio	db3428a7a9	remove unused pager parameter	2025-09-14 23:44:24 +03:00
Jussi Saurio	d598775e33	mvcc: properly remove mutations of rolled back tx mvstore was not removing deletions made by a tx that rolled back. deletions are removed by clearing the `end` mark from the row version.	2025-09-14 23:29:14 +03:00
Jussi Saurio	dccf8b9472	mvcc: properly clear tx states when mvcc tx rolls back	2025-09-14 23:29:07 +03:00
Jussi Saurio	487b8710d9	mvcc: don't double-rollback on write-write-conflict handle_program_error() already rolls back if this error happens. double rollback causes a crash.	2025-09-14 23:28:21 +03:00
Jussi Saurio	2ca1640a2a	not always write	2025-09-14 22:24:07 +03:00
Jussi Saurio	396091044e	store tx_mode in conn.mv_tx otherwise op_transaction works completely wrong because each separate insert statement overrides the tx_mode to Write	2025-09-14 21:59:08 +03:00
Jussi Saurio	7fe25a1d0e	mvcc: remove conn.mv_transactions afaict this isn't needed for anything since there is already conn.mv_tx_id	2025-09-14 21:26:58 +03:00
Jussi Saurio	5feb9ea2f0	mvcc: fix non-concurrent transaction semantics on the main branch, mvcc allows concurrent inserts from multiple txns even without BEGIN CONCURRENT, and then always hangs whenever one of the txns tries to commit. this commit fixes that issue.	2025-09-14 21:23:06 +03:00
Jussi Saurio	2ea1798d6e	mvcc: end commit state machine early when write set is empty	2025-09-14 20:02:35 +03:00
Pekka Enberg	95660535da	core/storage: Demote info logging to debug	2025-09-14 13:10:46 +03:00
PThorpe92	f6dd0bc4d6	Dont grab page cache write lock in a loop	2025-09-13 12:21:13 -04:00
Pekka Enberg	6a2f0d6061	Merge 'Add per page checksums' from Avinash Sajjanshetty This patch adds checksums to Turso DB. You may check the design here in the [RFC](https://github.com/tursodatabase/turso/issues/2178). 1. We use reserved bytes (8 bytes) to store the checksums. On every IO read, we verify that the checksum matches. 2. We use twox hash for checksums. 3. Checksum works only on 4K pages now. It's a small change to enable for all other sizes, I will send another PR. 4. Right now, it's not possible to switch to different algorithm or turn off altogether. That will be added in the future PRs. 5. Checksums can be enabled only for new dbs. For existing DBs, we will disable it. 6. To add checksums for existing DBs, we need vacuum since it would require rewrite of whole db. Closes #2840	2025-09-13 18:46:53 +03:00
Pekka Enberg	d8f07fe3da	core: Panic on fsync() error by default Retrying fsync() on error was historically not safe ("fsyncgate") and Postgres still defaults to panicing on fsync(). Therefore, add a "data_sync_retry" pragma (disabled by default) and use it to determine whether to panic on fsync() error or not.	2025-09-13 10:21:12 +03:00
Pekka Enberg	a7e34f1551	Merge 'Handle partial writes in unix IO for pwrite and pwritev' from Preston Thorpe currently, `io_uring` is setup to handle partial writes for `pwritev` (will add `pwrite` in subsequent PR), but unix and other IO back-ends were not correctly setup for this. Closes #3073	2025-09-13 09:08:43 +03:00
Avinash Sajjanshetty	5256f29a9c	Add checksums behind a feature flag	2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty	11030056c7	rename method to `verify_checksum`	2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty	e010c46552	use checksums when reading/writing from db file	2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty	4b59cf19e5	use checksums when reading/writing from wal	2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty	14a1307720	Set reserved space as required when allocating page1	2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty	3b410e4f79	set required reserved bytes while initialising the pager	2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty	2e6943bfdf	Add helper to read reserved bytes value from disk	2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty	c2c1ec2dba	Pass use `usable_space()` instead of hardcoding the value	2025-09-13 11:00:38 +05:30
Avinash Sajjanshetty	15266105f7	Update IOContext to carry checksum ctx	2025-09-13 11:00:38 +05:30
Avinash Sajjanshetty	3f72de3623	Add checksum module	2025-09-13 11:00:37 +05:30
TcMits	48522c1cc0	remove Stmt clone	2025-09-13 12:08:29 +07:00
PThorpe92	6098bca211	Handle partial writes in unix IO for pwrite and pwritev	2025-09-12 18:13:02 -04:00
Preston Thorpe	b1420904bb	Merge 'fix(btree): advance cursor after interior node replacement in delete' from Jussi Saurio ## Problem When a delete replaces an index interior cell, the replacement key is LT the deleted key. Currently on the main branch, after the deletion happens, the following call to BTreeCursor::next() stops at the replaced interior cell. This is incorrect - imagine the following sequence: - We are executing a query that deletes all keys WHERE key > 5 - We delete <key=6> from an interior node, and take a replacement <key=5> from the left subtree of that interior page - next() is called, and we land on the interior node again, which now has <key=5>, and we incorrectly delete it even though our WHERE condition is key > 5. ## Solution This PR: - Tracks `interior_node_was_replaced` in CheckNeedsBalancing - If no balancing is needed and a replacement occurred, advances once so the next invocation of next() will skip the replaced cell properly i.e. we prevent next() from landing on the replaced content and ensures iteration continues with the next logical record. ## Details This problem only became apparent once we started using indexes as valid iteration cursors for DELETE operations in #2981 Closes #3045 Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Reviewed-by: Preston Thorpe <preston@turso.tech> Closes #3049	2025-09-12 17:37:01 -04:00
Pekka Enberg	ad6157028e	Merge 'core/vdbe: Fix BEGIN CONCURRENT transactions' from Pekka Enberg The transaction upgrade logic in Transaction opcode is total nonsense for concurrent transactions so just drop it. Fixes #3061 Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #3070	2025-09-12 23:11:12 +03:00
Pekka Enberg	a0921c4221	Merge 'core/storage: Remove unused import warning' from Pekka Enberg Closes #3069	2025-09-12 23:11:05 +03:00
Pekka Enberg	5e2b1bc0d3	Merge 'Fix incompatible math functions' from Levy A. Fixes #1817, #2068, #1326, #1397. The solution is very much not ideal, but fixes all math function related incompatibilities. Reviewed-by: Preston Thorpe <preston@turso.tech> Closes #3033	2025-09-12 21:28:08 +03:00
Pekka Enberg	86dcdad3d0	core/vdbe: Fix BEGIN CONCURRENT transactions The transaction upgrade logic in Transaction opcode is total nonsense for concurrent transactions so just drop it. Fixes #3061	2025-09-12 21:19:34 +03:00
Pekka Enberg	2bc8c0c850	core/storage: Remove unused import warning	2025-09-12 21:09:38 +03:00
Pekka Enberg	dcd43ab8fc	Merge 'Handle `EXPLAIN QUERY PLAN` like SQLite' from Lâm Hoàng Phúc After this PR: ``` turso> EXPLAIN QUERY PLAN SELECT 1; QUERY PLAN `--SCAN CONSTANT ROW turso> EXPLAIN QUERY PLAN SELECT 1 UNION SELECT 1; QUERY PLAN `--COMPOUND QUERY \|--LEFT-MOST SUBQUERY \| `--SCAN CONSTANT ROW `--UNION USING TEMP B-TREE `--SCAN CONSTANT ROW turso> CREATE TABLE x(y); turso> CREATE TABLE z(y); turso> EXPLAIN QUERY PLAN SELECT * from x,z; QUERY PLAN \|--SCAN x `--SCAN z turso> EXPLAIN QUERY PLAN SELECT * from x,z ON x.y = z.y; QUERY PLAN \|--SCAN x `--SEARCH z USING INDEX ephemeral_z_t2 turso> ``` Closes #3057	2025-09-12 20:41:23 +03:00
PThorpe92	b04c364981	Fix clippy error	2025-09-12 11:43:38 -04:00
PThorpe92	7a14c7394f	Remove the header copy stored on the WalFile, fix fast_path	2025-09-12 11:29:43 -04:00
PThorpe92	25e7c719f1	Update checkpoint_seq on each checkpoint, not just when log restarts This was causing checkpoint_seq to be 0 when we had already successfully ran a passive checkpoint, and causing us to use improper pages from the cache.	2025-09-12 11:29:42 -04:00
Pekka Enberg	14da283e36	Merge 'MVCC: remove reliance on BTreeCursor::has_record()' from Jussi Saurio Closes #3051 Closes #3032 Closes #3056	2025-09-12 17:31:15 +03:00
Pekka Enberg	54b4c9f30b	Merge 'Implement the balance_quick algorithm' from Jussi Saurio Fast balancing routine for the common special case where the rightmost leaf page of a given subtree overflows such that the overflowing cell would be the rightmost cell on the page -- i.e. an append. In this case we just add a new leaf page as the right sibling of that page, put the overflow cell there, and insert a new divider cell into the parent. The high level steps are: 1. Allocate a new leaf page and insert the overflow cell payload in it. 2. Create a new divider cell in the parent - it contains the page number of the old rightmost leaf, plus the largest rowid on that page. 3. Update the rightmost pointer of the parent to point to the new leaf page. 4. Continue balance from the parent page (inserting the new divider cell may have overflowed the parent Closes #3041	2025-09-12 17:30:52 +03:00
Pekka Enberg	443720c74a	Merge 'benchmark: introduce simple 1 thread concurrent benchmark for mvcc/sq…' from Pere Diaz Bou …lite/wal This is considerably simpler with 1 thread as we just try to yield control when I/O happens and we only run io.run_once when all connections tried to do some work. This allows connections to cooperatively progress. Closes #3060	2025-09-12 17:27:41 +03:00
Pekka Enberg	7fdb116d41	Merge 'core/mvcc: queue mvcc txns on pager's end_tx' from Pere Diaz Bou Flushing mvcc changes to disk requires serialization. To do so we simply introduce a lock for pager.end_tx, which will take ownership of flushing to WAL. Once this is finished we can simply release lock. When multiple tx writes happen concurrently in mvcc, max frame will be updated. This new max_frame makes is the point of view of the other transaction return busy because his current wal snapshot is outdated. Closes #3059	2025-09-12 17:27:17 +03:00
Pere Diaz Bou	ec2cff2026	benchmark: introduce simple 1 thread concurrent benchmark for mvcc/sqlite/wal This is considerably simpler with 1 thread as we just try to yield control when I/O happens and we only run io.run_once when all connections tried to do some work. This allows connections to cooperatively progress.	2025-09-12 14:02:57 +00:00
Pere Diaz Bou	39fb5913e0	core/mvcc: queue write txn commits in mvcc on pager end_tx Flushing mvcc changes to disk requires serialization. To do so we simply introduce a lock for pager.end_tx, which will take ownership of flushing to WAL. Once this is finished we can simply release lock.	2025-09-12 14:00:02 +00:00

1 2 3 4 5 ...

4866 Commits