Commit Graph

4862 Commits

Author SHA1 Message Date
pedrocarlo
3d265489dc modify semantics of busy_timeout to be more on par with sqlite 2025-09-15 02:20:32 -03:00
pedrocarlo
0586b75fbe expose function to set busy timeout duration 2025-09-15 02:20:32 -03:00
pedrocarlo
a56680f79e implement Busy Handler in Turso statements 2025-09-15 02:16:18 -03:00
Jussi Saurio
f4c15a37d3 add manual hack to mvcc test
we rollback the mvcc transaction in the VDBE, so manually roll it
back in the test
2025-09-14 23:46:38 +03:00
Jussi Saurio
db3428a7a9 remove unused pager parameter 2025-09-14 23:44:24 +03:00
Jussi Saurio
d598775e33 mvcc: properly remove mutations of rolled back tx
mvstore was not removing deletions made by a tx that rolled back.
deletions are removed by clearing the `end` mark from the row
version.
2025-09-14 23:29:14 +03:00
Jussi Saurio
dccf8b9472 mvcc: properly clear tx states when mvcc tx rolls back 2025-09-14 23:29:07 +03:00
Jussi Saurio
487b8710d9 mvcc: don't double-rollback on write-write-conflict
handle_program_error() already rolls back if this error happens.
double rollback causes a crash.
2025-09-14 23:28:21 +03:00
Jussi Saurio
2ca1640a2a not always write 2025-09-14 22:24:07 +03:00
Jussi Saurio
396091044e store tx_mode in conn.mv_tx
otherwise op_transaction works completely wrong because each separate
insert statement overrides the tx_mode to Write
2025-09-14 21:59:08 +03:00
Jussi Saurio
7fe25a1d0e mvcc: remove conn.mv_transactions
afaict this isn't needed for anything since there is already
conn.mv_tx_id
2025-09-14 21:26:58 +03:00
Jussi Saurio
5feb9ea2f0 mvcc: fix non-concurrent transaction semantics
on the main branch, mvcc allows concurrent inserts from multiple
txns even without BEGIN CONCURRENT, and then always hangs whenever
one of the txns tries to commit.

this commit fixes that issue.
2025-09-14 21:23:06 +03:00
Jussi Saurio
2ea1798d6e mvcc: end commit state machine early when write set is empty 2025-09-14 20:02:35 +03:00
Pekka Enberg
95660535da core/storage: Demote info logging to debug 2025-09-14 13:10:46 +03:00
PThorpe92
f6dd0bc4d6 Dont grab page cache write lock in a loop 2025-09-13 12:21:13 -04:00
Pekka Enberg
6a2f0d6061 Merge 'Add per page checksums' from Avinash Sajjanshetty
This patch adds checksums to Turso DB. You may check the design here in
the [RFC](https://github.com/tursodatabase/turso/issues/2178).
1. We use reserved bytes (8 bytes) to store the checksums. On every IO
read, we verify that the checksum matches.
2. We use twox hash for checksums.
3. Checksum works only on 4K pages now. It's a small change to enable
for all other sizes, I will send another PR.
4. Right now, it's not possible to switch to different algorithm or turn
off altogether. That will be added in the future PRs.
5. Checksums can be enabled only for new dbs. For existing DBs, we will
disable it.
6. To add checksums for existing DBs, we need vacuum since it would
require rewrite of whole db.

Closes #2840
2025-09-13 18:46:53 +03:00
Pekka Enberg
d8f07fe3da core: Panic on fsync() error by default
Retrying fsync() on error was historically not safe ("fsyncgate") and
Postgres still defaults to panicing on fsync(). Therefore, add a
"data_sync_retry" pragma (disabled by default) and use it to determine
whether to panic on fsync() error or not.
2025-09-13 10:21:12 +03:00
Pekka Enberg
a7e34f1551 Merge 'Handle partial writes in unix IO for pwrite and pwritev' from Preston Thorpe
currently, `io_uring` is setup to handle partial writes for `pwritev`
(will add `pwrite` in subsequent PR), but unix and other IO back-ends
were not correctly setup for this.

Closes #3073
2025-09-13 09:08:43 +03:00
Avinash Sajjanshetty
5256f29a9c Add checksums behind a feature flag 2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty
11030056c7 rename method to verify_checksum 2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty
e010c46552 use checksums when reading/writing from db file 2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty
4b59cf19e5 use checksums when reading/writing from wal 2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty
14a1307720 Set reserved space as required when allocating page1 2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty
3b410e4f79 set required reserved bytes while initialising the pager 2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty
2e6943bfdf Add helper to read reserved bytes value from disk 2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty
c2c1ec2dba Pass use usable_space() instead of hardcoding the value 2025-09-13 11:00:38 +05:30
Avinash Sajjanshetty
15266105f7 Update IOContext to carry checksum ctx 2025-09-13 11:00:38 +05:30
Avinash Sajjanshetty
3f72de3623 Add checksum module 2025-09-13 11:00:37 +05:30
TcMits
48522c1cc0 remove Stmt clone 2025-09-13 12:08:29 +07:00
PThorpe92
6098bca211 Handle partial writes in unix IO for pwrite and pwritev 2025-09-12 18:13:02 -04:00
Preston Thorpe
b1420904bb Merge 'fix(btree): advance cursor after interior node replacement in delete' from Jussi Saurio
## Problem
When a delete replaces an index interior cell, the replacement key is LT
the deleted key. Currently on the main branch, after the deletion
happens, the following call to BTreeCursor::next() stops at the replaced
interior cell.
This is incorrect - imagine the following sequence:
- We are executing a query that deletes all keys WHERE key > 5
- We delete <key=6> from an interior node, and take a replacement
<key=5> from the left subtree of that interior page
- next() is called, and we land on the interior node again, which now
has <key=5>, and we incorrectly delete it even though our WHERE
condition is key > 5.
## Solution
This PR:
- Tracks `interior_node_was_replaced` in CheckNeedsBalancing
- If no balancing is needed and a replacement occurred, advances once so
the next invocation of next() will skip the replaced cell properly
i.e. we prevent next() from landing on the replaced content and ensures
iteration continues with the next logical record.
## Details
This problem only became apparent once we started using indexes as valid
iteration cursors for DELETE operations in #2981
Closes #3045

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3049
2025-09-12 17:37:01 -04:00
Pekka Enberg
ad6157028e Merge 'core/vdbe: Fix BEGIN CONCURRENT transactions' from Pekka Enberg
The transaction upgrade logic in Transaction opcode is total nonsense
for concurrent transactions so just drop it.
Fixes #3061

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #3070
2025-09-12 23:11:12 +03:00
Pekka Enberg
a0921c4221 Merge 'core/storage: Remove unused import warning' from Pekka Enberg
Closes #3069
2025-09-12 23:11:05 +03:00
Pekka Enberg
5e2b1bc0d3 Merge 'Fix incompatible math functions' from Levy A.
Fixes #1817, #2068, #1326, #1397.
The solution is very much not ideal, but fixes all math function related
incompatibilities.

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3033
2025-09-12 21:28:08 +03:00
Pekka Enberg
86dcdad3d0 core/vdbe: Fix BEGIN CONCURRENT transactions
The transaction upgrade logic in Transaction opcode is total nonsense
for concurrent transactions so just drop it.

Fixes #3061
2025-09-12 21:19:34 +03:00
Pekka Enberg
2bc8c0c850 core/storage: Remove unused import warning 2025-09-12 21:09:38 +03:00
Pekka Enberg
dcd43ab8fc Merge 'Handle EXPLAIN QUERY PLAN like SQLite' from Lâm Hoàng Phúc
After this PR:
```
turso> EXPLAIN QUERY PLAN SELECT 1;
QUERY PLAN
`--SCAN CONSTANT ROW
turso> EXPLAIN QUERY PLAN SELECT 1 UNION SELECT 1;
QUERY PLAN
`--COMPOUND QUERY
   |--LEFT-MOST SUBQUERY
   |  `--SCAN CONSTANT ROW
   `--UNION USING TEMP B-TREE
      `--SCAN CONSTANT ROW
turso> CREATE TABLE x(y);
turso> CREATE TABLE z(y);
turso> EXPLAIN QUERY PLAN SELECT * from x,z;
QUERY PLAN
|--SCAN x
`--SCAN z
turso> EXPLAIN QUERY PLAN SELECT * from x,z ON x.y = z.y;
QUERY PLAN
|--SCAN x
`--SEARCH z USING INDEX ephemeral_z_t2
turso>
```

Closes #3057
2025-09-12 20:41:23 +03:00
PThorpe92
b04c364981 Fix clippy error 2025-09-12 11:43:38 -04:00
PThorpe92
7a14c7394f Remove the header copy stored on the WalFile, fix fast_path 2025-09-12 11:29:43 -04:00
PThorpe92
25e7c719f1 Update checkpoint_seq on each checkpoint, not just when log restarts
This was causing checkpoint_seq to be 0 when we had already successfully
ran a passive checkpoint, and causing us to use improper pages from the
cache.
2025-09-12 11:29:42 -04:00
Pekka Enberg
14da283e36 Merge 'MVCC: remove reliance on BTreeCursor::has_record()' from Jussi Saurio
Closes #3051
Closes #3032

Closes #3056
2025-09-12 17:31:15 +03:00
Pekka Enberg
54b4c9f30b Merge 'Implement the balance_quick algorithm' from Jussi Saurio
Fast balancing routine for the common special case where the rightmost
leaf page of a given subtree overflows such that the overflowing cell
would be the rightmost cell on the page -- i.e. an append. In this case
we just add a new leaf page as the right sibling of that page, put the
overflow cell there, and insert a new divider cell into the parent. The
high level steps are:
1. Allocate a new leaf page and insert the overflow cell payload in it.
2. Create a new divider cell in the parent - it contains the page number
of the old rightmost leaf, plus the largest rowid on that page.
3. Update the rightmost pointer of the parent to point to the new leaf
page.
4. Continue balance from the parent page (inserting the new divider cell
may have overflowed the parent

Closes #3041
2025-09-12 17:30:52 +03:00
Pekka Enberg
443720c74a Merge 'benchmark: introduce simple 1 thread concurrent benchmark for mvcc/sq…' from Pere Diaz Bou
…lite/wal
This is considerably simpler with 1 thread as we just try to yield
control when I/O happens and we only run io.run_once when all
connections tried to do some work. This allows connections to
cooperatively progress.

Closes #3060
2025-09-12 17:27:41 +03:00
Pekka Enberg
7fdb116d41 Merge 'core/mvcc: queue mvcc txns on pager's end_tx' from Pere Diaz Bou
Flushing mvcc changes to disk requires serialization. To do so we simply
introduce a lock for pager.end_tx, which will take ownership of flushing
to WAL. Once this is finished we can simply release lock.
When multiple tx writes happen concurrently in mvcc, max frame will be
updated. This new max_frame makes is the point of view of the other
transaction return busy because his current wal snapshot is outdated.

Closes #3059
2025-09-12 17:27:17 +03:00
Pere Diaz Bou
ec2cff2026 benchmark: introduce simple 1 thread concurrent benchmark for mvcc/sqlite/wal
This is considerably simpler with 1 thread as we just try to yield
control when I/O happens and we only run io.run_once when all
connections tried to do some work. This allows connections to
cooperatively progress.
2025-09-12 14:02:57 +00:00
Pere Diaz Bou
39fb5913e0 core/mvcc: queue write txn commits in mvcc on pager end_tx
Flushing mvcc changes to disk requires serialization. To do so we simply
introduce a lock for pager.end_tx, which will take ownership of flushing
to WAL. Once this is finished we can simply release lock.
2025-09-12 14:00:02 +00:00
Pere Diaz Bou
e87226548c core/mvcc: fix concurrent tests mvcc 2025-09-12 13:49:40 +00:00
Pere Diaz Bou
9b6d181be4 wal: add hacky update max frame for mvcc use
When multiple tx writes happen concurrently in mvcc, max frame will be
updated. This new max_frame makes is the point of view of the other
transaction return busy because his current wal snapshot is outdated.
2025-09-12 13:49:14 +00:00
Pere Diaz Bou
66b5630870 vdbe/mvcc: rollback mvcc txn on vdbe error 2025-09-12 13:47:45 +00:00
Jussi Saurio
305b2f55ae MVCC: remove reliance on BTreeCursor::has_record() 2025-09-12 16:03:55 +03:00