Commit Graph

1444 Commits

Author SHA1 Message Date
Pekka Enberg
d7158262ab Merge 'core/storage: Clean up unused import warning in encryption.rs' from Pekka Enberg
...happens when encryption feature is disabled.

Closes #3165
2025-09-17 11:19:20 +03:00
Pekka Enberg
1e90572e7a core/storage: Clean up unused import warning in encryption.rs
...happens when encryption feature is disabled.
2025-09-17 10:22:36 +03:00
Pekka Enberg
17e9f05ea4 core: Convert Rc<Pager> to Arc<Pager> 2025-09-17 09:32:49 +03:00
Jussi Saurio
104b8dd083 Merge 'Encrypt page 1' from
This PR extends the existing encryption support to include the database
header page (page 1).

Reviewed-by: Avinash Sajjanshetty (@avinassh)

Closes #3040
2025-09-17 09:26:06 +03:00
Jussi Saurio
cae234818b Merge 'Inital support for window functions' from Piotr Rżysko
This adds basic support for window functions. For now:
* Only existing aggregate functions can be used as window functions.
* Specialized window-specific functions (`rank`, `row_number`, etc.) are
not yet supported.
* Only the default frame definition is implemented:
`RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW EXCLUDE NO OTHERS`.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3079
2025-09-17 08:29:16 +03:00
rajajisai
e605aff31b Merge branch 'main' into enc-page-1 2025-09-16 10:06:00 -04:00
rajajisai
89caa868f9 Encryption support for database header page 2025-09-16 10:04:30 -04:00
Jussi Saurio
d2d1d1bc61 fix re-entrancy issue in Pager::free_page
current logic can lead to a situation where:

- we call read_page(trunk_page_id)
- we assign trunk_page in the FreePageState state machine
- the page read fails and cache marks it as !locked && !loaded
- next call to Pager::free_page() asserts that the page is loaded and panics
2025-09-15 21:41:18 +03:00
pedrocarlo
7021386f86 move divider_cell_is_overflow_cell to debug assertions so it stops appearing in release builds 2025-09-15 11:11:28 -03:00
Jussi Saurio
32cd01a615 fix deadlock 2025-09-15 14:48:26 +03:00
Pekka Enberg
247d4c06c6 Merge 'Fix MVCC update' from Jussi Saurio
Based on #3126
Closes #3029
Closes #3030
Closes #3065
Closes #3083
Closes #3084
Closes #3085
simple reason why mvcc update didn't work: it didn't try to update.

Closes #3127
2025-09-15 14:24:59 +03:00
Jussi Saurio
59f18e2dc8 fix mvcc update
simple reason why mvcc update didn't work: it didn't try to update.
2025-09-15 11:27:56 +03:00
Nikita Sivukhin
3bcac441e4 reduce log level of some very frequent logs 2025-09-15 11:35:41 +04:00
Jussi Saurio
db3428a7a9 remove unused pager parameter 2025-09-14 23:44:24 +03:00
Pekka Enberg
95660535da core/storage: Demote info logging to debug 2025-09-14 13:10:46 +03:00
PThorpe92
f6dd0bc4d6 Dont grab page cache write lock in a loop 2025-09-13 12:21:13 -04:00
Pekka Enberg
6a2f0d6061 Merge 'Add per page checksums' from Avinash Sajjanshetty
This patch adds checksums to Turso DB. You may check the design here in
the [RFC](https://github.com/tursodatabase/turso/issues/2178).
1. We use reserved bytes (8 bytes) to store the checksums. On every IO
read, we verify that the checksum matches.
2. We use twox hash for checksums.
3. Checksum works only on 4K pages now. It's a small change to enable
for all other sizes, I will send another PR.
4. Right now, it's not possible to switch to different algorithm or turn
off altogether. That will be added in the future PRs.
5. Checksums can be enabled only for new dbs. For existing DBs, we will
disable it.
6. To add checksums for existing DBs, we need vacuum since it would
require rewrite of whole db.

Closes #2840
2025-09-13 18:46:53 +03:00
Piotr Rzysko
867bef55d8 Add ResetSorter instruction
This instruction isn't used yet, but it will be needed for window
functions, since they heavily rely on ephemeral tables.
2025-09-13 10:44:56 +02:00
Piotr Rzysko
ea9599681e Add OpenDup instruction
The instruction isn’t used yet, but it’ll be needed for window functions,
since they heavily rely on ephemeral tables.
2025-09-13 10:35:33 +02:00
Pekka Enberg
d8f07fe3da core: Panic on fsync() error by default
Retrying fsync() on error was historically not safe ("fsyncgate") and
Postgres still defaults to panicing on fsync(). Therefore, add a
"data_sync_retry" pragma (disabled by default) and use it to determine
whether to panic on fsync() error or not.
2025-09-13 10:21:12 +03:00
Avinash Sajjanshetty
5256f29a9c Add checksums behind a feature flag 2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty
11030056c7 rename method to verify_checksum 2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty
e010c46552 use checksums when reading/writing from db file 2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty
4b59cf19e5 use checksums when reading/writing from wal 2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty
14a1307720 Set reserved space as required when allocating page1 2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty
c2c1ec2dba Pass use usable_space() instead of hardcoding the value 2025-09-13 11:00:38 +05:30
Avinash Sajjanshetty
15266105f7 Update IOContext to carry checksum ctx 2025-09-13 11:00:38 +05:30
Avinash Sajjanshetty
3f72de3623 Add checksum module 2025-09-13 11:00:37 +05:30
Preston Thorpe
b1420904bb Merge 'fix(btree): advance cursor after interior node replacement in delete' from Jussi Saurio
## Problem
When a delete replaces an index interior cell, the replacement key is LT
the deleted key. Currently on the main branch, after the deletion
happens, the following call to BTreeCursor::next() stops at the replaced
interior cell.
This is incorrect - imagine the following sequence:
- We are executing a query that deletes all keys WHERE key > 5
- We delete <key=6> from an interior node, and take a replacement
<key=5> from the left subtree of that interior page
- next() is called, and we land on the interior node again, which now
has <key=5>, and we incorrectly delete it even though our WHERE
condition is key > 5.
## Solution
This PR:
- Tracks `interior_node_was_replaced` in CheckNeedsBalancing
- If no balancing is needed and a replacement occurred, advances once so
the next invocation of next() will skip the replaced cell properly
i.e. we prevent next() from landing on the replaced content and ensures
iteration continues with the next logical record.
## Details
This problem only became apparent once we started using indexes as valid
iteration cursors for DELETE operations in #2981
Closes #3045

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3049
2025-09-12 17:37:01 -04:00
Pekka Enberg
2bc8c0c850 core/storage: Remove unused import warning 2025-09-12 21:09:38 +03:00
PThorpe92
b04c364981 Fix clippy error 2025-09-12 11:43:38 -04:00
PThorpe92
7a14c7394f Remove the header copy stored on the WalFile, fix fast_path 2025-09-12 11:29:43 -04:00
PThorpe92
25e7c719f1 Update checkpoint_seq on each checkpoint, not just when log restarts
This was causing checkpoint_seq to be 0 when we had already successfully
ran a passive checkpoint, and causing us to use improper pages from the
cache.
2025-09-12 11:29:42 -04:00
Pekka Enberg
14da283e36 Merge 'MVCC: remove reliance on BTreeCursor::has_record()' from Jussi Saurio
Closes #3051
Closes #3032

Closes #3056
2025-09-12 17:31:15 +03:00
Pekka Enberg
54b4c9f30b Merge 'Implement the balance_quick algorithm' from Jussi Saurio
Fast balancing routine for the common special case where the rightmost
leaf page of a given subtree overflows such that the overflowing cell
would be the rightmost cell on the page -- i.e. an append. In this case
we just add a new leaf page as the right sibling of that page, put the
overflow cell there, and insert a new divider cell into the parent. The
high level steps are:
1. Allocate a new leaf page and insert the overflow cell payload in it.
2. Create a new divider cell in the parent - it contains the page number
of the old rightmost leaf, plus the largest rowid on that page.
3. Update the rightmost pointer of the parent to point to the new leaf
page.
4. Continue balance from the parent page (inserting the new divider cell
may have overflowed the parent

Closes #3041
2025-09-12 17:30:52 +03:00
Pere Diaz Bou
9b6d181be4 wal: add hacky update max frame for mvcc use
When multiple tx writes happen concurrently in mvcc, max frame will be
updated. This new max_frame makes is the point of view of the other
transaction return busy because his current wal snapshot is outdated.
2025-09-12 13:49:14 +00:00
Jussi Saurio
305b2f55ae MVCC: remove reliance on BTreeCursor::has_record() 2025-09-12 16:03:55 +03:00
PThorpe92
f60ca3970f Remove old comment from wal 2025-09-12 06:39:59 -04:00
PThorpe92
faf3531a4e Fix checkpoint fast-path, don't use cached pages w/o write lock
closes #3024
Also we snapshot the page when we determine that it's eligible, and pay a
memcpy instead of the read from disk, but this further prevents any in-memory
changes to the page/TOCTOU issues.
2025-09-12 06:38:02 -04:00
Jussi Saurio
9f6e1a2e7c fix(btree): advance cursor after interior node replacement in delete
When a delete replaces an interior cell, the replacement key is LT the
deleted key. Currently on the main branch, after the deletion happens,
the following call to BTreeCursor::next() stops at the replaced interior
cell.

This is incorrect - imagine the following sequence:

- We are executing a query that deletes all keys WHERE key > 5
- We delete <key=6> from an interior node, and take a replacement
  <key=5> from the left subtree of that interior page
- next() is called, and we land on the interior node again, which
  now has <key=5>, and we incorrectly delete it even though our
  WHERE condition is key > 5.

This PR:
- Tracks `interior_node_was_replaced` in CheckNeedsBalancing
- If no balancing is needed and a replacement occurred, advances once
  so the next invocation of next() will skip the replaced cell properly

i.e. we prevent next() from landing on the replaced content and ensures iteration continues with the next logical record.

Closes #3045
2025-09-12 10:49:44 +03:00
Denizhan Dakılır
70102f5f6e add explicit usize type annotation to range iterator in test 2025-09-12 02:18:49 +03:00
Jussi Saurio
9b14c0022d Implement the balance_quick algorithm
Fast balancing routine for the common special case where the rightmost leaf page of a given subtree overflows (= an append).
In this case we just add a new leaf page as the right sibling of that page, and insert a new divider cell into the parent.
The high level steps are:
1. Allocate a new leaf page and insert the overflow cell payload in it.
2. Create a new divider cell in the parent - it contains the page number of the old rightmost leaf, plus the largest rowid on that page.
3. Update the rightmost pointer of the parent to point to the new leaf page.
4. Continue balance from the parent page (inserting the new divider cell may have overflowedImplement the balance_quick algorithm
2025-09-12 00:42:27 +03:00
Pekka Enberg
7d8a1a0d5f Merge 'whopper: A new DST with concurrency' from Pekka Enberg
Our simulator is currently limited to concurrency of one. This
introduces a much less sophisticated DST with focus on finding
concurrency bugs.

Closes #2985
2025-09-11 18:42:45 +03:00
Jussi Saurio
c30d320cab Fix: read transaction cannot be allowed to start with a stale max frame
If both of the following are true:

1. All read locks are already held
2. The highest readmark of any read lock is less than the committed max frame

Then we must return Busy to the reader, because otherwise they would begin a
transaction with a stale local max frame, and thus not see some committed
changes.
2025-09-11 15:58:13 +03:00
Pekka Enberg
ca51a60b3c core/storage: Demote restart_log() logging to debug 2025-09-11 08:35:18 +03:00
PThorpe92
b93ad749a9 Remove some traces in super hot paths in btree 2025-09-10 09:54:32 -04:00
Pekka Enberg
bb3fbb7962 Merge 'check freelist count in integrity check' from Jussi Saurio
Closes #3003
2025-09-10 16:15:39 +03:00
Jussi Saurio
d7ce781a2a Merge 'Enable the use of indexes in DELETE statements' from Jussi Saurio
Closes #1714
This PR enables the use of an index as the iteration cursor for a point
or range deletion operation. Main changes:
- Use `Delete` opcode for the index that is iterating the rows - avoids
unnecessary seeking on that index, since it's already positioned
correctly
- Fix delete balancing; details below:
### current state
- a deletion may cause a btree rebalancing operation
- to get the cursor back to the right place after a rebalancing, we must
remember what the deleted key was and seek to it
- right now we are using `SeekOp::LT` to move to one slot BEFORE the
deleted key, so that if we delete rows in a loop, the following `Next()`
call will put us back into the right place
### problem
- When we delete multiple rows, we always iterate forwards. Using
`SeekOp::LT` implies backwards iteration, but it works OK for table
btrees since the cursor never remains on an internal node, because table
internal cells do not have payloads. However: this behavior is
problematic for indexes because we can effectively end up skipping
visiting a page entirely. Honestly: despite spending some debugging the
_old_ code, I still don't remember what exactly causes this to happen.
:) It's one of the `iter_dir` specific behaviors in `indexbtree_move_to`
or `get_prev_record()`, but I'm too tired to spend more time figuring it
out. I had the reason in my head before going on vacation, but it was
evicted from the cache it seems...
### solution
use `SeekOp::GE { eq_only: true }` instead and make the next call to
`Next()` a no-op instead. This has the same effect as SeekOp::LT +
next(), but without introducing bugs due to `LT` being implied backwards
iteration.

Reviewed-by: Nikita Sivukhin (@sivukhin)

Closes #2981
2025-09-10 16:00:54 +03:00
Jussi Saurio
e3594d0ae0 make the comment for skip_advance more accurate 2025-09-10 15:38:57 +03:00
Jussi Saurio
618f51330a advance despite skip_advance flag if cursor not pointing at record 2025-09-10 14:54:51 +03:00