Commit Graph

727 Commits

Author SHA1 Message Date
Pekka Enberg
3176df64a2 Merge 'Fix: return NULL for rowid() when cursor's null flag is on' from Jussi Saurio
Fixes TPC-H query 13 from returning an incorrect result. In this
specific case, we were returning non-null `IdxRowid` values for the
right-hand side table even when there was no match with the left-hand
side table, meaning the join produced matches even in cases where there
shouldn't have been any.
Closes #2794

Closes #2795
2025-08-26 09:33:49 +03:00
Jussi Saurio
e52f807c7d Fix: return NULL for rowid() when cursor's null flag is on
Fixes TPC-H query 13 from returning an incorrect result. In this specific
case, we were returning non-null `IdxRowid` values for the right-hand side
table even when there was no match with the left-hand side table, meaning
the join produced matches even in cases where there shouldn't have been any.

Closes #2794
2025-08-26 09:08:48 +03:00
Pekka Enberg
114ece0375 Merge 'Make fill_cell_payload() safe for async IO and cache spilling' from Jussi Saurio
## Make fill_cell_payload() safe for async IO and cache spilling
### Problems:
1. fill_cell_payload() is not re-entrant because it can yield IO
   on allocating a new overflow page, resulting in losing some of the
   input data.
2. fill_cell_payload() in its current form is not safe for cache
spilling
   because the previous overflow page in the chain of allocated overflow
pages
   can be evicted by a spill caused by the next overflow page
allocation,
   invalidating the page pointer and causing corruption.
3. fill_cell_payload() uses raw pointers and `unsafe` as a workaround
from a previous time when we used to clone `WriteState`, resulting in
hard-to-read code.
### Solutions:
1. Introduce a new substate to the fill_cell_payload state machine to
handle
   re-entrancy wrt. allocating overflow pages.
2. Always pin the current overflow page so that it cannot be evicted
during the
   overflow chain construction. Also pin the regular page the overflow
chain is
   attached to, because it is immediately accessed after
fill_cell_payload is done.
3. Remove all explicit usages of `unsafe` from `fill_cell_payload`
(although our pager is ofc still extremely unsafe under the hood :] )
Note that solution 2 addresses a problem that arose in the development
of page cache
spilling, which is not yet implemented, but will be soon.
### Miscellania:
1. Renamed a bunch of variables to be clearer
2. Added more comments about what is happening in fill_cell_payload

Closes #2737
2025-08-26 08:36:46 +03:00
Jussi Saurio
8cae10f744 Fix several issues with integrity_check
Things that were just wrong:

1. No pages other than the root page were checked, because no looping
was done. Add a loop.
2. Rightmost child page was never added to page stack. Add it.

New integrity check features:

- Add overflow pages to stack as well
- Check that no page is referenced more than once in the tree
2025-08-25 16:51:57 +03:00
Pekka Enberg
3f5878243f Merge 'Remove unnecessary argument from Pager::end_tx()' from Nikita Sivukhin
No need to pass `disable` flag to the `end_tx` method as it has that
info from connection itself

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2777
2025-08-25 15:34:41 +03:00
Jussi Saurio
16b1ae4a9f Handle unpinning btree page in case of allocate overflow page error 2025-08-25 15:12:37 +03:00
Jussi Saurio
c6553d82b8 Clarify expected behavior with assertion 2025-08-25 15:05:04 +03:00
Jussi Saurio
42c8a77bb7 use existing payload_overflows() utility in local space calculation 2025-08-25 15:03:10 +03:00
Nikita Sivukhin
f7ad55b680 remove unnecessary argument 2025-08-25 12:24:39 +04:00
Jussi Saurio
dc6bcd4d41 refactor/btree: rewrite find_free_cell() 2025-08-25 10:08:39 +03:00
Jussi Saurio
4ea8cd0007 refactor/btree: rewrite the free_cell_range() function
i had a rough time reading this function earlier and trying to understand it,
so rewrote it in a way that, to me, is much more readable.
2025-08-25 09:41:44 +03:00
Jussi Saurio
b4ee40dd3d fix tests 2025-08-23 16:14:02 +03:00
Jussi Saurio
1d24925e21 Make fill_cell_payload() safe for async IO and cache spilling
Problems:

1. fill_cell_payload() is not re-entrant because it can yield IO
   on allocating a new overflow page, resulting in losing some of the
   input data.
2. fill_cell_payload() in its current form is not safe for cache spilling
   because the previous overflow page in the chain of allocated overflow pages
   can be evicted by a spill caused by the next overflow page allocation,
   invalidating the page pointer and causing corruption.
3. fill_cell_payload() uses raw pointers and `unsafe` as a workaround from a previous time when we used to clone `WriteState`, resulting in hard-to-read code.

Solutions:

1. Introduce a new substate to the fill_cell_payload state machine to handle
   re-entrancy wrt. allocating overflow pages.
2. Always pin the current overflow page so that it cannot be evicted during the
   overflow chain construction. Also pin the regular page the overflow chain is
   attached to, because it is immediately accessed after fill_cell_payload is done.
3. Remove all explicit usages of `unsafe` from `fill_cell_payload` (although our pager is ofc still extremely unsafe under the hood :] )

Note that solution 2 addresses a problem that arose in the development of page cache
spilling, which is not yet implemented, but will be soon.

Miscellania:

1. Renamed a bunch of variables to be clearer
2. Added more comments about what is happening in fill_cell_payload
2025-08-23 16:14:02 +03:00
Pekka Enberg
b9bb859271 Merge 'Switch to new parser in core' from Levy A.
Integrate #2381 to core. Resolves #2337.

Reviewed-by: Lâm Hoàng Phúc (@TcMits)

Closes #2650
2025-08-22 10:06:37 +03:00
Levy A.
4ba1304fb9 complete parser integration 2025-08-21 15:23:59 -03:00
pedrocarlo
6b0ed08465 read_page should return No Completion when have a page cache hit 2025-08-21 14:39:24 -03:00
Pekka Enberg
72a5de3551 Merge 'core/mvcc: support for MVCC' from Pere Diaz Bou
This PR tries to add simple support for delete, with limited testing for
now.
Moreover, there was an error with `forward`, which wasn't obvious
without delete, which didn't skip deleted rows.

Reviewed-by: Avinash Sajjanshetty (@avinassh)

Closes #2672
2025-08-20 16:51:31 +03:00
Jussi Saurio
e5f04ae100 Merge 'refactor/vdbe: move insert-related seeking to VDBE from BTreeCursor' from Jussi Saurio
This gets rid of `InsertState` in `BTreeCursor` plus the `moved_before`
parameter to `BTreeCursor::insert` -- instead, seek logic is now in the
existing state machines for `op_insert` and `op_idx_insert`

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #2639
2025-08-20 11:15:09 +03:00
Avinash Sajjanshetty
40a209c000 simplify feature flag usage for encryption 2025-08-20 12:49:38 +05:30
Avinash Sajjanshetty
bd9b4bbfd2 encrypt/decrypt when writing/reading from DB 2025-08-20 11:47:23 +05:30
Pere Diaz Bou
4314bc13e6 core/mvcc: delete support 2025-08-19 19:48:36 +02:00
Jussi Saurio
c2855cb0db refactor/idx_insert: move seeking to VDBE instead of BTreeCursor
Also removes `InsertState` and `moved_before` since neither are
needed anymore.
2025-08-19 19:04:42 +03:00
pedrocarlo
de1811dea7 abort completions on error 2025-08-19 10:48:21 -03:00
Jussi Saurio
9abc63d853 Add a bit of abstraction for creating EQ cursorcontexts 2025-08-18 13:13:02 +03:00
Jussi Saurio
3eb89982ba Remove obsolete FIXME 2025-08-18 12:08:40 +03:00
Jussi Saurio
50fd7ec58b Refactor: use regular save/restore context mechanism for delete balancing
- Removes special `DeleteSavepoint` and uses the existing cursor restoration
  mechanism.
- This required some restructuring of `DeleteState` to avoid cloning it, i.e.
  some negotations with the borrow checker.
- CursorContext now takes a SeekOp as well to allow retaining the behavior
  that we use LT for seeking after a delete-induced rebalancing. This behavior
  will probably be removed as part of fixing #2004, but here I am trying to
  preserve the current semantics.
2025-08-18 11:58:00 +03:00
PThorpe92
2c526c4c37 Add io_yield_x macros to reduce boilerplate 2025-08-16 16:14:00 -04:00
Jussi Saurio
c75e4c1092 Fix non-4096 page sizes by making WAL header lazy 2025-08-14 12:40:58 +03:00
Jussi Saurio
f8620a9869 Use non-hardcoded size for BTreeCursor immutablerecord 2025-08-14 12:40:58 +03:00
Jussi Saurio
69d8a73028 Merge 'use virtual root page for sqlite_schema' from Mikaël Francoeur
This PR fixes a problem where `sqlite_schema` could be read before page
1 was allocated.
The fix is similar to that in SQLite. In SQLite, if `btreeCursor()` sees
that the root page is 1 and that the b-tree is empty, it sets the page
to 0 ([here](https://github.com/sqlite/sqlite/blob/master/src/btree.c#L4
691-L4696)). SQLite's `moveToRoot()` then uses this special value to
return `CURSOR_INVALID` with no rows ([here](https://github.com/sqlite/s
qlite/blob/master/src/btree.c#L5538-L5540)).
Fixes https://github.com/tursodatabase/turso/issues/2449

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2551
2025-08-14 11:08:11 +03:00
Mikaël Francoeur
07ef47924c use virtual root page for sqlite_schema 2025-08-13 16:31:21 -04:00
Jussi Saurio
fd72a2ff20 Fix: do computations on usable_space as usize, not as u16
Otherwise page size 65536 will not work as casting to u16 will make
it wrap around to 0.
2025-08-13 17:20:29 +03:00
pedrocarlo
ccc22863c6 remove return_if_locked and return_if_locked_maybe_load 2025-08-13 10:24:55 +03:00
pedrocarlo
f95625a06c bubble completions in btree 2025-08-13 10:24:55 +03:00
pedrocarlo
85e86d427b cleanups - use io.block in many functions and return_if_io 2025-08-13 08:32:38 +03:00
pedrocarlo
e94f1f9f14 refactor move_to functions to return IO on start 2025-08-12 12:28:35 -03:00
pedrocarlo
4010dc8f32 state machine for insert 2025-08-12 12:28:35 -03:00
pedrocarlo
fe0e4bcbb7 state machine for seek_end 2025-08-12 12:28:35 -03:00
pedrocarlo
fc05518192 refactor continue_payload_overflow_with_offset 2025-08-12 12:28:34 -03:00
pedrocarlo
81fbf8cb4b balance_non_root validation logic should be done in the next state 2025-08-12 12:28:34 -03:00
pedrocarlo
96a6bc5125 end_tx does not need schema_did_change variable 2025-08-11 18:59:11 -03:00
PThorpe92
d7e4ba21f8 Add explanation for using 3mb limit 2025-08-08 10:55:28 -04:00
PThorpe92
3cff47e490 Fix btree test to properly initialize pool 2025-08-08 10:55:27 -04:00
PThorpe92
9d1ca1c8ca Add ReadFixed/WriteFixed opcodes for buffers from registered arena 2025-08-08 10:55:27 -04:00
PThorpe92
4ffb273b53 Adjust IO to use new buffer pool and buffer API 2025-08-08 10:55:26 -04:00
Jussi Saurio
7fd63d8a5d btree: cache usable_space in the btreecursor constructor 2025-08-08 10:32:18 +03:00
Jussi Saurio
15c429b673 btree: remove completely unused ParseRecordState 2025-08-08 10:08:59 +03:00
Preston Thorpe
7a793b818d Merge 'perf: a few small insert optimizations' from Jussi Saurio
1. We spend a lot of time in `cell_get_raw_region` in the balancing
routine, and especially calling `contents.page_type()` there a lot, so
extract a version that can take some precomputed arguments so those
don't have to be redundantly computed multiple times for successive
calls where those values are going to be the same
2. Avoid calling `self.usable_space()` in a loop in
`insert_into_page()`.
3. Avoid accessing `pages_in_frames` lock if we're not going to modify
it
main improvement is to the "insert 100 rows" bench which ends up doing
balancing a lot:
```
Insert rows in batches/limbo_insert_1_rows
                        time:   [22.856 µs 24.342 µs 27.496 µs]
                        change: [-3.3579% +15.495% +67.671%] (p = 0.62 > 0.05)
                        No change in performance detected.

Benchmarking Insert rows in batches/limbo_insert_10_rows: Collecting 100 samples in estim
Insert rows in batches/limbo_insert_10_rows
                        time:   [32.196 µs 32.604 µs 32.981 µs]
                        change: [+1.3253% +2.9177% +4.5863%] (p = 0.00 < 0.05)
                        Performance has regressed.

Insert rows in batches/limbo_insert_100_rows
                        time:   [89.425 µs 92.105 µs 96.304 µs]
                        change: [-18.317% -13.605% -9.1022%] (p = 0.00 < 0.05)
                        Performance has improved.
```

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #2483
2025-08-07 21:33:30 -04:00
Jussi Saurio
1fe32dadf3 PageContent: make read_x/write_x methods private and add dedicated methods
Problem:

A very easy source of bugs is to mistakenly use e.g. PageContent::read_u16()
instead of PageContent::read_u16_no_offset(). The difference between the two
is that `read_u16()` adds 100 bytes to the requested byte offset if and only if
the page in question is page 1, which contains a 100-byte database header.

Case in point: see #2491.

Observation:

In all of the cases where we want to read from or write to a page  "header-sensitively",
those reads/writes are to so-called "well known offsets", e.g. specific bytes in a btree
page header.

In all other cases, the "no-offset" versions, i.e. the ones taking the absolute byte offset
as parameter, should be used.

Solution:

1. Make all the offset-sensitive versions (read_u16() and friends) private methods of
`PageContent`.
2. Expose dedicated methods for things like updating rightmost pointer, updating fragmented
bytes count and so on, and use them instead of the plain read/write methods universally.
2025-08-07 17:00:06 +03:00
Jussi Saurio
6cd7334afc btree/fix: use correct byte offsets for page1 in defragmentation
`defragment_page_fast()` incorrectly didn't use the version of
read/write methods on `PageContent` that does NOT add the 100 byte
database header into the requested byte offset.

this resulted in defragment of page 1 in reading 2nd/3rd freeblocks
from the wrong offset and writing cell offsets to the wrong location.
2025-08-07 15:42:06 +03:00