This PR addresses https://github.com/tursodatabase/turso/issues/1828 in
a phased manner.
Making database header access async in one PR will be complicated. This
PR ports adds an async API to `header_accessor.rs` and ports over some
of `pager.rs` to use this API.
This will allow gradual porting over of all call sites. Once all call
sites are ported over, one mechanical rename will fix everything in the
repo so we don't have any `<header_name>_async` functions.
Also, porting header accessors over from sync to async would be a good
way to get introduced to the Limbo codebase for first time contributors.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#1966
Current table B-Tree seek code rely on the invariant that if key `K` is
present in interior page then it also must be present in the leaf page.
This is generally not true if data was ever deleted from the table
because leaf row which key was used as a divider in the interior pages
can be deleted. Also, SQLite spec says nothing about such invariant - so
`turso-db` implementation of B-Tree should not rely on it.
This PR introduce 3 options for B-Tree `seek` result: `Found` /
`NotFound` and `TryAdvance` which is generated when leaf page have no
match for `seek_op` but DB don't know if neighbor page can have matching
data.
There is an alternative approach where we can move cursor in the `seek`
itself to the neighbor page - but I was afraid to introduce such changes
because analogue `seek` function from SQLite works exactly like current
version of the code and I think some query planner internals (for
insertion) can rely on the fact that repositioning will leave cursor at
the position of insertion:
> ** If an exact match is not found, then the cursor is always
** left pointing at a leaf page which would hold the entry if it
** were present. The cursor might point to an entry that comes
** before or after the key.
Also, this PR introduces new B-tree fuzz tests which generate table
B-tree from scratch and execute opreations over it. This can help to
reach some non trivial states and also generate huge DBs faster (that's
how this bug was discovered)
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#2065
Let's assert **for now** that we do not read/write less bytes than
expected. This should be fixed to retrigger several reads/writes if we
couldn't read/write enough but for now let's assert.
Closes#2078
- Apart from regular states Found/NotFound seek result has TryAdvance
value which tells caller to advance the cursor in necessary direction
because the leaf page which would hold the entry if it was present
actually has no matching entry (but neighbouring page can have match)
Let's assert **for now** that we do not read/write less bytes than
expected. This should be fixed to retrigger several reads/writes if we
couldn't read/write enough but for now let's assert.
This is done because the compiler is refusing to inline even after
adding inline hint.
- Get refvalues from directly from registers without using
`make_record`
compare_records_int
compare_records_string
compare_records_generic
comapre_records_generic will still be more efficient than compare-
_immutable because it deserializes the record column by column
Closes#2047
the validation code was assuming that:
- if the parent has overflow cells after inserting a divider cell
- the exact divider we are validating MUST be in those overflow cells
However, this is not necessarily the case. Imagine:
- First divider gets inserted at index `n`. It is too large to fit, so
it gets pushed to `parent.overflow_cells()`. Parent usable space does
not decrease.
- Second divider gets inserted at index `n+1`. It is smaller, so it
still fits in usable space.
Hence:
Provide information to the validation function about whether the
inserted cell overflowed, and use that to find the left pointer and
assert accordingly.
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#2050
the validation code was assuming that:
- if the parent has overflow cells after a inserting divider cell
- the exact divider we are validating MUST be in those overflow cells
However, this is not necessarily the case. Imagine:
- First divider gets inserted at index `n`. It is too large to fit,
so it gets pushed to `parent.overflow_cells()`. Parent usable space
does not decrease.
- Second divider gets inserted at index `n+1`. It is smaller, so it
still fits in usable space.
Hence:
Provide information to the validation function about whether the inserted
cell overflowed, and use that to find the left pointer and assert accordingly.
We can use `right_page_id` directly to perform the validation instead of
carrying a raw pointer around which might be invalidated by the time we
do the validation.