## Background
PR #2065 fixed a bug with table btree seeks concerning boundaries of
leaf pages.
The issue was that if we were e.g. looking for the first key greater
than (GT) 100, we always assumed the key would either be found on the
left child page of a given divider (e.g. divider 102) or not at all,
which is incorrect. #2065 has more discussion and documentation about
this, so read that one for more context.
## This PR
We already had similar handling for index btrees as #2065 introduced for
table btrees, but it was baked into the `BTreeCursor` struct's seek
handling itself, whereas #2065 handled this on the VDBE side.
This PR unifies this handling for both table and index btrees by always
doing the additional cursor advancement in the VDBE.
Unfortunately, unlike table btrees, index btrees may also need to do an
additional advance when they are looking for an exact match. This
resulted in a bigger refactor than anticipated, since there are quite a
few VDBE instructions that may perform a seek, e.g.: `IdxInsert`,
`IdxDelete`, `Found`, `NotFound`, `NoConflict`. All of these can
potentially end up in a similar situation where the cursor needs one
more advance after the initial seek, and they were currently calling
`cursor.seek()` directly and expecting the `BTreeCursor` to handle the
auto-advance fallback internally.
For this reason, I have 1. removed the "TryAdvance"-ish logic from the
index btree internals and 2. extracted a common VDBE helper `fn
seek_internal()` - heavily based on the existing `op_seek_internal()`,
but decoupled from instructions and the program counter - which all the
interested VDBE instructions will call to delegate their seek logic.
Closes#2083
Reviewed-by: Nikita Sivukhin (@sivukhin)
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#2084
First step toward resolving
https://github.com/tursodatabase/limbo/issues/1643.
### This PR
With this change, the following two queries are considered equivalent:
```sql
SELECT value FROM generate_series(5, 50);
SELECT value FROM generate_series WHERE start = 5 AND stop = 50;
```
Arguments passed in parentheses to the virtual table name are now
matched to hidden columns.
Additionally, I fixed two bugs related to virtual tables.
### TODO (I'll handle this in a separate PR)
Column references are still not supported as table-valued function
arguments. The only difference is that previously, a query like:
```sql
SELECT one.value, series.value
FROM (SELECT 1 AS value) one, generate_series(one.value, 3) series;
```
would cause a panic. Now, it returns a proper error message instead.
Adding support for column references is more nuanced for two main
reasons:
* We need to ensure that in joins where a TVF depends on other tables,
those other tables are processed first. For example, in:
```sql
SELECT one.value, series.value
FROM generate_series(one.value, 3) series, (SELECT 1 AS value) one;
```
the one table must be processed by the top-level loop, and series must
be nested.
* For outer joins involving TVFs, the arguments must be treated as `ON`
predicates, not `WHERE` predicates.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#1727
PR #2065 fixed a bug with table btree seeks concerning boundaries
of leaf pages.
The issue was that if we were e.g. looking for the first key greater than
(GT) 100, we always assumed the key would either be found on the left child
page of a given divider (e.g. divider 102), which is incorrect. #2065 has more
discussion and documentation about this, so read that one for more context.
Anyway:
We already had similar handling for index btrees, but it was baked into
the `BTreeCursor` struct's seek handling itself, whereas #2065 handled this
on the VDBE side.
This PR unifies this handling for both table and index btrees by always doing
the additional cursor advancement in the VDBE.
Unfortunately, since indexes may also need to do an additional advance when they
are looking for an exact match, this resulted in a bigger refactor than anticipated,
since there are quite a few VDBE instructions that may perform a seek, e.g.:
`IdxInsert`, `IdxDelete`, `Found`, `NotFound`, `NoConflict`.
All of these can potentially end up in a similar situation where the cursor needs
one more advance after the initial seek.
For this reason, I have extracted a common VDBE helper `fn seek_internal()` which
all the interested VDBE instructions will call to delegate their seek logic.
This PR addresses https://github.com/tursodatabase/turso/issues/1828 in
a phased manner.
Making database header access async in one PR will be complicated. This
PR ports adds an async API to `header_accessor.rs` and ports over some
of `pager.rs` to use this API.
This will allow gradual porting over of all call sites. Once all call
sites are ported over, one mechanical rename will fix everything in the
repo so we don't have any `<header_name>_async` functions.
Also, porting header accessors over from sync to async would be a good
way to get introduced to the Limbo codebase for first time contributors.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#1966
This PR provides Euclidean distance support for limbo's vector search.
At the same time, some type abstractions are introduced, such as
`DistanceCalculator`, etc. This is because I hope to unify the current
vector module in the future to make it more structured, clearer, and
more extensible.
While practicing Euclidean distance for Limbo, I discovered that many
checks could be done using the type system or in advance, rather than
waiting until the distance is calculated. By building these checks into
the type system or doing them ahead of time, this would allow us to
explore more efficient computations, such as automatic vectorization or
SIMD acceleration, which is future work.
Reviewed-by: Nikita Sivukhin (@sivukhin)
Closes#1986
Current table B-Tree seek code rely on the invariant that if key `K` is
present in interior page then it also must be present in the leaf page.
This is generally not true if data was ever deleted from the table
because leaf row which key was used as a divider in the interior pages
can be deleted. Also, SQLite spec says nothing about such invariant - so
`turso-db` implementation of B-Tree should not rely on it.
This PR introduce 3 options for B-Tree `seek` result: `Found` /
`NotFound` and `TryAdvance` which is generated when leaf page have no
match for `seek_op` but DB don't know if neighbor page can have matching
data.
There is an alternative approach where we can move cursor in the `seek`
itself to the neighbor page - but I was afraid to introduce such changes
because analogue `seek` function from SQLite works exactly like current
version of the code and I think some query planner internals (for
insertion) can rely on the fact that repositioning will leave cursor at
the position of insertion:
> ** If an exact match is not found, then the cursor is always
** left pointing at a leaf page which would hold the entry if it
** were present. The cursor might point to an entry that comes
** before or after the key.
Also, this PR introduces new B-tree fuzz tests which generate table
B-tree from scratch and execute opreations over it. This can help to
reach some non trivial states and also generate huge DBs faster (that's
how this bug was discovered)
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#2065
Let's assert **for now** that we do not read/write less bytes than
expected. This should be fixed to retrigger several reads/writes if we
couldn't read/write enough but for now let's assert.
Closes#2078
- Apart from regular states Found/NotFound seek result has TryAdvance
value which tells caller to advance the cursor in necessary direction
because the leaf page which would hold the entry if it was present
actually has no matching entry (but neighbouring page can have match)
- `OP_NewRowId` now generates new rowid semi randomly when the largest
rowid in the table is `i64::MAX`.
- Introduced new `LimboError` variant `DatabaseFull` to signify that
database might be full (SQLite behaves this way returning
`SQLITE_FULL`).
Now:
```SQL
turso> CREATE TABLE q(x INTEGER PRIMARY KEY, y);
turso> INSERT INTO q VALUES (9223372036854775807, 1);
turso> INSERT INTO q(y) VALUES (2);
turso> INSERT INTO q(y) VALUES (3);
turso> SELECT * FROM q;
┌─────────────────────┬───┐
│ x │ y │
├─────────────────────┼───┤
│ 1841427626667347484 │ 2 │
├─────────────────────┼───┤
│ 4000338366725695791 │ 3 │
├─────────────────────┼───┤
│ 9223372036854775807 │ 1 │
└─────────────────────┴───┘
```
Fixes: https://github.com/tursodatabase/turso/issues/1977
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#1985
Simple PR to check minor issue that `INTEGER PRIMARY KEY NOT NULL` (`NOT
NULL` is redundant here obviously) will prevent user to insert anything
to the table as rowid-alias column always set to null by `turso-db`
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#2063
Let's assert **for now** that we do not read/write less bytes than
expected. This should be fixed to retrigger several reads/writes if we
couldn't read/write enough but for now let's assert.
`i64::MAX`. We do this by attempting to generate random values smaller
than `i64::MAX` for 100 times and returns `DatabaseFull` error on
failure
- Introduced `DatabaseFull` error variant
Fixes: https://github.com/tursodatabase/turso/issues/1977