Currently LIMIT 0 jumps to "after the main loop", and it is done
before ORDER BY and GROUP BY cursor have had a chance to be initialized,
which causes a panic.
Simplest fix for now is to delay the LIMIT initialization.
This PR adds `index_method` trait and implementation of toy sparse
vector index.
In order to make PR more lightweight - for now index methods are not
deeply integrated into the query planner and only necessary components
are added in order to make integration tests which uses `index_method`
API directly to work.
Primary changes introduced in this PR are:
1. `SymbolTable` extended with `index_methods` field and builtin
extensions populated with 2 native indices: `backing_btree` and
`toy_vector_sparse_ivf`
2. `Index` struct extended with `index_method` field which holds
`IndexMethodAttachment` constructed for the table with given parameters
from `IndexMethod` "factory" trait
The toy index implementation store inverted index pairs `(dimension,
rowid)` in the auxilary BTree index. This index uses special
`backing_btree` index_method which marked as `backing_btree: true` and
treated in a special way by the db core: this is real BTree index which
is not managed by the tursodb core and must be managed by index_method
created it (so it responsible for data population, creation, destruction
of this btree).
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#3846
Closes#1282
# Support for WHERE clause subqueries
This PR implements support for subqueries that appear in the WHERE
clause of SELECT statements.
## What are those lol
1. **EXISTS subqueries**: `WHERE EXISTS (SELECT ...)`
2. **Row value subqueries**: `WHERE x = (SELECT ...)` or `WHERE (x, y) =
(SELECT ...)`. The latter are not yet supported - only the single-column
("scalar subquery") case is.
3. **IN subqueries**: `WHERE x IN (SELECT ...)` or `WHERE (x, y) IN
(SELECT ...)`
## Correlated vs Uncorrelated Subqueries
- **Uncorrelated subqueries** reference only their own tables and can be
evaluated once.
- **Correlated subqueries** reference columns from the outer query
(e.g., `WHERE EXISTS (SELECT * FROM t2 WHERE t2.id = t1.id)`) and must
be re-evaluated for each row of the outer query
## Implementation
### Planning
During query planning, the WHERE clause is walked to find subquery
expressions (`Expr::Exists`, `Expr::Subquery`, `Expr::InSelect`). Each
subquery is:
1. Assigned a unique internal ID
2. Compiled into its own `SelectPlan` with outer query tables provided
as available references
3. Replaced in the AST with an `Expr::SubqueryResult` node that
references the subquery with its internal ID
4. Stored in a `Vec<NonFromClauseSubquery>` on the `SelectPlan`
For IN subqueries, an ephemeral index is created to store the subquery
results; for other kinds, the results are stored in register(s).
### Translation
Before emitting bytecode, we need to determine when each subquery should
be evaluated:
- **Uncorrelated**: Evaluated once before opening any table cursors
- **Correlated**: Evaluated at the appropriate nested loop depth after
all referenced outer tables are in scope
This is calculated by examining which outer query tables the subquery
references and finding the right-most (innermost) loop that opens those
tables - using similar mechanisms that we use for figuring out when to
evaluate other `WhereTerm`s too.
### Code Generation
- **EXISTS**: Sets a register to 1 if any row is produced, 0 otherwise.
Has new `QueryDestination::ExistsSubqueryResult` variant.
- **IN**: Results stored in an ephemeral index and the index is probed.
- **RowValue**: Results stored in a range of registers. Has new
`QueryDestination::RowValueSubqueryResult` variant.
## Annoying details
### Which cursor to read from in a subquery?
Sometimes a query will use a covering index, i.e. skip opening the table
cursor at all if the index contains All The Needed Stuff.
Correlated subqueries reading columns from outer tables is a bit
problematic in this regard: with our current translation code, the
subquery doesn't know whether the outer query opened a table cursor,
index cursor, or both. So, for now, we try to find a table cursor first,
then fall back to finding any index cursor for that table.
Reviewed-by: Preston Thorpe <preston@turso.tech>
Closes#3847
Add support for index method syntax extension (similar to postgresql)
and hide it for now behind `--experimental-index-method` flag
```sh
$> cargo run --package turso_cli -- --experimental-index-method
turso> CREATE TABLE t(x);
turso> CREATE INDEX t_idx ON t USING index_method (x) WITH (a = 1, b = '2');
turso> SELECT * FROM sqlite_master;
┌───────┬───────┬──────────┬──────────┬───────────────────────────────────────────────────────────────────────┐
│ type │ name │ tbl_name │ rootpage │ sql │
├───────┼───────┼──────────┼──────────┼───────────────────────────────────────────────────────────────────────┤
│ table │ t │ t │ 2 │ CREATE TABLE t (x) │
├───────┼───────┼──────────┼──────────┼───────────────────────────────────────────────────────────────────────┤
│ index │ t_idx │ t │ 3 │ CREATE INDEX t_idx ON t USING index_method (x) WITH (a = 1, b = '2') │
└───────┴───────┴──────────┴──────────┴───────────────────────────────────────────────────────────────────────┘
```
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Reviewed-by: Preston Thorpe <preston@turso.tech>
Closes#3842
Unfortunately, our current translation machinery is unable to know for sure
whether a subquery reference to an outer table 't1' has opened a table cursor,
an index cursor, or both.
For this reason, return a flag from `TableReferences::find_table_by_internal_id()`
that tells the caller whether the table is an outer query reference, and further
commits will have some additional logic to decide which cursor a subquery will
read from when referencing a table from the outer query.
WHERE clause subqueries use the BeginSubrtn instruction.
The corresponding closing instruction for BeginSubrtn is Return,
but Return is also used for other purposes, so we need to track pairs of
BeginSubrtn and Return that share the same 1st parameter (the subroutine register),
so that the EXPLAIN output for those subroutine contents is indented properly.
This optimization reuses an existing cursor when op_open_write() is
called on the same table/index (same root_page). This is safe because
the cursor position doesn't matter - op_rewind() is always called after
op_open_write() to position the cursor at the beginning of the
table/index before any operations are performed.
This change speeds up op_open_write() by avoiding unnecessary cursor re-
initialization.
Closes#3815
Trying to return integer sometimes to match SQLite led to more problems
that I anticipated. The reason being, we can't *really* match SQLite's
behavior unless we know the type of *every* element in the sum. This is
not impossible, but it is very hard, for very little gain.
Fixes#3831Closes#3832