Commit Graph

10578 Commits

Author SHA1 Message Date
Nikita Sivukhin
212bcfe08f integrate IndexMethod into select main loop 2025-10-28 11:27:35 +04:00
Nikita Sivukhin
61c9279a57 properly translate column which was covered by index method 2025-10-28 11:27:35 +04:00
Nikita Sivukhin
d9ea3be4b8 forbid usage of IndexMethod in insert/delete loops 2025-10-28 11:27:35 +04:00
Nikita Sivukhin
d65b7eddc0 add helper for simple binding of values in the AST 2025-10-28 11:27:35 +04:00
Nikita Sivukhin
35b96ae8d8 fix few places which needs to be hooked into new types 2025-10-28 11:27:35 +04:00
Nikita Sivukhin
8dd2644c07 add support for new cursor type in existing op codes and also implement new opcodes in the VM 2025-10-28 11:27:35 +04:00
Nikita Sivukhin
e9b1ca12b6 add new access operation through IndexMethod 2025-10-28 11:27:35 +04:00
Nikita Sivukhin
37de39e5d1 integrate IndexMethod to the insert/delete flow 2025-10-28 11:27:35 +04:00
Nikita Sivukhin
b994e2cbd8 add new Cursor type 2025-10-28 11:27:35 +04:00
Nikita Sivukhin
5af10e6ccb add IndexMethod specific VM instructions 2025-10-28 11:27:35 +04:00
Jussi Saurio
f593fd1a8d remove deprecated flag from TempDatabase::new_empty() usage in fuzz test 2025-10-28 09:10:05 +02:00
Jussi Saurio
dae2441dd1 Fix compilation error after incompatible merges 2025-10-28 07:05:18 +02:00
Jussi Saurio
d993ac8157 Merge 'index_method: implement basic trait and simple toy index' from Nikita Sivukhin
This PR adds `index_method` trait and implementation of toy sparse
vector index.
In order to make PR more lightweight - for now index methods are not
deeply integrated into the query planner and only necessary components
are added in order to make integration tests which uses `index_method`
API directly to work.
Primary changes introduced in this PR are:
1. `SymbolTable` extended with `index_methods` field and builtin
extensions populated with 2 native indices: `backing_btree` and
`toy_vector_sparse_ivf`
2. `Index` struct extended with `index_method` field which holds
`IndexMethodAttachment` constructed for the table with given parameters
from `IndexMethod` "factory" trait
The toy index implementation store inverted index pairs `(dimension,
rowid)` in the auxilary BTree index. This index uses special
`backing_btree` index_method which marked as `backing_btree: true` and
treated in a special way by the db core: this is real BTree index which
is not managed by the tursodb core and must be managed by index_method
created it (so it responsible for data population, creation, destruction
of this btree).

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3846
2025-10-28 07:01:36 +02:00
Jussi Saurio
9c87b20cb2 Merge 'Where clause subquery support' from Jussi Saurio
Closes #1282
# Support for WHERE clause subqueries
This PR implements support for subqueries that appear in the WHERE
clause of SELECT statements.
## What are those lol
1. **EXISTS subqueries**: `WHERE EXISTS (SELECT ...)`
2. **Row value subqueries**: `WHERE x = (SELECT ...)` or `WHERE (x, y) =
(SELECT ...)`. The latter are not yet supported - only the single-column
("scalar subquery") case is.
3. **IN subqueries**: `WHERE x IN (SELECT ...)` or `WHERE (x, y) IN
(SELECT ...)`
## Correlated vs Uncorrelated Subqueries
- **Uncorrelated subqueries** reference only their own tables and can be
evaluated once.
- **Correlated subqueries** reference columns from the outer query
(e.g., `WHERE EXISTS (SELECT * FROM t2 WHERE t2.id = t1.id)`) and must
be re-evaluated for each row of the outer query
## Implementation
### Planning
During query planning, the WHERE clause is walked to find subquery
expressions (`Expr::Exists`, `Expr::Subquery`, `Expr::InSelect`). Each
subquery is:
1. Assigned a unique internal ID
2. Compiled into its own `SelectPlan` with outer query tables provided
as available references
3. Replaced in the AST with an `Expr::SubqueryResult` node that
references the subquery with its internal ID
4. Stored in a `Vec<NonFromClauseSubquery>` on the `SelectPlan`
For IN subqueries, an ephemeral index is created to store the subquery
results; for other kinds, the results are stored in register(s).
### Translation
Before emitting bytecode, we need to determine when each subquery should
be evaluated:
- **Uncorrelated**: Evaluated once before opening any table cursors
- **Correlated**: Evaluated at the appropriate nested loop depth after
all referenced outer tables are in scope
This is calculated by examining which outer query tables the subquery
references and finding the right-most (innermost) loop that opens those
tables - using similar mechanisms that we use for figuring out when to
evaluate other `WhereTerm`s too.
### Code Generation
- **EXISTS**: Sets a register to 1 if any row is produced, 0 otherwise.
Has new `QueryDestination::ExistsSubqueryResult` variant.
- **IN**: Results stored in an ephemeral index and the index is probed.
- **RowValue**: Results stored in a range of registers. Has new
`QueryDestination::RowValueSubqueryResult` variant.
## Annoying details
### Which cursor to read from in a subquery?
Sometimes a query will use a covering index, i.e. skip opening the table
cursor at all if the index contains All The Needed Stuff.
Correlated subqueries reading columns from outer tables is a bit
problematic in this regard: with our current translation code, the
subquery doesn't know whether the outer query opened a table cursor,
index cursor, or both. So, for now, we try to find a table cursor first,
then fall back to finding any index cursor for that table.

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3847
2025-10-28 06:36:55 +02:00
Preston Thorpe
ccaf39de93 Merge 'index method syntax extension' from Nikita Sivukhin
Add support for index method syntax extension (similar to postgresql)
and hide it for now behind `--experimental-index-method` flag
```sh
$> cargo run --package turso_cli -- --experimental-index-method
turso> CREATE TABLE t(x);
turso> CREATE INDEX t_idx ON t USING index_method (x) WITH (a = 1, b = '2');
turso> SELECT * FROM sqlite_master;
┌───────┬───────┬──────────┬──────────┬───────────────────────────────────────────────────────────────────────┐
│ type  │ name  │ tbl_name │ rootpage │ sql                                                                   │
├───────┼───────┼──────────┼──────────┼───────────────────────────────────────────────────────────────────────┤
│ table │ t     │ t        │        2 │ CREATE TABLE t (x)                                                    │
├───────┼───────┼──────────┼──────────┼───────────────────────────────────────────────────────────────────────┤
│ index │ t_idx │ t        │        3 │ CREATE INDEX t_idx ON t USING index_method (x) WITH (a = 1, b = '2') │
└───────┴───────┴──────────┴──────────┴───────────────────────────────────────────────────────────────────────┘
```

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3842
2025-10-27 14:03:22 -04:00
Jussi Saurio
a0d6fcba23 Unignore those TPC-H tests that can be ignored 2025-10-27 16:23:38 +02:00
Jussi Saurio
0b08f006d3 Add subquery fuzz test 2025-10-27 16:23:38 +02:00
Jussi Saurio
82995b4264 Add subquery TCL tests 2025-10-27 16:10:49 +02:00
Jussi Saurio
f288dfd3d0 TableMask: take tables referenced in subqueries into account
This influences valid potential join orders.
2025-10-27 16:10:49 +02:00
Jussi Saurio
59363a1be3 Translate Expr::SubqueryResult into bytecode 2025-10-27 16:01:39 +02:00
Jussi Saurio
bc2a7c79f9 Add TODO comment about subquery positions we don't support yet 2025-10-27 16:01:39 +02:00
Jussi Saurio
8fecd82311 Emit non from clause subqueries in translation 2025-10-27 16:01:39 +02:00
Jussi Saurio
bf66999f64 Add emit_non_from_clause_subquery() method 2025-10-27 16:01:39 +02:00
Jussi Saurio
8e1987bd5d Rename emit_subqueries() to emit_from_clause_subqueries() to disambiguate 2025-10-27 16:01:39 +02:00
Jussi Saurio
58caf32fe2 Add plan_subqueries_from_where_clause() method and use it in Select planning 2025-10-27 16:01:39 +02:00
Jussi Saurio
c54988192e Add SelectPlan::is_correlated() method 2025-10-27 16:01:39 +02:00
Jussi Saurio
9b62687c41 Change unwrap_parens() to return Parenthesized as is, if it contains multiple values 2025-10-27 16:01:39 +02:00
Jussi Saurio
580333ddd3 Add NonFromClauseSubquery struct and add a Vec of them to SelectPlan 2025-10-27 16:01:39 +02:00
Jussi Saurio
609d9957c1 Add new QueryDestination variants for subquery types 2025-10-27 16:01:39 +02:00
Jussi Saurio
5bd6e033e6 Rename emit_subquery() to emit_from_clause_subquery() to disambiguate 2025-10-27 16:01:39 +02:00
Jussi Saurio
5eb74ce8e6 AST: Add Expr::SubqueryResult variant and enum SubqueryType 2025-10-27 16:01:39 +02:00
Nikita Sivukhin
e7f6b3cd4c slightly adjust test 2025-10-27 17:00:56 +04:00
Nikita Sivukhin
05f0ee6a72 add more integration in order to properly skip backing_btree index_method 2025-10-27 17:00:26 +04:00
Nikita Sivukhin
bdbfac20fb resolve index method parameters 2025-10-27 16:39:22 +04:00
Nikita Sivukhin
a151770cea add minimal support of index_methods in the query planner in order to make integration tests work 2025-10-27 16:34:49 +04:00
Nikita Sivukhin
97dcc0869e register index_methods as db builtin extensions 2025-10-27 16:31:31 +04:00
Nikita Sivukhin
cb11417883 add index_method trait and implement simple inverted index for sparse vectors 2025-10-27 16:22:52 +04:00
Nikita Sivukhin
5d81f8db13 add simple test for index_method API 2025-10-27 16:15:50 +04:00
Jussi Saurio
e7aa7ee2ff ProgramBuilder: add a few utility methods needed for correlated subqueries 2025-10-27 14:03:41 +02:00
Jussi Saurio
5c05383cc1 Implement union for ColumnUsedMask 2025-10-27 13:57:56 +02:00
Jussi Saurio
3a1d6d8879 Improve error messages in translate_expr()
The current error messages are misleading, as the user may encounter
these errors in expressions outside the WHERE clause, too.
2025-10-27 13:51:59 +02:00
Jussi Saurio
de81af29e5 find_table_by_internal_id() returns whether table is an outer query reference
Unfortunately, our current translation machinery is unable to know for sure
whether a subquery reference to an outer table 't1' has opened a table cursor,
an index cursor, or both.

For this reason, return a flag from `TableReferences::find_table_by_internal_id()`
that tells the caller whether the table is an outer query reference, and further
commits will have some additional logic to decide which cursor a subquery will
read from when referencing a table from the outer query.
2025-10-27 13:47:49 +02:00
Jussi Saurio
c0c425b5d6 EXPLAIN: indent BeginSubrtn...Return blocks properly
WHERE clause subqueries use the BeginSubrtn instruction.

The corresponding closing instruction for BeginSubrtn is Return,
but Return is also used for other purposes, so we need to track pairs of
BeginSubrtn and Return that share the same 1st parameter (the subroutine register),
so that the EXPLAIN output for those subroutine contents is indented properly.
2025-10-27 13:42:00 +02:00
Nikita Sivukhin
22fe9452ac remove unnecessary parameter from integration tests 2025-10-27 15:16:12 +04:00
Nikita Sivukhin
8a80e8b743 rename custom modules to index_method like in postgresql 2025-10-27 13:18:18 +04:00
Nikita Sivukhin
408ca235d1 small refactoring 2025-10-27 12:43:38 +04:00
Nikita Sivukhin
299533b7b6 hide custom modules syntax behind --experimental-custom-modules flag 2025-10-27 12:29:05 +04:00
Nikita Sivukhin
67e62fd6ea support USING ... WITH ... syntax for index creation 2025-10-27 12:13:43 +04:00
Nikita Sivukhin
f178daa373 update comment 2025-10-27 11:47:25 +04:00
Nikita Sivukhin
906bbdd1c4 support deep nestedness 2025-10-27 11:37:42 +04:00