This PR adds `index_method` trait and implementation of toy sparse
vector index.
In order to make PR more lightweight - for now index methods are not
deeply integrated into the query planner and only necessary components
are added in order to make integration tests which uses `index_method`
API directly to work.
Primary changes introduced in this PR are:
1. `SymbolTable` extended with `index_methods` field and builtin
extensions populated with 2 native indices: `backing_btree` and
`toy_vector_sparse_ivf`
2. `Index` struct extended with `index_method` field which holds
`IndexMethodAttachment` constructed for the table with given parameters
from `IndexMethod` "factory" trait
The toy index implementation store inverted index pairs `(dimension,
rowid)` in the auxilary BTree index. This index uses special
`backing_btree` index_method which marked as `backing_btree: true` and
treated in a special way by the db core: this is real BTree index which
is not managed by the tursodb core and must be managed by index_method
created it (so it responsible for data population, creation, destruction
of this btree).
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#3846
Closes#2470
In a query like `SELECT * FROM t LEFT JOIN s ON t.a=s.a WHERE s.a = 'foo'` we can
remove the LEFT JOIN because NULL values will be equal to 'foo'. In fact, we have
this optimization already.
However, there was a dumb bug where `WhereTerm`s involving this join still retained
their `from_outer_join` state, resulting in forcing the evaluation of those terms
at the original join index, which results in completely wrong bytecode if the join
optimizer decides to reorder the join as `s JOIN t` instead. Effectively it will
evaluate `t.a=s.a` after table `s` is open but table `t` is not open yet.
This PR fixes that issue by clearing `from_outer_join` properly from the relevant
`WhereTerm`s.
- even if index search will return only 1 row - it will call next in the loop - and we incorrecty can process same row values multiple times
- the following query failed with this optimization:
turso> CREATE TABLE t (id INTEGER PRIMARY KEY AUTOINCREMENT, k TEXT, c0 INT);
turso> CREATE UNIQUE INDEX idx_p1_0 ON t(c0);
turso> insert into t values (null, 'uu', -1);
turso> insert into t values (null, 'uu', -2);
turso> UPDATE t SET c0 = NULL WHERE c0 = -1;
turso> SELECT * FROM t
┌────┬────┬────┐
│ id │ k │ c0 │
├────┼────┼────┤
│ 1 │ uu │ │
├────┼────┼────┤
│ 2 │ uu │ │
└────┴────┴────┘
This solves an issue where an INSERT statement conflicts with
multiple indices. In that case, sqlite iterates the linked list
`pTab->pIndex` in order and handles the first conflict encountered.
The newest parsed index is always added to the head of the list.
To be compatible with this behavior, we also need to put the most
recently parsed index definition first in our indexes list for a given
table.
Currently, this is effectively a no-op because, at the optimization
stage, window function expressions are in the form
win_func(subquery_column1, subquery_column2, ...).
Nevertheless, expressions are rewritten to maintain consistency with
aggregates, which also hold cloned expressions from sources like result
columns. This ensures future changes in the optimizer won’t break window
function handling.
indexes with the naming scheme "sqlite_autoindex_<tblname>_<number>"
are automatically created when a table is created with UNIQUE or
PRIMARY KEY definitions.
these indexes must map to the table definition SQL in definition order,
i.e. sqlite_autoindex_foo_1 must be the first instance of UNIQUE or
PRIMARY KEY and so on.
this commit fixes our autoindex creation / parsing so that this invariant
is upheld.
hell yeah
concurrency tests passing now woosh
finally write tests passed
Most of the cdc tests are passing yay
autoincremeent draft
remove shared schema code that broke transactions
sequnce table should reset if table is drop
fmt
fmt
fmt
To be used in DBSP-based projections. This will compile an expression
to VDBE bytecode and execute it.
To do that we need to add a new type of Expression, which we call a
Register.
This is a way for us to pass parameters to a DBSP program which will be
not columns or literals, but inputs from the DBSP deltas.