This PR provides Euclidean distance support for limbo's vector search.
At the same time, some type abstractions are introduced, such as
`DistanceCalculator`, etc. This is because I hope to unify the current
vector module in the future to make it more structured, clearer, and
more extensible.
While practicing Euclidean distance for Limbo, I discovered that many
checks could be done using the type system or in advance, rather than
waiting until the distance is calculated. By building these checks into
the type system or doing them ahead of time, this would allow us to
explore more efficient computations, such as automatic vectorization or
SIMD acceleration, which is future work.
Reviewed-by: Nikita Sivukhin (@sivukhin)
Closes#1986
- Apart from regular states Found/NotFound seek result has TryAdvance
value which tells caller to advance the cursor in necessary direction
because the leaf page which would hold the entry if it was present
actually has no matching entry (but neighbouring page can have match)
- `OP_NewRowId` now generates new rowid semi randomly when the largest
rowid in the table is `i64::MAX`.
- Introduced new `LimboError` variant `DatabaseFull` to signify that
database might be full (SQLite behaves this way returning
`SQLITE_FULL`).
Now:
```SQL
turso> CREATE TABLE q(x INTEGER PRIMARY KEY, y);
turso> INSERT INTO q VALUES (9223372036854775807, 1);
turso> INSERT INTO q(y) VALUES (2);
turso> INSERT INTO q(y) VALUES (3);
turso> SELECT * FROM q;
┌─────────────────────┬───┐
│ x │ y │
├─────────────────────┼───┤
│ 1841427626667347484 │ 2 │
├─────────────────────┼───┤
│ 4000338366725695791 │ 3 │
├─────────────────────┼───┤
│ 9223372036854775807 │ 1 │
└─────────────────────┴───┘
```
Fixes: https://github.com/tursodatabase/turso/issues/1977
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#1985
This is done because the compiler is refusing to inline even after
adding inline hint.
- Get refvalues from directly from registers without using
`make_record`
when updating last_insert_rowid we call return_if_io!(cursor.rowid())
which yields IO on large records. this causes op_insert to insert and
overwrite the same row many times. we need a state machine to ensure
that the insertion only happens once and the reading of rowid can
independently yield IO without causing a re-insert.
Separates cursor.key_exists_in_index() into a state machine. The problem with
the main branch implementation is this:
`return_if_io!(seek)`
`return_if_io!(cursor.record())`
The latter may yield on IO and cause the seek to start over, causing an infinite
loop. With an explicit state machine we can control and prevent this.