Commit Graph

912 Commits

Author SHA1 Message Date
Pekka Enberg
1653b0883a Merge 'core/vector: Euclidean distance support for vector search' from KarinaMilet
This PR provides Euclidean distance support for limbo's vector search.
At the same time, some type abstractions are introduced, such as
`DistanceCalculator`, etc. This is because I hope to unify the current
vector module in the future to make it more structured, clearer, and
more extensible.
While practicing Euclidean distance for Limbo, I discovered that many
checks could be done using the type system or in advance, rather than
waiting until the distance is calculated. By building these checks into
the type system or doing them ahead of time, this would allow us to
explore more efficient computations, such as automatic vectorization or
SIMD acceleration, which is future work.

Reviewed-by: Nikita Sivukhin (@sivukhin)

Closes #1986
2025-07-14 13:07:20 +03:00
Nikita Sivukhin
413d93f041 fix after rebase 2025-07-14 13:05:20 +04:00
Nikita Sivukhin
5bd3287826 add comments 2025-07-14 13:01:15 +04:00
Nikita Sivukhin
aceaf182b1 remove comment 2025-07-14 13:01:15 +04:00
Nikita Sivukhin
f9cd5fad4c add small comment 2025-07-14 13:01:15 +04:00
Nikita Sivukhin
fc400906d5 handle case when target seek page has no matching entries 2025-07-14 13:01:15 +04:00
Nikita Sivukhin
03b2725cc7 return SeekResult from seek operation
- Apart from regular states Found/NotFound seek result has TryAdvance
  value which tells caller to advance the cursor in necessary direction
  because the leaf page which would hold the entry if it was present
  actually has no matching entry (but neighbouring page can have match)
2025-07-14 13:01:15 +04:00
Nikita Sivukhin
77bf6c287d introduce proper state machine for seek op code 2025-07-14 13:01:14 +04:00
Pekka Enberg
9285d8b83b Merge 'Fix: OP_NewRowId to generate semi random rowid when largest rowid is i64::MAX' from Krishna Vishal
- `OP_NewRowId` now generates new rowid semi randomly when the largest
rowid in the table is `i64::MAX`.
- Introduced new `LimboError` variant `DatabaseFull` to signify that
database might be full (SQLite behaves this way returning
`SQLITE_FULL`).
Now:
```SQL
turso> CREATE TABLE q(x INTEGER PRIMARY KEY, y);
turso> INSERT INTO q VALUES (9223372036854775807, 1);
turso> INSERT INTO q(y) VALUES (2);
turso> INSERT INTO q(y) VALUES (3);
turso> SELECT * FROM q;
┌─────────────────────┬───┐
│ x                   │ y │
├─────────────────────┼───┤
│ 1841427626667347484 │ 2 │
├─────────────────────┼───┤
│ 4000338366725695791 │ 3 │
├─────────────────────┼───┤
│ 9223372036854775807 │ 1 │
└─────────────────────┴───┘
```
Fixes: https://github.com/tursodatabase/turso/issues/1977

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1985
2025-07-14 11:56:09 +03:00
Krishna Vishal
12f9743443 Remove unused imports 2025-07-14 13:13:54 +05:30
Krishna Vishal
ab0cb06755 split seek and getting rowid as two separate states 2025-07-14 13:11:41 +05:30
Krishna Vishal
3e880c34d6 Make op_new_rowid re-entrant
Introduce `OpNewRowidState` state machine

remove `get_new_rowid` from vdbe/mod.rs
2025-07-14 13:11:40 +05:30
Krishna Vishal
98ca275b33 Add a way to semi randomly generate rowid when the max rowid reaches
`i64::MAX`. We do this by attempting to generate random values smaller
than `i64::MAX` for 100 times and returns `DatabaseFull` error on
failure

- Introduced `DatabaseFull` error variant

Fixes: https://github.com/tursodatabase/turso/issues/1977
2025-07-14 13:09:34 +05:30
Nikita Sivukhin
b330c6b70e fix clippy 2025-07-14 11:38:08 +04:00
Nikita Sivukhin
cc04f11bd6 remove clone 2025-07-14 11:27:51 +04:00
Nikita Sivukhin
f61d733dd3 make new functions dependend on "json" Cargo feature 2025-07-14 11:26:51 +04:00
Nikita Sivukhin
c9e7271eaf properly pass subtype 2025-07-14 11:20:49 +04:00
Nikita Sivukhin
bf25a0e3f1 fix clippy 2025-07-14 11:20:16 +04:00
Nikita Sivukhin
81cd04dd65 add bin_record_json_object and table_columns_json_array functions 2025-07-14 11:19:45 +04:00
Krishna Vishal
370d437491 Add docs for get_tie_breaker_from_idx_comp_op 2025-07-14 03:28:55 +05:30
Krishna Vishal
4c5383b0b3 chore: clippy 2025-07-14 03:28:55 +05:30
Krishna Vishal
e27b9c7e0f Address review comments 2025-07-14 03:28:55 +05:30
Krishna Vishal
a79fe458db Fix merge conflicts and adapt schema.rs to use RecordCursor 2025-07-14 03:28:55 +05:30
Krishna Vishal
d3368a28bc fix merge conflicts 2025-07-14 03:28:55 +05:30
Krishna Vishal
d78185bd62 Remove mistakenly pushed file 2025-07-14 03:28:54 +05:30
Krishna Vishal
9de3cf0c60 Remove redundant checks 2025-07-14 03:28:54 +05:30
Krishna Vishal
235e798561 Return corrupt errors. 2025-07-14 03:28:54 +05:30
Krishna Vishal
e7e5f28c0a chore: Clippy chill 2025-07-14 03:28:54 +05:30
Krishna Vishal
f3b169bf30 Fix empty blob test failure. 2025-07-14 03:28:54 +05:30
Krishna Vishal
9b315d1d7e Manually inline the record deserialization code for performance.
This is done because the compiler is refusing to inline even after
adding inline hint.
- Get refvalues from directly from registers without using
`make_record`
2025-07-14 03:28:54 +05:30
Krishna Vishal
f0e8e5871b Replace compare_immutable with compare_records_generic 2025-07-14 03:28:54 +05:30
Krishna Vishal
860de412d9 Add num_columns to BTreeCursor so we can initialize
`Vecs` inside `RecordCursor` to their appropriate to reduce
allocations.
2025-07-14 03:28:54 +05:30
Krishna Vishal
ef147181c2 Working version of the incremental column and "optimal" record
compare functions. Now we optimize them
2025-07-14 03:28:54 +05:30
Krishna Vishal
692f0413eb Stash 2025-07-14 03:28:54 +05:30
Krishna Vishal
515712b7f2 Fix sorter 2025-07-14 03:28:54 +05:30
Krishna Vishal
601540af6e Make OP_column do on demand serialization baby! 2025-07-14 03:28:54 +05:30
Jussi Saurio
a48b6d049a Another post-rebase clippy round with 1.88.0 2025-07-12 19:10:56 +03:00
Nils Koch
828d4f5016 fix clippy errors for rust 1.88.0 (auto fix) 2025-07-12 18:58:41 +03:00
Jussi Saurio
b015fabb26 vdbe: fix panic when first value added to min()/max() accumulator is null 2025-07-10 21:02:57 +03:00
Jussi Saurio
63c5698050 vdbe: remove error prints from min()/max() and simplify 2025-07-10 21:02:57 +03:00
KaguraMilet
9d6ae78786 Merge branch 'tursodatabase:main' into distance 2025-07-10 19:15:08 +08:00
Jussi Saurio
c9a6c289e0 clippy 2025-07-09 14:26:43 +03:00
Jussi Saurio
38650eee0e VDBE: fix op_insert re-entrancy
when updating last_insert_rowid we call return_if_io!(cursor.rowid())
which yields IO on large records. this causes op_insert to insert and
overwrite the same row many times. we need a state machine to ensure
that the insertion only happens once and the reading of rowid can
independently yield IO without causing a re-insert.
2025-07-09 14:26:40 +03:00
Jussi Saurio
c752058a97 VDBE: introduce state machine for op_idx_insert for more granular IO control
Separates cursor.key_exists_in_index() into a state machine. The problem with
the main branch implementation is this:

`return_if_io!(seek)`
`return_if_io!(cursor.record())`

The latter may yield on IO and cause the seek to start over, causing an infinite
loop. With an explicit state machine we can control and prevent this.
2025-07-09 11:43:18 +03:00
meteorgan
3065416bb2 cargo fmt 2025-07-08 22:57:20 +08:00
meteorgan
08be906bb1 return early if index is not found in op_idx_delete 2025-07-08 22:57:20 +08:00
meteorgan
f44d818400 cargo fmt 2025-07-08 22:57:20 +08:00
meteorgan
c6ef4898b0 fix: IdxDelete shouldn't raise error if P5 == 0 2025-07-08 22:57:20 +08:00
Pere Diaz Bou
232beddf62 vdbe: fix compilation 2025-07-08 16:15:29 +02:00
Pere Diaz Bou
8909e198ae set closed flag for connection to detect force zombies
Let's make sure we don't keep using a connection after it was dropped.
In case of executing a query that was closed we will try to rollback and
return early.
2025-07-08 15:19:20 +02:00