This PR provides Euclidean distance support for limbo's vector search.
At the same time, some type abstractions are introduced, such as
`DistanceCalculator`, etc. This is because I hope to unify the current
vector module in the future to make it more structured, clearer, and
more extensible.
While practicing Euclidean distance for Limbo, I discovered that many
checks could be done using the type system or in advance, rather than
waiting until the distance is calculated. By building these checks into
the type system or doing them ahead of time, this would allow us to
explore more efficient computations, such as automatic vectorization or
SIMD acceleration, which is future work.
Reviewed-by: Nikita Sivukhin (@sivukhin)
Closes#1986
Simple PR to check minor issue that `INTEGER PRIMARY KEY NOT NULL` (`NOT
NULL` is redundant here obviously) will prevent user to insert anything
to the table as rowid-alias column always set to null by `turso-db`
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#2063
…ct_plan()
For example, if we attempt to do `max(*)`, let's return the error
message from `resolve_function()` to be compatible with SQLite:
```
sqlite> CREATE TABLE test1(f1, f2);
sqlite> SELECT max(*) FROM test1;
Parse error: wrong number of arguments to function max()
SELECT max(*) FROM test1;
^--- error here
```
Spotted by SQLite TCL tests.
Closes#1990
Was running the sim with I/O faults enabled and fixed some nasty bugs.
Now, there are some more nasty bugs to fix as well. This is the command
that I use to run the simulator `cargo run -p limbo_sim -- --minimum-
tests 10 --maximum-tests 1000`
This PR mainly fixes the following bugs:
- Not decrementing in flight write counter when `pwrite` fails
- not rolling back the transaction on `step` error
- not rolling back the transaction on `run_once` error
- some functions were just being unwrapped when they could suffer io
errors
- Only change max_frame after wal sync's
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#1946
For example, if we attempt to do `max(*)`, let's return the error
message from `resolve_function()` to be compatible with SQLite:
```
sqlite> CREATE TABLE test1(f1, f2);
sqlite> SELECT max(*) FROM test1;
Parse error: wrong number of arguments to function max()
SELECT max(*) FROM test1;
^--- error here
```
Spotted by SQLite TCL tests.
This PR add basic CDC functionality to the `turso-db`.
### Feature components
1. `unstable_capture_data_changes_conn` pragma which allow user to turn
on/off CDC logging for **specific connection**
* CDC will have multiple modes, but for now only `off` / `rowid-
only` are supported
* Default CDC table is `turso_cdc` but user can override this with
`PRAGMA` update syntax and use arbitrary table for the CDC needs
* This can be helpful in future if turso will need to break table
format compatibility and custom tables can be a way to migrate between
different schemas
* Update syntax for the pragma accepts one string argument in
format, where only mode is set or custom cdc table name is provided as
second part of the string, separated with comma from the mode
```sql
turso> PRAGMA unstable_capture_data_changes_conn('rowid-only');
turso> PRAGMA unstable_capture_data_changes_conn('off');
turso> PRAGMA unstable_capture_data_changes_conn('rowid-only,custom_cdc_table');
turso> PRAGMA unstable_capture_data_changes_conn;
┌────────────┬──────────────────┐
│ mode │ table │
├────────────┼──────────────────┤
│ rowid-only │ custom_cdc_table │
└────────────┴──────────────────┘
```
2. CDC table schema right now is simple but it will be evolved soon to
support logging of row values before/after the change:
```sql
CREATE TABLE custom_cdc_table (
operation_id INTEGER PRIMARY KEY AUTOINCREMENT,
operation_time INTEGER, -- unixepoch() at the moment of insert, can drift if machine clocks is not monotonic
operation_type INTEGER, -- -1 = delete, 0 = update, 1 = insert
table_name TEXT,
id
)
```
* Note, that `operation_id` is marked as `AUTOINCREMENT` but `turso-
db` needs to implement
https://github.com/tursodatabase/turso/issues/1976 in order to properly
support that keyword
3. Query planner changes are made in `INSERT`/`UPDATE`/`DELETE` plans in
order to emit updates to the CDC table for changes in the table
* Note, that row `UPDATE` which change primary key generate `DELETE` +
`INSERT` statement instead of single `UPDATE`
### Implementation details
- `PRAGMA` to enable CDC is **unstable** which means that publicly
visible side-effects/public API can change in future (and it will change
soon in order to support more rich CDC modes)
- CDC table is just a regular table with its benefits and downsides:
* benefits: user can perform maintenance operations with that table
just with regular SQL like `DELETE FROM turso_cdc WHERE operation_id <
?` to cleanup old not needed CDC entries
* downsides: user can accidentally make unwanted change to CDC table
- Changes to CDC table is not logged to itself
* Note, that different connections (e.g. `C1`, `C2`) can have
different CDC tables set (e.g. `A` and `B`) - in which case changes made
to CDC table `B` through connection `C1` will be reflected in CDC table
`A`
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#1926
We're now mixing different error messages, which makes compatibility
testing pretty hard. Unify on a single, SQLite compatible error message
"no such table".