Commit Graph

5653 Commits

Author SHA1 Message Date
Jussi Saurio
7725f336b8 Merge 'Fix incorrectly using an equality constraint twice for index seek' from Jussi Saurio
Prevents something like `WHERE x = 5 AND x = 5` from becoming a two
component index key.
Closes #3656

Reviewed-by: Nikita Sivukhin (@sivukhin)

Closes #3658
2025-10-10 13:33:43 +03:00
Jussi Saurio
74e04634aa Fix incorrectly using an equality constraint twice for index seek
Prevents something like `WHERE x = 5 AND x = 5` from becoming a two
component index key.

Closes #3656
2025-10-10 13:19:19 +03:00
Pekka Enberg
e727b8e0dc Merge 'Vector improvements' from Nikita Sivukhin
This PR introduces sparse vectors support and jaccard distance
implementation.
Also, this PR restructure the code to have all vector operations in
separate files (they grow pretty quickly as new vector representations
added to the DB).

Closes #3647
2025-10-10 13:08:46 +03:00
Pekka Enberg
77924c6c71 Merge 'Optimize sorter' from Jussi Saurio
Various little fixes to `Sorter` that reduce unnecessary work.
Makes TPC-H query 1 roughly 2x faster, which is a lot because it
originally took 30-40 seconds depending on the CI run

Closes #3645
2025-10-10 13:06:53 +03:00
Pekka Enberg
cf22819817 Merge 'Make sqlite_version() compatible with SQLite' from Glauber Costa
I found an application in the open that expects sqlite_version() to
return a specific string (higher than 3.8...).
We had tons of those issues at Scylla, and the lesson was that you tell
your kids not to lie, but when life hits, well... you lie.
We'll add a new function, turso_version, that tells the truth.

Closes #3635
2025-10-10 13:06:36 +03:00
Nikita Sivukhin
51122d3e9c fix clippy 2025-10-10 11:39:06 +04:00
Nikita Sivukhin
7e727d07af fix bugs add tests 2025-10-09 23:23:16 +04:00
Nikita Sivukhin
10c51c8da0 add test for convert operation 2025-10-09 22:14:38 +04:00
Nikita Sivukhin
e18f26a1f1 fix bug after refactoring 2025-10-09 21:28:46 +04:00
Nikita Sivukhin
ac9a25a417 fix clippy 2025-10-09 21:19:35 +04:00
Nikita Sivukhin
5336801574 add jaccard distance 2025-10-09 21:15:39 +04:00
Nikita Sivukhin
585d11b736 implement operations for sparse vectors 2025-10-09 20:52:58 +04:00
Jussi Saurio
edf40cc65b clippy 2025-10-09 19:00:40 +03:00
Jussi Saurio
27a88b86dc Reuse a single RecordCursor per PseudoCursor 2025-10-09 18:56:49 +03:00
Jussi Saurio
812709cf8e inline collation comparison functions 2025-10-09 18:56:49 +03:00
Nikita Sivukhin
84643dc4f2 implement sparse vector operations 2025-10-09 19:19:33 +04:00
Diego Reis
d2d265a06f Small nits and code clean ups 2025-10-09 12:14:20 -03:00
Diego Reis
b8f8a87007 Refactor bytecode emission
- we were redundantly translating tmp
- Make emit_constant_insn a method of ProgramBuilder
2025-10-09 11:57:16 -03:00
Diego Reis
84e8d11764 Fix bug when jump_if_true is enabled 2025-10-09 11:57:16 -03:00
Diego Reis
625403cc2a Fix register reuse when called inside a coroutine
- On each interaction we assume that the value is NULL, so we need to
  set it like so for every interaction in the list. So we force to not
  emit this NULL as constant;
- Forces a copy so IN expressions works inside an aggregation
  expression. Not ideal but it works, we should work more on the query
  planner for sure.
2025-10-09 11:57:16 -03:00
Diego Reis
da323fa0c4 Some clean ups and correctly working on WHERE clauses 2025-10-09 11:57:15 -03:00
Diego Reis
79958f468d Add jump_target_null to ConditionMetadata
It's kinda make sense, conditions can be evaluated into 3 values: false,
true and null. Now we handle that.
2025-10-09 11:56:14 -03:00
Diego Reis
52ed0f7997 Add in expr optimization at the parser level instead of translation.
lhs IN () and lhs NOT IN () can be translated to false and true.
2025-10-09 11:56:14 -03:00
Diego Reis
70fc509046 First step to fix 3277
This follows almost step by step sqlite's functions, and indeed it's
correct. But still have to translate some of this logic to our current
semantics
2025-10-09 11:56:14 -03:00
Jussi Saurio
0356a7102c remove another expensive assert 2025-10-09 17:50:15 +03:00
Jussi Saurio
a1a83c689b Don't yield if completion already succeeded 2025-10-09 17:50:06 +03:00
Jussi Saurio
1c35d5b342 avoid expensive Arc cloning 2025-10-09 17:43:28 +03:00
Jussi Saurio
1f310a4738 Remove expensive hot path assert 2025-10-09 17:29:18 +03:00
Glauber Costa
f4116eb3d4 lie about sqlite version
I found an application in the open that expects sqlite_version() to
return a specific string (higher than 3.8...).

We had tons of those issues at Scylla, and the lesson was that you
tell your kids not to lie, but when life hits, well... you lie.

We'll add a new function, turso_version, that tells the truth.
2025-10-09 07:19:35 -07:00
Nikita Sivukhin
68632cc142 rename euclidian to L2 for consistency 2025-10-09 17:26:36 +04:00
Nikita Sivukhin
1ebf2b7c8d add f32 sparse vector type 2025-10-09 17:25:40 +04:00
Nikita Sivukhin
9e68fa7f4a simplify vector_slice operation 2025-10-09 17:11:13 +04:00
Nikita Sivukhin
d7f3a450ad return Nan for cosine distance instead of error
- errors are hard to handle in case of some scan operations (something went wrong in the middle - whoe query aborted)
- it will be more flexibly if we will return NaN and let user handle situation
2025-10-09 17:06:49 +04:00
Nikita Sivukhin
14e104f830 add convert operation 2025-10-09 16:56:36 +04:00
Nikita Sivukhin
8584ee18a3 refactor parsing/deserialization 2025-10-09 16:36:39 +04:00
Jussi Saurio
bcca404551 Avoid string allocation in sorter record comparison 2025-10-09 15:34:27 +03:00
Nikita Sivukhin
a2f4376bd2 move more operations to the operations/ folder 2025-10-09 16:18:53 +04:00
Nikita Sivukhin
7e9e102f20 move vector operations under operations/ folder 2025-10-09 16:02:03 +04:00
Jussi Saurio
e0461dd78a Sorter: compute values upfront instead of deserializing on every comparison 2025-10-09 15:01:47 +03:00
Jussi Saurio
7948259d37 Merge 'optimizer: optimize range scans to use upper and lower bounds more efficiently' from Jussi Saurio
Made a new PR based on @sivukhin 's PR #2869 that had a lot of
conflicts. You can check out the PR description from there.
## The main idea is:
Before, if we had an index on `x` and had a query like `WHERE x > 100
and x < 200`, the plan would be something like:
```
- Seek to first row where x > 100
- Then, for every row, discard the row if x >= 200
```
This is highly wasteful in cases where there are a lot of rows where `x
>= 200`. Since our index is sorted on `x`, we know that once we hit the
_first_ row where `x >= 200`, we can stop iterating entirely.
So, the new plan is:
```
- Seek to first row where x > 100
- Then, iterate rows until x >= 200, and then stop
```
This also improves the situation for multi-column indexes. Imagine index
on `(x,y)` and a condition like `WHERE x = 100 and y > 100 and y < 200`.
Before, the plan was:
```
- Seek to first row where x=100 and y > 100
- Then, iterate rows while x = 100 and discard the row if y >= 200
- Stop when x > 100
```
This also suffers from a problem where if there are a lot of rows where
`x=100` and `y >= 200`, we go through those rows unnecessarily. The new
plan is:
```
- Seek to first row where x=100 and y > 100
- Then, iterate rows while x = 100 and y < 200
- Stop when either x > 100 or y >= 200
```
Which prevents us from iterating rows like `x=100, y = 666`
unnecessarily because we know the index is sorted on `(x,y)` - once we
hit any row where `x>100` OR `x=100, y >= 200`, we can stop.

Closes #3644
2025-10-09 14:47:15 +03:00
Jussi Saurio
e726803ab4 Merge 'translate: make bind_and_rewrite_expr() reject unbound identifiers if no referenced tables exist' from Jussi Saurio
Before, we just skipped evaluating `Id`, `Qualified` and
`DoublyQualified` if `referenced_tables` was `None`, leading to shit
like #3621. Let's eagerly return `"No such column"` parse errors in
these cases instead, and punch exceptions for cases where that doesn't
cleanly work
Top tip: use `Hide whitespace` toggle when inspecting the diff of this
PR
Closes #3621

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3626
2025-10-09 12:45:16 +03:00
Jussi Saurio
ab88e7c206 Merge 'don't allow duplicate col names in create table' from Pavan Nambi
closes https://github.com/tursodatabase/turso/issues/3637

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3641
2025-10-09 12:44:28 +03:00
Nikita Sivukhin
5b6e8e4b84 Float32/Float64 -> Float32Dense/Float64Dense 2025-10-09 13:28:40 +04:00
Nikita Sivukhin
4313f57ecb Optimize range scans 2025-10-09 11:47:41 +03:00
Pavan-Nambi
414f92d0a0 go back to for loop
cleanup

clippy
2025-10-09 13:50:45 +05:30
Jussi Saurio
acb3c97fea Merge 'When pwritev fails, clear the dirty pages' from Pedro Muniz
If we don't clear the dirty pages, we will initiate a rollback. In the
rollback, we will attempt to clear the whole page cache, but it will
then panic because there will still be dirty pages from the failed
writev

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3189
2025-10-09 10:38:47 +03:00
Pavan-Nambi
f0d9ead19f add more tests
refactor and use sort_unstable_by_key
2025-10-09 08:28:59 +05:30
Pavan-Nambi
f138448da2 don't allow duplicate col names in create table 2025-10-09 08:09:31 +05:30
Pere Diaz Bou
f06ee571be Merge 'MVCC: Don't modify the row version chain on rollback' from Duy Dang
Rollback shouldn't modify the row version chain. This is crucial for
implementing a Non-blocking row version chain in #3499

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3583
2025-10-08 18:00:02 +02:00
Pekka Enberg
08cf663d7b Merge 'Add support for sqlite_version() star syntax' from Glauber Costa
SQLite surprisingly supports this:
select sqlite_version(*);
this gets translated at the parser level to sqlite_version(), and it
works for all functions that take 0 arguments.
Let's be compatible with SQLite and support the same thing.

Closes #3630
2025-10-08 17:41:27 +03:00