Commit Graph

10129 Commits

Author SHA1 Message Date
pedrocarlo
fafbdbfa9d persist files in sim memory io for integrity check 2025-10-11 15:03:22 -03:00
Jussi Saurio
7725f336b8 Merge 'Fix incorrectly using an equality constraint twice for index seek' from Jussi Saurio
Prevents something like `WHERE x = 5 AND x = 5` from becoming a two
component index key.
Closes #3656

Reviewed-by: Nikita Sivukhin (@sivukhin)

Closes #3658
2025-10-10 13:33:43 +03:00
Jussi Saurio
74e04634aa Fix incorrectly using an equality constraint twice for index seek
Prevents something like `WHERE x = 5 AND x = 5` from becoming a two
component index key.

Closes #3656
2025-10-10 13:19:19 +03:00
Pekka Enberg
12783ef01e Merge 'bindings/java: Add support for publishing to Maven Central' from Kim Seon Woo
## Purpose
- Deploy `tech.turso:turso:<version>` to maven central so that users can
easily use java bindings
  - For example :
https://repo1.maven.org/maven2/io/github/seonwkim/turso/0.0.1/
## Requirements
- [x] Add the following github secrets.
  - [x] MAVEN_CENTRAL_USERNAME
  - [x] MAVEN_CENTRAL_PASSWORD
  - [x] GPG_PRIVATE_KEY
  - [x] GPG_PASSPHRASE
- [ ] Namespace `tech.turso` must be registered at maven central
- [ ] GPG key registration to key servers
- Notes
  - Retrieve MAVEN_CENTRAL_USERNAME and MAVEN_CENTRAL_PASSWORD from
[maven central](https://central.sonatype.com/usertoken)
  - GPG keys should be registered. You should distribute your keys to
designated(maven central supported) servers
    -  Refer to [GPG key related docs](https://central.sonatype.org/publ
ish/requirements/gpg/#distributing-your-public-key)
    - Btw, I used `keyserver.ubuntu.com` key server while testing
### [Maven Central Username &
Password](https://central.sonatype.com/usertoken)
<img width="2878" height="1338" alt="image"
src="https://github.com/user-
attachments/assets/03e6f967-a7f6-46b8-aef5-d15772bd9eea" />
### [Maven Central
Namespace](https://central.sonatype.com/publishing/namespaces)
<img width="1424" height="456" alt="image" src="https://github.com/user-
attachments/assets/8c0f4f17-bf5a-4c6a-bc47-748d86cd1f1a" />
## Future Works
- Currently, we depend on gradle.properties to determine the version of
our dependency and it's cumbersome to always change the version
manually. Let's find a better solution.

Closes #3624
2025-10-10 13:12:01 +03:00
Pekka Enberg
e727b8e0dc Merge 'Vector improvements' from Nikita Sivukhin
This PR introduces sparse vectors support and jaccard distance
implementation.
Also, this PR restructure the code to have all vector operations in
separate files (they grow pretty quickly as new vector representations
added to the DB).

Closes #3647
2025-10-10 13:08:46 +03:00
Pekka Enberg
77924c6c71 Merge 'Optimize sorter' from Jussi Saurio
Various little fixes to `Sorter` that reduce unnecessary work.
Makes TPC-H query 1 roughly 2x faster, which is a lot because it
originally took 30-40 seconds depending on the CI run

Closes #3645
2025-10-10 13:06:53 +03:00
Pekka Enberg
cf22819817 Merge 'Make sqlite_version() compatible with SQLite' from Glauber Costa
I found an application in the open that expects sqlite_version() to
return a specific string (higher than 3.8...).
We had tons of those issues at Scylla, and the lesson was that you tell
your kids not to lie, but when life hits, well... you lie.
We'll add a new function, turso_version, that tells the truth.

Closes #3635
2025-10-10 13:06:36 +03:00
Nikita Sivukhin
51122d3e9c fix clippy 2025-10-10 11:39:06 +04:00
Jussi Saurio
c88061e1eb Merge 'Fix IN operator NULL handling' from Diego Reis
Closes #3277
Basically changed what we used to do to match what and how SQLite does.

Closes #3606
2025-10-10 10:17:57 +03:00
Jussi Saurio
2c698525de Merge 'Cleanup Simulator + Fix Column constraints in sql generation' from Pedro Muniz
- Removed a general clippy rule to allow all dead code and subsequently
removed a lot of dead code
- Fixed Column constraints in Sql Generation to accommodate all Column
constraints available to the Parser and print the constraints in other
sql queries.
- Moved Generation of simulator values to separate files
These are some of the changes I made in my Alter Table PR that I am
upstreaming here

Closes #3649
2025-10-09 23:37:22 +03:00
pedrocarlo
642ec3032d use parser's ColumnDefinition for Sql Generation Column struct 2025-10-09 17:25:04 -03:00
pedrocarlo
fb6c5ffcff move SimValue generation to separate files to facilitate generation of new types of values in the future 2025-10-09 17:25:04 -03:00
pedrocarlo
b6f94b2fa1 remove dead code in sim 2025-10-09 17:25:04 -03:00
Nikita Sivukhin
7e727d07af fix bugs add tests 2025-10-09 23:23:16 +04:00
Nikita Sivukhin
10c51c8da0 add test for convert operation 2025-10-09 22:14:38 +04:00
Nikita Sivukhin
e18f26a1f1 fix bug after refactoring 2025-10-09 21:28:46 +04:00
Nikita Sivukhin
ac9a25a417 fix clippy 2025-10-09 21:19:35 +04:00
Nikita Sivukhin
5336801574 add jaccard distance 2025-10-09 21:15:39 +04:00
Nikita Sivukhin
585d11b736 implement operations for sparse vectors 2025-10-09 20:52:58 +04:00
Jussi Saurio
edf40cc65b clippy 2025-10-09 19:00:40 +03:00
Jussi Saurio
27a88b86dc Reuse a single RecordCursor per PseudoCursor 2025-10-09 18:56:49 +03:00
Jussi Saurio
812709cf8e inline collation comparison functions 2025-10-09 18:56:49 +03:00
Nikita Sivukhin
84643dc4f2 implement sparse vector operations 2025-10-09 19:19:33 +04:00
Diego Reis
d2d265a06f Small nits and code clean ups 2025-10-09 12:14:20 -03:00
Diego Reis
b8f8a87007 Refactor bytecode emission
- we were redundantly translating tmp
- Make emit_constant_insn a method of ProgramBuilder
2025-10-09 11:57:16 -03:00
Diego Reis
84e8d11764 Fix bug when jump_if_true is enabled 2025-10-09 11:57:16 -03:00
Diego Reis
625403cc2a Fix register reuse when called inside a coroutine
- On each interaction we assume that the value is NULL, so we need to
  set it like so for every interaction in the list. So we force to not
  emit this NULL as constant;
- Forces a copy so IN expressions works inside an aggregation
  expression. Not ideal but it works, we should work more on the query
  planner for sure.
2025-10-09 11:57:16 -03:00
Diego Reis
da323fa0c4 Some clean ups and correctly working on WHERE clauses 2025-10-09 11:57:15 -03:00
Diego Reis
79958f468d Add jump_target_null to ConditionMetadata
It's kinda make sense, conditions can be evaluated into 3 values: false,
true and null. Now we handle that.
2025-10-09 11:56:14 -03:00
Diego Reis
52ed0f7997 Add in expr optimization at the parser level instead of translation.
lhs IN () and lhs NOT IN () can be translated to false and true.
2025-10-09 11:56:14 -03:00
Diego Reis
70fc509046 First step to fix 3277
This follows almost step by step sqlite's functions, and indeed it's
correct. But still have to translate some of this logic to our current
semantics
2025-10-09 11:56:14 -03:00
Jussi Saurio
0356a7102c remove another expensive assert 2025-10-09 17:50:15 +03:00
Jussi Saurio
a1a83c689b Don't yield if completion already succeeded 2025-10-09 17:50:06 +03:00
Jussi Saurio
1c35d5b342 avoid expensive Arc cloning 2025-10-09 17:43:28 +03:00
Jussi Saurio
1f310a4738 Remove expensive hot path assert 2025-10-09 17:29:18 +03:00
Glauber Costa
f4116eb3d4 lie about sqlite version
I found an application in the open that expects sqlite_version() to
return a specific string (higher than 3.8...).

We had tons of those issues at Scylla, and the lesson was that you
tell your kids not to lie, but when life hits, well... you lie.

We'll add a new function, turso_version, that tells the truth.
2025-10-09 07:19:35 -07:00
Nikita Sivukhin
68632cc142 rename euclidian to L2 for consistency 2025-10-09 17:26:36 +04:00
Nikita Sivukhin
1ebf2b7c8d add f32 sparse vector type 2025-10-09 17:25:40 +04:00
Nikita Sivukhin
9e68fa7f4a simplify vector_slice operation 2025-10-09 17:11:13 +04:00
Nikita Sivukhin
d7f3a450ad return Nan for cosine distance instead of error
- errors are hard to handle in case of some scan operations (something went wrong in the middle - whoe query aborted)
- it will be more flexibly if we will return NaN and let user handle situation
2025-10-09 17:06:49 +04:00
Nikita Sivukhin
14e104f830 add convert operation 2025-10-09 16:56:36 +04:00
Nikita Sivukhin
8584ee18a3 refactor parsing/deserialization 2025-10-09 16:36:39 +04:00
Jussi Saurio
bcca404551 Avoid string allocation in sorter record comparison 2025-10-09 15:34:27 +03:00
Nikita Sivukhin
a2f4376bd2 move more operations to the operations/ folder 2025-10-09 16:18:53 +04:00
Nikita Sivukhin
7e9e102f20 move vector operations under operations/ folder 2025-10-09 16:02:03 +04:00
Jussi Saurio
e0461dd78a Sorter: compute values upfront instead of deserializing on every comparison 2025-10-09 15:01:47 +03:00
Jussi Saurio
7948259d37 Merge 'optimizer: optimize range scans to use upper and lower bounds more efficiently' from Jussi Saurio
Made a new PR based on @sivukhin 's PR #2869 that had a lot of
conflicts. You can check out the PR description from there.
## The main idea is:
Before, if we had an index on `x` and had a query like `WHERE x > 100
and x < 200`, the plan would be something like:
```
- Seek to first row where x > 100
- Then, for every row, discard the row if x >= 200
```
This is highly wasteful in cases where there are a lot of rows where `x
>= 200`. Since our index is sorted on `x`, we know that once we hit the
_first_ row where `x >= 200`, we can stop iterating entirely.
So, the new plan is:
```
- Seek to first row where x > 100
- Then, iterate rows until x >= 200, and then stop
```
This also improves the situation for multi-column indexes. Imagine index
on `(x,y)` and a condition like `WHERE x = 100 and y > 100 and y < 200`.
Before, the plan was:
```
- Seek to first row where x=100 and y > 100
- Then, iterate rows while x = 100 and discard the row if y >= 200
- Stop when x > 100
```
This also suffers from a problem where if there are a lot of rows where
`x=100` and `y >= 200`, we go through those rows unnecessarily. The new
plan is:
```
- Seek to first row where x=100 and y > 100
- Then, iterate rows while x = 100 and y < 200
- Stop when either x > 100 or y >= 200
```
Which prevents us from iterating rows like `x=100, y = 666`
unnecessarily because we know the index is sorted on `(x,y)` - once we
hit any row where `x>100` OR `x=100, y >= 200`, we can stop.

Closes #3644
2025-10-09 14:47:15 +03:00
Jussi Saurio
f9f8eda3c3 Merge 'add Calendar-based timezone conversion support in JDBC4ResultSet' from 김민석
## Summary
Implemented Calendar-based Date/Time/Timestamp getter methods in
JDBC4ResultSet to support timezone conversions.
## Changes
- Implemented `getDate(int, Calendar)` and `getDate(String, Calendar)`
- Implemented `getTime(int, Calendar)` and `getTime(String, Calendar)`
- Implemented `getTimestamp(int, Calendar)` and `getTimestamp(String,
Calendar)`
- Fixed timezone conversion logic (changed from subtraction to addition)
- Added comprehensive test cases for all implemented methods
Test Results
- All tests passed successfully
- New tests validate timezone conversion with UTC and Seoul (UTC+9)

Reviewed-by: Kim Seon Woo (@seonWKim)

Closes #3607
2025-10-09 12:52:09 +03:00
Jussi Saurio
e726803ab4 Merge 'translate: make bind_and_rewrite_expr() reject unbound identifiers if no referenced tables exist' from Jussi Saurio
Before, we just skipped evaluating `Id`, `Qualified` and
`DoublyQualified` if `referenced_tables` was `None`, leading to shit
like #3621. Let's eagerly return `"No such column"` parse errors in
these cases instead, and punch exceptions for cases where that doesn't
cleanly work
Top tip: use `Hide whitespace` toggle when inspecting the diff of this
PR
Closes #3621

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3626
2025-10-09 12:45:16 +03:00
Jussi Saurio
ab88e7c206 Merge 'don't allow duplicate col names in create table' from Pavan Nambi
closes https://github.com/tursodatabase/turso/issues/3637

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3641
2025-10-09 12:44:28 +03:00