Commit Graph

6123 Commits

Author SHA1 Message Date
Jussi Saurio
fda92d43a2 adjust comment in header size test 2025-07-15 18:52:27 +03:00
Jussi Saurio
38183d3b3b tcl: add regression test for large text keys 2025-07-15 18:48:06 +03:00
Jussi Saurio
932536a03f compare_records: fix assumption that header size is 1 byte and serial type is 1 byte 2025-07-15 17:57:52 +03:00
Jussi Saurio
7c353095ed types: fix and unify record header size calculation 2025-07-15 17:38:02 +03:00
Jussi Saurio
beaf393476 Merge 'Treat table-valued functions as tables' from Piotr Rżysko
First step toward resolving
https://github.com/tursodatabase/limbo/issues/1643.
### This PR
With this change, the following two queries are considered equivalent:
```sql
SELECT value FROM generate_series(5, 50);
SELECT value FROM generate_series WHERE start = 5 AND stop = 50;
```
Arguments passed in parentheses to the virtual table name are now
matched to hidden columns.
Additionally, I fixed two bugs related to virtual tables.
### TODO (I'll handle this in a separate PR)
Column references are still not supported as table-valued function
arguments. The only difference is that previously, a query like:
```sql
SELECT one.value, series.value
FROM (SELECT 1 AS value) one, generate_series(one.value, 3) series;
```
would cause a panic. Now, it returns a proper error message instead.
Adding support for column references is more nuanced for two main
reasons:
* We need to ensure that in joins where a TVF depends on other tables,
those other tables are processed first. For example, in:
```sql
SELECT one.value, series.value
FROM generate_series(one.value, 3) series, (SELECT 1 AS value) one;
```
the one table must be processed by the top-level loop, and series must
be nested.
* For outer joins involving TVFs, the arguments must be treated as `ON`
predicates, not `WHERE` predicates.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1727
2025-07-15 12:23:45 +03:00
Jussi Saurio
0ab0af912c Merge 'bindings/js: fix more tests' from Mikaël Francoeur
Six more tests passing on Turso. The commits can be reviewed separately.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2085
2025-07-15 12:17:15 +03:00
Jussi Saurio
5ea35d700a Merge 'Support page_size pragma setting' from meteorgan
Closes: #1379
This PR consists of three main changes:
1. Rebuild `Pager` to set the correct page size for `buffer_pool`, `wal`
and other components when the database is uninitialized.
2. Persist the latest page size when allocate page 1.
3. Ensure all pragmas emit the correct transaction instructions,
preventing even a `page_size`  read from triggering database
initialization.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2053
2025-07-15 12:14:52 +03:00
meteorgan
d7bdfeb711 reinitialize WalFileShare when reset page size 2025-07-15 16:34:07 +08:00
meteorgan
b42a1ef272 minor improvements based on PR comments 2025-07-15 16:34:07 +08:00
meteorgan
39d79d7420 add tests for page_size pragma 2025-07-15 16:34:07 +08:00
meteorgan
f123c77ee8 fix set page_size in pager 2025-07-15 16:34:07 +08:00
meteorgan
e2ab673624 fix self.pager.replace() panic 2025-07-15 16:34:07 +08:00
meteorgan
bf69b86e94 fix: not all pragma need transaction 2025-07-15 16:34:07 +08:00
meteorgan
a6faab17e9 fix query page size 2025-07-15 16:34:07 +08:00
meteorgan
cf126824de Support set page size 2025-07-15 16:34:07 +08:00
Pekka Enberg
b7db07cf2d Turso 0.1.2 2025-07-15 11:01:25 +03:00
Pekka Enberg
f4bc0ca77e Update CHANGELOG 2025-07-15 11:01:18 +03:00
Mikaël Francoeur
68134fa186 support named bind parameters 2025-07-14 15:36:12 -04:00
Mikaël Francoeur
093140d84c throw on empty statement 2025-07-14 15:28:07 -04:00
Mikaël Francoeur
e25064959b return info object 2025-07-14 14:35:48 -04:00
Pekka Enberg
f15fa91695 Merge 'Gopher is biologically closer to beavers than hamsters' from David Shekunts
Biologically gopher is closer to beavers, than to hamsters, so it will
be much more correct to use beaver emoji.
And yes, if you merge this MR I would be proud of my contribution into
open source.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2067
2025-07-14 21:34:26 +03:00
Mikaël Francoeur
99614a3c7c support open property 2025-07-14 14:03:57 -04:00
Pekka Enberg
363c29af2e Merge 'test/fuzz: fix rowid_seek_fuzz not being a proper fuzz test' from Jussi Saurio
The original `rowid_seek_fuzz` test had a design flaw: it inserted
contiguous non-random integers (so not really fuzzing), which prevented
issues such as the one fixed in #2065 from being discovered.
Further, the test has at some point also been neutered a bit by only
inserting 100 values which makes the btree very small, hiding
interactions between interior pages and neighboring leaf pages.
This should not be merged until #2065 is merged.

Closes #2081
2025-07-14 14:42:42 +03:00
Pekka Enberg
03d170ca05 Turso 0.1.2-pre.4 2025-07-14 13:21:41 +03:00
Pekka Enberg
d5d48db304 Merge 'build: Update cargo-dist to 0.28.6' from Pekka Enberg
Update `cargo-dist` to version 0.28.6. It should make installers more
robust to $HOME not being defined.
Refs #2073

Closes #2082
2025-07-14 13:20:54 +03:00
Pekka Enberg
fd4deda556 Merge 'Add fuzz to CI checks' from Levy A.
Closes #1869
2025-07-14 13:10:36 +03:00
Pekka Enberg
55cf9c8f02 Merge 'Add async header accessor functionality' from Zaid Humayun
This PR addresses https://github.com/tursodatabase/turso/issues/1828 in
a phased manner.
Making database header access async in one PR will be complicated. This
PR ports adds an async API to `header_accessor.rs` and ports over some
of `pager.rs` to use this API.
This will allow gradual porting over of all call sites. Once all call
sites are ported over, one mechanical rename will fix everything in the
repo so we don't have any `<header_name>_async` functions.
Also, porting header accessors over from sync to async would be a good
way to get introduced to the Limbo codebase for first time contributors.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1966
2025-07-14 13:08:29 +03:00
Pekka Enberg
1653b0883a Merge 'core/vector: Euclidean distance support for vector search' from KarinaMilet
This PR provides Euclidean distance support for limbo's vector search.
At the same time, some type abstractions are introduced, such as
`DistanceCalculator`, etc. This is because I hope to unify the current
vector module in the future to make it more structured, clearer, and
more extensible.
While practicing Euclidean distance for Limbo, I discovered that many
checks could be done using the type system or in advance, rather than
waiting until the distance is calculated. By building these checks into
the type system or doing them ahead of time, this would allow us to
explore more efficient computations, such as automatic vectorization or
SIMD acceleration, which is future work.

Reviewed-by: Nikita Sivukhin (@sivukhin)

Closes #1986
2025-07-14 13:07:20 +03:00
Pekka Enberg
4613104044 Merge 'cli: Fail import command if table does not exists' from Pekka Enberg
SQLite creates a table if it does not exists, but we just silently
ignore the data. Let's add an error if table does not exist until we fix
this.
Refs #2079

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2080
2025-07-14 13:05:23 +03:00
Jussi Saurio
615ccf6789 test/fuzz: fix rowid_seek_fuzz
The original `rowid_seek_fuzz` test had a design flaw: it inserted contiguous
integers, which prevented issues such as the one fixed in #2065 from being
discovered.

Further, the test has at some point also been neutered a bit by only inserting
100 values which makes the btree very small, hiding interactions between interior
pages and neighboring leaf pages.

This should not be merged until #2065 is merged.
2025-07-14 12:57:59 +03:00
Pekka Enberg
90532eabdf Merge 'b-tree: fix bug in case when no matching rows was found in seek in the leaf page' from Nikita Sivukhin
Current table B-Tree seek code rely on the invariant that if key `K` is
present in interior page then it also must be present in the leaf page.
This is generally not true if data was ever deleted from the table
because leaf row which key was used as a divider in the interior pages
can be deleted. Also, SQLite spec says nothing about such invariant - so
`turso-db` implementation of B-Tree should not rely on it.
This PR introduce 3 options for B-Tree `seek` result: `Found` /
`NotFound` and `TryAdvance` which is generated when leaf page have no
match for `seek_op` but DB don't know if neighbor page can have matching
data.
There is an alternative approach where we can move cursor in the `seek`
itself to the neighbor page - but I was afraid to introduce such changes
because analogue `seek` function from SQLite works exactly like current
version of the code and I think some query planner internals (for
insertion) can rely on the fact that repositioning will leave cursor at
the position of insertion:
> ** If an exact match is not found, then the cursor is always
** left pointing at a leaf page which would hold the entry if it
** were present.  The cursor might point to an entry that comes
** before or after the key.
Also, this PR introduces new B-tree fuzz tests which generate table
B-tree from scratch and execute opreations over it. This can help to
reach some non trivial states and also generate huge DBs faster (that's
how this bug was discovered)

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2065
2025-07-14 12:57:09 +03:00
Pekka Enberg
214831a591 build: Update cargo-dist to 0.28.6
Update `cargo-dist` to version 0.28.6. It should make installers more
robust to $HOME not being defined.

Refs #2073
2025-07-14 12:50:19 +03:00
Pekka Enberg
b13a0bb549 cli: Fail import command if table does not exists
SQLite creates a table if it does not exists, but we just silently
ignore the data. Let's add an error if table does not exist until we fix
this.

Refs #2079
2025-07-14 12:24:58 +03:00
Pekka Enberg
1a0d618a41 Merge 'Assert I/O read and write sizes' from Pere Diaz Bou
Let's assert **for now** that we do not read/write less bytes than
expected. This should be fixed to retrigger several reads/writes if we
couldn't read/write enough but for now let's assert.

Closes #2078
2025-07-14 12:22:18 +03:00
Nikita Sivukhin
413d93f041 fix after rebase 2025-07-14 13:05:20 +04:00
Nikita Sivukhin
82773d6563 fix clippy 2025-07-14 13:01:15 +04:00
Nikita Sivukhin
5bd3287826 add comments 2025-07-14 13:01:15 +04:00
Nikita Sivukhin
47ab260f6c use PlatformIO in the fuzz test code 2025-07-14 13:01:15 +04:00
Nikita Sivukhin
9a347d8852 add simple tcl test 2025-07-14 13:01:15 +04:00
Nikita Sivukhin
c4841e18f3 fix clippy 2025-07-14 13:01:15 +04:00
Nikita Sivukhin
aceaf182b1 remove comment 2025-07-14 13:01:15 +04:00
Nikita Sivukhin
6e2ccdff20 add btree fuzz tests which generate seed file from scratch 2025-07-14 13:01:15 +04:00
Nikita Sivukhin
f9cd5fad4c add small comment 2025-07-14 13:01:15 +04:00
Nikita Sivukhin
fc400906d5 handle case when target seek page has no matching entries 2025-07-14 13:01:15 +04:00
Nikita Sivukhin
03b2725cc7 return SeekResult from seek operation
- Apart from regular states Found/NotFound seek result has TryAdvance
  value which tells caller to advance the cursor in necessary direction
  because the leaf page which would hold the entry if it was present
  actually has no matching entry (but neighbouring page can have match)
2025-07-14 13:01:15 +04:00
Nikita Sivukhin
77bf6c287d introduce proper state machine for seek op code 2025-07-14 13:01:14 +04:00
Pekka Enberg
9285d8b83b Merge 'Fix: OP_NewRowId to generate semi random rowid when largest rowid is i64::MAX' from Krishna Vishal
- `OP_NewRowId` now generates new rowid semi randomly when the largest
rowid in the table is `i64::MAX`.
- Introduced new `LimboError` variant `DatabaseFull` to signify that
database might be full (SQLite behaves this way returning
`SQLITE_FULL`).
Now:
```SQL
turso> CREATE TABLE q(x INTEGER PRIMARY KEY, y);
turso> INSERT INTO q VALUES (9223372036854775807, 1);
turso> INSERT INTO q(y) VALUES (2);
turso> INSERT INTO q(y) VALUES (3);
turso> SELECT * FROM q;
┌─────────────────────┬───┐
│ x                   │ y │
├─────────────────────┼───┤
│ 1841427626667347484 │ 2 │
├─────────────────────┼───┤
│ 4000338366725695791 │ 3 │
├─────────────────────┼───┤
│ 9223372036854775807 │ 1 │
└─────────────────────┴───┘
```
Fixes: https://github.com/tursodatabase/turso/issues/1977

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1985
2025-07-14 11:56:09 +03:00
Pekka Enberg
0b544717a1 Merge 'do not check rowid alias for null' from Nikita Sivukhin
Simple PR to check minor issue that `INTEGER PRIMARY KEY NOT NULL` (`NOT
NULL` is redundant here obviously) will prevent user to insert anything
to the table as rowid-alias column always set to null by `turso-db`

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2063
2025-07-14 11:55:06 +03:00
Pekka Enberg
80f9de133e Merge 'CDC functions' from Nikita Sivukhin
This PR adds few functions to the `turso-db` in order to simplify
exploration of CDC table. Later we will also add API to work with
changes from the code - but SQL support is also useful.
So, this PR adds 2 functions:
1. `table_columns_json_array('<table-name>')` - returns list of current
table column **names** as a single string in JSON array format
2. `bin_record_json_object('<columns-array>', x'<bin-record>')` -
convert record in the SQLite format to the JSON object with keys from
`columns-array`
So, this functions can be used together to extract changes in human-
readable format:
```sql
turso> PRAGMA unstable_capture_data_changes_conn('full');
turso> CREATE TABLE t(a INTEGER PRIMARY KEY, b);
turso> INSERT INTO t VALUES (1, 2), (3, 4);
turso> UPDATE t SET b = 20 WHERE a = 1;
turso> UPDATE t SET a = 30, b = 40 WHERE a = 3;
turso> DELETE FROM t WHERE a = 1;
turso> SELECT
    bin_record_json_object(table_columns_json_array('t'), before) before,
    bin_record_json_object(table_columns_json_array('t'), after) after
    FROM turso_cdc;
┌─────────────────┬────────────────┐
│ before          │ after          │
├─────────────────┼────────────────┤
│                 │ {"a":1,"b":2}  │
├─────────────────┼────────────────┤
│                 │ {"a":3,"b":4}  │
├─────────────────┼────────────────┤
│ {"a":1,"b":2}   │ {"a":1,"b":20} │
├─────────────────┼────────────────┤
│ {"a":3,"b":4}   │                │
├─────────────────┼────────────────┤
│ {"a":30,"b":40} │                │
├─────────────────┼────────────────┤
│ {"a":1,"b":20}  │                │
└─────────────────┴────────────────┘
```
Initially, I thought to implement single function like
`bin_record_json_object('<table-name', x'<bin-record')` but this design
has certain flaws:
1. In case of schema changes this function can return incorrect result
(imagine that you dropped a column and now JSON from CDC mentions some
random subset of columns). While this feature is unstable - `turso-db`
should avoid silent incorrect behavior at all cost
2. Single-function design provide no way to deal with schema changes
3. The API is unsound and user can think that under the hood `turso-db`
will select proper schema for the record (but this is actually
impossible with current CDC implementation)
So, I decided to stop with two-functions design which cover drawbacks
mentioned above to some extent
1. First concern still remains valid
2. Two-functions design provides a way to deal with schema changes. For
example, user can maintain simple `cdc_schema_changes` table and log
result of `table_columns_json_array` before applying breaking schema
changes.
    * Obviously, this is not ideal UX - but it suits my needs: I don't
want to design schema changes capturing, but also I don't want to block
users and provide a way to have a workaround for scenarios which are not
natively supported by CDC
3. Subjectively, I think that API became a bit more clear about the
machinery of these two functions as user see that it extract column list
of the table (without any context) and then feed it to the
`bin_record_json_object` function.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2057
2025-07-14 11:54:17 +03:00
Pere Diaz Bou
3a34f21434 io/windows: pread return bytes read 2025-07-14 10:44:56 +02:00