Commit Graph

6076 Commits

Author SHA1 Message Date
Jussi Saurio
553396e9ca btree: unify table&index seek page boundary handling
PR #2065 fixed a bug with table btree seeks concerning boundaries
of leaf pages.

The issue was that if we were e.g. looking for the first key greater than
(GT) 100, we always assumed the key would either be found on the left child
page of a given divider (e.g. divider 102), which is incorrect. #2065 has more
discussion and documentation about this, so read that one for more context.

Anyway:

We already had similar handling for index btrees, but it was baked into
the `BTreeCursor` struct's seek handling itself, whereas #2065 handled this
on the VDBE side.

This PR unifies this handling for both table and index btrees by always doing
the additional cursor advancement in the VDBE.

Unfortunately, since indexes may also need to do an additional advance when they
are looking for an exact match, this resulted in a bigger refactor than anticipated,
since there are quite a few VDBE instructions that may perform a seek, e.g.:
`IdxInsert`, `IdxDelete`, `Found`, `NotFound`, `NoConflict`.

All of these can potentially end up in a similar situation where the cursor needs
one more advance after the initial seek.

For this reason, I have extracted a common VDBE helper `fn seek_internal()` which
all the interested VDBE instructions will call to delegate their seek logic.
2025-07-14 16:46:43 +03:00
Pekka Enberg
90532eabdf Merge 'b-tree: fix bug in case when no matching rows was found in seek in the leaf page' from Nikita Sivukhin
Current table B-Tree seek code rely on the invariant that if key `K` is
present in interior page then it also must be present in the leaf page.
This is generally not true if data was ever deleted from the table
because leaf row which key was used as a divider in the interior pages
can be deleted. Also, SQLite spec says nothing about such invariant - so
`turso-db` implementation of B-Tree should not rely on it.
This PR introduce 3 options for B-Tree `seek` result: `Found` /
`NotFound` and `TryAdvance` which is generated when leaf page have no
match for `seek_op` but DB don't know if neighbor page can have matching
data.
There is an alternative approach where we can move cursor in the `seek`
itself to the neighbor page - but I was afraid to introduce such changes
because analogue `seek` function from SQLite works exactly like current
version of the code and I think some query planner internals (for
insertion) can rely on the fact that repositioning will leave cursor at
the position of insertion:
> ** If an exact match is not found, then the cursor is always
** left pointing at a leaf page which would hold the entry if it
** were present.  The cursor might point to an entry that comes
** before or after the key.
Also, this PR introduces new B-tree fuzz tests which generate table
B-tree from scratch and execute opreations over it. This can help to
reach some non trivial states and also generate huge DBs faster (that's
how this bug was discovered)

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2065
2025-07-14 12:57:09 +03:00
Pekka Enberg
1a0d618a41 Merge 'Assert I/O read and write sizes' from Pere Diaz Bou
Let's assert **for now** that we do not read/write less bytes than
expected. This should be fixed to retrigger several reads/writes if we
couldn't read/write enough but for now let's assert.

Closes #2078
2025-07-14 12:22:18 +03:00
Nikita Sivukhin
413d93f041 fix after rebase 2025-07-14 13:05:20 +04:00
Nikita Sivukhin
82773d6563 fix clippy 2025-07-14 13:01:15 +04:00
Nikita Sivukhin
5bd3287826 add comments 2025-07-14 13:01:15 +04:00
Nikita Sivukhin
47ab260f6c use PlatformIO in the fuzz test code 2025-07-14 13:01:15 +04:00
Nikita Sivukhin
9a347d8852 add simple tcl test 2025-07-14 13:01:15 +04:00
Nikita Sivukhin
c4841e18f3 fix clippy 2025-07-14 13:01:15 +04:00
Nikita Sivukhin
aceaf182b1 remove comment 2025-07-14 13:01:15 +04:00
Nikita Sivukhin
6e2ccdff20 add btree fuzz tests which generate seed file from scratch 2025-07-14 13:01:15 +04:00
Nikita Sivukhin
f9cd5fad4c add small comment 2025-07-14 13:01:15 +04:00
Nikita Sivukhin
fc400906d5 handle case when target seek page has no matching entries 2025-07-14 13:01:15 +04:00
Nikita Sivukhin
03b2725cc7 return SeekResult from seek operation
- Apart from regular states Found/NotFound seek result has TryAdvance
  value which tells caller to advance the cursor in necessary direction
  because the leaf page which would hold the entry if it was present
  actually has no matching entry (but neighbouring page can have match)
2025-07-14 13:01:15 +04:00
Nikita Sivukhin
77bf6c287d introduce proper state machine for seek op code 2025-07-14 13:01:14 +04:00
Pekka Enberg
9285d8b83b Merge 'Fix: OP_NewRowId to generate semi random rowid when largest rowid is i64::MAX' from Krishna Vishal
- `OP_NewRowId` now generates new rowid semi randomly when the largest
rowid in the table is `i64::MAX`.
- Introduced new `LimboError` variant `DatabaseFull` to signify that
database might be full (SQLite behaves this way returning
`SQLITE_FULL`).
Now:
```SQL
turso> CREATE TABLE q(x INTEGER PRIMARY KEY, y);
turso> INSERT INTO q VALUES (9223372036854775807, 1);
turso> INSERT INTO q(y) VALUES (2);
turso> INSERT INTO q(y) VALUES (3);
turso> SELECT * FROM q;
┌─────────────────────┬───┐
│ x                   │ y │
├─────────────────────┼───┤
│ 1841427626667347484 │ 2 │
├─────────────────────┼───┤
│ 4000338366725695791 │ 3 │
├─────────────────────┼───┤
│ 9223372036854775807 │ 1 │
└─────────────────────┴───┘
```
Fixes: https://github.com/tursodatabase/turso/issues/1977

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1985
2025-07-14 11:56:09 +03:00
Pekka Enberg
0b544717a1 Merge 'do not check rowid alias for null' from Nikita Sivukhin
Simple PR to check minor issue that `INTEGER PRIMARY KEY NOT NULL` (`NOT
NULL` is redundant here obviously) will prevent user to insert anything
to the table as rowid-alias column always set to null by `turso-db`

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2063
2025-07-14 11:55:06 +03:00
Pekka Enberg
80f9de133e Merge 'CDC functions' from Nikita Sivukhin
This PR adds few functions to the `turso-db` in order to simplify
exploration of CDC table. Later we will also add API to work with
changes from the code - but SQL support is also useful.
So, this PR adds 2 functions:
1. `table_columns_json_array('<table-name>')` - returns list of current
table column **names** as a single string in JSON array format
2. `bin_record_json_object('<columns-array>', x'<bin-record>')` -
convert record in the SQLite format to the JSON object with keys from
`columns-array`
So, this functions can be used together to extract changes in human-
readable format:
```sql
turso> PRAGMA unstable_capture_data_changes_conn('full');
turso> CREATE TABLE t(a INTEGER PRIMARY KEY, b);
turso> INSERT INTO t VALUES (1, 2), (3, 4);
turso> UPDATE t SET b = 20 WHERE a = 1;
turso> UPDATE t SET a = 30, b = 40 WHERE a = 3;
turso> DELETE FROM t WHERE a = 1;
turso> SELECT
    bin_record_json_object(table_columns_json_array('t'), before) before,
    bin_record_json_object(table_columns_json_array('t'), after) after
    FROM turso_cdc;
┌─────────────────┬────────────────┐
│ before          │ after          │
├─────────────────┼────────────────┤
│                 │ {"a":1,"b":2}  │
├─────────────────┼────────────────┤
│                 │ {"a":3,"b":4}  │
├─────────────────┼────────────────┤
│ {"a":1,"b":2}   │ {"a":1,"b":20} │
├─────────────────┼────────────────┤
│ {"a":3,"b":4}   │                │
├─────────────────┼────────────────┤
│ {"a":30,"b":40} │                │
├─────────────────┼────────────────┤
│ {"a":1,"b":20}  │                │
└─────────────────┴────────────────┘
```
Initially, I thought to implement single function like
`bin_record_json_object('<table-name', x'<bin-record')` but this design
has certain flaws:
1. In case of schema changes this function can return incorrect result
(imagine that you dropped a column and now JSON from CDC mentions some
random subset of columns). While this feature is unstable - `turso-db`
should avoid silent incorrect behavior at all cost
2. Single-function design provide no way to deal with schema changes
3. The API is unsound and user can think that under the hood `turso-db`
will select proper schema for the record (but this is actually
impossible with current CDC implementation)
So, I decided to stop with two-functions design which cover drawbacks
mentioned above to some extent
1. First concern still remains valid
2. Two-functions design provides a way to deal with schema changes. For
example, user can maintain simple `cdc_schema_changes` table and log
result of `table_columns_json_array` before applying breaking schema
changes.
    * Obviously, this is not ideal UX - but it suits my needs: I don't
want to design schema changes capturing, but also I don't want to block
users and provide a way to have a workaround for scenarios which are not
natively supported by CDC
3. Subjectively, I think that API became a bit more clear about the
machinery of these two functions as user see that it extract column list
of the table (without any context) and then feed it to the
`bin_record_json_object` function.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2057
2025-07-14 11:54:17 +03:00
Pere Diaz Bou
3a34f21434 io/windows: pread return bytes read 2025-07-14 10:44:56 +02:00
Pere Diaz Bou
340391538a io: change comment for assert 2025-07-14 10:36:06 +02:00
Pere Diaz Bou
93235bc566 io/wasm: return number read bytes 2025-07-14 10:35:55 +02:00
Pere Diaz Bou
88ff218810 io: assert small I/O
Let's assert **for now** that we do not read/write less bytes than
expected. This should be fixed to retrigger several reads/writes if we
couldn't read/write enough but for now let's assert.
2025-07-14 10:19:41 +02:00
Nikita Sivukhin
0457567714 more clippy fixes 2025-07-14 12:09:39 +04:00
Krishna Vishal
12f9743443 Remove unused imports 2025-07-14 13:13:54 +05:30
Krishna Vishal
ab0cb06755 split seek and getting rowid as two separate states 2025-07-14 13:11:41 +05:30
Krishna Vishal
3e880c34d6 Make op_new_rowid re-entrant
Introduce `OpNewRowidState` state machine

remove `get_new_rowid` from vdbe/mod.rs
2025-07-14 13:11:40 +05:30
Krishna Vishal
7f2a6187fb Add regression test 2025-07-14 13:09:36 +05:30
Krishna Vishal
98ca275b33 Add a way to semi randomly generate rowid when the max rowid reaches
`i64::MAX`. We do this by attempting to generate random values smaller
than `i64::MAX` for 100 times and returns `DatabaseFull` error on
failure

- Introduced `DatabaseFull` error variant

Fixes: https://github.com/tursodatabase/turso/issues/1977
2025-07-14 13:09:34 +05:30
Nikita Sivukhin
b330c6b70e fix clippy 2025-07-14 11:38:08 +04:00
Nikita Sivukhin
e94ebbad04 remove unwanted changes 2025-07-14 11:27:51 +04:00
Nikita Sivukhin
551c353fff fix clippy 2025-07-14 11:27:51 +04:00
Nikita Sivukhin
cc04f11bd6 remove clone 2025-07-14 11:27:51 +04:00
Nikita Sivukhin
f61d733dd3 make new functions dependend on "json" Cargo feature 2025-07-14 11:26:51 +04:00
Nikita Sivukhin
c9e7271eaf properly pass subtype 2025-07-14 11:20:49 +04:00
Nikita Sivukhin
bf25a0e3f1 fix clippy 2025-07-14 11:20:16 +04:00
Nikita Sivukhin
81cd04dd65 add bin_record_json_object and table_columns_json_array functions 2025-07-14 11:19:45 +04:00
Nikita Sivukhin
eed89993f9 fix clippy 2025-07-14 11:17:32 +04:00
Nikita Sivukhin
5409812610 properly implement generation of before/after records for new modes 2025-07-14 11:17:32 +04:00
Nikita Sivukhin
9e04102a94 add basic cdc tests for new modes 2025-07-14 11:17:31 +04:00
Nikita Sivukhin
fabb00f385 fix test 2025-07-14 11:16:06 +04:00
Nikita Sivukhin
b258c10c9a generate before/after row values in modification statements 2025-07-14 11:16:06 +04:00
Nikita Sivukhin
9129991b62 add id,before,after,full modes 2025-07-14 11:16:06 +04:00
Pekka Enberg
8f8d582b4a Merge 'Ignore double quotes around table names' from Zaid Humayun
This PR normalizes the table name identifier by removing quotes to
improve SQLite compatibility. Fixes
https://github.com/tursodatabase/turso/issues/1964
<img width="545" height="175" alt="Screenshot 2025-07-13 at 5 12 23 PM"
src="https://github.com/user-
attachments/assets/10952fc7-9ade-4c97-a427-385ff2dc3b44" />
<img width="384" height="110" alt="Screenshot 2025-07-13 at 5 12 32 PM"
src="https://github.com/user-
attachments/assets/5d87e0fe-72a4-4472-abc3-24c0d4cc8add" />

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2069
2025-07-14 10:04:59 +03:00
Pekka Enberg
26ba8c1176 Merge 'Efficient Record Comparison and Incremental Record Parsing ' from Krishna Vishal
Currently we deserialize the entire record to compare them or to get a
particular column. This PR introduces efficient record operations such
as incremental column deserialization and efficient record comparison.
### Incremental Column Deserialization
- Introduced `RecordCursor` to keep track of how much of the header and
the record we have already parsed. Each `BTreeCursor` will have its own
`RecordCursor` similar to an `ImmutableRecord`.
- The `RecordCursor` gets the number of columns from schema when the
BTreeCursor is initialized in VDBE. This helps in cutting down heap
allocs by reserving the correct amount of space for underlying `Vec`s.
- `Immutable` record only carries the serialized `payload` now.
- We parse the header up till we reach the required serial type (denoted
by the column index) and then calculate the offsets and deserialize only
that particular slice of the payload.
- Manually inlined most of the deserialization code into `fn op_column`
code because the compiler is refusing to inline even with
`#[inline(always)]` hint. This is probably due to complicated control
flow.
- Tried to follow SQLite semantics, where it returns `Null` when the
requested column falls outside the number of columns available in the
record or when the payload is empty etc.
### Efficient Record Comparison ops
- Three record comparison function are introduced for Integer, String
and for general case which replaces the `compare_immutable`. These
functions compare a serialized record with deserialized one.
- `compare_records_int`: is used when the first field is integer, header
≤63 bytes, ≤13 total fields. No varint parsing, direct integer
extraction.
- `compare_records_string`: is used when the first field is text with
binary collation, header ≤63 bytes.
-  `compare_records_generic`: is used in complex cases, custom
collations, large headers. Here we parse the record incrementally field
by field and comparing each field with the one from the deserialized
record. We early exit on the first mismatch saving on the
deserialization cost.
- `find_compare`: selects the optimal comparison strategy for a given
case and dispatches the function required.
### Benchmarks `main` vs `incremental_column`
I've used the `testing/testing.db` for this benchmark.
| Query                                                       | Main
| Incremental | % Change (Faster is +ve) |
|-------------------------------------------------------------|---------
-|-------------|------------------------|
| SELECT first_name FROM users                                | 1.3579ms
| 1.1452ms      | 15.66          |
| SELECT age FROM users                                       | 912.33µs
| 897.97µs      | 1.57            |
| SELECT email FROM users                                     | 1.3632ms
| 1.215ms       | 10.87            |
| SELECT id FROM users                                        | 1.4985ms
| 1.1762ms      | 21.50            |
| SELECT first_name, last_name FROM users                     | 1.5736ms
| 1.4616ms      | 7.11            |
| SELECT first_name, last_name, email FROM users              | 1.7965ms
| 1.754ms       | 2.36            |
| SELECT id, first_name, last_name, email, age FROM users     | 2.3545ms
| 2.4059ms      | -2.18           |
| SELECT * FROM users                                         | 3.5731ms
| 3.7587ms      | -5.19           |
| SELECT * FROM users WHERE age = 30                          | 87.947µs
| 85.545µs     | 2.73            |
| SELECT id, first_name FROM users WHERE first_name LIKE 'John%' |
1.8594ms   | 1.6781ms      | 9.75            |
| SELECT age FROM users LIMIT 1000                            | 100.27µs
| 95.418µs      | 4.83            |
| SELECT first_name, age, email FROM users LIMIT 1000         | 176.04µs
| 167.56µs      | 4.81            |
Closes: https://github.com/tursodatabase/turso/issues/1703

Closes #1923
2025-07-14 10:04:19 +03:00
Krishna Vishal
370d437491 Add docs for get_tie_breaker_from_idx_comp_op 2025-07-14 03:28:55 +05:30
Krishna Vishal
4c5383b0b3 chore: clippy 2025-07-14 03:28:55 +05:30
Krishna Vishal
e27b9c7e0f Address review comments 2025-07-14 03:28:55 +05:30
Krishna Vishal
a79fe458db Fix merge conflicts and adapt schema.rs to use RecordCursor 2025-07-14 03:28:55 +05:30
Krishna Vishal
ea4a4708ea - Address some review comments
- Add docs for `RecordCursor`
2025-07-14 03:28:55 +05:30
Krishna Vishal
b1f27cad94 chore: fix clippy 2025-07-14 03:28:55 +05:30