Commit Graph

7172 Commits

Author SHA1 Message Date
Nikita Sivukhin
253b4933f7 more small fixes 2025-08-06 19:30:16 +04:00
Nikita Sivukhin
5b9f3816b3 fix after rebase 2025-08-06 19:27:10 +04:00
Nikita Sivukhin
1763e9bbf9 small fixes 2025-08-06 19:26:55 +04:00
Nikita Sivukhin
b612259a3a more friendly copmletely runtime agnostic turso-sync-engine crate 2025-08-06 19:26:55 +04:00
Pekka Enberg
0c9216d1cc Merge 'cdc: emit entries for schema changes' from Nikita Sivukhin
This PR emit CDC entries as changes in `sqlite_schema` table for DDL
statements: `CREATE TABLE` / `CREATE INDEX` / etc.
The logic is a bit tricky as under the hood `turso` can do some implicit
DDL operations like:
1. Creating auto-indexes in case of `CREATE TABLE`
2. Deletion of all attached indices in case of `DROP TABLE`
```
turso> PRAGMA unstable_capture_data_changes_conn('full');
turso> CREATE TABLE t(x, y, z UNIQUE, q, PRIMARY KEY (x, y));
turso> CREATE INDEX t_xy ON t(x, y);
turso> CREATE TABLE q(a, b, c);
turso> ALTER TABLE q DROP COLUMN b;

turso> SELECT
change_id, 
id,
change_type, 
table_name,
bin_record_json_object(table_columns_json_array(table_name), before) AS before,
bin_record_json_object(table_columns_json_array(table_name), after) AS after
FROM turso_cdc;
┌───────────┬────┬─────────────┬───────────────┬─────────────────────────────────────────────────────────────────────┬─────────────────────────────────────────────────────────────────────┐
│ change_id │ id │ change_type │ table_name    │ before                                                              │ after                                                               │
├───────────┼────┼─────────────┼───────────────┼─────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────┤
│         1 │  2 │           1 │ sqlite_schema │                                                                     │ {"type":"table","name":"t","tbl_name":"t","rootpage":3,"sql":"CREA… │
├───────────┼────┼─────────────┼───────────────┼─────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────┤
│         2 │  5 │           1 │ sqlite_schema │                                                                     │ {"type":"index","name":"t_xy","tbl_name":"t","rootpage":6,"sql":"C… │
├───────────┼────┼─────────────┼───────────────┼─────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────┤
│         3 │  6 │           1 │ sqlite_schema │                                                                     │ {"type":"table","name":"q","tbl_name":"q","rootpage":7,"sql":"CREA… │
├───────────┼────┼─────────────┼───────────────┼─────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────┤
│         4 │  6 │           0 │ sqlite_schema │ {"type":"table","name":"q","tbl_name":"q","rootpage":7,"sql":"CREA… │ {"type":"table","name":"q","tbl_name":"q","rootpage":7,"sql":"CREA… │
└───────────┴────┴─────────────┴───────────────┴─────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────┘
```
For now, CDC capture only all explicit operations and ignore all
implicit operations. The reasoning for that is that one use case for CDC
is to apply logical changes as is with simple SQL statements - but if
implicit operations will be logged to the CDC table too - we can have
hard times using simple SQL statement (for example, creation of
`autoindices` will always work; implicit deletion of indices for `DROP
TABLE` also can lead to some troubles and force us to is `DROP INDEX IF
EXISTS ...` statements + we will need to filter out autoindices in this
case too).
Also, to simplify PR, for now `DatabaseTape` from `turso-sync` package
just ignore all schema changes from CDC table.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2426
2025-08-06 14:48:27 +03:00
Jussi Saurio
aaa9ed1d9f Merge 'Add regexp capture' from bit-aloo
This PR adds RegExp Capture to regexp module

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2465
2025-08-06 12:13:19 +03:00
Jussi Saurio
8e4597d11b Merge 'Add load_insn macro for compiler hint in vdbe::execute hot path' from Preston Thorpe
The built-in `unreachable!` macro, believe it or not is just an alias
for `panic!` and does not actually provide the compiler with a hint that
the path is not reachable.
This provides a wrapper around the actual
`std::hint::unreachable_unchecked()`, to be used only in the very hot
path of `execute` where it is not possible to be the incorrect variant.

Closes #2459
2025-08-06 12:05:33 +03:00
Jussi Saurio
9cf8a9b533 Merge 'refactor/btree: don't clone WriteState during balancing' from Jussi Saurio
## Beef
Removes cloning of `WriteState` in `balance_non_root()` and removes the
`Clone` implementation from `WriteState`.  Not only is it unnecessary to
clone the state, but even having the `Clone` implementation available is
a real regression risk in `insert_into_page()` as it will deep clone
vectors that we require stable raw pointers to in `fill_cell_payload()`
## Details
In removing the cloning of `WriteState` in `balance_non_root()`, this
replaces the `next_write_state` logic with a plain loop, which ends up
looking like a lot of changes, but it's really not. Main things that
were changed:
- Loop and mutate state instead of create new state and return
- Make `WriteInfo` an `Option` in `WriteState` for ergonomics so it can
be `.take()`:n instead of `.clone()`:d
- Clone a `Rc<Pager>` in `balance_non_root()` so that we can call
`pager.do_allocate_page()` directly instead of `self.allocate_page()`
which is just a facade for the pager method, and would violate borrowing
rules.
- assign `let usable_space = self.usable_space()` at the beginning of
the balance loop for similar reasons.
- Make the balance debug validation methods static so they don't require
a reference to self, for similar reasons

Reviewed-by: Nikita Sivukhin (@sivukhin)
Reviewed-by: Avinash Sajjanshetty (@avinassh)

Closes #2466
2025-08-06 12:03:23 +03:00
Jussi Saurio
5f3cfaac60 refactor/btree: don't clone WriteState in balance_non_root() 2025-08-06 11:30:09 +03:00
Jussi Saurio
a15d7dd2e7 refactor/btree: don't clone WriteState in balance() 2025-08-06 11:30:09 +03:00
Jussi Saurio
3d635ecd67 Merge 'refactor/btree: don't clone WriteState in insert_into_page()' from Jussi Saurio
## Problem
We currently clone `WriteState` in every loop iteration of
`insert_into_page()`, which was probably done for borrow checker
reasons, but since `WriteState` has expanded to include buffers that
must not be moved in memory or dropped, it has necessitated a really
annoying workaround of wrapping the buffers in `Arc<Mutex>>` which is
just completely wasteful.
## Fix
Do not clone `WriteState` in `insert_into_page()`, and instead work with
the borrow checker a bit more. Note that `WriteState` still _implements_
`Clone` because it's also cloned in `balance_non_root()` - that can be a
separate refactor.

Reviewed-by: Avinash Sajjanshetty (@avinassh)
Reviewed-by: Nikita Sivukhin (@sivukhin)

Closes #2464
2025-08-06 11:29:55 +03:00
Jussi Saurio
406cbb9e78 Merge 'Coll seq' from Glauber Costa
Implement the CollSeq vdbe opcode.

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #2454
2025-08-06 11:28:27 +03:00
Pekka Enberg
2913dc4dd4 Turso 0.1.4-pre.2 2025-08-06 10:52:32 +03:00
bit-aloo
84316d9206 fix python lint 2025-08-06 12:02:20 +05:30
Jussi Saurio
1c1f55fdfb refactor/btree: remove cloning of WriteState in insert_into_page() 2025-08-06 08:50:56 +03:00
Jussi Saurio
c3a32b63bf refactor/btree: remove unnecessary ref of self in overwrite_content() 2025-08-06 08:45:34 +03:00
Jussi Saurio
6dd08c21e4 refactor/btree: remove unnecessary mut ref of self in rowid() 2025-08-06 08:44:52 +03:00
Jussi Saurio
fc95282c57 Merge 'core/btree: fix re-entrancy bug in insert_into_page()' from Jussi Saurio
## Background
We currently clone `WriteState` on every iteration of
`insert_into_page()`,
presumably for _Borrow Checker Reasons (tm)_.
## Problem
There was a bug in `WriteState::Insert` handling where if
`fill_cell_payload()`
returned IO, the `fill_cell_payload_state` was not updated in
`write_info.state`, leading to an infinite loop of allocating new pages.
Further, the `new_payload` vector was also always deep-cloned.
## Fix
Update the state if `fill_cell_payload()` returns IO. Because of
`WriteState` cloning we also need to `Arc<Mutex>` wrap the vector so
that the underlying data buffer doesn't get cloned the next time
`WriteState` gets cloned, since `fill_cell_payload` relies on raw
pointers to work. Left a couple of prominent FIXMEs about this.
## Notes
This bug was surfaced by, but not caused by,
https://github.com/tursodatabase/turso/pull/2400.

Closes #2463
2025-08-06 08:23:49 +03:00
Jussi Saurio
839d428e36 core/btree: fix re-entrancy bug in insert_into_page()
We currently clone WriteState on every iteration of `insert_into_page()`,
presumably for Borrow Checker Reasons (tm).

There was a bug in `WriteState::Insert` handling where if `fill_cell_payload()`
returned IO, the `fill_cell_payload_state` was not updated in
`write_info.state`, leading to an infinite loop of allocating new pages.

This bug was surfaced by, but not caused by, #2400.
2025-08-06 08:01:49 +03:00
Jussi Saurio
cd3fe523a3 core/types: add IOResult::is_io() helper 2025-08-06 07:46:51 +03:00
PThorpe92
f6fb786cc9 Fix borrow method on WindowsIO 2025-08-05 22:26:19 -04:00
Preston Thorpe
cb9a74ccc6 Merge 'Remove RefCell from Buffer in IO trait methods and PageContents' from Preston Thorpe
We are doing unsafe mut borrowing in `PageContent` currently, and when
new BufferPool is merged, it will be all pointers into an arena anyway.
We just need to be sure to clone the Arc and reference the buffers in
the completion callbacks so they are ensured to live for async IO.
naturally in true spirit of all my previous PR's, I needed to introduce
one while another is open that causes an absurd amount of conflicts.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2456
2025-08-05 21:30:27 -04:00
PThorpe92
4010c7bf0b Make clippy happy 2025-08-05 21:24:54 -04:00
PThorpe92
00a3c7eb52 Apply PR comments, fix syntax 2025-08-05 21:17:23 -04:00
PThorpe92
d5f9e60dfc Add assert_insn for compiler hint in execute hot path 2025-08-05 18:32:27 -04:00
Nikita Sivukhin
d504f9b1ac fix turso-sync tests 2025-08-06 01:03:49 +04:00
Nikita Sivukhin
c0d5c55d5c fix tests and clippy 2025-08-06 01:03:49 +04:00
Nikita Sivukhin
9c4147e8d6 add simple integration tests 2025-08-06 01:03:49 +04:00
Nikita Sivukhin
c6a87d61c7 emit CDC entries if necessary for schema changes 2025-08-06 01:03:49 +04:00
Nikita Sivukhin
0b4c1ac802 refactor code a little bit 2025-08-06 01:03:48 +04:00
PThorpe92
53a0524050 Fix clippy warning 2025-08-05 16:24:50 -04:00
PThorpe92
f6a68cffc2 Remove RefCell from IO and Page apis 2025-08-05 16:24:49 -04:00
Preston Thorpe
01531be9f4 Merge 'Remove Clone impl for Buffer and PageContent' from Preston Thorpe
After #2258, we no longer need to clone/deep copy `PageContent` or
`Buffer`.
We can remove the implementation to make any copying of page data
explicit.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2453
2025-08-05 16:17:13 -04:00
Glauber Costa
d1be7ad0bb implement the collseq bytecode instruction
SQLite generates those in aggregations like min / max with collation
information either in the table definition or in the column expression.

We currently generate the wrong result here, and properly generating the
bytecode instruction fixes it.
2025-08-05 13:49:04 -05:00
Glauber Costa
6a66053ca8 make sure value comparisons for min and max are collation aware
They currently aren't, which isn't right.
2025-08-05 13:39:38 -05:00
PThorpe92
914c10e095 Remove Clone impl for Buffer and PageContent 2025-08-05 14:26:53 -04:00
bit-aloo
eff096273c update compat.md 2025-08-05 20:54:55 +05:30
bit-aloo
e92fdfca9f refactir regexp replace 2025-08-05 20:53:24 +05:30
bit-aloo
db17a195f3 refactor regexp capture 2025-08-05 20:48:47 +05:30
bit-aloo
b26a58f652 update extension.py with regexp_replace test 2025-08-05 20:36:09 +05:30
bit-aloo
20d4da7054 register regexp_capture with register_extension 2025-08-05 20:19:48 +05:30
bit-aloo
0fcb302d8f add regexp_capture 2025-08-05 20:18:14 +05:30
Pekka Enberg
9492a29d47 Merge 'Fix performance regression' from Jussi Saurio
Closes #2440
## Fix 1
Do not start a read transaction when a SELECT is not going to access the
database, which means we can avoid checking whether the schema has
changed.
## Fix 2
Add a field `accesses_db` to `Program` and `Statement` so we can avoid
even checking for `SchemaUpdated` errors when it's not possible to get
one.
## Fix 3
Avoid doing any work in `commit_txn` when not in a transaction. This
optimization is only enabled when `mv_store.is_none()`, because MVCC has
its own logic and this doesn't work with MVCC enabled, and honestly I'm
too tired to find out why. Left an inline comment about it, though.
```sql
Execute `SELECT 1`/limbo_execute_select_1
                        time:   [21.440 ns 21.513 ns 21.586 ns]
                        change: [-60.766% -60.616% -60.453%] (p = 0.00 < 0.05)
                        Performance has improved.
```
Effect is even more dramatic in CI where the latency is down over 80%

Closes #2441
2025-08-05 16:30:18 +03:00
Jussi Saurio
cde8567b1d Merge 'More state machine + Return IO in places where completions are created' from Pedro Muniz
In preparation for tracking IO Completions, we need to start to return
IO in places where completions are created. Doing some more plumbing now
to avoid bigger PRs for the future

Closes #2438
2025-08-05 15:47:51 +03:00
Jussi Saurio
d13a1a0eec Merge 'test/fuzz: add ALTER TABLE column ops to tx isolation fuzz test' from Jussi Saurio
## Beef
Adds `AddColumn`, `DropColumn`, `RenameColumn`
## Details
- Previously the test was hardcoded to assume there's always 2 named
columns, so changed a bunch of things for this reason
- Still assumes the primary key column is always `id` and is never
renamed or dropped etc.

Closes #2434
2025-08-05 15:42:31 +03:00
Pekka Enberg
49123db6e8 Merge 'core/mvcc: implement exists' from Pere Diaz Bou
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2446
2025-08-05 15:34:23 +03:00
Jussi Saurio
9f820ae4c7 Merge 'cleanup: remove unused page uptodate flag' from Jussi Saurio
Closes #2445
2025-08-05 15:26:41 +03:00
Jussi Saurio
1feb5ba2d3 perf/vdbe: avoid doing work in commit_txn if not in txn 2025-08-05 15:25:28 +03:00
Jussi Saurio
3f633247f7 perf/stmt: avoid checking for SchemaUpdated errors if it's impossible 2025-08-05 15:10:55 +03:00
Jussi Saurio
c498196c7b fix/perf: fix regression in SELECT 1 benchmark
Do not start a read transaction when a SELECT is not going to access
the database, which means we can avoid checking whether the schema has
changed.
2025-08-05 15:10:55 +03:00