in #2521, I messed up and introduced improper calculation of the current
checkpoint's max safe frame (mostly due to incorrect comments that I had
left on the method).
The confusion partially stems from our lack of Busy handling at the
moment, but essentially when determining the max safe frame for all
readers, for passive mode we cannot simply `break` out of the loop when
we find a reader with a lower read mark than we have, because _another_
reader might have an even _lower_ read mark, and we could proceed with
the first mark < shared_max.
And for !passive modes, we still attempt to backfill with the same lower
frame, we just return `Busy` at the end, after backfilling what we can
(we just don't reset the log for restart/truncate).
Most of the changes in this PR is just the renaming the fields of
Checkpoint Result, because the names were confusing
Closes#2560
This PR fixes few issues in turso-sync-engine implementation:
1. One step of `pull` implementation works like this:
a. Start write WAL session
b. Revert local changes in WAL
c. Replay WAL frames from remote DB
d. Replay WAL frames produced by local changes applied to the remote
DB copy (`synced`)
My initial thinking was that by executing step (d) we will get the same
schema as before (with same schema cookie) and everything will be fine.
With more deep thinking we can see that it's not fine - as after step
(d) tables created locally can change their root pages (if they were
created locally, for example) - and DB will have "broken" schema
2. Sync engine execute few SQL statements and do not run them to
completion - which basically created "orphaned" locks
In order to fix (1) I decided to introduce another `conn_raw_api`
extension which allows to read and write schema cookie directly in the
transaction. So, the process described above adds step (e) which set
schema cookie to the value strictly greater than the value before.
In order to fix (2) I just fixed all places where statement were dropped
before running to completion.
These fixes are merged together because I explored them by fixing one
new test: `test_sync_single_db_many_pulls_big_payloads`
Closes#2561
We have to update the Transaction State before checking for the Schema
Cookie so that we can rollback the transaction later on correctly.
Closes#2535Closes#2549
Small contribution to my current work on making checkpointing efficient.
We hold a write lock, and especially here on main there is no reason to
mark the pages as dirty in the cache, so we can do away with that
Vec<u64> and just track whether it's `Done`
Closes#2545
Mainly the performance impact here comes from removing some unnecessary
checks and inlining `read_integer_fast()` directly into `op_column()`,
but I also added some fiddly nano-optimizations for fun
On main, we are roughly 3.4x slower than sqlite on `SELECT * FROM users
LIMIT 100`, and here we are roughly 3.2x slower, which ain't much, but
it's honest work.
A more impactful optimization, but a much more annoying refactor, would
be #2304Closes#2516
Add support for schema changes and granular updates in the
`DatabaseTape` and turso-sync-engine
Now, schema changes made locally will be replicated to the remote too.
Also, `UPDATE`s made locally will touch only changed columns (before we
did `DELETE` + `INSERT` which can overwrite non-conflicting changes from
another device to the same row).
Note, that schema changes replication for now can be pretty dangerous,
as we can't extract proper schema at some moment in time from turso_cdc
and always use latest schema columns. This means that it's better to
avoid `ALTER TABLE ...` to be executed locally, but basic DDL like
`CREATE TABLE / CREATE INDEX / DROP TABLE / DROP INDEX` will work fine
(as columns only appear/disappear for schema in this case).
Closes#2540
This PR adds new `updates` column to the CDC table. This column holds
updated fields of the row in the following format:
```
[C boolean values where true set for changed columns]
[C values with updates where NULL is set for not-changed columns]
```
For example:
```
turso> UPDATE t SET y = 'turso', q = 'db' WHERE rowid = 1;
turso> SELECT bin_record_json_object('["x","y","z","q","x","y","z","q"]', updates) as updates FROM turso_cdc;
┌──────────────────────────────────────────────────────────────────┐
│ updates │
├──────────────────────────────────────────────────────────────────┤
│ {"x":0,"y":1,"z":0,"q":1,"x":null,"y":"turso","z":null,"q":"db"} │
└──────────────────────────────────────────────────────────────────┘
```
Also, this column works differently for `ALTER TABLE` statements where
update value for `sql` will be equal to the original `ALTER TABLE`:
```
turso> ALTER TABLE t ADD COLUMN t;
turso> SELECT bin_record_json_object('["type","name","tbl_name","rootpage","sql","type","name","tbl_name","rootpage","sql"]', updates) as updates FROM turso_cdc WHERE rowid = 2;
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ updates │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ {"type":0,"name":0,"tbl_name":0,"rootpage":0,"sql":1,"type":null,"name":null,"tbl_name":null,"rootpage":null,"sql":"ALTER TABLE t ADD COLUMN t;"} │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```
This will help turso-db to implement logical replication which supports
both column-level updates and schema changes
Closes#2538
1. Introduce state machines to insert and delete to make sure IO is
handled properly
2. When `rowid` is changed in `UPDATE`, it is handled as a combination
of delete and insert, so `op_insert` doesn't need to update the
incremental view with the deleted old column since the preceding
`op_delete` instruction should already do it.
Closes#2542
populate now has its own code path to apply changes to the view. It was
okay until now because all we do is filter. But now that we are also
applying aggregations, we'll end up with two disjoint code paths.
A better approach is to just apply the results of our select to the
delta set, and apply it.
Improves the clarity of the README's "Getting Started" → "Command Line"
section by adding explicit `$ tursodb` command example so users know
exactly what to type.
Closes#2534
I'm not sure how much this will clash with @TcMits's parser rewrite,
hopefully not too much. If it does and we eventually have to remove it,
at least we'll have two new regression tests.
Closes https://github.com/tursodatabase/turso/issues/2484Closes#2499
Use the same rusqlite version in every crate and use a bundled up-to-
date sqlite version
(the impetus for this PR is still me trying to figure out why sqlite in
the insert benchmark doesn't seem to be fsyncing, even when instructed)
Closes#2507