If we don't clear the dirty pages, we will initiate a rollback. In the
rollback, we will attempt to clear the whole page cache, but it will
then panic because there will still be dirty pages from the failed
writev
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#3189
This PR add proper program abort in case of unfinished statement reset
and interruption.
Also, this PR makes rollback methods non-failing because otherwise of
their callers usually unclear (if rollback failed - what is the state of
statement/connection/transaction?)
Reviewed-by: Preston Thorpe <preston@turso.tech>
Closes#3591
Introduces a completion group abstraction that allows grouping multiple
I/O completions together for coordinated tracking and error handling.
This enables:
- Tracking completion status of multiple I/O operations as a group
- Detecting when all operations in a group have finished
- Aborting all operations in a group atomically
- Retrieving errors from any completion in the group
The implementation uses intrusive linked lists for efficient membership
tracking and atomic counters for outstanding operation counts. Each
completion can be linked to a group using the new .link() method.
This lays the groundwork for batch I/O operations and coordinated
transaction handling in the storage layer.
This PR makes sync client completely autonomous as now it can defer
initial sync.
This can open possibility to asynchronously create DB in the Turso Cloud
while giving user ability to interact with local DB straight away.
Closes#3531
MVCC bootstrap connection got stuck into an infinite statement reparsing
loop because the bootstrap procedure happened before the on-disk schema
was deserialized.
closes#3518Closes#3522
MVCC bootstrap connection got stuck into an infinite statement
reparsing loop because the bootstrap procedure happened before the
on-disk schema was deserialized.
The VDBE step() function was taking Arc<MvStore> by value, causing it to
be cloned on every single step of query execution. This resulted in
thousands of atomic reference count increments/decrements per query,
showing up as a major hotspot in profiling.
Changed step() and related functions to take Option<&Arc<MvStore>>
instead, passing a reference rather than cloning the Arc. This eliminates
the unnecessary atomic operations while maintaining the same semantics.
Sqlite has a crazy easter egg where a 1 Gib file offset, it creates a
`PENDING_BYTE_PAGE` that is used only by the VFS layer, and is never
read or written into.
To properly test this, I took inspiration from SQLITE testing framework,
and defined a helper method, that is conditionally compiled with the
`test_helper` feature enabled.
https://github.com/sqlite/sqlite/blob/7e38287da43ea3b661da3d8c1f431aa907
d648c9/src/main.c#L4327
As the `PENDING_BYTE` is normally at the 1 Gib mark, I created a
function that modifies the static `PENDING_BYTE` atomic to whatever
value we want. This means we can test this unusual behaviours at any DB
file size we want.
`fuzz_pending_byte_database` is the test that fuzzes different pending
byte offsets and does an integrity check at the end to confirm, we are
compatible with SQLITE
Closes#2749
<img width="1100" height="740" alt="image" src="https://github.com/user-
attachments/assets/06eb258f-b4b4-47bf-85f9-df1cf411e1df" />
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#3431
**Handle table ID / rootpages properly for both checkpointed and non-
checkpointed tables**
Table ID is an opaque identifier that is only meaningful to the MV
store.
Each checkpointed MVCC table corresponds to a single B-tree on the
pager,
which naturally has a root page.
**We cannot use root page as the MVCC table ID directly because:**
- We assign table IDs during MVCC commit, but
- we commit pages to the pager only during checkpoint
which means the root page is not easily knowable ahead of time.
**Hence:**
- MVCC table ids are always negative
- sqlite_schema rows will have a negative rootpage column if the
table has not been checkpointed yet.
- on checkpoint when the table is allocated a real root page, we update
the row in sqlite_schema and in MV store's internal mapping
**On recovery:**
- All sqlite_schema tables are read directly from disk and assigned
`table_id = -1 * root_page` -- root_page on disk must be positive
- Logical log is deserialized and inserted into MV store
- Schema changes from logical_log are captured into the DB's global
schema
**Note about recovery:**
I changed MVCC recovery to happen on DB initialization which should
prevent any races, so no need for `recover_lock`, right @pereman2 ?
Closes#3419
Table ID is an opaque identifier that is only meaningful to the MV store.
Each checkpointed MVCC table corresponds to a single B-tree on the pager,
which naturally has a root page.
We cannot use root page as the MVCC table ID directly because:
- We assign table IDs during MVCC commit, but
- we commit pages to the pager only during checkpoint
which means the root page is not easily knowable ahead of time.
Hence, we:
- store the mapping between table id and btree rootpage
- sqlite_schema rows will have a negative rootpage column if the
table has not been checkpointed yet.
MVCC is like the annoying younger cousin (I know because I was him) that
needs to be treated differently. MVCC requires us to use root_pages that
might not be allocated yet, and the plan is to use negative root_pages
for that case. Therefore, we need i64 in order to fit this change.
MVCC does currently not support indexes. Therefore,
- Fail if a database with indexes is opened with MVCC
- Disallow `CREATE INDEX` when MVCC is enabled
Fixes: #3108
If we open database and logical log is not empty we need to recover from
it. We also make sure a single recover executes concurrently and other
connections just wait for it to finish.
I also changed the fuzz tester to use `restart` instead of calling
`load_logical_log` manually to test this behaviour.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#3359
fixes#1976
and #1605
```zsh
turso> DROP TABLE IF EXISTS t;
CREATE TABLE t (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT
);
turso> INSERT INTO t (name) VALUES ('A'); SELECT * FROM sqlite_sequence;
┌──────┬─────┐
│ name │ seq │
├──────┼─────┤
│ t │ 1 │
└──────┴─────┘
turso> DROP TABLE IF EXISTS t;
CREATE TABLE t (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT
);
turso> INSERT INTO t (name) VALUES ('A'); SELECT * FROM sqlite_sequence;
┌──────┬─────┐
│ name │ seq │
├──────┼─────┤
│ t │ 1 │
└──────┴─────┘
turso> INSERT INTO t (name) VALUES ('A'); SELECT * FROM sqlite_sequence;
┌──────┬─────┐
│ name │ seq │
├──────┼─────┤
│ t │ 2 │
└──────┴─────┘
turso>
```
Reviewed-by: Preston Thorpe <preston@turso.tech>
Closes#2983