Commit Graph

1552 Commits

Author SHA1 Message Date
pedrocarlo
5a7390735d rename Completion functions 2025-10-06 11:07:06 -03:00
Pekka Enberg
be6f3d09ea core/storage: Switch checkpoint_inner() to completion group 2025-10-06 07:33:31 +03:00
pedrocarlo
f3dc0bef5d remove some explicit Arc<dyn File> references 2025-10-03 16:39:57 -03:00
pedrocarlo
e93add6c80 remove dyn DatabaseStorage and replace it with DatabaseFile 2025-10-03 14:14:15 -03:00
Pere Diaz Bou
9c9d4d147e core/btree: fuzz tests force page 1 allocation with a transaction 2025-10-03 13:28:28 +02:00
Pere Diaz Bou
8f103f7c35 core/wal: introduce transaction_count, same as iChange in sqlite 2025-10-03 13:02:47 +02:00
Pekka Enberg
c98bf9b593 Merge 'core/wal: check index header on begin_write_tx' from Pere Diaz Bou
Fixes a page cache staleness issue where connections could incorrectly
believe the database hasn't changed after checkpointing. This can happen
when writes following a checkpoint resulted in the same `max_frame
value`, causing connections to miss updates since they only checked
`max_frame` to detect changes.

Closes #3502
2025-10-03 13:51:22 +03:00
Pere Diaz Bou
b5a969933c core/wal: remove dbg! 2025-10-03 12:17:35 +02:00
Pekka Enberg
b11246278f Merge 'Enable encryption properly in Rust bindings, whopper, and throughput tests' from Avinash Sajjanshetty
This is a follow up from PR - #3457 which requires users to opt in to
enable encryption. This patch
- Makes appropriate changes to Whopper and Encryption throughput tests
- Updated Rust bindings to pass the encryption options properly
- Added a test for rust bindings
To use encryption in Rust bindings, one needs to do:
```rust
let opts = EncryptionOpts {
    hexkey: "b1bbfda...02a5669fc76327".to_string(),
    cipher: "aegis256".to_string(),
};

let builder = Builder::new_local(&db_file).experimental_encryption(true).with_encryption(opts.clone());
let db = builder.build().await.unwrap();
```
We will remove the `experimental_encryption` once the feature is stable.

Closes #3532
2025-10-02 18:32:06 +03:00
Avinash Sajjanshetty
3653c1a853 clear page cache when the encryption context is set 2025-10-02 19:50:12 +05:30
Avinash Sajjanshetty
09ba4615ba return appropriate error if checksum was not compiled 2025-10-02 16:11:18 +05:30
Avinash Sajjanshetty
6d7dc6d183 enable checksums only if its opted in via feature flag 2025-10-02 16:01:56 +05:30
Jussi Saurio
a9d782e319 Merge 'Add encryption internals docs' from Avinash Sajjanshetty
preview - https://github.com/tursodatabase/turso/blob/8d2ef700c9b087a7e2
904c25052e4365395b33b3/docs/manual.md#encryption-1

Closes #3461
2025-10-02 07:04:16 +03:00
Avinash Sajjanshetty
ca0d738f4d Add encryption internals docs 2025-10-02 00:14:28 +05:30
Charly Delaroche
5856dc8733 core/storage: Apple platforms support 2025-10-01 09:59:22 -07:00
Pere Diaz Bou
fe29fcbb09 core/wal: update checkpoint_seq and last_checkpoint on begin_read_tx 2025-10-01 16:17:40 +02:00
Pere Diaz Bou
e84f960516 core/wal: check index header on begin_write_tx 2025-10-01 16:03:17 +02:00
Pekka Enberg
02023ce821 Merge 'core/storage: Switch page cache queue to linked list' from Pekka Enberg
The page cache implementation uses a pre-allocated vector (`entries`)
with fixed capacity, along with a custom hash map and freelist. This
design requires expensive upfront allocation when creating a new
connection, which severely impacted performance in workloads that open
many short-lived connections (e.g., our concurrent write benchmarks that
create a new connection per transaction).
Therefore, replace the pre-allocated vector with an intrusive doubly-
linked list. This eliminates the page cache initialization overhead from
connection establishment, but also reduces memory usage to entries that
are actually used. Furthermore, the approach allows us to grow the page
cache with much less overhead.
The patch improves concurrent write throughput benchmark by 4x for
single-threaded performance.
Before:
```
$ write-throughput --threads 1 --batch-size 100 -i 1000 --mode concurrent
Running write throughput benchmark with 1 threads, 100 batch size, 1000 iterations, mode: Concurrent
Database created at: write_throughput_test.db
Thread 0: 100000 inserts in 3.82s (26173.63 inserts/sec)
```
After:
```
$ write-throughput --threads 1 --batch-size 100 -i 1000 --mode concurrent
Running write throughput benchmark with 1 threads, 100 batch size, 1000 iterations, mode: Concurrent
Database created at: write_throughput_test.db
Thread 0: 100000 inserts in 0.90s (110848.46 inserts/sec)
```

Closes #3456
2025-10-01 16:39:47 +03:00
Pekka Enberg
2b168cf7b0 core/storage: Switch page cache queue to linked list
The page cache implementation uses a pre-allocated vector (`entries`)
with fixed capacity, along with a custom hash map and freelist. This
design requires expensive upfront allocation when creating a new
connection, which severely impacted performance in workloads that open
many short-lived connections (e.g., our concurrent write benchmarks that
create a new connection per transaction).

Therefore, replace the pre-allocated vector with an intrusive
doubly-linked list. This eliminates the page cache initialization
overhead from connection establishment, but also reduces memory usage to
entries that are actually used. Furthermore, the approach allows us to
grow the page cache with much less overhead.

The patch improves concurrent write throughput benchmark by 4x for
single-threaded performance.

Before:

```
$ write-throughput --threads 1 --batch-size 100 -i 1000 --mode concurrent
Running write throughput benchmark with 1 threads, 100 batch size, 1000 iterations, mode: Concurrent
Database created at: write_throughput_test.db
Thread 0: 100000 inserts in 3.82s (26173.63 inserts/sec)
```

After:

```
$ write-throughput --threads 1 --batch-size 100 -i 1000 --mode concurrent
Running write throughput benchmark with 1 threads, 100 batch size, 1000 iterations, mode: Concurrent
Database created at: write_throughput_test.db
Thread 0: 100000 inserts in 0.90s (110848.46 inserts/sec)
```
2025-10-01 14:41:35 +03:00
Jussi Saurio
8a08f085e8 Merge 'Fix SQLite database file pending byte page' from Pedro Muniz
Sqlite has a crazy easter egg where a 1 Gib file offset, it creates a
`PENDING_BYTE_PAGE` that is used only by the VFS layer, and is never
read or written into.
To properly test this, I took inspiration from SQLITE testing framework,
and defined a helper method, that is conditionally compiled with the
`test_helper` feature enabled.
https://github.com/sqlite/sqlite/blob/7e38287da43ea3b661da3d8c1f431aa907
d648c9/src/main.c#L4327
As the `PENDING_BYTE` is normally at the 1 Gib mark, I created a
function that modifies the static `PENDING_BYTE` atomic to whatever
value we want. This means we can test this unusual behaviours at any DB
file size we want.
`fuzz_pending_byte_database` is the test that fuzzes different pending
byte offsets and does an integrity check at the end to confirm, we are
compatible with SQLITE
Closes #2749
<img width="1100" height="740" alt="image" src="https://github.com/user-
attachments/assets/06eb258f-b4b4-47bf-85f9-df1cf411e1df" />

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3431
2025-10-01 08:55:44 +03:00
Jussi Saurio
65abe3efdc Merge 'MVCC: Handle table ID / rootpages properly for both checkpointed and non-checkpointed tables' from Jussi Saurio
**Handle table ID / rootpages properly for both checkpointed and non-
checkpointed tables**
Table ID is an opaque identifier that is only meaningful to the MV
store.
Each checkpointed MVCC table corresponds to a single B-tree on the
pager,
which naturally has a root page.
**We cannot use root page as the MVCC table ID directly because:**
- We assign table IDs during MVCC commit, but
- we commit pages to the pager only during checkpoint
which means the root page is not easily knowable ahead of time.
**Hence:**
- MVCC table ids are always negative
- sqlite_schema rows will have a negative rootpage column if the
  table has not been checkpointed yet.
- on checkpoint when the table is allocated a real root page, we update
the row in sqlite_schema and in MV store's internal mapping
**On recovery:**
- All sqlite_schema tables are read directly from disk and assigned
`table_id = -1 * root_page` -- root_page on disk must be positive
- Logical log is deserialized and inserted into MV store
- Schema changes from logical_log are captured into the DB's global
schema
**Note about recovery:**
I changed MVCC recovery to happen on DB initialization which should
prevent any races, so no need for `recover_lock`, right @pereman2 ?

Closes #3419
2025-10-01 08:55:10 +03:00
pedrocarlo
65cd4d998d page_size can be 0 when it is not initialized, so account for that 2025-09-30 15:58:38 -03:00
pedrocarlo
aa5055e563 fuzz tests for pending_byte 2025-09-30 13:52:40 -03:00
pedrocarlo
3d5978c718 add special hipp pending page that is supposed to be ignored 2025-09-30 13:43:10 -03:00
Jussi Saurio
a52dbb7842 Handle table ID / rootpages properly for both checkpointed and non-checkpointed tables
Table ID is an opaque identifier that is only meaningful to the MV store.
Each checkpointed MVCC table corresponds to a single B-tree on the pager,
which naturally has a root page.

We cannot use root page as the MVCC table ID directly because:
- We assign table IDs during MVCC commit, but
- we commit pages to the pager only during checkpoint
which means the root page is not easily knowable ahead of time.

Hence, we:

- store the mapping between table id and btree rootpage
- sqlite_schema rows will have a negative rootpage column if the
  table has not been checkpointed yet.
2025-09-30 16:53:12 +03:00
Avinash Sajjanshetty
c8111f9555 Put encryption behind an opt in (runtime) flag 2025-09-30 18:29:18 +05:30
Jussi Saurio
35b584f050 Merge 'core: change root_page to i64' from Pere Diaz Bou
Closes #3454
2025-09-30 12:50:23 +03:00
Pere Diaz Bou
2fff6bb119 core: page id to usize 2025-09-30 11:35:06 +02:00
Pekka Enberg
f8a9bb1158 core/storage: Remove unused import from encryption.rs 2025-09-30 11:13:35 +03:00
Pere Diaz Bou
af98067ff1 fmt 2025-09-29 18:40:17 +02:00
Pere Diaz Bou
0f631101df core: change page idx type from usize to i64
MVCC is like the annoying younger cousin (I know because I was him) that
needs to be treated differently. MVCC requires us to use root_pages that
might not be allocated yet, and the plan is to use negative root_pages
for that case. Therefore, we need i64 in order to fit this change.
2025-09-29 18:38:43 +02:00
Preston Thorpe
8665d76c2e Merge 'Improve encryption module' from Avinash Sajjanshetty
This patch improves the encryption module:
1. Previously, we did not use the first 100 bytes in encryption. This
patch uses that portion as associated data, for protection against
tampering and corruption
2. Once the page 1 encrypted, on disk we store a special Turso header
(the first 16 bytes). During decryption we replace this with standard
SQLite's header (`"SQLite format 3\000"`). So that the upper layers (B
Tree or in Sync APIs) operate on the existing SQLite page expectations.
The format is:
```
///                    Turso Header (16 bytes)
///        ┌─────────┬───────┬────────┬──────────────────┐
///        │         │       │        │                  │
///        │  Turso  │Version│ Cipher │     Unused       │
///        │  (5)    │ (1)   │  (1)   │    (9 bytes)     │
///        │         │       │        │                  │
///        └─────────┴───────┴────────┴──────────────────┘
///         0-4      5       6        7-15
///
///        Standard SQLite Header: "SQLite format 3\0" (16 bytes)
///                            ↓
///        Turso Encrypted Header: "Turso" + Version + Cipher ID + Unused
```

Reviewed-by: Nikita Sivukhin (@sivukhin)
Reviewed-by: bit-aloo (@Shourya742)

Closes #3358
2025-09-29 11:04:31 -04:00
Pekka Enberg
f247b1a2bb core/storage: Wrap Pager::commit_info with RwLock
Also remove RefCells from CommitInfo because they're not only redundant,
but cause CommitInfo not to be Send.
2025-09-29 13:54:28 +03:00
Avinash Sajjanshetty
ec1bf8888c refactor to adress review comments 2025-09-28 22:03:47 +05:30
Pekka Enberg
d3abeb6281 core/storage: Wrap WalFile::{max,min}_frame with AtomicU64 2025-09-28 16:47:54 +03:00
Pekka Enberg
aba596441c core/storage: Wrap WalFile::max_frame_read_lock_index with AtomicUsize 2025-09-28 13:42:32 +03:00
Jussi Saurio
959165eed1 Merge 'core/storage: Mark Page as Send and Sync' from Pekka Enberg
Closes #3399
2025-09-28 08:08:46 +03:00
Avinash Sajjanshetty
c2453046fa clippy fixes 2025-09-27 18:16:51 +05:30
Avinash Sajjanshetty
a2df313ad5 Add documentation for the encryption module 2025-09-27 18:11:27 +05:30
Pekka Enberg
ce76aa11b2 core/storage: Mark Page as Send and Sync 2025-09-27 15:16:38 +03:00
Avinash Sajjanshetty
dc3d1fa36d Use the SQLite header as associated data for protection
against tampering and corruption.

Previously, we did not use the first 100 bytes in encryption
machinery. This patch changes that and uses that data as
associated data. So in case the header is corrupted or
tampered with, the decryption will fail.
2025-09-27 17:34:51 +05:30
Pekka Enberg
8d9d2dad1d core/storage: Wrap WalFile::syncing with AtomicBool 2025-09-27 14:07:26 +03:00
Pekka Enberg
931cf2658e core/storage: Display page category for rowid integrity check failure
Let's add more hints to hunt down the reason for #2896.
2025-09-26 18:25:49 +03:00
Pekka Enberg
60e9d1a1c4 core: Wrap Connection::is_nested_stmt in AtomicBool 2025-09-24 19:30:31 +03:00
Pekka Enberg
042a8dd031 core: Wrap Connection::wal_auto_checkpoint_disabled with AtomicBool 2025-09-24 09:12:46 +03:00
Pekka Enberg
aa95cb24ea core: Wrap Connection::page_size with AtomicU16 2025-09-24 09:12:46 +03:00
Pekka Enberg
f5d3962459 core: Wrap Connection::transaction_state with RwLock 2025-09-23 14:01:31 +03:00
Pekka Enberg
b94aa22499 core: Wrap Connection::schema in RwLock 2025-09-23 10:31:20 +03:00
Pekka Enberg
b857f94fe4 Merge 'core: Wrap Connection::pager in RwLock' from Pekka Enberg
Closes #3247
2025-09-23 07:29:09 +03:00
Pekka Enberg
aa454a6637 core: Wrap Connection::pager in RwLock 2025-09-22 17:02:08 +03:00