Commit Graph

4917 Commits

Author SHA1 Message Date
Pekka Enberg
6472a71ae7 Merge 'core: Wrap symbol table with RwLock' from Pekka Enberg
Make it Send.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3168
2025-09-17 11:47:40 +03:00
Pekka Enberg
602c247f87 Merge 'core/ext: Switch vtab_modules from Rc to Arc' from Pekka Enberg
Closes #3166
2025-09-17 11:19:27 +03:00
Pekka Enberg
d7158262ab Merge 'core/storage: Clean up unused import warning in encryption.rs' from Pekka Enberg
...happens when encryption feature is disabled.

Closes #3165
2025-09-17 11:19:20 +03:00
Pekka Enberg
50653258cf core: Wrap symbol table with RwLock
Make it Send.
2025-09-17 10:58:32 +03:00
Pekka Enberg
06d869ea5e core/ext: Switch vtab_modules from Rc to Arc 2025-09-17 10:36:12 +03:00
Pekka Enberg
1e90572e7a core/storage: Clean up unused import warning in encryption.rs
...happens when encryption feature is disabled.
2025-09-17 10:22:36 +03:00
Pekka Enberg
17e9f05ea4 core: Convert Rc<Pager> to Arc<Pager> 2025-09-17 09:32:49 +03:00
Jussi Saurio
104b8dd083 Merge 'Encrypt page 1' from
This PR extends the existing encryption support to include the database
header page (page 1).

Reviewed-by: Avinash Sajjanshetty (@avinassh)

Closes #3040
2025-09-17 09:26:06 +03:00
Jussi Saurio
fad8d0c8b8 fix build 2025-09-17 08:45:13 +03:00
Jussi Saurio
cae234818b Merge 'Inital support for window functions' from Piotr Rżysko
This adds basic support for window functions. For now:
* Only existing aggregate functions can be used as window functions.
* Specialized window-specific functions (`rank`, `row_number`, etc.) are
not yet supported.
* Only the default frame definition is implemented:
`RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW EXCLUDE NO OTHERS`.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3079
2025-09-17 08:29:16 +03:00
rajajisai
e605aff31b Merge branch 'main' into enc-page-1 2025-09-16 10:06:00 -04:00
rajajisai
89caa868f9 Encryption support for database header page 2025-09-16 10:04:30 -04:00
Pekka Enberg
ae25a0f088 Merge 'Implement Min/Max aggregators' from Glauber Costa
We have not implemented them before because they require the raw
elements to be kept. It is easy to see why in the following example:
```
current_min = 3;
insert(2) => current_min = 2 // can be done without state
delete(2) => needs to look at the state to determine new min!
```
The aggregator state was a very simple key-value structure. To
accomodate for min/max, we will make it into a more complex table, where
we can encode a more complex structure.
The key insight is that we can use a primary key composed of:
```
1) storage_id
2) zset_id,
3) element
```
The storage_id and zset_id are our previous key, except they are now
exploded to support a larger range of storage_id. With more bits
available in the storage_id, we can encode information about which
column we are storing. For aggregations in multiple columns, we will
need to keep a different list of values for min/max!
The element is just the values of the columns.
Because this is a primary key, the data will be sorted in the btree. We
can then just do a prefix search in the first two components of the key
and easily find the min/max when needed.
This new format is also adequate for joins. Joins will just have a new
storage_id which encodes two "columns" (left side, right side).

Closes #3143
2025-09-16 16:19:59 +03:00
Jussi Saurio
d9e7b7f0e1 mvcc: starting a pager read tx can fail with busy 2025-09-16 15:19:49 +03:00
Jussi Saurio
e012768549 mvcc: dont allow CONCURRENT transaction to overwrite others changes
We start a pager read transaction at the beginning of the MV transaction, because
any reads we do from the database file and WAL must uphold snapshot isolation.
However, we must end and immediately restart the read transaction before committing.
This is because other transactions may have committed writes to the DB file or WAL,
and our pager must read in those changes when applying our writes; otherwise we would overwrite
the changes from the previous committed transactions.

Note that this would be incredibly unsafe in the regular transaction model, but in MVCC we trust
the MV-store to uphold the guarantee that no write-write conflicts happened.
2025-09-16 15:03:26 +03:00
Jussi Saurio
b4fba69fe2 mvcc: fix logic bug in CommitState::WriteRow iteration order
We must iterate the row versions in reverse order because the
versions are in order of oldest to newest, and we must commit
the newest version applied by the active transaction.
2025-09-16 12:56:17 +03:00
Jussi Saurio
139ce39a00 mvcc: fix logic bug in MvStore::insert_version_raw()
In insert_version_raw(), we correctly iterate the versions backwards
because we want to find the newest version that is still older than
the one we are inserting.

However, the order of `.enumerate()` and `.rev()` was wrong, so the
insertion position was calculated based on the position in the
_reversed_ iterator, not the original iterator.
2025-09-16 12:56:17 +03:00
Jussi Saurio
847e413c34 mvcc: assert that DeleteRowStateMachine must find the row it is deleting 2025-09-16 12:56:17 +03:00
Pekka Enberg
74331898a3 Merge 'Add quoted identifier test cases for ALTER TABLE' from Levy A.
Resolves #2093
There is a small incompatibility on how we quote the added column on the
final schema, but doesn't change any behavior.

Closes #2943
2025-09-16 11:46:12 +03:00
Pekka Enberg
3c62352bcb core/mvcc: Specify level for tracing
..otherwise we perform the tracing for every step() dropping write
throughput by 40%.
2025-09-16 09:51:08 +03:00
Pekka Enberg
950cb8a818 Merge 'Move common dependencies to workspace ' from Pedro Muniz
This removes 4 crates from the `cargo build` and tries to ensure that in
the future we avoid the same crates with different versions.

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3141
2025-09-16 08:30:06 +03:00
Glauber Costa
6bee6bb785 implement min/max
We have not implemented them before because they require the raw
elements to be kept. It is easy to see why in the following example:

current_min = 3;
insert(2) => current_min = 2 // can be done without state
delete(2) => needs to look at the state to determine new min!

The aggregator state was a very simple key-value structure. To
accomodate for min/max, we will make it into a more complex table, where
we can encode a more complex structure.

The key insight is that we can use a primary key composed of:

1) storage_id
2) zset_id,
3) element

The storage_id and zset_id are our previous key, except they are now
exploded to support a larger range of storage_id. With more bits
available in the storage_id, we can encode information about which
column we are storing. For aggregations in multiple columns, we will
need to keep a different list of values for min/max!

The element is just the values of the columns.

Because this is a primary key, the data will be sorted in the btree.
We can then just do a prefix search in the first two components of
the key and easily find the min/max when needed.

This new format is also adequate for joins. Joins will just have
a new storage_id which encodes two "columns" (left side, right side).
2025-09-15 22:30:48 -05:00
Glauber Costa
3565e7978a Add an index to the dbsp internal table
And also change the schema of the main table. I have come to see the
current key-value schema as inadequate for non-aggregate operators.
Calculating Min/Max, for example, doesn't feat in this schema because
we have to be able to track existing values and index them.

Another alternative is to keep one table per operator type, but this
quickly leads to an explosion of tables.
2025-09-15 22:30:48 -05:00
pedrocarlo
3c91ae206b move as many dependencies as possible to workspace to avoid multiple versions of the same dependency 2025-09-15 17:19:36 -03:00
Jussi Saurio
d2d1d1bc61 fix re-entrancy issue in Pager::free_page
current logic can lead to a situation where:

- we call read_page(trunk_page_id)
- we assign trunk_page in the FreePageState state machine
- the page read fails and cache marks it as !locked && !loaded
- next call to Pager::free_page() asserts that the page is loaded and panics
2025-09-15 21:41:18 +03:00
pedrocarlo
7021386f86 move divider_cell_is_overflow_cell to debug assertions so it stops appearing in release builds 2025-09-15 11:11:28 -03:00
Jussi Saurio
32cd01a615 fix deadlock 2025-09-15 14:48:26 +03:00
Jussi Saurio
d493a72cc0 dont unwrap begin_tx 2025-09-15 14:48:26 +03:00
Pekka Enberg
247d4c06c6 Merge 'Fix MVCC update' from Jussi Saurio
Based on #3126
Closes #3029
Closes #3030
Closes #3065
Closes #3083
Closes #3084
Closes #3085
simple reason why mvcc update didn't work: it didn't try to update.

Closes #3127
2025-09-15 14:24:59 +03:00
Pekka Enberg
a5eac9b700 Merge 'avoid unnecessary cloning when formatting Txn for Display' from Avinash Sajjanshetty
Closes #3109
2025-09-15 14:24:32 +03:00
Pekka Enberg
244458199f Merge 'Various fixes to sync' from Nikita Sivukhin
This PR fixes incorrect path registration for sync in browser, add tests
and also expose revision string in the `stats()` method of synced
database

Closes #3124
2025-09-15 14:24:02 +03:00
Pekka Enberg
380b27f58a Merge 'Busy handler' from Pedro Muniz
I searched using deepwiki how SQLite implements their busy handler. They
use a callback system with exponential backoff, where it stores the
callback in the pager and in the database. I confess I found this
slightly confusing, so I just implemented a simple exponential backoff
directly in the `Statement` struct. I imagine SQLite does this in a more
convoluted manner, as they do not have a concept of yielding as we do.
https://deepwiki.com/search/where-is-the-code-for-the-
busy_4a5ed006-4eed-479f-80c3-dd038832831b
I also fixed the rust bindings so that it yields when we return
`StepResult::IO`, instead of just blocking the async function. To
achieve this I implemented the `Stream` trait for `Rows` struct, which
unfortunately came with a slight change to the function signature of
`rows.next()` to `rows.try_next()`.
EDIT:
~test `test_multiple_connections_fuzz` timeouts because now it has the
busy handler "slowing" things down (this test generates a lot of busy
transactions), so it takes a lot longer for the test to run. Not sure if
it is acceptable for us to reduce the number of operations so the test
is shorter.~
EDIT:
Adjusted the API to be more in line with
https://www.sqlite.org/c3ref/busy_timeout.html.
Sets maximum total accumulated timeout. If the duration is None or Zero,
we unset the busy handler for this Connection.
This api defers slightly from SQLite as instead of sleeping for linear
amount of time specified by the user, we will sleep in phases until the
the total amount of time requested is reached. This means we first sleep
of 1ms, then if we still return busy, we sleep for 2 ms, and repeat
until a maximum of 100 ms per phase or we reached the total timeout.
Example:
1. Set duration to 5ms
2. Step through query -> returns Busy -> sleep/yield for 1 ms
3. Step through query -> returns Busy -> sleep/yield for 2 ms
4. Step through query -> returns Busy -> sleep/yield for 2 ms (totaling
5 ms of sleep)
5. Step through query -> returns Busy -> return Busy to user
This slight api change demonstrated a better throughtput in
`perf/throughput/turso` benchmark
```sh
cargo run -p write-throughput --release -- -t 2

Running write throughput benchmark with 2 threads, 100 batch size, 10 iterations, mode: Legacy
Database created at: write_throughput_test.db
Thread 1: 1000 inserts in 0.04s (23438.42 inserts/sec)
Thread 0: 1000 inserts in 0.08s (12385.64 inserts/sec)

=== BENCHMARK RESULTS ===
Total inserts: 2000
Total time: 0.08s
Overall throughput: 24762.60 inserts/sec
Threads: 2
Batch size: 100
Iterations per thread: 10
Database file exists: true
Database file size: 4096 bytes
```
Depends on #3102
Closes #3067

Closes #3074
2025-09-15 13:52:49 +03:00
Jussi Saurio
59f18e2dc8 fix mvcc update
simple reason why mvcc update didn't work: it didn't try to update.
2025-09-15 11:27:56 +03:00
Jussi Saurio
aa7a853cd2 mvcc: fix hang when CONCURRENT tx tries to commit and non-CONCURRENT tx is active 2025-09-15 11:09:19 +03:00
Jussi Saurio
9234ef86ae mvcc: fix two sources of panic
1. commit state machine was assuming that begin_write_tx() cannot
fail, but it can fail if there is another tx that is not using
BEGIN CONCURRENT.

2. if a brand new non-CONCURRENT transaction attempts to start
exclusive transaction but fails with Busy, we must end the read
pager read tx it just started, because otherwise the next time
it attempts to do something it will panic with:

"cannot start a new read tx without ending an existing one"
2025-09-15 10:59:44 +03:00
Nikita Sivukhin
3bcac441e4 reduce log level of some very frequent logs 2025-09-15 11:35:41 +04:00
Jussi Saurio
8f43741513 fix mvcc rollback
executing ROLLBACK did not rollback the mv-store transaction
2025-09-15 09:29:08 +03:00
pedrocarlo
3d265489dc modify semantics of busy_timeout to be more on par with sqlite 2025-09-15 02:20:32 -03:00
pedrocarlo
0586b75fbe expose function to set busy timeout duration 2025-09-15 02:20:32 -03:00
pedrocarlo
a56680f79e implement Busy Handler in Turso statements 2025-09-15 02:16:18 -03:00
Jussi Saurio
f4c15a37d3 add manual hack to mvcc test
we rollback the mvcc transaction in the VDBE, so manually roll it
back in the test
2025-09-14 23:46:38 +03:00
Jussi Saurio
db3428a7a9 remove unused pager parameter 2025-09-14 23:44:24 +03:00
Jussi Saurio
d598775e33 mvcc: properly remove mutations of rolled back tx
mvstore was not removing deletions made by a tx that rolled back.
deletions are removed by clearing the `end` mark from the row
version.
2025-09-14 23:29:14 +03:00
Jussi Saurio
dccf8b9472 mvcc: properly clear tx states when mvcc tx rolls back 2025-09-14 23:29:07 +03:00
Jussi Saurio
487b8710d9 mvcc: don't double-rollback on write-write-conflict
handle_program_error() already rolls back if this error happens.
double rollback causes a crash.
2025-09-14 23:28:21 +03:00
Jussi Saurio
2ca1640a2a not always write 2025-09-14 22:24:07 +03:00
Jussi Saurio
396091044e store tx_mode in conn.mv_tx
otherwise op_transaction works completely wrong because each separate
insert statement overrides the tx_mode to Write
2025-09-14 21:59:08 +03:00
Jussi Saurio
7fe25a1d0e mvcc: remove conn.mv_transactions
afaict this isn't needed for anything since there is already
conn.mv_tx_id
2025-09-14 21:26:58 +03:00
Jussi Saurio
5feb9ea2f0 mvcc: fix non-concurrent transaction semantics
on the main branch, mvcc allows concurrent inserts from multiple
txns even without BEGIN CONCURRENT, and then always hangs whenever
one of the txns tries to commit.

this commit fixes that issue.
2025-09-14 21:23:06 +03:00
Avinash Sajjanshetty
25d4070d3b avoid unnecessary cloning when formatting Txn for Display 2025-09-14 23:14:47 +05:30