Commit Graph

10850 Commits

Author SHA1 Message Date
Pavan-Nambi
8d2b06e6bf remove stupid files,clippy and tcl-syntax 2025-11-14 08:24:01 +05:30
Pavan-Nambi
eaa8edb6f7 don't overwrite col mappings 2025-11-14 07:41:41 +05:30
Pekka Enberg
82d7a6ff27 Merge 'Nyrkiö: Set all comment-on to false' from Henrik Ingo
Closes #3943
2025-11-13 16:38:48 +02:00
Pekka Enberg
1f79fbc22c Merge 'Partial sync basic' from Nikita Sivukhin
This PR implements basic support for partial sync. Right now the scope
is limited to only `:memory:` IO and later will be properly expanded to
the file based IO later.
The main addition is `PartialDatabaseStorage` which make request to the
remote server for missing local pages on demand.
The main change is that now tursodatabase JS bindings accept optional
"external" IO event loop which in case of sync will drive `ProtocolIo`
internal work associated with remote page fetching tasks.

Closes #3931
2025-11-13 16:38:04 +02:00
Henrik Ingo
dd25599529 Set all comment-on to false 2025-11-13 01:16:09 +02:00
Pere Diaz Bou
7c96b6d9f9 Merge 'Fix: Drop internal DBSP table when dropping materialized view' from Martin Mauch
# Fix: Clean up DBSP state table when dropping materialized views
## Problem
When dropping a materialized view, the internal DBSP state table (e.g.,
`__turso_internal_dbsp_state_v1_view_name`) and its automatic primary
key index were not being properly cleaned up. This caused two issues:
1. **Persistent schema entries**: The DBSP table and index entries
remained in `sqlite_schema` after dropping the view
2. **In-memory schema inconsistency**: The DBSP table remained in the
in-memory schema's `tables` HashMap, causing "table already exists"
errors when trying to recreate a materialized view with the same name
## Root Cause
The issue had two parts:
1. **Missing sqlite_schema cleanup**: The `translate_drop_view` function
deleted the view entry from `sqlite_schema` but didn't delete the
associated DBSP state table and index entries
2. **Missing in-memory schema cleanup**: The `remove_view` function
removed the materialized view from the in-memory schema but didn't
remove the DBSP state table and its indexes
## Solution
### Changes in `core/translate/view.rs`
- Added a second pass loop in `translate_drop_view` to scan
`sqlite_schema` and delete DBSP table and index entries
- The loop checks for entries matching the DBSP table name pattern
(`__turso_internal_dbsp_state_v{version}_{view_name}`) and the automatic
index name pattern (`sqlite_autoindex___turso_internal_dbsp_state_v{vers
ion}_{view_name}_1`)
- Registers for comparison values are allocated outside the loop for
efficiency
- Column registers are reused across loop iterations
### Changes in `core/schema.rs`
- Updated `remove_view` to also remove the DBSP state table and its
indexes from the in-memory schema's `tables` HashMap and `indexes`
collection
- This ensures consistency between the persistent schema
(`sqlite_schema`) and the in-memory schema
### Tests Added
Added two new test cases in `testing/materialized_views.test`:
1. **`matview-drop-cleans-up-dbsp-table`**: Explicitly verifies that
after dropping a materialized view:
   - The view entry is removed from `sqlite_schema`
   - The DBSP state table entry is removed from `sqlite_schema`
   - The DBSP state index entry is removed from `sqlite_schema`
2. **`matview-recreate-after-drop`**: Verifies that a materialized view
can be successfully recreated after being dropped, which implicitly
tests that all underlying resources (including DBSP tables) are properly
cleaned up
## Testing
- All existing materialized view tests pass
- New tests specifically verify the cleanup behavior
- Manual testing confirms that materialized views can be dropped and
recreated without errors
## Related
This fix ensures that materialized views can be safely dropped and
recreated, resolving issues where the DBSP state table would persist and
cause conflicts.

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3928
2025-11-12 17:16:04 +01:00
Nikita Sivukhin
740ff2b4a6 fix clippy 2025-11-12 16:46:15 +04:00
Nikita Sivukhin
15dafd46c1 replace turso_assert -> assert 2025-11-12 16:40:38 +04:00
Nikita Sivukhin
3d14092679 fix 2025-11-12 16:38:04 +04:00
Nikita Sivukhin
72089d2682 adjust compilation 2025-11-12 16:30:50 +04:00
Nikita Sivukhin
41d7d5af49 adjust tests 2025-11-12 16:15:54 +04:00
Nikita Sivukhin
54cb7758ef fix formatting 2025-11-12 16:14:26 +04:00
Nikita Sivukhin
aa65cfd55d update Cargo.toml 2025-11-12 16:14:14 +04:00
Jussi Saurio
16097e7355 Merge 'Add RowSet<Add/Read/Test> instructions and rowset implementation' from Jussi Saurio
## What
Rowsets are used in SQLite for two purposes:
1. for membership tests on a set of `i64`s,
2. for in-order iteration of a set of `i64`s,
Both in cases where we can just use rowids (which are `i64`) instead of
building an entire ephemeral btree from a table's contents.
For example, in cases where a `DELETE FROM tbl WHERE ...` is performed
on a table that has any `BEFORE DELETE` triggers, SQLite collects the
table's rowids into a RowSet before actually performing the deletion.
This is similar to how an UPDATE that modifies rowids (or the index used
to iterate the UPDATE loop) will first collect the rows into an
ephemeral index, and same with `INSERT INTO ... SELECT`.
## Details
RowSet uses a "batch" concept where insertions of a given batch must be
guaranteed by caller to contain no duplicates and will be pushed into a
vector for O(1). When a new batch is started, the previous batch is
folded into a `BTreeSet` so that membership tests can be performed in
O(logn). As far as I can tell, the "in-order iteration" use case doesn't
use this batch logic at all.
## AI disclosure
This entire PR description was written by me - no AIs were harmed in the
production of it. However, the code itself was mostly vibecoded using
two agents in Cursor:
- Composer 1: given the SQLite opcode documentation and rowset.c source
code, and asked to implement the VDBE instructions and the RowSet
module.
- GPT-5: given the same SQLite docs and source code, and asked to review
Composer 1's work and write feedback into a separate markdown file.
This loop was run for roughly 4-5 iterations, where each time GPT-5's
feedback was given to Composer 1, until GPT-5 found nothing to comment
anymore.
After this, I instructed Composer 1 to improve the documentation to be
less stupid.
After that, I made a manual editing pass over the runtime code to e.g.
change boolean flags to a `RowSetMode` enum to make clearer that the
rowset has two distinct mutually exclusive purposes (membership tests
and in-order iteration), plus cleaned up some other dumb shit and added
comments.
I am still not sure if this saved time or not.

Closes #3938
2025-11-12 13:02:00 +02:00
Jussi Saurio
933c3112f9 Merge 'Use AsValueRef in more functions' from Pedro Muniz
Depends on #3932
Converting more functions to use `AsValueRef`

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3934
2025-11-12 12:54:39 +02:00
Jussi Saurio
65a7dd40b3 Merge 'Change Value::Text and ValueRef::Text to use Cow<'static, str> and &str to avoid allocations' from Pedro Muniz
When building text values, we could not pass ownership of newly created
strings, which meant a lot of the times we were double cloning strings,
one to transform, and one to build the Value

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3932
2025-11-12 12:54:16 +02:00
Nikita Sivukhin
2d517f9fd7 use sparse io fir partial sync in case when file is used 2025-11-12 14:20:26 +04:00
Jussi Saurio
a63e12f793 Merge 'treat parameters as "constant" within a query' from Nikita Sivukhin
Right now tursodb treat parameters/variable as non-constant. But
actually they are constant in a sense that parameters/variables has
fixed value during query execution which never changes.
This PR makes tursodb to treat parameters as constant and evaluate
expressions related to them only once.
One real-world scenario where this can be helpful is vector search
query:
```sql
    SELECT id, vector_distance_jaccard(embedding, vector32_sparse(?)) as distance
    FROM vectors
    ORDER BY distance ASC
    LIMIT ?
```
Without constant optimization, `vector32_sparse` function will be
executed for every row - which is very inefficient and query can be 100x
slower due to that (but there is no need to evaluate this function for
every query as we can transform text representation to binary only once)

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3936
2025-11-12 11:46:10 +02:00
Jussi Saurio
cdf2f0d3c5 Fix comment about DELETE ... RETURNING 2025-11-12 11:43:06 +02:00
Jussi Saurio
da92982f41 Add RowSet<Add/Read/Test> instructions and rowset implementation
Rowsets are used in SQLite for two purposes:

1. for membership tests on a set of `i64`s,
2. for in-order iteration of a set of `i64`s,

Both in cases where we can just use rowids (which are `i64`) instead of building an entire ephemeral btree from a table's contents.

For example, in cases where a `DELETE FROM tbl WHERE ...` is performed on a table that has any `BEFORE DELETE` triggers, SQLite collects the table's rowids into a RowSet before actually performing the deletion. This is similar to how an UPDATE that modifies rowids (or the index used to iterate the UPDATE loop) will first collect the rows into an ephemeral index, and same with `INSERT INTO ... SELECT`.

This entire PR description was written by me - no AIs were harmed in the production of it. However, the code itself was mostly vibecoded using two agents in Cursor:

- Composer 1: given the SQLite opcode documentation and rowset.c source code, and asked to implement the VDBE instructions and the RowSet module.
- GPT-5: given the same SQLite docs and source code, and asked to review Composer 1's work and write feedback into a separate markdown file.

This loop was run for roughly 4-5 iterations, where each time GPT-5's feedback was given to Composer 1, until GPT-5 found nothing to comment anymore.

After this, I instructed Composer 1 to improve the documentation to be less stupid.

After that, I made a manual editing pass over the runtime code to e.g. change boolean flags to a `RowSetMode` enum to make clearer that the rowset has two distinct mutually exclusive purposes (membership tests and in-order iteration), plus cleaned up some other dumb shit and added comments.

I am still not sure if this saved time or not.
2025-11-12 11:39:40 +02:00
Nikita Sivukhin
a25e3e76eb wip 2025-11-12 13:21:34 +04:00
Nikita Sivukhin
6f7edcaddd agent review fixes 2025-11-12 12:32:45 +04:00
Nikita Sivukhin
be12ca01aa add is_hole / punch_hole optional methods to IO trait and remove is_hole method from Database trait 2025-11-12 12:04:42 +04:00
Nikita Sivukhin
b73ff13b88 add simple implementation of Sparse IO 2025-11-12 12:04:12 +04:00
Nikita Sivukhin
d519945098 make ArenaBuffer unsafe Send + Sync 2025-11-12 10:54:40 +04:00
Nikita Sivukhin
33375697d1 add partial database storage implementation 2025-11-12 10:53:25 +04:00
Nikita Sivukhin
a855a657aa report network stats 2025-11-12 10:53:25 +04:00
Nikita Sivukhin
02275a6fa1 fix js bindings 2025-11-12 10:53:25 +04:00
Nikita Sivukhin
98db727a99 integrate extra io stepping logic to the JS bindings 2025-11-12 10:53:25 +04:00
Nikita Sivukhin
f3dc19cb00 UNSAFE: make Completion to be Send + Sync 2025-11-12 10:53:25 +04:00
Nikita Sivukhin
d42b5c7bcc wip 2025-11-12 10:53:25 +04:00
Nikita Sivukhin
95f31067fa add has_hole API in the DatabaseStorage trait 2025-11-12 10:53:25 +04:00
Nikita Sivukhin
34f1072071 add hooks to plug partial sync in the sync engine 2025-11-12 10:53:25 +04:00
Nikita Sivukhin
e1f77d8776 do not treat registers as constant 2025-11-12 10:51:51 +04:00
Preston Thorpe
dad7feffca Merge 'Completion: make it Send + Sync' from Nikita Sivukhin
This PR makes Completion to be `Send` and also force internal callbacks
to be `Send`.
The reasons for that is following:
1. `io_uring` right now can execute completion at any moment potentially
on arbitrary thread, so we already implicitly rely on that property of
`Completion` and its callbacks
2. In case of partial sync
(https://github.com/tursodatabase/turso/pull/3931), there will be an
additional requirement for Completion to be Send as it will be put in
the separate queue associated with `DatabaseStorage` (which is Send +
Sync) processed in parallel with main IO
3. Generally, it sounds pretty natural in the context of async io to
have `Send` Completion so it can be safely transferred between threads
The approach in the PR is hacky as `Completion` made `Send` in a pretty
unsafe way. The main reason why Rust can't derive `Send` automatically
is following:
1. Many completions holds `Arc<Buffer>` internally which needs to be
marked with unsafe traits explicitly as it holds `ptr: NonNull<u8>`
2. `Completion` holds `CompletionInner` as `Arc` which internally holds
completion callback as `Box<XXXComplete>`, but because it's guarded by
`Arc` - Rust forces completion callback to also be Sync (not only Send)
and as we usually move Completion in the callback - we get a cycle here
and with current code Send for Completion implies Sync for Completion.
So, in order to fix this, PR marks `ArenaBuffer` as Send + Sync and
forces completion callbacks to be Send + Sync too. It's seems like
`Sync` requirement is theoretically unnecessary and `Send` should be
enough - but with current code organization Send + Sync looks like the
simplest approach.
Making `ArenaBuffer` Sync sounds almost correct, although I am worried
about read/write access to it as internally `ArenaBuffer` do not
introduce any synchronization of its reads/writes - so potentially we
already can hit some multi-threading bugs with io_uring do to
`ArenaBuffer` used from different threads (or maybe there are some
implicit memory barriers in another parts of the code which can
guarantee us that we will properly use `ArenaBuffer` - but this sounds
like a pure luck)

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3935
2025-11-11 20:10:52 -05:00
Nikita Sivukhin
b3380bc398 treat parameters as "constant" within a query 2025-11-12 02:30:15 +04:00
pedrocarlo
bc06bb0415 have RecordCursor::get_values return an Iterator for actual lazy deserialization. Unfortunately we won't see much improvement yet as we do not store the RecordCursor when calling ImmutableRecord::get_values 2025-11-11 16:11:46 -03:00
pedrocarlo
60db10cc02 consolidate Value PartialEq and PartialOrd to use the same implementation as ValueRef 2025-11-11 16:11:46 -03:00
pedrocarlo
e1d36a2221 clippy fix 2025-11-11 16:11:46 -03:00
pedrocarlo
4a94ce89e3 Change ValueRef::Text to use a &str instead of &[u8] 2025-11-11 16:11:46 -03:00
pedrocarlo
84268c155b convert json functions to use AsValueRef 2025-11-11 16:11:46 -03:00
pedrocarlo
1db13889e3 Change Value::Text to use a Cow<'static, str> instead of Vec<u8> 2025-11-11 16:11:46 -03:00
pedrocarlo
98d268cdc6 change datetime functions to accept AsValueRef and not registers 2025-11-11 16:11:46 -03:00
pedrocarlo
505a6ba5ea convert vector functions to use AsValueRef 2025-11-11 16:11:46 -03:00
Nikita Sivukhin
78b6eeae80 cargo fmt 2025-11-11 22:47:25 +04:00
Nikita Sivukhin
5e09c4f0c0 make completion send + sync 2025-11-11 22:42:20 +04:00
Nikita Sivukhin
9a9aacaf32 fix compilation 2025-11-11 22:22:34 +04:00
Nikita Sivukhin
6e3b364bb5 make completion callbacks Send
- IO uring already use this because it can invoke callback on another thread
2025-11-11 21:44:12 +04:00
Pere Diaz Bou
c4d89662a8 Merge 'core/mvcc: use btree cursor to navigate rows' from Pere Diaz Bou
The current implementation is simple, we have a pointer called
`CursorPosition::Loaded` that points to a rowid and if it's poiting to
either btree or mvcc.
Moving with `next` will `peek` both btree and mvcc to ensure we load the
correct next value. This draws some inefficiencies for now as we could
simply skip one or other in different cases.
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> Combine MVCC index with a BTree-backed lazy cursor (including rootpage
mapping) and add row-version state checks, updating VDBE open paths and
tests.
>
> - **MVCC Cursor (`core/mvcc/cursor.rs`)**:
>   - Introduce hybrid cursor that merges MVCC index with `BTreeCursor`;
enhanced `CursorPosition` (tracks `in_btree`/`btree_consumed`).
>   - Implement state machine for `next`, coordinating MVCC/BTree
iteration and filtering via `RowVersionState`.
>   - `current_row()` now yields immutable records from BTree or MVCC;
add `read_mvcc_current_row`.
>   - Update `rowid`, `seek`, `rewind`, `last`, `seek_to_last`,
`exists`, `insert` to honor hybrid positioning.
> - **MVCC Store (`core/mvcc/database/mod.rs`)**:
>   - Add `RowVersionState` and `find_row_last_version_state`.
>   - Remove eager table initialization/scan helpers and `loaded_tables`
tracking.
>   - Add `get_real_table_id` for mapping negative IDs to physical root
pages.
> - **VDBE (`core/vdbe/execute.rs`)**:
>   - Route BTree cursor creation through
`maybe_transform_root_page_to_positive` and promote to `MvCursor`
without pager arg.
>   - Apply mapping in `OpenRead`, `OpenWrite`, `OpenDup`, and index
open paths.
> - **Tests (`core/mvcc/database/tests.rs`)**:
>   - Adjust to new cursor API; add coverage for BTree+MVCC iteration
and gaps after checkpoint/restart.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
b581519be4. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Closes #3829
2025-11-11 17:53:17 +01:00
Pere Diaz Bou
b581519be4 more clippy 2025-11-10 17:20:15 +01:00