Commit Graph

3962 Commits

Author SHA1 Message Date
Nikita Sivukhin
c0d5c55d5c fix tests and clippy 2025-08-06 01:03:49 +04:00
Nikita Sivukhin
c6a87d61c7 emit CDC entries if necessary for schema changes 2025-08-06 01:03:49 +04:00
Nikita Sivukhin
0b4c1ac802 refactor code a little bit 2025-08-06 01:03:48 +04:00
PThorpe92
914c10e095 Remove Clone impl for Buffer and PageContent 2025-08-05 14:26:53 -04:00
Pekka Enberg
9492a29d47 Merge 'Fix performance regression' from Jussi Saurio
Closes #2440
## Fix 1
Do not start a read transaction when a SELECT is not going to access the
database, which means we can avoid checking whether the schema has
changed.
## Fix 2
Add a field `accesses_db` to `Program` and `Statement` so we can avoid
even checking for `SchemaUpdated` errors when it's not possible to get
one.
## Fix 3
Avoid doing any work in `commit_txn` when not in a transaction. This
optimization is only enabled when `mv_store.is_none()`, because MVCC has
its own logic and this doesn't work with MVCC enabled, and honestly I'm
too tired to find out why. Left an inline comment about it, though.
```sql
Execute `SELECT 1`/limbo_execute_select_1
                        time:   [21.440 ns 21.513 ns 21.586 ns]
                        change: [-60.766% -60.616% -60.453%] (p = 0.00 < 0.05)
                        Performance has improved.
```
Effect is even more dramatic in CI where the latency is down over 80%

Closes #2441
2025-08-05 16:30:18 +03:00
Jussi Saurio
cde8567b1d Merge 'More state machine + Return IO in places where completions are created' from Pedro Muniz
In preparation for tracking IO Completions, we need to start to return
IO in places where completions are created. Doing some more plumbing now
to avoid bigger PRs for the future

Closes #2438
2025-08-05 15:47:51 +03:00
Pekka Enberg
49123db6e8 Merge 'core/mvcc: implement exists' from Pere Diaz Bou
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2446
2025-08-05 15:34:23 +03:00
Jussi Saurio
1feb5ba2d3 perf/vdbe: avoid doing work in commit_txn if not in txn 2025-08-05 15:25:28 +03:00
Jussi Saurio
3f633247f7 perf/stmt: avoid checking for SchemaUpdated errors if it's impossible 2025-08-05 15:10:55 +03:00
Jussi Saurio
c498196c7b fix/perf: fix regression in SELECT 1 benchmark
Do not start a read transaction when a SELECT is not going to access
the database, which means we can avoid checking whether the schema has
changed.
2025-08-05 15:10:55 +03:00
Pere Diaz Bou
474f0d8bbc core/mvcc: implement exists 2025-08-05 13:34:51 +02:00
Jussi Saurio
a28e64bfdd cleanup: remove unused page uptodate flag 2025-08-05 14:25:42 +03:00
Pekka Enberg
d2fea25fef Merge 'perf/btree: implement fast algorithm for defragment_page' from Jussi Saurio
Implement sqlite's fast path defragment algorithm. This path is taken
when:
1. There are 1-2 freeblocks
2. There are at most `max_frag_bytes` fragmented free bytes (-1..=4)
Instead of reconstructing the entire page, it merges the two freeblocks
and then moves the merged freeblock to the left, effectively turning it
into free space in the unallocated region, instead of a freeblock.
`max_frag_bytes` is particularly important when jnserting a new cell,
because if the page contains (in total) ~just enough space for the new
cell, then there can be hardly any fragmented free space because
otherwise, merging the 1-2 freeblocks won't produce enough contiguous
free space to fit the cell.
## Benchmark
```sql
Insert rows in batches/limbo_insert_1_rows
                        time:   [26.692 µs 27.153 µs 27.695 µs]
                        change: [-9.9033% -2.9097% +1.6336%] (p = 0.55 > 0.05)
                        No change in performance detected.
Insert rows in batches/limbo_insert_10_rows
                        time:   [38.618 µs 40.022 µs 42.201 µs]
                        change: [-8.9137% -6.6405% -4.2299%] (p = 0.00 < 0.05)
                        Performance has improved.
Insert rows in batches/limbo_insert_100_rows
                        time:   [168.94 µs 169.58 µs 170.31 µs]
                        change: [-22.520% -17.669% -12.790%] (p = 0.00 < 0.05)
                        Performance has improved.
```

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2411
2025-08-05 12:44:48 +03:00
Pekka Enberg
aa20c2f1ba Merge 'Relax I/O configuration attribute to cover all Unixes' from Pedro Muniz
hopefully fixes #2268.

Closes #2435
2025-08-05 12:44:34 +03:00
Pekka Enberg
e355fc4c65 Merge 'core/mvcc: implement seeking operations with rowid' from Pere Diaz Bou
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2429
2025-08-05 12:40:48 +03:00
Jussi Saurio
ad35cf07eb Add extra illustrative doodle for pere 2025-08-05 11:24:15 +03:00
Jussi Saurio
a5330aa6fb perf/btree: implement fast algorithm for defragment_page 2025-08-05 11:24:14 +03:00
Jussi Saurio
5b84ad6b0f Merge 'Update defragment page to defragment in-place' from João Severo
Change original code from doing a full copy of the original buffer to
modify the buffer in-place using a temporary vector with offsets.

Closes #2258
2025-08-05 11:22:22 +03:00
Jussi Saurio
c9c5565867 Merge 'Integrate virtual tables with optimizer' from Piotr Rżysko
This PR integrates virtual tables into the query optimizer. It is a
follow-up to https://github.com/tursodatabase/turso/pull/1727.
The most immediate improvement is better support for inner joins
involving TVFs, particularly when TVF arguments are column references.
### Example
The following two queries are semantically equivalent, but require
different join orders to be valid:
```sql
-- TVF depends on `t.id`, so `t` must be evaluated in outer loop
SELECT t.id, series.value
FROM target t, generate_series(t.id, 3) series;

-- Equivalent query, but with reversed table order in the FROM clause
SELECT t.id, series.value
FROM generate_series(t.id, 3) series, target t;
```
Without optimizer integration, the second query would fail because the
planner would attempt to evaluate `generate_series` before `t`. With
this change, the optimizer detects column dependencies and produces the
correct join order in both cases.
### TODO
Support for outer joins with TVFs is still missing and will be addressed
in a follow-up PR.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2439
2025-08-05 09:22:08 +03:00
pedrocarlo
aa8d17cbf1 state machine for ptrmap_get 2025-08-05 01:38:42 -03:00
Piotr Rzysko
59ec2d3949 Replace ConstraintInfo::plan_info with ConstraintInfo::index
The side of the binary expression no longer needs to be stored in
`ConstraintInfo`, since the optimizer now guarantees that it is always
on the right. As a result, only the index of the corresponding constraint
needs to be preserved.
2025-08-05 05:48:29 +02:00
Piotr Rzysko
8fb4fbf8af Make WhereTerm::consumed a plain bool
Now that virtual tables are integrated into the optimizer, this field no
longer needs to be wrapped in Cell<bool>.
2025-08-05 05:48:28 +02:00
Piotr Rzysko
99f87c07c1 Support column references in table-valued function arguments
This change extends table-valued function support by allowing arguments
to be column references, not only literals.

Virtual tables can now reject a plan by returning an error from
best_index (e.g., when a TVF argument references a table that appears
later in the join order). The planner using this information excludes
invalid plans during join order search.
2025-08-05 05:48:28 +02:00
Piotr Rzysko
82491ceb6a Integrate virtual tables with optimizer
This change connects virtual tables with the query optimizer.
The optimizer now considers virtual tables during join order search
and invokes their best_index callbacks to determine feasible access
paths.

Currently, this is not a visible change, since none of the existing
extensions return information indicating that a plan is invalid.
2025-08-05 05:48:28 +02:00
pedrocarlo
0ac040cc87 return IO in some other functions in Pager 2025-08-04 23:28:57 -03:00
pedrocarlo
a4a2425ffd return IO in places where completions are created 2025-08-04 23:28:57 -03:00
Jussi Saurio
a66b56678d Merge 'Reprepare Statements when Schema changes' from Pedro Muniz
Closes #1967
To support this I had to change how we did `epilogue` similarly to how
SQLite does it. SQLIte first declares a `beginWriteOperation` when some
statement is going to necessitate a Write Transaction. And as we now
need to pass the current schema cookie to `epilogue` it was easier to
call epilogue only in one location (like we do with prologue), and just
have each statement declare their intentions separately. This allows us
to not have to pass the Schema around just to do the epilogue. I believe
this is something that @jussisaurio would be interested in.
~Also had to disable the MVCC test, as it was extremely buggy for me.~
Just disabled reprepare statements for MVCC

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2214
2025-08-05 00:01:14 +03:00
Jussi Saurio
1e59165ea6 Merge 'More State Machines in preparation for tracking IO Completions' from Pedro Muniz
More changes. I want to avoid big PRs, so doing these changes in small
increments. I think in like 2 PRs after this one, I will be able make
the change effectively.

Closes #2400
2025-08-05 00:00:09 +03:00
Jussi Saurio
4cf02dfd14 Merge 'coalesce any adjacent buffers from writev calls into fewer iovecs' from Preston Thorpe
In `io_uring` and `unix` IO backends, we can check if our buffers are
sequential in memory and reduce the number of iovecs per call. Although
this is highly unlikely to actually happen at the moment due to our
buffer pool implementation.
Later on, when #2419 is merged, we will be able to specifically request
runs of contiguous buffers, so that our `writev` calls will (in the
ideal case) be coalesced into a single `pwrite` or preferrably
`WriteFixed` operation on the io_uring backend.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2436
2025-08-04 23:54:57 +03:00
Jussi Saurio
13219dbf87 Merge 'extend raw WAL API with few more methods' from Nikita Sivukhin
This PR extends raw WAL API with few methods which will be helpful for
offline-sync:
1. `try_wal_watermark_read_page` - try to read page from the DB with
given WAL watermark value\
    * Usually, WAL max_frame is set automatically to the latest value
(`shared.max_frame`) when transaction is started and then this
"watermark" is preserved throughout whole transaction
    * New method allows to simulate "read from the past" by controlling
frame watermark explicitly
    * There is an alternative to implement some API like
`start_read_session(frame_watermark: u64)` - but I decided to expose
just single method to simplify the logic and reduce "surface" of actions
which can be executed in this "controllable" manner
    * Also, for simplicity, now `try_wal_watermark_read_page` always
read data from disk and bypass any cached values (and also do not
populate the cache)
2. `wal_changed_pages_after` - return set of unique pages changed after
watermark WAL position in the current WAL session
With these 2 methods we can implement `REVERT frame_watermark` logic
which will just fetch all changed pages first, and then revert them to
the previous value by using `try_wal_watermark_read_page` and
`wal_insert_frame` methods (see `test_wal_api_revert_pages` test).
Note, that if there were schema changes - than `REVERT` logic described
above can bring connection to the inconsistent state, as it will
preserve schema information in memory and will still think that table
exist (while it can be reverted). This should be considered by any
consumer of this new methods.

Closes #2433
2025-08-04 23:53:46 +03:00
PThorpe92
2a3fa0955f Attempt to coalesce contiguous iovecs during pwritev operation for unix IO 2025-08-04 16:18:19 -04:00
PThorpe92
b76ef20f4c Attempt to coalesce contiguous iovecs during pwritev operation for io_uring 2025-08-04 16:18:05 -04:00
PThorpe92
6cbc8ff868 Replace values with constants 2025-08-04 15:14:06 -04:00
PThorpe92
73d1fdef14 Fix and change bitmap, apply suggestions and add some optimizations 2025-08-04 14:58:58 -04:00
PThorpe92
f4197f1eb5 change debug assertions to turso asserts 2025-08-04 14:55:48 -04:00
PThorpe92
54696d2f0d Add additional test for edge cases 2025-08-04 14:55:48 -04:00
PThorpe92
5378195ad6 Add page bitmap to storage mod.rs 2025-08-04 14:55:48 -04:00
PThorpe92
3e30335ea5 Add tests for PageBitmap 2025-08-04 14:55:48 -04:00
PThorpe92
7b1f908c00 Add PageBitmap for use with arena page allocator 2025-08-04 14:55:48 -04:00
pedrocarlo
ebe6aa0d28 adjust cfg for unix and linux IO 2025-08-04 15:49:52 -03:00
pedrocarlo
f2d84a534c adjust clear_overflow_pages 2025-08-04 15:28:06 -03:00
Piotr Rzysko
521eb2368e Return error when no valid plan exists
Replace panics with proper errors when a valid plan does not exist.
Currently, this never happens because a naive plan is always available.
However, once virtual tables are integrated into the planner, it may
occur—for example, when table-valued function arguments are column
references, and the function cannot be placed in the join order so that
its arguments can be evaluated.

Although this change is effectively a no-op for now, it is extracted
into a separate commit to avoid polluting the one that introduces
virtual table integration with the planner.
2025-08-04 20:27:23 +02:00
Piotr Rzysko
c80cd370cb Remove cost_upper_bound_ordered
It was redundant, as it was always equal to cost_upper_bound.
2025-08-04 20:27:23 +02:00
Piotr Rzysko
718598eab8 Introduce scan type
Different scan parameters are required for different table types.
Currently, index and iteration direction are only used by B-tree tables,
while the remaining table types don’t require any parameters. Planning
access to virtual tables, however, will require passing additional
information from the planner, such as the virtual table index (distinct
from a B-tree index) and the constraints that must be forwarded to the
`filter` method.
2025-08-04 20:27:22 +02:00
Piotr Rzysko
9167b30c7c Introduce AccessMethodParams
Previously, AccessMethod stored fields like `iter_dir`, `index`, and
`constraint_refs` directly, but these only applied to BTree tables.
Other table types (virtual tables, subqueries) either ignored these
fields or required different parameters entirely.

This change prepares the planner to handle virtual table access methods
with their own specialized parameters.
2025-08-04 20:23:44 +02:00
Piotr Rzysko
4166735953 Return error when start argument is missing for generate_series
This matches SQLite’s behavior and will help in the future to
differentiate between an invalid function invocation (missing argument,
not provided by the user) and an invalid combination of constraints
proposed by the planner.

No new integration tests are added, since this case was already covered
by the `filter` method. With the ability to return result codes from
`best_index`, we can now detect this error earlier.
2025-08-04 20:18:44 +02:00
Piotr Rzysko
61234eeb19 Add ResultCode to best_index result
The `best_index` implementation now returns a ResultCode along with the
IndexInfo. This allows it to signal specific outcomes, such as errors or
constraint violations. This change aligns better with SQLite’s xBestIndex
contract, where cases like missing constraints or invalid combinations of
constraints must not result in a valid plan.
2025-08-04 20:18:44 +02:00
Piotr Rzysko
6a4cf02a90 Fix computation of argv_index in best_index
The `filter` methods for extensions affected by this fix expect arguments
to be passed in a specific order. For example, `generate_series` assumes
that if the `start` argument exists, it is always passed to `filter`
first. If `start` does not exist, then `stop` is passed first — but
`stop` must never come before `start`.

Previously, this was not guaranteed: `best_index` relied on constraints
being passed in the order matching `filter`'s expectations.
2025-08-04 19:38:45 +02:00
Piotr Rzysko
c465ce6e7b Clarify semantics of argv_index
Extend the documentation of `argv_index` and add validations enforcing
the requirements it must meet.
2025-08-04 19:31:18 +02:00
Piotr Rzysko
b0460a589f Ensure argv_index is either None or >= 1
Previously, there were two ways to indicate that a constraint should not
be passed to the filter function: setting `argv_index` to `None` or to
a value less than 1. This was redundant, so now only `None` is used.
2025-08-04 19:27:53 +02:00