Again found when fuzzing nested where clause subqueries:
Aggregate registers need to be NULLed at the start because the same
registers might be reused on another invocation of a subquery, and if
they are not NULLed, the 2nd invocation of the same subquery will have
values left over from the first invocation.
Reviewed-by: Preston Thorpe (@PThorpe92)
Closes#1614
Found while fuzzing nested subqueries. Since subqueries result in nested
plans, it quickly revealed that there can be multiple `DeferredSeek`
instructions issued for different cursors, but our `ProgramState` only
supported one at a time.
Closes#1610
**Beef:** we need to distinguish between references to tables in the
current query scope (CTEs, FROM clause) and references to tables from
outer query scopes (inside subqueries, or inside CTEs that refer to
previous CTEs). We don't want to consider 'tables from outside' in the
join order of a subquery, but we want the subquery to be able to
_reference_ those tables in e.g. its WHERE clause.
This PR -- or at least some sort of equivalent of it -- is a requirement
for #1595.
---
This PR replaces the `Vec<TableReference>` we use with new data
structures:
- TableReferences struct, which holds both:
- joined_tables, and
- outer_query_refs
- JoinedTable:
- this is just a rename of the previous TableReference struct
- OuterQueryReference
- this is to distinguish from JoinedTable those cases where
e.g. a subquery refers to an outer query's table, or a CTE
refers to a previous CTE.
Both JoinedTable and OuterQueryReference can be referred to by
expressions,
but only JoinedTables are considered for join ordering optimization and
so
forth.
These data structures are then used everywhere, which resulted in a lot
of changes.
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#1580
Add simple stress test run with 10k iteration to test `Delete`, `Update`
and `Insert` together.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#1585
- TableReferences struct, which holds both:
- joined_tables, and
- outer_query_refs
- JoinedTable:
- this is just a rename of the previous TableReference struct
- OuterQueryReference
- this is to distinguish from JoinedTable those cases where
e.g. a subquery refers to an outer query's table, or a CTE
refers to a previous CTE.
Both JoinedTable and OuterQueryReference can be referred to by expressions,
but only JoinedTables are considered for join ordering optimization and so
forth.
This commit does not compile.
Currently we have this:
`program.alloc_cursor_id(Option<String>, CursorType)`
where the `String` is the table's name or alias ('users' or 'u' in the
query).
This is problematic because this can happen:
`SELECT * FROM t WHERE EXISTS (SELECT * FROM t)`
There are two cursors, both with identifier 't'. This causes a bug where
the program will use the same cursor for both the main query and the
subquery, since they are keyed by 't'.
Instead introduce `CursorKey`, which is a combination of:
1. `TableInternalId`, and
2. index name (`Option<Arc<Index>>` -- in case of index cursors.)
This should provide key uniqueness for cursors:
`SELECT * FROM t WHERE EXISTS (SELECT * FROM t)`
here the first 't' will have a different `TableInternalId` than the
second `t`, so there is no clash.
---
These `CursorKey`s are only required when the program needs to retrieve
the cursor ID later:
`program.resolve_cursor_id(key)`
So, there are now two methods for allocating cursors:
`program.alloc_cursor_id_keyed(key, cursor_type); // needs to be
retrieved later with same key`
`program.alloc_cursor_id(cursor_type); // does not need to be retrieved
later`
Reviewed-by: Preston Thorpe (@PThorpe92)
Closes#1604
Currently we have this:
program.alloc_cursor_id(Option<String>, CursorType)`
where the String is the table's name or alias ('users' or 'u' in
the query).
This is problematic because this can happen:
`SELECT * FROM t WHERE EXISTS (SELECT * FROM t)`
There are two cursors, both with identifier 't'. This causes a bug
where the program will use the same cursor for both the main query
and the subquery, since they are keyed by 't'.
Instead introduce `CursorKey`, which is a combination of:
1. `TableInternalId`, and
2. index name (Option<String> -- in case of index cursors.
This should provide key uniqueness for cursors:
`SELECT * FROM t WHERE EXISTS (SELECT * FROM t)`
here the first 't' will have a different `TableInternalId` than the
second `t`, so there is no clash.
This is the first step towards rollback, since we still don't spill
pages with WAL, we can simply invalidate page cache in case of failure.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#1599
Closes: https://github.com/tursodatabase/limbo/issues/1602
Fix unreachable panic when calling serialize on Value::Integer(0)
by handling `SerialTypeKind` `ConstInt0` and `ConstInt1` in
`Record::serialize()`
Changed `test_serialize_integers` to reflect this change.
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#1603
overwrite_cell() requires that the cursor state is pointing to a valid
record, but this was not currently set properly.
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#1598
With Blacksmith runners, it errors out as follows:
```
npm notice Publishing to https://registry.npmjs.org/ with tag next and public access
npm notice publish Signed provenance statement with source and build information from GitHub Actions
npm notice publish Provenance statement published to transparency log: https://search.sigstore.dev/?logIndex=221582002
npm error code E422
npm error 422 Unprocessable Entity - PUT https://registry.npmjs.org/@tursodatabase%2flimbo - Error verifying sigstore provenance bundle: Unsupported GitHub Actions runner environment: "self-hosted". Only "github-hosted" runners are supported when publishing with provenance.
npm error A complete log of this run can be found in: /home/runner/.npm/_logs/2025-05-28T08_42_29_830Z-debug-0.log
Error: Process completed with exit code 1.
```
Currently in the main translation logic after planning and optimization,
we don't _really_ need to pass a `&mut Vec<WhereTerm>` around anymore,
except for the fact that virtual table constraint resolution is done ad-
hoc in `init_loop()`.
Even there, the only thing we mutate is `WhereTerm::consumed` which is a
boolean indicating that the term has been "used up" by the optimizer and
shouldn't be evaluated as a normal where clause condition anymore.
In the upcoming branch for WHERE clause subqueries, I want to store
immutable references to WHERE clause expressions in `Resolver`, but this
is unfortunately not possible if we still use the aforementioned mutable
references.
Hence, we can temporarily make `WhereTerm::consumed` a `Cell<bool>`
which allows us to pass an immutable reference to `init_loop()`, and the
`Cell` can be removed once the virtual table constraint resolution is
moved to an earlier part of the query processing pipeline.
Closes#1597
Currently in the main translation logic after planning and optimization,
we don't _really_ need to pass a &mut Vec<WhereTerm> around anymore, except
for the fact that virtual table constraint resolution is done ad-hoc in
`init_loop()`. Even there, the only thing we mutate is `WhereTerm::consumed`
which is a boolean indicating that the term has been "used up" by the optimizer
and shouldn't be evaluated as a normal where clause condition anymore.
In the upcoming branch for WHERE clause subqueries, I want to store immutable
references to WHERE clause expressions in `Resolver`, but this is unfortunately
not possible if we still use the aforementioned mutable references.
Hence, we can temporarily make `WhereTerm::consumed` a `Cell<bool>` which allows
us to pass an immutable reference to `init_loop()`, and the `Cell` can be removed
once the virtual table constraint resolution is moved to an earlier part of the
query processing pipeline.
Currently we have some usages of LIMIT where the actual limit counter is
initialized next to the DecrJumpZero instruction, and then
`program.mark_last_insn_constant()` is used to hoist the counter
initialization to the beginning of the program.
This is very fragile, and already FROM clause subquery handling works
around this with a hack (removed in this PR), and (upcoming) WHERE
clause subqueries would also run into problems because of this, because
the LIMIT might need to be initialized once for every iteration of the
subquery.
This PR removes those usages for LIMIT, and LIMIT processing is now more
intuitive:
- limit counter is now initialized at the start of the query processing
- a function init_limit() is extracted to do this for
select/update/delete
Closes#1591
Currently we have some usages of LIMIT where the actual limit counter
is initialized next to the DecrJumpZero instruction, and then
`program.mark_last_insn_constant()` is used to hoist the counter
initialization to the beginning of the program.
This is very fragile, and already FROM clause subquery handling works
around this with a hack (removed in this PR), and (upcoming) WHERE clause
subqueries would also run into problems because of this, because the LIMIT
might need to be initialized once for every iteration of the subquery.
This PR removes those usages for LIMIT, and LIMIT processing is now more
intuitive:
- limit counter is now initialized at the start of the query processing
- a function init_limit() is extracted to do this for select/update/delete