Commit Graph

134 Commits

Author SHA1 Message Date
Jussi Saurio
a99c8a8ca0 Simplify ORDER BY sorter column remapping
In case an ORDER BY column exactly matches a result column in the SELECT,
the insertion of the result column into the ORDER BY sorter can be skipped
because it's already necessarily inserted as a sorting column.

For this reason we have a mapping to know what index a given result column
has in the order by sorter.

This commit makes that mapping much simpler.
2025-08-15 15:48:41 +03:00
Piotr Rzysko
375b9047e2 Evaluate WHERE conditions after LEFT JOIN
Previously, the query from the added test would not filter out rows
where `products.price` was NULL.
2025-08-08 06:26:30 +02:00
Piotr Rzysko
92ba25e44d Extract loop emitting conditions into a method
No functional changes — this is just preparation for reusing this code
and avoiding polluting future commits with trivial refactoring.
2025-08-08 06:21:08 +02:00
Piotr Rzysko
8986266394 Emit conditions in open_loop in one place
The loop emitting conditions is independent of the operation type.
2025-08-07 19:26:32 +02:00
Nikita Sivukhin
c6a87d61c7 emit CDC entries if necessary for schema changes 2025-08-06 01:03:49 +04:00
Nikita Sivukhin
0b4c1ac802 refactor code a little bit 2025-08-06 01:03:48 +04:00
Piotr Rzysko
82491ceb6a Integrate virtual tables with optimizer
This change connects virtual tables with the query optimizer.
The optimizer now considers virtual tables during join order search
and invokes their best_index callbacks to determine feasible access
paths.

Currently, this is not a visible change, since none of the existing
extensions return information indicating that a plan is invalid.
2025-08-05 05:48:28 +02:00
Piotr Rzysko
718598eab8 Introduce scan type
Different scan parameters are required for different table types.
Currently, index and iteration direction are only used by B-tree tables,
while the remaining table types don’t require any parameters. Planning
access to virtual tables, however, will require passing additional
information from the planner, such as the virtual table index (distinct
from a B-tree index) and the constraints that must be forwarded to the
`filter` method.
2025-08-04 20:27:22 +02:00
Piotr Rzysko
61234eeb19 Add ResultCode to best_index result
The `best_index` implementation now returns a ResultCode along with the
IndexInfo. This allows it to signal specific outcomes, such as errors or
constraint violations. This change aligns better with SQLite’s xBestIndex
contract, where cases like missing constraints or invalid combinations of
constraints must not result in a valid plan.
2025-08-04 20:18:44 +02:00
Piotr Rzysko
c465ce6e7b Clarify semantics of argv_index
Extend the documentation of `argv_index` and add validations enforcing
the requirements it must meet.
2025-08-04 19:31:18 +02:00
Piotr Rzysko
b0460a589f Ensure argv_index is either None or >= 1
Previously, there were two ways to indicate that a constraint should not
be passed to the filter function: setting `argv_index` to `None` or to
a value less than 1. This was redundant, so now only `None` is used.
2025-08-04 19:27:53 +02:00
Piotr Rzysko
c6f398122d Add validation for constraint usage length returned by best_index
Additional changes:
- Update IndexInfo documentation to clarify that constraint_usages must
  have exact 1:1 correspondence with input ConstraintInfo array. The code
  translating constraints into VFilter arguments heavily relies on this.
- Fix best_index implementation in test extension to comply with new
  validation requirements by returning usage entry for each constraint
2025-08-04 19:25:10 +02:00
Glauber Costa
988b16f962 Support ATTACH (read only)
Support for attaching databases. The main difference from SQLite is that
we support an arbitrary number of attached databases, and we are not
bound to just 100ish.

We for now only support read-only databases. We open them as read-only,
but also, to keep things simple, we don't patch any of the insert
machinery to resolve foreign tables.  So if an insert is tried on an
attached database, it will just fail with a "no such table" error - this
is perfect for now.

The code in core/translate/attach.rs is written by Claude, who also
played a key part in the boilerplate for stuff like the .databases
command and extending the pragma database_list, and also aided me in
the test cases.
2025-07-24 19:19:48 -05:00
PThorpe92
0871a8c7f3 Bail early when we detect a readonly virtual table 2025-07-23 16:57:30 -04:00
Glauber Costa
57a1113460 make readonly a property of the database
There's no such thing as a read-only connection.
In a normal connection, you can have many attached databases. Some
r/o, some r/w.

To properly fix that, we also need to fix the OpenWrite opcode. Right
now we are passing a name, which is the name of the table. That
parameter is not used anywhere. That is also not what the SQLite opcode
specifies. Same as OpenRead, the p3 register should be the database
index.

With that change, we can - for now - pass the index 0, which is all
we support anyway, and then use that to test if we are r/o.
2025-07-22 09:41:32 -05:00
Glauber Costa
65312baee6 fix opcodes missing a database register
Two of the opcodes we implement (OpenRead and Transaction) should have
an opcode specifying the database to use, but they don't.

Add it, and for now always use 0 (the main database).
2025-07-20 12:27:26 -05:00
Piotr Rzysko
30ae6538ee Treat table-valued functions as tables
With this change, the following two queries are considered equivalent:
```sql
SELECT value FROM generate_series(5, 50);
SELECT value FROM generate_series WHERE start = 5 AND stop = 50;
```
Arguments passed in parentheses to the virtual table name are now
matched to hidden columns.

Column references are still not supported as table-valued function
arguments. The only difference is that previously, a query like:
```sql
SELECT one.value, series.value
FROM (SELECT 1 AS value) one, generate_series(one.value, 3) series;
```
would cause a panic. Now, it returns a proper error message instead.

Adding support for column references is more nuanced for two main
reasons:
- We need to ensure that in joins where a TVF depends on other tables,
those other tables are processed first. For example, in:
```sql
SELECT one.value, series.value
FROM generate_series(one.value, 3) series, (SELECT 1 AS value) one;
```
the one table must be processed by the top-level loop, and series must
be nested.
- For outer joins involving TVFs, the arguments must be treated as ON
predicates, not WHERE predicates.
2025-07-14 07:16:53 +02:00
Piotr Rzysko
44b1b1852a Fix referencing virtual table predicates
We need to enumerate first and filter afterward — not the other way
around — because we later use the indexes produced by `enumerate` to
access the original `predicates` slice.
2025-07-14 07:16:53 +02:00
Nikita Sivukhin
32fa2ac3ee avoid capturing changes in cdc table 2025-07-06 22:24:35 +04:00
Nikita Sivukhin
a988bbaffe allow to specify table in the capture_data_changes PRAGMA 2025-07-06 22:19:32 +04:00
Nikita Sivukhin
40769618c1 small refactoring 2025-07-06 21:16:58 +04:00
Nikita Sivukhin
04f2efeaa4 small renames 2025-07-06 21:16:57 +04:00
Nikita Sivukhin
a82529f55a emit cdc changes for UPDATE / DELETE statements 2025-07-06 21:16:25 +04:00
Levy A.
ffd6844b5b refactor: remove PseudoTable from Table
the only reason for `PseudoTable` to exist, is to provide column
information for `PseudoCursor` creation. this should not be part of the
schema.
2025-06-30 14:31:58 -03:00
Pekka Enberg
725c3e4ddc Rename limbo_sqlite3_parser crate to turso_sqlite3_parser 2025-06-29 12:34:46 +03:00
Pekka Enberg
eb0de4066b Rename limbo_ext crate to turso_ext 2025-06-29 12:14:08 +03:00
Nils Koch
2827b86917 chore: fix clippy warnings 2025-06-23 19:52:13 +01:00
pedrocarlo
e53a290a48 move ephemeral table logic to update plan and reuse select logic for ephemeral index 2025-06-20 16:30:21 -03:00
pedrocarlo
b3351dc709 tests + adjustment to halt error message 2025-06-20 16:29:10 -03:00
pedrocarlo
9048ad398b modify loop functions to accomodate for ephemeral tables 2025-06-20 16:29:10 -03:00
pedrocarlo
74beac5ea8 ephemeral table for update when rowid is being update 2025-06-20 16:28:10 -03:00
Jussi Saurio
f396528d53 Merge 'Fix DELETE not emitting constant WhereTerms' from Pedro Muniz
Fixes DELETE not emitting conditional jumps at all if the associated
WhereTerm is a constant, e.g.
```sql
limbo> create table t(x);
limbo> explain DELETE FROM t WHERE 5-5;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     7     0                    0   Start at 7
1     OpenWrite          0     2     0                    0   root=2; t
2     Rewind             0     6     0                    0   Rewind table t
3       RowId            0     1     0                    0   r[1]=t.rowid
4       Delete           0     0     0                    0
5     Next               0     3     0                    0
6     Halt               0     0     0                    0
7     Transaction        0     1     0                    0   write=true
8     Goto               0     1     0                    0
```
I was adding more stuff to the simulator in a Branch of mine, and I
caught this error with delete. Upstreaming the fix here. As we do with
Update, I added the translation step for the `WhereTerms` of the query.
Edit: Closes #1732. Closes #1733. Closes #1734. Closes #1735. Closes
#1736. Closes #1738. Closes #1739. Closes #1740.
Edit: Also pushes constant where term translation to `init_loop` for
Update and Select as well.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1746
2025-06-20 22:00:32 +03:00
Piotr Rzysko
08c1767ba7 Collect non-aggregate columns in one place
Previously, the logic for collecting non-aggregate columns was duplicated
across multiple locations and implemented inconsistently. This caused a
bug that was revealed by the refactoring in this commit (see the added
test).
2025-06-20 06:17:14 +02:00
pedrocarlo
fcff306f98 emit constant where terms in init_loop 2025-06-19 13:50:38 -03:00
Levy A.
15e0cab8d8 refactor+fix: precompute default values from schema 2025-06-11 14:18:39 -03:00
Levy A.
7638b0dab7 fix: use default value on empty columns added via ALTER TABLE 2025-06-11 14:18:19 -03:00
krishvishal
5837f7329f clean up 2025-06-11 00:33:47 +05:30
krishvishal
6c04c18f87 Add affinity flag to comparison opcodes 2025-06-11 00:33:47 +05:30
krishvishal
9130b25111 Add jump_if_null flag for rowid alias based seeks 2025-06-11 00:33:05 +05:30
Jussi Saurio
2bac140d73 Remove SeekOp::EQ and encode eq_only in LE&GE - needed for iteration direction aware equality seeks 2025-06-10 14:16:26 +03:00
Jussi Saurio
31b37332d5 all index cursors must be opened when DELETE does an index seek too 2025-06-03 15:18:45 +03:00
Jussi Saurio
06626f72eb Fix cursors not being opened for indexes in DELETE 2025-06-03 14:45:01 +03:00
Jussi Saurio
819a6138d0 Merge 'Fix: aggregate regs must be initialized as NULL at the start' from Jussi Saurio
Again found when fuzzing nested where clause subqueries:
Aggregate registers need to be NULLed at the start because the same
registers might be reused on another invocation of a subquery, and if
they are not NULLed, the 2nd invocation of the same subquery will have
values left over from the first invocation.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1614
2025-05-30 09:39:37 +03:00
Jussi Saurio
f8257df77b Fix: aggregate regs must be initialized as NULL at the start 2025-05-29 18:44:53 +03:00
Jussi Saurio
cc405dea7e Use new TableReferences struct everywhere 2025-05-29 11:44:56 +03:00
Jussi Saurio
77ce4780d9 Fix ProgramBuilder::cursor_ref not having unique keys
Currently we have this:

program.alloc_cursor_id(Option<String>, CursorType)`

where the String is the table's name or alias ('users' or 'u' in
the query).

This is problematic because this can happen:

`SELECT * FROM t WHERE EXISTS (SELECT * FROM t)`

There are two cursors, both with identifier 't'. This causes a bug
where the program will use the same cursor for both the main query
and the subquery, since they are keyed by 't'.

Instead introduce `CursorKey`, which is a combination of:

1. `TableInternalId`, and
2. index name (Option<String> -- in case of index cursors.

This should provide key uniqueness for cursors:

`SELECT * FROM t WHERE EXISTS (SELECT * FROM t)`

here the first 't' will have a different `TableInternalId` than the
second `t`, so there is no clash.
2025-05-29 00:59:24 +03:00
Jussi Saurio
73e806ad84 Make WhereTerm::consumed a Cell<bool>
Currently in the main translation logic after planning and optimization,
we don't _really_ need to pass a &mut Vec<WhereTerm> around anymore, except
for the fact that virtual table constraint resolution is done ad-hoc in
`init_loop()`. Even there, the only thing we mutate is `WhereTerm::consumed`
which is a boolean indicating that the term has been "used up" by the optimizer
and shouldn't be evaluated as a normal where clause condition anymore.

In the upcoming branch for WHERE clause subqueries, I want to store immutable
references to WHERE clause expressions in `Resolver`, but this is unfortunately
not possible if we still use the aforementioned mutable references.

Hence, we can temporarily make `WhereTerm::consumed` a `Cell<bool>` which allows
us to pass an immutable reference to `init_loop()`, and the `Cell` can be removed
once the virtual table constraint resolution is moved to an earlier part of the
query processing pipeline.
2025-05-28 11:02:39 +03:00
Jussi Saurio
4e9d9a2470 Fix LIMIT handling
Currently we have some usages of LIMIT where the actual limit counter
is initialized next to the DecrJumpZero instruction, and then
`program.mark_last_insn_constant()` is used to hoist the counter
initialization to the beginning of the program.

This is very fragile, and already FROM clause subquery handling works
around this with a hack (removed in this PR), and (upcoming) WHERE clause
subqueries would also run into problems because of this, because the LIMIT
might need to be initialized once for every iteration of the subquery.

This PR removes those usages for LIMIT, and LIMIT processing is now more
intuitive:

- limit counter is now initialized at the start of the query processing
- a function init_limit() is extracted to do this for select/update/delete
2025-05-27 21:12:22 +03:00
Jussi Saurio
07fa3a9668 Rename SelectQueryType to QueryDestination 2025-05-25 21:23:04 +03:00
Jussi Saurio
7c07c09300 Add stable internal_id property to TableReference
Currently our "table id"/"table no"/"table idx" references always
use the direct index of the `TableReference` in the plan, e.g. in
`SelectPlan::table_references`. For example:

```rust
Expr::Column { table: 0, column: 3, .. }
```

refers to the 0'th table in the `table_references` list.

This is a fragile approach because it assumes the table_references
list is stable for the lifetime of the query processing. This has so
far been the case, but there exist certain query transformations,
e.g. subquery unnesting, that may fold new table references from
a subquery (which has its own table ref list) into the table reference
list of the parent.

If such a transformation is made, then potentially all of the Expr::Column
references to tables will become invalid. Consider this example:

```sql
-- Assume tables: users(id, age), orders(user_id, amount)

-- Get total amount spent per user on orders over $100
SELECT u.id, sub.total
FROM users u JOIN
     (SELECT user_id, SUM(amount) as total
      FROM orders o
      WHERE o.amount > 100
      GROUP BY o.user_id) sub
WHERE u.id = sub.user_id

-- Before subquery unnesting:
-- Main query table_references: [users, sub]
-- u.id refers to table 0, column 0
-- sub.total refers to table 1, column 1
--
-- Subquery table_references: [orders]
-- o.user_id refers to table 0, column 0
-- o.amount refers to table 0, column 1
--
-- After unnesting and folding subquery tables into main query,
-- the query might look like this:

SELECT u.id, SUM(o.amount) as total
FROM users u JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100
GROUP BY u.id;

-- Main query table_references: [users, orders]
-- u.id refers to table index 0 (correct)
-- o.amount refers to table index 0 (incorrect, should be 1)
-- o.user_id refers to table index 0 (incorrect, should be 1)
```

We could ofc traverse every expression in the subquery and rewrite
the table indexes to be correct, but if we instead use stable identifiers
for each table reference, then all the column references will continue
to be correct.

Hence, this PR introduces a `TableInternalId` used in `TableReference`
as well as `Expr::Column` and `Expr::Rowid` so that this kind of query
transformations can happen with less pain.
2025-05-25 20:26:17 +03:00