Commit Graph

132 Commits

Author SHA1 Message Date
Piotr Rzysko
000d70f1f3 Propagate info about hidden columns 2025-07-14 07:16:53 +02:00
Nils Koch
828d4f5016 fix clippy errors for rust 1.88.0 (auto fix) 2025-07-12 18:58:41 +03:00
meteorgan
4a516ab414 Support except operator for compound select 2025-07-08 22:57:20 +08:00
Levy A.
ffd6844b5b refactor: remove PseudoTable from Table
the only reason for `PseudoTable` to exist, is to provide column
information for `PseudoCursor` creation. this should not be part of the
schema.
2025-06-30 14:31:58 -03:00
Pekka Enberg
725c3e4ddc Rename limbo_sqlite3_parser crate to turso_sqlite3_parser 2025-06-29 12:34:46 +03:00
Pekka Enberg
eb0de4066b Rename limbo_ext crate to turso_ext 2025-06-29 12:14:08 +03:00
Nils Koch
2827b86917 chore: fix clippy warnings 2025-06-23 19:52:13 +01:00
pedrocarlo
6596ee28a8 introduce EphemeralTable query destination 2025-06-20 16:30:21 -03:00
pedrocarlo
e53a290a48 move ephemeral table logic to update plan and reuse select logic for ephemeral index 2025-06-20 16:30:21 -03:00
pedrocarlo
9048ad398b modify loop functions to accomodate for ephemeral tables 2025-06-20 16:29:10 -03:00
Piotr Rzysko
08c1767ba7 Collect non-aggregate columns in one place
Previously, the logic for collecting non-aggregate columns was duplicated
across multiple locations and implemented inconsistently. This caused a
bug that was revealed by the refactoring in this commit (see the added
test).
2025-06-20 06:17:14 +02:00
meteorgan
6179d8de23 refactor compound select 2025-06-13 10:39:32 +03:00
Levy A.
de2ac89ad2 feat: complete ALTER TABLE implementation 2025-06-11 14:17:36 -03:00
pedrocarlo
bfc8cb6d4c move display and to_sql_string impls to separate modules for plan 2025-06-04 12:06:43 -03:00
pedrocarlo
fa0dff9843 Fix rebase changes 2025-06-04 12:06:43 -03:00
pedrocarlo
a96577529e impl ToSqlString for Update Plan 2025-06-04 12:06:43 -03:00
pedrocarlo
d243d1015c impl ToSqlString for Delete Plan 2025-06-04 12:06:43 -03:00
pedrocarlo
ff5aa17769 impl ToSqlString for CompoundSelect Plan 2025-06-04 12:06:43 -03:00
pedrocarlo
51014d01c3 impl ToSqlString for SelectPlan 2025-06-04 12:06:43 -03:00
Jussi Saurio
cc405dea7e Use new TableReferences struct everywhere 2025-05-29 11:44:56 +03:00
Jussi Saurio
124b38a262 plan.rs: add new datastructures
- TableReferences struct, which holds both:
     - joined_tables, and
     - outer_query_refs

- JoinedTable:
     - this is just a rename of the previous TableReference struct

- OuterQueryReference
     - this is to distinguish from JoinedTable those cases where
       e.g. a subquery refers to an outer query's table, or a CTE
       refers to a previous CTE.

Both JoinedTable and OuterQueryReference can be referred to by expressions,
but only JoinedTables are considered for join ordering optimization and so
forth.

This commit does not compile.
2025-05-29 11:03:09 +03:00
Jussi Saurio
77ce4780d9 Fix ProgramBuilder::cursor_ref not having unique keys
Currently we have this:

program.alloc_cursor_id(Option<String>, CursorType)`

where the String is the table's name or alias ('users' or 'u' in
the query).

This is problematic because this can happen:

`SELECT * FROM t WHERE EXISTS (SELECT * FROM t)`

There are two cursors, both with identifier 't'. This causes a bug
where the program will use the same cursor for both the main query
and the subquery, since they are keyed by 't'.

Instead introduce `CursorKey`, which is a combination of:

1. `TableInternalId`, and
2. index name (Option<String> -- in case of index cursors.

This should provide key uniqueness for cursors:

`SELECT * FROM t WHERE EXISTS (SELECT * FROM t)`

here the first 't' will have a different `TableInternalId` than the
second `t`, so there is no clash.
2025-05-29 00:59:24 +03:00
Jussi Saurio
73e806ad84 Make WhereTerm::consumed a Cell<bool>
Currently in the main translation logic after planning and optimization,
we don't _really_ need to pass a &mut Vec<WhereTerm> around anymore, except
for the fact that virtual table constraint resolution is done ad-hoc in
`init_loop()`. Even there, the only thing we mutate is `WhereTerm::consumed`
which is a boolean indicating that the term has been "used up" by the optimizer
and shouldn't be evaluated as a normal where clause condition anymore.

In the upcoming branch for WHERE clause subqueries, I want to store immutable
references to WHERE clause expressions in `Resolver`, but this is unfortunately
not possible if we still use the aforementioned mutable references.

Hence, we can temporarily make `WhereTerm::consumed` a `Cell<bool>` which allows
us to pass an immutable reference to `init_loop()`, and the `Cell` can be removed
once the virtual table constraint resolution is moved to an earlier part of the
query processing pipeline.
2025-05-28 11:02:39 +03:00
Jussi Saurio
07fa3a9668 Rename SelectQueryType to QueryDestination 2025-05-25 21:23:04 +03:00
Jussi Saurio
d893a55c55 UNION 2025-05-25 21:23:04 +03:00
Jussi Saurio
7c07c09300 Add stable internal_id property to TableReference
Currently our "table id"/"table no"/"table idx" references always
use the direct index of the `TableReference` in the plan, e.g. in
`SelectPlan::table_references`. For example:

```rust
Expr::Column { table: 0, column: 3, .. }
```

refers to the 0'th table in the `table_references` list.

This is a fragile approach because it assumes the table_references
list is stable for the lifetime of the query processing. This has so
far been the case, but there exist certain query transformations,
e.g. subquery unnesting, that may fold new table references from
a subquery (which has its own table ref list) into the table reference
list of the parent.

If such a transformation is made, then potentially all of the Expr::Column
references to tables will become invalid. Consider this example:

```sql
-- Assume tables: users(id, age), orders(user_id, amount)

-- Get total amount spent per user on orders over $100
SELECT u.id, sub.total
FROM users u JOIN
     (SELECT user_id, SUM(amount) as total
      FROM orders o
      WHERE o.amount > 100
      GROUP BY o.user_id) sub
WHERE u.id = sub.user_id

-- Before subquery unnesting:
-- Main query table_references: [users, sub]
-- u.id refers to table 0, column 0
-- sub.total refers to table 1, column 1
--
-- Subquery table_references: [orders]
-- o.user_id refers to table 0, column 0
-- o.amount refers to table 0, column 1
--
-- After unnesting and folding subquery tables into main query,
-- the query might look like this:

SELECT u.id, SUM(o.amount) as total
FROM users u JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100
GROUP BY u.id;

-- Main query table_references: [users, orders]
-- u.id refers to table index 0 (correct)
-- o.amount refers to table index 0 (incorrect, should be 1)
-- o.user_id refers to table index 0 (incorrect, should be 1)
```

We could ofc traverse every expression in the subquery and rewrite
the table indexes to be correct, but if we instead use stable identifiers
for each table reference, then all the column references will continue
to be correct.

Hence, this PR introduces a `TableInternalId` used in `TableReference`
as well as `Expr::Column` and `Expr::Rowid` so that this kind of query
transformations can happen with less pain.
2025-05-25 20:26:17 +03:00
Jussi Saurio
08bda9cc58 UNION ALL 2025-05-24 13:12:41 +03:00
Jussi Saurio
c18c6a00fa refactor: use walk_expr() in resolving vtab constraints 2025-05-23 16:28:56 +03:00
Jussi Saurio
597020bc0c Merge 'Support values statement and values in select' from meteorgan
Close: #866
**limbo output**:
```
limbo> explain values(1, 2);
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     5     0                    0   Start at 5
1     Integer            1     1     0                    0   r[1]=1
2     Integer            2     2     0                    0   r[2]=2
3     ResultRow          1     2     0                    0   output=r[1..2]
4     Halt               0     0     0                    0
5     Goto               0     1     0                    0

limbo> explain values(1, 2), (3, 4);
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     16    0                    0   Start at 16
1     InitCoroutine      1     9     2                    0
2     Integer            1     2     0                    0   r[2]=1
3     Integer            2     3     0                    0   r[3]=2
4     Yield              1     0     0                    0
5     Integer            3     2     0                    0   r[2]=3
6     Integer            4     3     0                    0   r[3]=4
7     Yield              1     0     0                    0
8     EndCoroutine       1     0     0                    0
9     InitCoroutine      1     0     2                    0
10    Yield              1     15    0                    0
11    Copy               2     4     0                    0   r[4]=r[2]
12    Copy               3     5     0                    0   r[5]=r[3]
13    ResultRow          4     2     0                    0   output=r[4..5]
14    Goto               0     10    0                    0
15    Halt               0     0     0                    0
16    Goto               0     1     0                    0

limbo> explain select * from (values(1, 2), (3, 4));
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     16    0                    0   Start at 16
1     InitCoroutine      1     9     2                    0
2     Integer            1     2     0                    0   r[2]=1
3     Integer            2     3     0                    0   r[3]=2
4     Yield              1     0     0                    0
5     Integer            3     2     0                    0   r[2]=3
6     Integer            4     3     0                    0   r[3]=4
7     Yield              1     0     0                    0
8     EndCoroutine       1     0     0                    0
9     InitCoroutine      1     0     2                    0
10    Yield              1     15    0                    0
11    Copy               2     4     0                    0   r[4]=r[2]
12    Copy               3     5     0                    0   r[5]=r[3]
13    ResultRow          4     2     0                    0   output=r[4..5]
14    Goto               0     10    0                    0
15    Halt               0     0     0                    0
16    Transaction        0     0     0                    0   write=false
17    Goto               0     1     0                    0
```
**sqlite output**:
```
sqlite> explain values(1, 2);
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     5     0                    0   Start at 5
1     Integer        1     1     0                    0   r[1]=1
2     Integer        2     2     0                    0   r[2]=2
3     ResultRow      1     2     0                    0   output=r[1..2]
4     Halt           0     0     0                    0
5     Goto           0     1     0                    0
sqlite> explain values(1, 2), (3, 4);
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     16    0                    0   Start at 16
1     InitCoroutine  1     9     2                    0
2     Integer        1     4     0                    0   r[4]=1
3     Integer        2     5     0                    0   r[5]=2
4     Yield          1     0     0                    0
5     Integer        3     4     0                    0   r[4]=3
6     Integer        4     5     0                    0   r[5]=4
7     Yield          1     0     0                    0
8     EndCoroutine   1     0     0                    0
9     InitCoroutine  1     0     2                    0
10      Yield          1     15    0                    0   next row of 2-ROW VALUES CLAUSE
11      Copy           4     8     0                    2   r[8]=r[4]
12      Copy           5     9     0                    2   r[9]=r[5]
13      ResultRow      8     2     0                    0   output=r[8..9]
14    Goto           0     10    0                    0
15    Halt           0     0     0                    0
16    Goto           0     1     0                    0
sqlite>  explain select * from (values(1, 2), (3, 4));
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     16    0                    0   Start at 16
1     InitCoroutine  1     9     2                    0
2     Integer        1     4     0                    0   r[4]=1
3     Integer        2     5     0                    0   r[5]=2
4     Yield          1     0     0                    0
5     Integer        3     4     0                    0   r[4]=3
6     Integer        4     5     0                    0   r[5]=4
7     Yield          1     0     0                    0
8     EndCoroutine   1     0     0                    0
9     InitCoroutine  1     0     2                    0
10      Yield          1     15    0                    0   next row of 2-ROW VALUES CLAUSE
11      Copy           4     8     0                    2   r[8]=r[4]
12      Copy           5     9     0                    2   r[9]=r[5]
13      ResultRow      8     2     0                    0   output=r[8..9]
14    Goto           0     10    0                    0
15    Halt           0     0     0                    0
16    Goto           0     1     0                    0
```

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1549
2025-05-23 13:56:31 +03:00
Jussi Saurio
128a406f8c TableReference: fix stale comment 2025-05-23 10:06:24 +03:00
meteorgan
0467d7e11b Support values statement and values in select 2025-05-23 00:29:54 +08:00
Jussi Saurio
6ed5412bde extract method 2025-05-22 16:51:03 +03:00
Jussi Saurio
76227ec274 Rename to Distinctness + add distinctness information to SelectPlan 2025-05-22 16:51:03 +03:00
Jussi Saurio
14058357ad Merge 'refactor: replace Operation::Subquery with Table::FromClauseSubquery' from Jussi Saurio
Previously the Operation enum consisted of:
- Operation::Scan
- Operation::Search
- Operation::Subquery
Which was always a dumb hack because what we really are doing is an
Operation::Scan on a "virtual"/"pseudo" table (overloaded names...)
derived from a subquery appearing in the FROM clause.
Hence, refactor the relevant data structures so that the Table enum now
contains a new variant:
Table::FromClauseSubquery
And the Operation enum only consists of Scan and Search.
```
SELECT * FROM (SELECT ...) sub;

-- the subquery here was previously interpreted as Operation::Subquery on a Table::Pseudo,
-- with a lot of special handling for Operation::Subquery in different code paths
-- now it's an Operation::Scan on a Table::FromClauseSubquery
```
No functional changes (intended, at least!)

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1529
2025-05-20 14:31:42 +03:00
Pekka Enberg
e102cd0be5 Merge 'Add support for DISTINCT aggregate functions' from Jussi Saurio
Reviewable commit by commit. CI failures are not related.
Adds support for e.g. `select first_name, sum(distinct age),
count(distinct age), avg(distinct age) from users group by 1`
Implementation details:
- Creates an ephemeral index per distinct aggregate, and jumps over the
accumulation step if a duplicate is found

Closes #1507
2025-05-20 13:58:57 +03:00
Jussi Saurio
3121c6cdd3 Replace Operation::Subquery with Table::FromClauseSubquery
Previously the Operation enum consisted of:

- Operation::Scan
- Operation::Search
- Operation::Subquery

Which was always a dumb hack because what we really are doing is
an Operation::Scan on a "virtual"/"pseudo" table (overloaded names...)
derived from a subquery appearing in the FROM clause.

Hence, refactor the relevant data structures so that the Table enum
now contains a new variant:

Table::FromClauseSubquery

And the Operation enum only consists of Scan and Search.

No functional changes (intended, at least!)
2025-05-20 12:56:30 +03:00
pedrocarlo
f8854f180a Added collation to create table columns 2025-05-19 15:22:14 -03:00
Jussi Saurio
d584a1879b Mark WHERE terms as consumed instead of deleting them
We've run into trouble in multiple places due to the fact that
we delete terms from the where clause (e.g. when a constant condition
is removed, or the term becomes part of an index seek key).

A simpler solution is to add a flag indicating that the term is
consumed (used), so that it is not translated in the main loop
anymore when WHERE clause terms are evaluated.
2025-05-17 15:44:12 +03:00
Jussi Saurio
368c45e025 Add distinctness information to Aggregate struct 2025-05-17 15:33:55 +03:00
Pekka Enberg
e3f71259d8 Rename OwnedValue -> Value
We have not had enough merge conflicts for a while so let's do a
tree-wide rename.
2025-05-15 09:59:46 +03:00
pedrocarlo
bb158a5433 add unique field to Column 2025-05-14 11:34:11 -03:00
Jussi Saurio
625cf005fd Add some utilities to constraint related structs 2025-05-14 09:42:26 +03:00
Jussi Saurio
c02d3f8bcd Do groupby/orderby sort elimination based on optimizer decision 2025-05-14 09:41:13 +03:00
PThorpe92
0593a99f0e Remove insertCtx from parameters and replace fix with expr rewriting 2025-05-13 12:49:16 -04:00
pedrocarlo
9f726dbe62 simplify simple count detection 2025-05-10 22:36:43 -03:00
pedrocarlo
e9b1631d3c fix is_simple_count detection 2025-05-10 22:23:01 -03:00
Pekka Enberg
97ad25c506 Merge 'Initial implementation of ALTER TABLE RENAME' from Levy A.
- [x] `ALTER TABLE _ RENAME TO _`

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1456
2025-05-10 07:57:42 +03:00
Levy A.
023a116b0d feat: initial implementation of ALTER TABLE
only supporting renaming tables
2025-05-08 09:24:56 -03:00
Jussi Saurio
37097e01ae GROUP BY: refactor logic to support cases where no sorting is needed 2025-05-08 12:39:26 +03:00
Jussi Saurio
330fedbc2f Add notion of join ordering to plan + make determining where to eval expr dynamic always 2025-05-03 15:32:06 +03:00