Commit Graph

64 Commits

Author SHA1 Message Date
Jussi Saurio
a99c8a8ca0 Simplify ORDER BY sorter column remapping
In case an ORDER BY column exactly matches a result column in the SELECT,
the insertion of the result column into the ORDER BY sorter can be skipped
because it's already necessarily inserted as a sorting column.

For this reason we have a mapping to know what index a given result column
has in the order by sorter.

This commit makes that mapping much simpler.
2025-08-15 15:48:41 +03:00
Levy A.
ffd6844b5b refactor: remove PseudoTable from Table
the only reason for `PseudoTable` to exist, is to provide column
information for `PseudoCursor` creation. this should not be part of the
schema.
2025-06-30 14:31:58 -03:00
Levy A.
3907b387b3 cargo fix 2025-06-30 14:01:47 -03:00
Levy A.
afc55b27f0 refactor: remove unnecessary column definitions for PseudoTable
the only information that matters in the amount of column
2025-06-30 13:54:29 -03:00
Pekka Enberg
725c3e4ddc Rename limbo_sqlite3_parser crate to turso_sqlite3_parser 2025-06-29 12:34:46 +03:00
Nils Koch
2827b86917 chore: fix clippy warnings 2025-06-23 19:52:13 +01:00
Jussi Saurio
a549f2971d Merge 'Ephemeral Table in Update' from Pedro Muniz
Closes #1713. Adds ephemeral table when a rowid_alias is being updated.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1726
2025-06-21 19:07:32 +03:00
Piotr Rzysko
64b83a45e8 Fix infinite aggregation loop when sorting is not required
Previously, with the `index_experimental` feature enabled, the query in
the added test would enter an infinite loop. This happened because
`label_grouping_agg_step` pointed to a constant argument that was moved
to the end of the program. As a result, the aggregation loop would jump
to the constant, then return to the start of the main loop, rewind the
index, and re-enter the aggregation loop—causing it to repeat
indefinitely.
2025-06-21 10:03:10 +02:00
pedrocarlo
9ae4f6ec40 fix merge conflict problems 2025-06-20 16:38:10 -03:00
pedrocarlo
e53a290a48 move ephemeral table logic to update plan and reuse select logic for ephemeral index 2025-06-20 16:30:21 -03:00
Piotr Rzysko
64a0333119 Fix missing column references in non-aggregate expressions
Previously, queries like:
```
SELECT
    CASE WHEN c0 != 'x' THEN group_concat(c1, ',') ELSE 'x' END
FROM t0
GROUP BY c0;
```

would return incorrect results because c0 was not copied during the
aggregation loop into a register accessible to the logic processing the
grouped results (e.g., the CASE WHEN expression in this example).

The same issue applied to expressions in the HAVING and ORDER BY clauses.
2025-06-20 06:19:16 +02:00
Piotr Rzysko
08c1767ba7 Collect non-aggregate columns in one place
Previously, the logic for collecting non-aggregate columns was duplicated
across multiple locations and implemented inconsistently. This caused a
bug that was revealed by the refactoring in this commit (see the added
test).
2025-06-20 06:17:14 +02:00
Levy A.
15e0cab8d8 refactor+fix: precompute default values from schema 2025-06-11 14:18:39 -03:00
Jussi Saurio
18dd87eff1 Fix incorrect handling of OR clauses in HAVING 2025-06-10 18:02:14 +03:00
Jussi Saurio
819a6138d0 Merge 'Fix: aggregate regs must be initialized as NULL at the start' from Jussi Saurio
Again found when fuzzing nested where clause subqueries:
Aggregate registers need to be NULLed at the start because the same
registers might be reused on another invocation of a subquery, and if
they are not NULLed, the 2nd invocation of the same subquery will have
values left over from the first invocation.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1614
2025-05-30 09:39:37 +03:00
Jussi Saurio
f8257df77b Fix: aggregate regs must be initialized as NULL at the start 2025-05-29 18:44:53 +03:00
Jussi Saurio
cc405dea7e Use new TableReferences struct everywhere 2025-05-29 11:44:56 +03:00
Jussi Saurio
77ce4780d9 Fix ProgramBuilder::cursor_ref not having unique keys
Currently we have this:

program.alloc_cursor_id(Option<String>, CursorType)`

where the String is the table's name or alias ('users' or 'u' in
the query).

This is problematic because this can happen:

`SELECT * FROM t WHERE EXISTS (SELECT * FROM t)`

There are two cursors, both with identifier 't'. This causes a bug
where the program will use the same cursor for both the main query
and the subquery, since they are keyed by 't'.

Instead introduce `CursorKey`, which is a combination of:

1. `TableInternalId`, and
2. index name (Option<String> -- in case of index cursors.

This should provide key uniqueness for cursors:

`SELECT * FROM t WHERE EXISTS (SELECT * FROM t)`

here the first 't' will have a different `TableInternalId` than the
second `t`, so there is no clash.
2025-05-29 00:59:24 +03:00
Jussi Saurio
4e9d9a2470 Fix LIMIT handling
Currently we have some usages of LIMIT where the actual limit counter
is initialized next to the DecrJumpZero instruction, and then
`program.mark_last_insn_constant()` is used to hoist the counter
initialization to the beginning of the program.

This is very fragile, and already FROM clause subquery handling works
around this with a hack (removed in this PR), and (upcoming) WHERE clause
subqueries would also run into problems because of this, because the LIMIT
might need to be initialized once for every iteration of the subquery.

This PR removes those usages for LIMIT, and LIMIT processing is now more
intuitive:

- limit counter is now initialized at the start of the query processing
- a function init_limit() is extracted to do this for select/update/delete
2025-05-27 21:12:22 +03:00
Jussi Saurio
70965f4b28 Insn::Return: add possibility to fallthrough on non-integer values as per sqlite spec 2025-05-27 19:09:10 +03:00
Jussi Saurio
7c07c09300 Add stable internal_id property to TableReference
Currently our "table id"/"table no"/"table idx" references always
use the direct index of the `TableReference` in the plan, e.g. in
`SelectPlan::table_references`. For example:

```rust
Expr::Column { table: 0, column: 3, .. }
```

refers to the 0'th table in the `table_references` list.

This is a fragile approach because it assumes the table_references
list is stable for the lifetime of the query processing. This has so
far been the case, but there exist certain query transformations,
e.g. subquery unnesting, that may fold new table references from
a subquery (which has its own table ref list) into the table reference
list of the parent.

If such a transformation is made, then potentially all of the Expr::Column
references to tables will become invalid. Consider this example:

```sql
-- Assume tables: users(id, age), orders(user_id, amount)

-- Get total amount spent per user on orders over $100
SELECT u.id, sub.total
FROM users u JOIN
     (SELECT user_id, SUM(amount) as total
      FROM orders o
      WHERE o.amount > 100
      GROUP BY o.user_id) sub
WHERE u.id = sub.user_id

-- Before subquery unnesting:
-- Main query table_references: [users, sub]
-- u.id refers to table 0, column 0
-- sub.total refers to table 1, column 1
--
-- Subquery table_references: [orders]
-- o.user_id refers to table 0, column 0
-- o.amount refers to table 0, column 1
--
-- After unnesting and folding subquery tables into main query,
-- the query might look like this:

SELECT u.id, SUM(o.amount) as total
FROM users u JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100
GROUP BY u.id;

-- Main query table_references: [users, orders]
-- u.id refers to table index 0 (correct)
-- o.amount refers to table index 0 (incorrect, should be 1)
-- o.user_id refers to table index 0 (incorrect, should be 1)
```

We could ofc traverse every expression in the subquery and rewrite
the table indexes to be correct, but if we instead use stable identifiers
for each table reference, then all the column references will continue
to be correct.

Hence, this PR introduces a `TableInternalId` used in `TableReference`
as well as `Expr::Column` and `Expr::Rowid` so that this kind of query
transformations can happen with less pain.
2025-05-25 20:26:17 +03:00
Jussi Saurio
f6443ae742 Support LIMIT with UNION ALL 2025-05-24 13:12:41 +03:00
Jussi Saurio
0c4c451d2a rename 2025-05-22 16:51:03 +03:00
Jussi Saurio
f3ea9a603a add support for SELECT DISTINCT 2025-05-22 16:51:03 +03:00
Jussi Saurio
76227ec274 Rename to Distinctness + add distinctness information to SelectPlan 2025-05-22 16:51:03 +03:00
Pekka Enberg
e102cd0be5 Merge 'Add support for DISTINCT aggregate functions' from Jussi Saurio
Reviewable commit by commit. CI failures are not related.
Adds support for e.g. `select first_name, sum(distinct age),
count(distinct age), avg(distinct age) from users group by 1`
Implementation details:
- Creates an ephemeral index per distinct aggregate, and jumps over the
accumulation step if a duplicate is found

Closes #1507
2025-05-20 13:58:57 +03:00
pedrocarlo
4a3119786e refactor BtreeCursor and Sorter to accept Vec of collations 2025-05-19 15:22:55 -03:00
pedrocarlo
bf1fe9e0b3 Actually fixed group by and order by collation 2025-05-19 15:22:15 -03:00
pedrocarlo
0df6c87f07 Fixed Group By collation 2025-05-19 15:22:14 -03:00
pedrocarlo
f8854f180a Added collation to create table columns 2025-05-19 15:22:14 -03:00
Jussi Saurio
51c75c6014 Support distinct aggregates in GROUP BY 2025-05-17 15:33:55 +03:00
pedrocarlo
bb158a5433 add unique field to Column 2025-05-14 11:34:11 -03:00
Jussi Saurio
37097e01ae GROUP BY: refactor logic to support cases where no sorting is needed 2025-05-08 12:39:26 +03:00
Jussi Saurio
306e097950 Merge 'Fix bug: we cant remove order by terms from the head of the list' from Jussi Saurio
we had an incorrect optimization in `eliminate_orderby_like_groupby()`
where it could remove e.g. the first term of the ORDER BY if it matched
the first GROUP BY term and the result set was naturally ordered by that
term. this is invalid. see e.g.:
```sql
main branch - BAD: removes the `ORDER BY id` term because the results are naturally ordered by id.
However, this results in sorting the entire thing by last name only!

limbo> select id, last_name, count(1) from users GROUP BY 1,2 order by id, last_name desc limit 3;
┌──────┬───────────┬───────────┐
│ id   │ last_name │ count (1) │
├──────┼───────────┼───────────┤
│ 6235 │ Zuniga    │         1 │
├──────┼───────────┼───────────┤
│ 8043 │ Zuniga    │         1 │
├──────┼───────────┼───────────┤
│  944 │ Zimmerman │         1 │
└──────┴───────────┴───────────┘

after fix - GOOD:

limbo> select id, last_name, count(1) from users GROUP BY 1,2 order by id, last_name desc limit 3;
┌────┬───────────┬───────────┐
│ id │ last_name │ count (1) │
├────┼───────────┼───────────┤
│  1 │ Foster    │         1 │
├────┼───────────┼───────────┤
│  2 │ Salazar   │         1 │
├────┼───────────┼───────────┤
│  3 │ Perry     │         1 │
└────┴───────────┴───────────┘

I also refactored sorters to always use the ast `SortOrder` instead of boolean vectors, and use the `compare_immutable()` utility we use inside btrees too.

Closes #1365
2025-05-03 12:48:08 +03:00
Jussi Saurio
029e5eddde Fix existing resolve_label() calls to work with new system 2025-04-24 11:05:21 +03:00
Jussi Saurio
3798b4aa8b use SortOrder in sorters always 2025-04-24 10:34:06 +03:00
Ihor Andrianov
0c9464e3fc reduce vec allocations, add comments for magic ifs 2025-04-05 15:15:10 +03:00
Ihor Andrianov
d4b8fa17f8 fix tests 2025-04-03 22:28:14 +03:00
Ihor Andrianov
34a132fcd3 fix output when group by is not part of resulting set 2025-04-03 22:28:13 +03:00
Ihor Andrianov
91ceab1626 improve naming and add comments for context 2025-04-03 22:28:13 +03:00
Ihor Andrianov
816cbacc9c some smartie optimizations 2025-04-03 22:28:12 +03:00
Ihor Andrianov
2bcdd4e404 non group by cols are displayed in group by agg statements 2025-04-03 22:28:12 +03:00
Ihor Andrianov
40bb867d54 clippy 2025-03-30 19:01:16 +03:00
Ihor Andrianov
db5e364210 made json an optional module again 2025-03-30 19:01:03 +03:00
Ihor Andrianov
101dd51d7c add jsonb_group_object and array 2025-03-30 18:58:39 +03:00
Ihor Andrianov
a983c979c6 jsonb_merge, json_group_array, json_group_object 2025-03-30 18:47:33 +03:00
Jussi Saurio
89e48a16db Add affinity() function to Column 2025-02-18 10:56:30 +02:00
Pekka Enberg
ac54c35f92 Switch to workspace dependencies
...makes it easier to specify a version, which is needed for `cargo publish`.
2025-02-12 17:28:04 +02:00
Pekka Enberg
f3902ef9b6 core: Rename OwnedRecord to Record
We only have one record type so let's call it `Record`.
2025-02-06 13:40:34 +02:00
Jussi Saurio
795576b2ec dont eagerly allocate result column name strings 2025-02-05 17:53:23 +02:00