Commit Graph

112 Commits

Author SHA1 Message Date
Piotr Rzysko
30ae6538ee Treat table-valued functions as tables
With this change, the following two queries are considered equivalent:
```sql
SELECT value FROM generate_series(5, 50);
SELECT value FROM generate_series WHERE start = 5 AND stop = 50;
```
Arguments passed in parentheses to the virtual table name are now
matched to hidden columns.

Column references are still not supported as table-valued function
arguments. The only difference is that previously, a query like:
```sql
SELECT one.value, series.value
FROM (SELECT 1 AS value) one, generate_series(one.value, 3) series;
```
would cause a panic. Now, it returns a proper error message instead.

Adding support for column references is more nuanced for two main
reasons:
- We need to ensure that in joins where a TVF depends on other tables,
those other tables are processed first. For example, in:
```sql
SELECT one.value, series.value
FROM generate_series(one.value, 3) series, (SELECT 1 AS value) one;
```
the one table must be processed by the top-level loop, and series must
be nested.
- For outer joins involving TVFs, the arguments must be treated as ON
predicates, not WHERE predicates.
2025-07-14 07:16:53 +02:00
Piotr Rzysko
9102f4a2f4 Extract table parsing into separate method
This is in preparation for reusing the method when parsing `TableCall`s.
2025-07-14 07:16:53 +02:00
Piotr Rzysko
000d70f1f3 Propagate info about hidden columns 2025-07-14 07:16:53 +02:00
Nils Koch
828d4f5016 fix clippy errors for rust 1.88.0 (auto fix) 2025-07-12 18:58:41 +03:00
Pekka Enberg
3f10427f52 core: Fix resolve_function() error messages
We need to return the original function name, not normalized one to be
compatible with SQLite.

Spotted by SQLite TCL tests.
2025-07-09 15:30:57 +03:00
Pekka Enberg
7f91768ff6 core/translate: Unify no such table error messages
We're now mixing different error messages, which makes compatibility
testing pretty hard. Unify on a single, SQLite compatible error message
"no such table".
2025-07-07 11:10:46 +03:00
Pekka Enberg
725c3e4ddc Rename limbo_sqlite3_parser crate to turso_sqlite3_parser 2025-06-29 12:34:46 +03:00
Pekka Enberg
2fc5c0ce5c Switch to runtime flag for enabling indexes
Makes it easier to test the feature:

```
$ cargo run --  --experimental-indexes
Limbo v0.0.22
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database
limbo> CREATE TABLE t(x);
limbo> CREATE INDEX t_idx ON t(x);
limbo> DROP INDEX t_idx;
```
2025-06-26 10:07:28 +03:00
Nils Koch
2827b86917 chore: fix clippy warnings 2025-06-23 19:52:13 +01:00
Piotr Rzysko
64a0333119 Fix missing column references in non-aggregate expressions
Previously, queries like:
```
SELECT
    CASE WHEN c0 != 'x' THEN group_concat(c1, ',') ELSE 'x' END
FROM t0
GROUP BY c0;
```

would return incorrect results because c0 was not copied during the
aggregation loop into a register accessible to the logic processing the
grouped results (e.g., the CASE WHEN expression in this example).

The same issue applied to expressions in the HAVING and ORDER BY clauses.
2025-06-20 06:19:16 +02:00
Pere Diaz Bou
dde93e8deb disable distinct without index_experimental
distinct uses indexes, therefore we need to disable them
2025-06-17 19:33:23 +02:00
Piotr Rzysko
4d35e36b77 Introduce virtual table types 2025-06-01 07:45:57 +02:00
Jussi Saurio
cc405dea7e Use new TableReferences struct everywhere 2025-05-29 11:44:56 +03:00
Jussi Saurio
73e806ad84 Make WhereTerm::consumed a Cell<bool>
Currently in the main translation logic after planning and optimization,
we don't _really_ need to pass a &mut Vec<WhereTerm> around anymore, except
for the fact that virtual table constraint resolution is done ad-hoc in
`init_loop()`. Even there, the only thing we mutate is `WhereTerm::consumed`
which is a boolean indicating that the term has been "used up" by the optimizer
and shouldn't be evaluated as a normal where clause condition anymore.

In the upcoming branch for WHERE clause subqueries, I want to store immutable
references to WHERE clause expressions in `Resolver`, but this is unfortunately
not possible if we still use the aforementioned mutable references.

Hence, we can temporarily make `WhereTerm::consumed` a `Cell<bool>` which allows
us to pass an immutable reference to `init_loop()`, and the `Cell` can be removed
once the virtual table constraint resolution is moved to an earlier part of the
query processing pipeline.
2025-05-28 11:02:39 +03:00
pedrocarlo
bb7da39c72 remove assumption that translate_select is always called from a top-level context + adjust insert to use translate_select when needed 2025-05-25 19:12:30 -03:00
Jussi Saurio
07fa3a9668 Rename SelectQueryType to QueryDestination 2025-05-25 21:23:04 +03:00
Jussi Saurio
7c07c09300 Add stable internal_id property to TableReference
Currently our "table id"/"table no"/"table idx" references always
use the direct index of the `TableReference` in the plan, e.g. in
`SelectPlan::table_references`. For example:

```rust
Expr::Column { table: 0, column: 3, .. }
```

refers to the 0'th table in the `table_references` list.

This is a fragile approach because it assumes the table_references
list is stable for the lifetime of the query processing. This has so
far been the case, but there exist certain query transformations,
e.g. subquery unnesting, that may fold new table references from
a subquery (which has its own table ref list) into the table reference
list of the parent.

If such a transformation is made, then potentially all of the Expr::Column
references to tables will become invalid. Consider this example:

```sql
-- Assume tables: users(id, age), orders(user_id, amount)

-- Get total amount spent per user on orders over $100
SELECT u.id, sub.total
FROM users u JOIN
     (SELECT user_id, SUM(amount) as total
      FROM orders o
      WHERE o.amount > 100
      GROUP BY o.user_id) sub
WHERE u.id = sub.user_id

-- Before subquery unnesting:
-- Main query table_references: [users, sub]
-- u.id refers to table 0, column 0
-- sub.total refers to table 1, column 1
--
-- Subquery table_references: [orders]
-- o.user_id refers to table 0, column 0
-- o.amount refers to table 0, column 1
--
-- After unnesting and folding subquery tables into main query,
-- the query might look like this:

SELECT u.id, SUM(o.amount) as total
FROM users u JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100
GROUP BY u.id;

-- Main query table_references: [users, orders]
-- u.id refers to table index 0 (correct)
-- o.amount refers to table index 0 (incorrect, should be 1)
-- o.user_id refers to table index 0 (incorrect, should be 1)
```

We could ofc traverse every expression in the subquery and rewrite
the table indexes to be correct, but if we instead use stable identifiers
for each table reference, then all the column references will continue
to be correct.

Hence, this PR introduces a `TableInternalId` used in `TableReference`
as well as `Expr::Column` and `Expr::Rowid` so that this kind of query
transformations can happen with less pain.
2025-05-25 20:26:17 +03:00
Jussi Saurio
8ed5334ca7 tests/fuzz: add compound_select_fuzz() 2025-05-24 13:12:41 +03:00
Jussi Saurio
362347c474 refactor: use walk_expr() in determine_where_to_eval_expr() 2025-05-23 16:27:28 +03:00
Jussi Saurio
9ec84e3905 refactor: use walk_expr() in table_mask_from_expr() 2025-05-23 16:27:28 +03:00
Jussi Saurio
3835a29f47 refactor: use walk_expr() in resolve_aggregates() 2025-05-23 16:27:23 +03:00
Jussi Saurio
2ab5c5f6a9 refactor: use walk_expr_mut() in bind_column_references() 2025-05-23 15:56:49 +03:00
Jussi Saurio
76227ec274 Rename to Distinctness + add distinctness information to SelectPlan 2025-05-22 16:51:03 +03:00
Pekka Enberg
e102cd0be5 Merge 'Add support for DISTINCT aggregate functions' from Jussi Saurio
Reviewable commit by commit. CI failures are not related.
Adds support for e.g. `select first_name, sum(distinct age),
count(distinct age), avg(distinct age) from users group by 1`
Implementation details:
- Creates an ephemeral index per distinct aggregate, and jumps over the
accumulation step if a duplicate is found

Closes #1507
2025-05-20 13:58:57 +03:00
Jussi Saurio
d584a1879b Mark WHERE terms as consumed instead of deleting them
We've run into trouble in multiple places due to the fact that
we delete terms from the where clause (e.g. when a constant condition
is removed, or the term becomes part of an index seek key).

A simpler solution is to add a flag indicating that the term is
consumed (used), so that it is not translated in the main loop
anymore when WHERE clause terms are evaluated.
2025-05-17 15:44:12 +03:00
Jussi Saurio
368c45e025 Add distinctness information to Aggregate struct 2025-05-17 15:33:55 +03:00
Jussi Saurio
a90358f669 TableMask: comments 2025-05-14 09:42:26 +03:00
Jussi Saurio
3b1aef4a9e Do Less Work (tm) - everything works except ephemeral 2025-05-14 09:42:01 +03:00
Jussi Saurio
1e46f1d9de Feature: join reordering optimizer 2025-05-14 09:40:48 +03:00
Jussi Saurio
330fedbc2f Add notion of join ordering to plan + make determining where to eval expr dynamic always 2025-05-03 15:32:06 +03:00
Jussi Saurio
5a1cfb7d15 Add ColumnUsedMask struct to TableReference to track columns referenced in query 2025-04-15 15:13:31 +03:00
Jussi Saurio
c6bea835f9 Fix trying to use index when both sides of comparison refer to same table 2025-04-12 11:13:33 +03:00
Jussi Saurio
a706b7160a planner: support index backwards seeks and iteration 2025-04-09 10:14:29 +03:00
PThorpe92
13e084351d Change parse_limit function to accept reference value to ast::Limit 2025-04-04 12:38:18 -04:00
Pekka Enberg
7f2525ac27 Merge 'Implement create virtual table using vtab modules, more work on virtual tables' from Preston Thorpe
This PR started out as one to improve the API of extensions but I ended
up building on top of this quite a bit and it just kept going. Sorry
this one is so large but there wasn't really a good stopping point, as
it kept leaving stuff in broken states.
**VCreate**: Support for `CREATE VIRTUAL TABLE t USING vtab_module`
**VUpdate**: Support for `INSERT` and `DELETE` methods on virtual
tables.
Sqlite uses `xUpdate` function with the `VUpdate` opcode to handle all
insert/update/delete functionality in virtual tables..
have to just document that:
```
if args[0] == NULL:  INSERT args[1] the values in args[2..]

if args[1] == NULL: DELETE args[0]

if args[0] != NULL && len(args) > 2: Update values=args[2..]  rowid=args[0]
```
I know I asked @jussisaurio on discord about this already, but it just
sucked so bad that I added some internal translation so we could expose
a [nice API](https://github.com/tursodatabase/limbo/pull/996/files#diff-
3e8f8a660b11786745b48b528222d11671e9f19fa00a032a4eefb5412e8200d1R54) and
handle the logic ourselves while keeping with sqlite's opcodes.
I'll change it back if I have to, I just thought it was genuinely awful
to have to rely on comments to explain all that to extension authors.
The included extension is not meant to be a legitimately useful one, it
is there for testing purposes. I did something similar in #960 using a
test extension, so I figure when they are both merged, I will go back
and combine them into one since you can do many kinds at once, and that
way it will reduce the amount of crates and therefore compile time.
1. Remaining opcodes.
2. `UPDATE` (when we support the syntax)
3. `xConnect` - expose API for a DB connection to a vtab so it can
perform arbitrary queries.

Closes #996
2025-02-25 15:31:12 +02:00
PThorpe92
9b742e1a76 Remove clone in vtab from_args 2025-02-17 22:37:17 -05:00
PThorpe92
8b5772fe1c Implement VUpdate (insert/delete for virtual tables 2025-02-17 20:44:44 -05:00
PThorpe92
9c8083231c Implement create virtual table and VUpdate opcode 2025-02-17 20:44:44 -05:00
Jussi Saurio
8e5499e5ed Fix not evaling constant conditions when no tables in query
We were not evaluating constant conditions (e.g '1 IS NULL')
when there were no tables referenced in the query, because
our WHERE term evaluation was based on "during which loop"
to evaluate them. However, when there are no tables, there are
no loops, so they were never evaluated.
2025-02-17 13:10:27 +02:00
Jussi Saurio
a7300a4e0c select: fix bug with referring to a mixed-case alias 2025-02-14 16:40:58 +02:00
Pekka Enberg
ac54c35f92 Switch to workspace dependencies
...makes it easier to specify a version, which is needed for `cargo publish`.
2025-02-12 17:28:04 +02:00
Jussi Saurio
9e70e8fe02 Add basic CTE support 2025-02-08 14:50:05 +02:00
Jussi Saurio
338c27dad6 introduce Scope and Cte structs 2025-02-08 14:49:46 +02:00
PThorpe92
ae88d51e6f Remove TableReferenceType enum to clean up planner 2025-02-06 09:15:39 -05:00
PThorpe92
d4c06545e1 Refactor vtable impl and remove Rc Refcell from module 2025-02-06 09:15:39 -05:00
Jussi Saurio
f5f77c0bd1 Initial virtual table implementation 2025-02-06 07:51:50 -05:00
Jussi Saurio
795576b2ec dont eagerly allocate result column name strings 2025-02-05 17:53:23 +02:00
Jussi Saurio
1e5501650a Support column aliases in GROUP BY, ORDER BY and HAVING 2025-02-03 10:44:05 +02:00
Pekka Enberg
662d629666 Rename JoinAwareConditionExpr to WhereTerm
We transform all JOIN conditions into WHERE clause terms in the query
planner. The JoinAwareConditionExpr name tries to make that point, but I
think it makes things more confusing. Let's call it WhereTerm (suggested
by Jussi).
2025-02-03 07:46:51 +02:00
Jussi Saurio
16a97d3b98 planner.rs: refactor from/join + where parsing logic
- use new TableReference and JoinAwareConditionExpr
- add utilities for determining at which loop depth a
  WHERE condition should be evaluated, now that "operators"
  do not carry condition expressions inside them anymore.
2025-02-02 10:18:13 +02:00