Commit Graph

459 Commits

Author SHA1 Message Date
Pekka Enberg
f24e254ec6 core/translate: Fix "misuse of aggregate function" error message
```
sqlite> CREATE TABLE test1(f1, f2);
sqlite> SELECT SUM(min(f1)) FROM test1;
Parse error: misuse of aggregate function min()
  SELECT SUM(min(f1)) FROM test1;
             ^--- error here
```

Spotted by SQLite TCL tests.
2025-07-10 14:29:59 +03:00
KaguraMilet
9d6ae78786 Merge branch 'tursodatabase:main' into distance 2025-07-10 19:15:08 +08:00
Pekka Enberg
3f10427f52 core: Fix resolve_function() error messages
We need to return the original function name, not normalized one to be
compatible with SQLite.

Spotted by SQLite TCL tests.
2025-07-09 15:30:57 +03:00
pedrocarlo
b85687658d change instrumentation level to INFO 2025-07-07 11:53:45 -03:00
pedrocarlo
5559c45011 more instrumentation + write counter should decrement if pwrite fails 2025-07-07 11:50:21 -03:00
pedrocarlo
897426a662 add error tracing to relevant functions + rollback transaction in step_end_write_txn + make move_to_root return result 2025-07-07 11:50:21 -03:00
KaguraMilet
ac95758f76 feat(vector): integrate euclidean distance into limbo 2025-07-07 21:11:51 +08:00
Levy A.
ffd6844b5b refactor: remove PseudoTable from Table
the only reason for `PseudoTable` to exist, is to provide column
information for `PseudoCursor` creation. this should not be part of the
schema.
2025-06-30 14:31:58 -03:00
Pekka Enberg
725c3e4ddc Rename limbo_sqlite3_parser crate to turso_sqlite3_parser 2025-06-29 12:34:46 +03:00
Piotr Rzysko
116df2ec86 Fix evaluation of ISNULL/NOTNULL in OR expressions
Previously, the `jump_if_condition_is_true` flag was not respected. As a
result, for expressions like <`ISNULL`/`NOTNULL`> `OR` <rhs>, the <rhs>
expression was evaluated even when the left-hand side was true, and its
value was incorrectly used as the final result.
2025-06-27 08:21:40 +02:00
Nils Koch
2827b86917 chore: fix clippy warnings 2025-06-23 19:52:13 +01:00
Piotr Rzysko
64a0333119 Fix missing column references in non-aggregate expressions
Previously, queries like:
```
SELECT
    CASE WHEN c0 != 'x' THEN group_concat(c1, ',') ELSE 'x' END
FROM t0
GROUP BY c0;
```

would return incorrect results because c0 was not copied during the
aggregation loop into a register accessible to the logic processing the
grouped results (e.g., the CASE WHEN expression in this example).

The same issue applied to expressions in the HAVING and ORDER BY clauses.
2025-06-20 06:19:16 +02:00
Levy A.
b88cb99ff0 fix warnings and some refactoring 2025-06-11 14:19:06 -03:00
Levy A.
49a6ddad97 wip 2025-06-11 14:19:04 -03:00
Levy A.
15e0cab8d8 refactor+fix: precompute default values from schema 2025-06-11 14:18:39 -03:00
Krishna Vishal
0d5cbc4f1d Add affinity check as a function as ast::Operator impl 2025-06-11 00:33:48 +05:30
Krishna Vishal
712c94537c Add affinity flags to IS and IS NOT opeartors 2025-06-11 00:33:48 +05:30
krishvishal
5837f7329f clean up 2025-06-11 00:33:47 +05:30
krishvishal
7bd1589615 Added affinity inference and conversion for comparison ops.
Added affinity helper function for `CmpInsFlags`
2025-06-11 00:33:44 +05:30
pedrocarlo
80c480517a incorrect placeholder label in where clause translation 2025-06-10 12:00:19 -03:00
Jussi Saurio
cc405dea7e Use new TableReferences struct everywhere 2025-05-29 11:44:56 +03:00
Jussi Saurio
77ce4780d9 Fix ProgramBuilder::cursor_ref not having unique keys
Currently we have this:

program.alloc_cursor_id(Option<String>, CursorType)`

where the String is the table's name or alias ('users' or 'u' in
the query).

This is problematic because this can happen:

`SELECT * FROM t WHERE EXISTS (SELECT * FROM t)`

There are two cursors, both with identifier 't'. This causes a bug
where the program will use the same cursor for both the main query
and the subquery, since they are keyed by 't'.

Instead introduce `CursorKey`, which is a combination of:

1. `TableInternalId`, and
2. index name (Option<String> -- in case of index cursors.

This should provide key uniqueness for cursors:

`SELECT * FROM t WHERE EXISTS (SELECT * FROM t)`

here the first 't' will have a different `TableInternalId` than the
second `t`, so there is no clash.
2025-05-29 00:59:24 +03:00
Jussi Saurio
51605ad2a4 Use lifetimes in walk_expr() to guarantee that child expr has same lifetime as parent expr 2025-05-28 10:56:30 +03:00
Jussi Saurio
7c07c09300 Add stable internal_id property to TableReference
Currently our "table id"/"table no"/"table idx" references always
use the direct index of the `TableReference` in the plan, e.g. in
`SelectPlan::table_references`. For example:

```rust
Expr::Column { table: 0, column: 3, .. }
```

refers to the 0'th table in the `table_references` list.

This is a fragile approach because it assumes the table_references
list is stable for the lifetime of the query processing. This has so
far been the case, but there exist certain query transformations,
e.g. subquery unnesting, that may fold new table references from
a subquery (which has its own table ref list) into the table reference
list of the parent.

If such a transformation is made, then potentially all of the Expr::Column
references to tables will become invalid. Consider this example:

```sql
-- Assume tables: users(id, age), orders(user_id, amount)

-- Get total amount spent per user on orders over $100
SELECT u.id, sub.total
FROM users u JOIN
     (SELECT user_id, SUM(amount) as total
      FROM orders o
      WHERE o.amount > 100
      GROUP BY o.user_id) sub
WHERE u.id = sub.user_id

-- Before subquery unnesting:
-- Main query table_references: [users, sub]
-- u.id refers to table 0, column 0
-- sub.total refers to table 1, column 1
--
-- Subquery table_references: [orders]
-- o.user_id refers to table 0, column 0
-- o.amount refers to table 0, column 1
--
-- After unnesting and folding subquery tables into main query,
-- the query might look like this:

SELECT u.id, SUM(o.amount) as total
FROM users u JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100
GROUP BY u.id;

-- Main query table_references: [users, orders]
-- u.id refers to table index 0 (correct)
-- o.amount refers to table index 0 (incorrect, should be 1)
-- o.user_id refers to table index 0 (incorrect, should be 1)
```

We could ofc traverse every expression in the subquery and rewrite
the table indexes to be correct, but if we instead use stable identifiers
for each table reference, then all the column references will continue
to be correct.

Hence, this PR introduces a `TableInternalId` used in `TableReference`
as well as `Expr::Column` and `Expr::Rowid` so that this kind of query
transformations can happen with less pain.
2025-05-25 20:26:17 +03:00
Jussi Saurio
40a4d162bc Introduce walker expressions for ast::Expr 2025-05-23 15:56:27 +03:00
Jussi Saurio
c4548b51f1 Merge 'Optimization: lift common subexpressions from OR terms' from Jussi Saurio
```sql
-- This PR does effectively this transformation:

select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#22'
		and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
		and l_quantity >= 8 and l_quantity <= 8 + 10
		and p_size between 1 and 5
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#23'
		and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
		and l_quantity >= 10 and l_quantity <= 10 + 10
		and p_size between 1 and 10
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#12'
		and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
		and l_quantity >= 24 and l_quantity <= 24 + 10
		and p_size between 1 and 15
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	);

-- Same query with common conjuncts (ANDs) extracted:
select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	p_partkey = l_partkey
	and l_shipmode in ('AIR', 'AIR REG')
	and l_shipinstruct = 'DELIVER IN PERSON'
	and (
		(
			p_brand = 'Brand#22'
			and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
			and l_quantity >= 8 and l_quantity <= 8 + 10
			and p_size between 1 and 5
		)
		or
		(
			p_brand = 'Brand#23'
			and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
			and l_quantity >= 10 and l_quantity <= 10 + 10
			and p_size between 1 and 10
		)
		or
		(
			p_brand = 'Brand#12'
			and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
			and l_quantity >= 24 and l_quantity <= 24 + 10
			and p_size between 1 and 15
		)
	);
```
This allows Limbo's optimizer to 1. recognize `p_partkey=l_partkey` as
an index constraint on `part`, and 2. filter out `lineitem` rows before
joining. With this optimization, Limbo completes TPC-H `19.sql` nearly
as fast as SQLite on my machine. Without it, Limbo takes forever.
This branch: `939ms`
Main: `uh, i started running it a few minutes ago and it hasnt finished,
and i dont feel like waiting i guess`

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1520
2025-05-20 14:33:49 +03:00
Jussi Saurio
6790b7479c Optimization: lift common subexpressions from OR terms
```sql
-- This PR does effectively this transformation:

select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#22'
		and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
		and l_quantity >= 8 and l_quantity <= 8 + 10
		and p_size between 1 and 5
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#23'
		and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
		and l_quantity >= 10 and l_quantity <= 10 + 10
		and p_size between 1 and 10
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#12'
		and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
		and l_quantity >= 24 and l_quantity <= 24 + 10
		and p_size between 1 and 15
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	);

-- Same query with common conjuncts (ANDs) extracted:
select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	p_partkey = l_partkey
	and l_shipmode in ('AIR', 'AIR REG')
	and l_shipinstruct = 'DELIVER IN PERSON'
	and (
		(
			p_brand = 'Brand#22'
			and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
			and l_quantity >= 8 and l_quantity <= 8 + 10
			and p_size between 1 and 5
		)
		or
		(
			p_brand = 'Brand#23'
			and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
			and l_quantity >= 10 and l_quantity <= 10 + 10
			and p_size between 1 and 10
		)
		or
		(
			p_brand = 'Brand#12'
			and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
			and l_quantity >= 24 and l_quantity <= 24 + 10
			and p_size between 1 and 15
		)
	);
```
2025-05-20 14:25:15 +03:00
Jussi Saurio
3121c6cdd3 Replace Operation::Subquery with Table::FromClauseSubquery
Previously the Operation enum consisted of:

- Operation::Scan
- Operation::Search
- Operation::Subquery

Which was always a dumb hack because what we really are doing is
an Operation::Scan on a "virtual"/"pseudo" table (overloaded names...)
derived from a subquery appearing in the FROM clause.

Hence, refactor the relevant data structures so that the Table enum
now contains a new variant:

Table::FromClauseSubquery

And the Operation enum only consists of Scan and Search.

No functional changes (intended, at least!)
2025-05-20 12:56:30 +03:00
pedrocarlo
5bd47d7462 post rebase adjustments to accomodate new instructions that were created before the merge conflicts 2025-05-19 15:22:15 -03:00
pedrocarlo
bba9689674 Fixed matching bug for defining collation context to use 2025-05-19 15:22:14 -03:00
pedrocarlo
a818b6924c Removed repeated binary expression translation. Adjusted the set_collation to capture additional context of whether it was set by a Collate expression or not. Added some tests to prove those modifications were necessary. 2025-05-19 15:22:14 -03:00
pedrocarlo
f8854f180a Added collation to create table columns 2025-05-19 15:22:14 -03:00
pedrocarlo
d0a63429a6 Naive implementation of collate for queries. Not implemented for column constraints 2025-05-19 15:22:14 -03:00
pedrocarlo
b5b1010e7c set binary collation as default 2025-05-19 15:22:14 -03:00
Pekka Enberg
e3f71259d8 Rename OwnedValue -> Value
We have not had enough merge conflicts for a while so let's do a
tree-wide rename.
2025-05-15 09:59:46 +03:00
Jussi Saurio
5386859b44 as_binary-components: simplify 2025-05-14 09:42:26 +03:00
Jussi Saurio
bd875e3876 optimizer module split 2025-05-14 09:42:26 +03:00
Jussi Saurio
501e95637a Merge 'Support isnull and notnull expr' from meteorgan
Limbo `isnull` output:
```
limbo> explain select 1 isnull;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     3     0                    0   Start at 3
1     ResultRow          1     1     0                    0   output=r[1]
2     Halt               0     0     0                    0
3     Integer            1     2     0                    0   r[2]=1
4     Integer            1     1     0                    0   r[1]=1
5     IsNull             2     7     0                    0   if (r[2]==NULL) goto 7
6     Integer            0     1     0                    0   r[1]=0
7     Goto               0     1     0                    0
```
Sqlite `isnull` output:
```
sqlite>  explain select 1 isnull;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     6     0                    0   Start at 6
1     Integer        1     1     0                    0   r[1]=1
2     IsNull         2     4     0                    0   if r[2]==NULL goto 4
3     Integer        0     1     0                    0   r[1]=0
4     ResultRow      1     1     0                    0   output=r[1]
5     Halt           0     0     0                    0
6     Integer        1     2     0                    0   r[2]=1
7     Goto           0     1     0                    0
```
------------------------------------------------------------------------
-------------------
Limbo `notnull` output:
```
limbo> explain select 1 notnull;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     3     0                    0   Start at 3
1     ResultRow          1     1     0                    0   output=r[1]
2     Halt               0     0     0                    0
3     Integer            1     2     0                    0   r[2]=1
4     Integer            1     1     0                    0   r[1]=1
5     NotNull            2     7     0                    0   r[2]!=NULL -> goto 7
6     Integer            0     1     0                    0   r[1]=0
7     Goto               0     1     0                    0
```
Sqlite `notnull` output:
```
sqlite> explain select 1 notnull;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     6     0                    0   Start at 6
1     Integer        1     1     0                    0   r[1]=1
2     NotNull        2     4     0                    0   if r[2]!=NULL goto 4
3     Integer        0     1     0                    0   r[1]=0
4     ResultRow      1     1     0                    0   output=r[1]
5     Halt           0     0     0                    0
6     Integer        1     2     0                    0   r[2]=1
7     Goto           0     1     0                    0
```

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1468
2025-05-12 10:06:35 +03:00
PThorpe92
ab23f2a24f Add comments and reorganize fix of ordering parameters for insert statements 2025-05-11 14:20:57 -04:00
meteorgan
5185f4bf9e Support isnull and notnull expr 2025-05-11 23:47:30 +08:00
PThorpe92
1c7a50de96 Update comments and correct vtab insert behavior 2025-05-10 10:03:00 -04:00
PThorpe92
0d73fe0fe7 Fix parameter position on insert by handling before vdbe layer 2025-05-10 07:46:29 -04:00
PThorpe92
50f2621c12 Add several more rust tests for parameter binding 2025-05-10 07:46:29 -04:00
PThorpe92
c4aee50b58 Fix unclear comments in translator 2025-05-10 07:46:29 -04:00
PThorpe92
d908e78729 Use positional offsets in translate::expr to remap parameters to their correct offsets 2025-05-10 07:46:27 -04:00
meteorgan
ef3f004e30 refactor numeric literal 2025-05-08 18:37:17 +08:00
meteorgan
51d43074f3 Support literal-value current_time, current_date and current_timestamp 2025-04-29 22:35:26 +08:00
Jussi Saurio
e557503091 expr.rs: use constant spans to optimize constant expressions 2025-04-24 11:05:21 +03:00
pedrocarlo
1928dcfa10 Correct docs regarding between 2025-04-21 23:05:01 -03:00
Jussi Saurio
6c73db6fd3 feat: use covering indexes whenever possible 2025-04-18 15:13:09 +03:00