Commit Graph

62 Commits

Author SHA1 Message Date
Jussi Saurio
5da76c9125 Allow index in UPDATE for point queries (i.e. max 1 row affected) 2025-08-14 15:58:01 +03:00
Jussi Saurio
cd3b4bccd3 Fix UPDATE: Do not use an index for iteration if that index is going to be updated
Closes #2598
2025-08-14 15:35:00 +03:00
Mikaël Francoeur
2cf4e4fe96 handle single, double and unquoted strings in values clause 2025-08-08 09:03:38 -04:00
Piotr Rzysko
59ec2d3949 Replace ConstraintInfo::plan_info with ConstraintInfo::index
The side of the binary expression no longer needs to be stored in
`ConstraintInfo`, since the optimizer now guarantees that it is always
on the right. As a result, only the index of the corresponding constraint
needs to be preserved.
2025-08-05 05:48:29 +02:00
Piotr Rzysko
8fb4fbf8af Make WhereTerm::consumed a plain bool
Now that virtual tables are integrated into the optimizer, this field no
longer needs to be wrapped in Cell<bool>.
2025-08-05 05:48:28 +02:00
Piotr Rzysko
82491ceb6a Integrate virtual tables with optimizer
This change connects virtual tables with the query optimizer.
The optimizer now considers virtual tables during join order search
and invokes their best_index callbacks to determine feasible access
paths.

Currently, this is not a visible change, since none of the existing
extensions return information indicating that a plan is invalid.
2025-08-05 05:48:28 +02:00
Piotr Rzysko
718598eab8 Introduce scan type
Different scan parameters are required for different table types.
Currently, index and iteration direction are only used by B-tree tables,
while the remaining table types don’t require any parameters. Planning
access to virtual tables, however, will require passing additional
information from the planner, such as the virtual table index (distinct
from a B-tree index) and the constraints that must be forwarded to the
`filter` method.
2025-08-04 20:27:22 +02:00
Piotr Rzysko
9167b30c7c Introduce AccessMethodParams
Previously, AccessMethod stored fields like `iter_dir`, `index`, and
`constraint_refs` directly, but these only applied to BTree tables.
Other table types (virtual tables, subqueries) either ignored these
fields or required different parameters entirely.

This change prepares the planner to handle virtual table access methods
with their own specialized parameters.
2025-08-04 20:23:44 +02:00
bit-aloo
9a54ef214e parser: Distinguish quoted identifiers and unify Id into Name enum
This commit replaces the `Name(pub String)` struct with a `Name` enum that
explicitly models how the name appeared in the source either as an
unquoted identifier (`Ident`) or a quoted string (`Quoted`).

In the process, the separate `Id` wrapper type has been coalesced into the
`Name` enum, simplifying the AST and reducing duplication in identifier
handling logic.

While this increases the size of some AST nodes (notably `yyStackEntry`),
it improves correctness and makes source structure more explicit for
later phases.
2025-07-24 14:40:19 +05:30
Glauber Costa
cbdd5c5fc7 improve handling of double quotes
I ended up hitting #1974 today and wanted to fix it. I worked with
Claude to generate a more comprehensive set of queries that could fail
aside from just the insert query described in the issue. He got most of
them right - lots of cases were indeed failing. The ones that were
gibberish, he told me I was absolutely right for pointing out they were
bad.

But alas. With the test cases generated, we can work on fixing it. The
place where the assertion was hit, all we need to do there is return
true (but we assert that this is indeed a string literal, it shouldn't
be anything else at this point).

There are then just a couple of places where we need to make sure we
handle double quotes correctly. We already tested for single quotes in a
couple of places, but never for double quotes.

There is one funny corner case where you can just select "col" from tbl,
and if there is no column "col" on the table, that is treated as a
string literal. We handle that too.

Fixes #1974
2025-07-18 10:39:02 -05:00
Levy A.
6fe2505425 add more ToTokens impls 2025-07-16 12:16:31 -03:00
Piotr Rzysko
319cdbe3af Don't use search for virtual tables
Previously, the test queries added in this commit would fail with:
thread 'main' panicked at core/schema.rs:129:34:
not implemented
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:665:5
   1: core::panicking::panic_fmt
             at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/panicking.rs:74:14
   2: core::panicking::panic
             at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/panicking.rs:148:5
   3: limbo_core::schema::Table::get_root_page
             at ./core/schema.rs:129:34
   4: limbo_core::translate::main_loop::init_loop
             at ./core/translate/main_loop.rs:260:44
   5: limbo_core::translate::emitter::emit_query
             at ./core/translate/emitter.rs:568:5
   6: limbo_core::translate::emitter::emit_program_for_select
             at ./core/translate/emitter.rs:496:5
   7: limbo_core::translate::emitter::emit_program
             at ./core/translate/emitter.rs:187:31
   8: limbo_core::translate::select::translate_select
             at ./core/translate/select.rs:82:5
   9: limbo_core::translate::translate_inner
             at ./core/translate/mod.rs:241:13
  10: limbo_core::translate::translate
             at ./core/translate/mod.rs:95:17
  11: limbo_core::Connection::run_cmd
             at ./core/lib.rs:416:31
  12: <limbo_core::QueryRunner as core::iter::traits::iterator::Iterator>::next
             at ./core/lib.rs:916:22
  13: limbo::app::Limbo::run_query
             at ./cli/app.rs:442:27
  14: limbo::app::Limbo::handle_input_line
             at ./cli/app.rs:544:13
  15: limbo::main
             at ./cli/main.rs:51:31
  16: core::ops::function::FnOnce::call_once
2025-07-14 07:16:53 +02:00
Nils Koch
828d4f5016 fix clippy errors for rust 1.88.0 (auto fix) 2025-07-12 18:58:41 +03:00
Pekka Enberg
b87ce6d178 Merge 'Fix deleting previous rowid when rowid is in the Set Clause' from Pedro Muniz
Closes #1888 . This PR fixes UPDATE translation by not emitting an
ephemeral plan when we are doing a `RowIdEq` search. Also, we should
delete the previous rowid when the rowid is in the set clause.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1891
2025-06-30 11:58:05 +03:00
Pekka Enberg
9c1b7897ac Fix URLs to point to github.com/tursodatabase/turso 2025-06-30 11:23:53 +03:00
pedrocarlo
738e2cc06c do not emit ephemeral plan when doing a SeekRowId + emit Delete instruction when rowid in set clause 2025-06-29 17:12:24 -03:00
Pekka Enberg
725c3e4ddc Rename limbo_sqlite3_parser crate to turso_sqlite3_parser 2025-06-29 12:34:46 +03:00
Pekka Enberg
2fc5c0ce5c Switch to runtime flag for enabling indexes
Makes it easier to test the feature:

```
$ cargo run --  --experimental-indexes
Limbo v0.0.22
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database
limbo> CREATE TABLE t(x);
limbo> CREATE INDEX t_idx ON t(x);
limbo> DROP INDEX t_idx;
```
2025-06-26 10:07:28 +03:00
Nils Koch
2827b86917 chore: fix clippy warnings 2025-06-23 19:52:13 +01:00
pedrocarlo
e53a290a48 move ephemeral table logic to update plan and reuse select logic for ephemeral index 2025-06-20 16:30:21 -03:00
pedrocarlo
9048ad398b modify loop functions to accomodate for ephemeral tables 2025-06-20 16:29:10 -03:00
Pere Diaz Bou
9ae4563bcd index_experimental flag to enable index usages
Currently indexes are the bulk of the problem with `UPDATE` and
`DELETE`, while we work on fixing those it makes sense to disable
indexing since they are not stable. We want to try to make everything
else stable before we continue with indexing.
2025-06-17 19:33:23 +02:00
meteorgan
6179d8de23 refactor compound select 2025-06-13 10:39:32 +03:00
Levy A.
7638b0dab7 fix: use default value on empty columns added via ALTER TABLE 2025-06-11 14:18:19 -03:00
Jussi Saurio
e9d1f0823b Disable index usage in DELETE because it does not work safely 2025-06-11 12:15:20 +03:00
Jussi Saurio
85972fd744 Merge 'Fix rowid to_sql_string' from Pedro Muniz
Addresses the panic encountered here:
https://github.com/tursodatabase/limbo/pull/1690 . Sorry about that.

Closes #1693
2025-06-10 18:11:51 +03:00
pedrocarlo
36f60e4dd1 Fix rowid to_sql_string printing 2025-06-10 10:48:05 -03:00
Jussi Saurio
844461d20b update and delete fixes 2025-06-10 14:16:26 +03:00
Jussi Saurio
2bac140d73 Remove SeekOp::EQ and encode eq_only in LE&GE - needed for iteration direction aware equality seeks 2025-06-10 14:16:26 +03:00
Jussi Saurio
18e6987904 Remove plan.to_sql_string() from optimize_plan() as it panics on TODOs 2025-06-09 09:45:06 +03:00
pedrocarlo
3c1b984b78 use table_references for PlanContext 2025-06-04 12:06:43 -03:00
pedrocarlo
bfc8cb6d4c move display and to_sql_string impls to separate modules for plan 2025-06-04 12:06:43 -03:00
pedrocarlo
f90bebbfbc small fix and remove dbg 2025-06-04 12:06:43 -03:00
pedrocarlo
fa0dff9843 Fix rebase changes 2025-06-04 12:06:43 -03:00
pedrocarlo
a96577529e impl ToSqlString for Update Plan 2025-06-04 12:06:43 -03:00
pedrocarlo
d243d1015c impl ToSqlString for Delete Plan 2025-06-04 12:06:43 -03:00
pedrocarlo
ff5aa17769 impl ToSqlString for CompoundSelect Plan 2025-06-04 12:06:43 -03:00
pedrocarlo
51014d01c3 impl ToSqlString for SelectPlan 2025-06-04 12:06:43 -03:00
Jussi Saurio
cc405dea7e Use new TableReferences struct everywhere 2025-05-29 11:44:56 +03:00
Jussi Saurio
73e806ad84 Make WhereTerm::consumed a Cell<bool>
Currently in the main translation logic after planning and optimization,
we don't _really_ need to pass a &mut Vec<WhereTerm> around anymore, except
for the fact that virtual table constraint resolution is done ad-hoc in
`init_loop()`. Even there, the only thing we mutate is `WhereTerm::consumed`
which is a boolean indicating that the term has been "used up" by the optimizer
and shouldn't be evaluated as a normal where clause condition anymore.

In the upcoming branch for WHERE clause subqueries, I want to store immutable
references to WHERE clause expressions in `Resolver`, but this is unfortunately
not possible if we still use the aforementioned mutable references.

Hence, we can temporarily make `WhereTerm::consumed` a `Cell<bool>` which allows
us to pass an immutable reference to `init_loop()`, and the `Cell` can be removed
once the virtual table constraint resolution is moved to an earlier part of the
query processing pipeline.
2025-05-28 11:02:39 +03:00
Jussi Saurio
7c07c09300 Add stable internal_id property to TableReference
Currently our "table id"/"table no"/"table idx" references always
use the direct index of the `TableReference` in the plan, e.g. in
`SelectPlan::table_references`. For example:

```rust
Expr::Column { table: 0, column: 3, .. }
```

refers to the 0'th table in the `table_references` list.

This is a fragile approach because it assumes the table_references
list is stable for the lifetime of the query processing. This has so
far been the case, but there exist certain query transformations,
e.g. subquery unnesting, that may fold new table references from
a subquery (which has its own table ref list) into the table reference
list of the parent.

If such a transformation is made, then potentially all of the Expr::Column
references to tables will become invalid. Consider this example:

```sql
-- Assume tables: users(id, age), orders(user_id, amount)

-- Get total amount spent per user on orders over $100
SELECT u.id, sub.total
FROM users u JOIN
     (SELECT user_id, SUM(amount) as total
      FROM orders o
      WHERE o.amount > 100
      GROUP BY o.user_id) sub
WHERE u.id = sub.user_id

-- Before subquery unnesting:
-- Main query table_references: [users, sub]
-- u.id refers to table 0, column 0
-- sub.total refers to table 1, column 1
--
-- Subquery table_references: [orders]
-- o.user_id refers to table 0, column 0
-- o.amount refers to table 0, column 1
--
-- After unnesting and folding subquery tables into main query,
-- the query might look like this:

SELECT u.id, SUM(o.amount) as total
FROM users u JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100
GROUP BY u.id;

-- Main query table_references: [users, orders]
-- u.id refers to table index 0 (correct)
-- o.amount refers to table index 0 (incorrect, should be 1)
-- o.user_id refers to table index 0 (incorrect, should be 1)
```

We could ofc traverse every expression in the subquery and rewrite
the table indexes to be correct, but if we instead use stable identifiers
for each table reference, then all the column references will continue
to be correct.

Hence, this PR introduces a `TableInternalId` used in `TableReference`
as well as `Expr::Column` and `Expr::Rowid` so that this kind of query
transformations can happen with less pain.
2025-05-25 20:26:17 +03:00
Jussi Saurio
08bda9cc58 UNION ALL 2025-05-24 13:12:41 +03:00
Jussi Saurio
fbfd2b2c38 refactor: use walk_expr_mut() in rewrite_expr() 2025-05-23 16:27:28 +03:00
Jussi Saurio
696c98877c Merge 'btree: Remove assumption that all btrees have a rowid' from Jussi Saurio
For example, implementing `SELECT DISTINCT` (#1517) and `UNION` (#1545)
require that we are able to create indexes without a rowid column
present. Similarly, `WITHOUT ROWID` tables require this.
I implemented this by replacing the `rowid` and `empty_record`
properties in `BtreeCursor` with
```rust
/// Whether the cursor is currently pointing to a record.
#[derive(Debug, Clone, Copy, PartialEq)]
enum CursorHasRecord {
    Yes {
        rowid: Option<u64>, // not all indexes and btrees have rowids, so this is optional.
    },
    No,
}
```

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1518
2025-05-21 14:53:00 +03:00
Jussi Saurio
c4548b51f1 Merge 'Optimization: lift common subexpressions from OR terms' from Jussi Saurio
```sql
-- This PR does effectively this transformation:

select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#22'
		and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
		and l_quantity >= 8 and l_quantity <= 8 + 10
		and p_size between 1 and 5
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#23'
		and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
		and l_quantity >= 10 and l_quantity <= 10 + 10
		and p_size between 1 and 10
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#12'
		and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
		and l_quantity >= 24 and l_quantity <= 24 + 10
		and p_size between 1 and 15
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	);

-- Same query with common conjuncts (ANDs) extracted:
select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	p_partkey = l_partkey
	and l_shipmode in ('AIR', 'AIR REG')
	and l_shipinstruct = 'DELIVER IN PERSON'
	and (
		(
			p_brand = 'Brand#22'
			and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
			and l_quantity >= 8 and l_quantity <= 8 + 10
			and p_size between 1 and 5
		)
		or
		(
			p_brand = 'Brand#23'
			and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
			and l_quantity >= 10 and l_quantity <= 10 + 10
			and p_size between 1 and 10
		)
		or
		(
			p_brand = 'Brand#12'
			and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
			and l_quantity >= 24 and l_quantity <= 24 + 10
			and p_size between 1 and 15
		)
	);
```
This allows Limbo's optimizer to 1. recognize `p_partkey=l_partkey` as
an index constraint on `part`, and 2. filter out `lineitem` rows before
joining. With this optimization, Limbo completes TPC-H `19.sql` nearly
as fast as SQLite on my machine. Without it, Limbo takes forever.
This branch: `939ms`
Main: `uh, i started running it a few minutes ago and it hasnt finished,
and i dont feel like waiting i guess`

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1520
2025-05-20 14:33:49 +03:00
Jussi Saurio
14058357ad Merge 'refactor: replace Operation::Subquery with Table::FromClauseSubquery' from Jussi Saurio
Previously the Operation enum consisted of:
- Operation::Scan
- Operation::Search
- Operation::Subquery
Which was always a dumb hack because what we really are doing is an
Operation::Scan on a "virtual"/"pseudo" table (overloaded names...)
derived from a subquery appearing in the FROM clause.
Hence, refactor the relevant data structures so that the Table enum now
contains a new variant:
Table::FromClauseSubquery
And the Operation enum only consists of Scan and Search.
```
SELECT * FROM (SELECT ...) sub;

-- the subquery here was previously interpreted as Operation::Subquery on a Table::Pseudo,
-- with a lot of special handling for Operation::Subquery in different code paths
-- now it's an Operation::Scan on a Table::FromClauseSubquery
```
No functional changes (intended, at least!)

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1529
2025-05-20 14:31:42 +03:00
Jussi Saurio
6790b7479c Optimization: lift common subexpressions from OR terms
```sql
-- This PR does effectively this transformation:

select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#22'
		and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
		and l_quantity >= 8 and l_quantity <= 8 + 10
		and p_size between 1 and 5
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#23'
		and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
		and l_quantity >= 10 and l_quantity <= 10 + 10
		and p_size between 1 and 10
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#12'
		and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
		and l_quantity >= 24 and l_quantity <= 24 + 10
		and p_size between 1 and 15
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	);

-- Same query with common conjuncts (ANDs) extracted:
select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	p_partkey = l_partkey
	and l_shipmode in ('AIR', 'AIR REG')
	and l_shipinstruct = 'DELIVER IN PERSON'
	and (
		(
			p_brand = 'Brand#22'
			and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
			and l_quantity >= 8 and l_quantity <= 8 + 10
			and p_size between 1 and 5
		)
		or
		(
			p_brand = 'Brand#23'
			and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
			and l_quantity >= 10 and l_quantity <= 10 + 10
			and p_size between 1 and 10
		)
		or
		(
			p_brand = 'Brand#12'
			and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
			and l_quantity >= 24 and l_quantity <= 24 + 10
			and p_size between 1 and 15
		)
	);
```
2025-05-20 14:25:15 +03:00
Jussi Saurio
a7b33b1509 schema: add Index::has_rowid 2025-05-20 14:22:17 +03:00
Jussi Saurio
3121c6cdd3 Replace Operation::Subquery with Table::FromClauseSubquery
Previously the Operation enum consisted of:

- Operation::Scan
- Operation::Search
- Operation::Subquery

Which was always a dumb hack because what we really are doing is
an Operation::Scan on a "virtual"/"pseudo" table (overloaded names...)
derived from a subquery appearing in the FROM clause.

Hence, refactor the relevant data structures so that the Table enum
now contains a new variant:

Table::FromClauseSubquery

And the Operation enum only consists of Scan and Search.

No functional changes (intended, at least!)
2025-05-20 12:56:30 +03:00
Jussi Saurio
9c710b5292 Add collation column to Index struct 2025-05-20 12:52:54 +03:00