Commit Graph

955 Commits

Author SHA1 Message Date
PThorpe92
a2f8b2dfea Fix add check for invalid argv index for vtab constraints in main loop 2025-05-24 14:49:58 -04:00
Jussi Saurio
8ed5334ca7 tests/fuzz: add compound_select_fuzz() 2025-05-24 13:12:41 +03:00
Jussi Saurio
f6443ae742 Support LIMIT with UNION ALL 2025-05-24 13:12:41 +03:00
Jussi Saurio
08bda9cc58 UNION ALL 2025-05-24 13:12:41 +03:00
Jussi Saurio
0b2c3298aa Merge 'refactor: introduce walk_expr() and walk_expr_mut() to reduce repetitive pattern matching' from Jussi Saurio
We do a lot of
```rust
match expr {
  ...
}
```
just to find some specific case like `Expr::Column` deep inside an
expression tree. This PR introduces new helpers `walk_expr()` and
`walk_expr_mut()` that handle the tree-walking part, and the business
logic functions where this tree traversal is used can focus on the exact
enum variants of `Expr` that they are interested in.

Closes #1564
2025-05-23 22:00:20 +03:00
Jussi Saurio
2e095e6d03 Merge 'Add some comments for values statement' from meteorgan
follow up: #1549
simultaneously, address a warning in CI as the package `sqlite3-parser`
has been renamed to `limbo_sqlite3_parser`.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1565
2025-05-23 17:30:30 +03:00
meteorgan
3bf0ce7fb3 Add some comments for values statement 2025-05-23 22:11:34 +08:00
Jussi Saurio
1a937462b3 Merge 'core/pragma: Add support for update user_version' from Diego Reis
It also changes the type from u32 to i32 since
sqlite supports negative values

Closes #1559
2025-05-23 17:00:55 +03:00
Jussi Saurio
c18c6a00fa refactor: use walk_expr() in resolving vtab constraints 2025-05-23 16:28:56 +03:00
Jussi Saurio
fbfd2b2c38 refactor: use walk_expr_mut() in rewrite_expr() 2025-05-23 16:27:28 +03:00
Jussi Saurio
362347c474 refactor: use walk_expr() in determine_where_to_eval_expr() 2025-05-23 16:27:28 +03:00
Jussi Saurio
9ec84e3905 refactor: use walk_expr() in table_mask_from_expr() 2025-05-23 16:27:28 +03:00
Jussi Saurio
3835a29f47 refactor: use walk_expr() in resolve_aggregates() 2025-05-23 16:27:23 +03:00
Jussi Saurio
2ab5c5f6a9 refactor: use walk_expr_mut() in bind_column_references() 2025-05-23 15:56:49 +03:00
Jussi Saurio
40a4d162bc Introduce walker expressions for ast::Expr 2025-05-23 15:56:27 +03:00
Diego Reis
2d6405b3e9 core/pragma: Remove unnecessary clone in user_version and cache_size 2025-05-23 08:43:07 -03:00
Jussi Saurio
597020bc0c Merge 'Support values statement and values in select' from meteorgan
Close: #866
**limbo output**:
```
limbo> explain values(1, 2);
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     5     0                    0   Start at 5
1     Integer            1     1     0                    0   r[1]=1
2     Integer            2     2     0                    0   r[2]=2
3     ResultRow          1     2     0                    0   output=r[1..2]
4     Halt               0     0     0                    0
5     Goto               0     1     0                    0

limbo> explain values(1, 2), (3, 4);
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     16    0                    0   Start at 16
1     InitCoroutine      1     9     2                    0
2     Integer            1     2     0                    0   r[2]=1
3     Integer            2     3     0                    0   r[3]=2
4     Yield              1     0     0                    0
5     Integer            3     2     0                    0   r[2]=3
6     Integer            4     3     0                    0   r[3]=4
7     Yield              1     0     0                    0
8     EndCoroutine       1     0     0                    0
9     InitCoroutine      1     0     2                    0
10    Yield              1     15    0                    0
11    Copy               2     4     0                    0   r[4]=r[2]
12    Copy               3     5     0                    0   r[5]=r[3]
13    ResultRow          4     2     0                    0   output=r[4..5]
14    Goto               0     10    0                    0
15    Halt               0     0     0                    0
16    Goto               0     1     0                    0

limbo> explain select * from (values(1, 2), (3, 4));
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     16    0                    0   Start at 16
1     InitCoroutine      1     9     2                    0
2     Integer            1     2     0                    0   r[2]=1
3     Integer            2     3     0                    0   r[3]=2
4     Yield              1     0     0                    0
5     Integer            3     2     0                    0   r[2]=3
6     Integer            4     3     0                    0   r[3]=4
7     Yield              1     0     0                    0
8     EndCoroutine       1     0     0                    0
9     InitCoroutine      1     0     2                    0
10    Yield              1     15    0                    0
11    Copy               2     4     0                    0   r[4]=r[2]
12    Copy               3     5     0                    0   r[5]=r[3]
13    ResultRow          4     2     0                    0   output=r[4..5]
14    Goto               0     10    0                    0
15    Halt               0     0     0                    0
16    Transaction        0     0     0                    0   write=false
17    Goto               0     1     0                    0
```
**sqlite output**:
```
sqlite> explain values(1, 2);
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     5     0                    0   Start at 5
1     Integer        1     1     0                    0   r[1]=1
2     Integer        2     2     0                    0   r[2]=2
3     ResultRow      1     2     0                    0   output=r[1..2]
4     Halt           0     0     0                    0
5     Goto           0     1     0                    0
sqlite> explain values(1, 2), (3, 4);
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     16    0                    0   Start at 16
1     InitCoroutine  1     9     2                    0
2     Integer        1     4     0                    0   r[4]=1
3     Integer        2     5     0                    0   r[5]=2
4     Yield          1     0     0                    0
5     Integer        3     4     0                    0   r[4]=3
6     Integer        4     5     0                    0   r[5]=4
7     Yield          1     0     0                    0
8     EndCoroutine   1     0     0                    0
9     InitCoroutine  1     0     2                    0
10      Yield          1     15    0                    0   next row of 2-ROW VALUES CLAUSE
11      Copy           4     8     0                    2   r[8]=r[4]
12      Copy           5     9     0                    2   r[9]=r[5]
13      ResultRow      8     2     0                    0   output=r[8..9]
14    Goto           0     10    0                    0
15    Halt           0     0     0                    0
16    Goto           0     1     0                    0
sqlite>  explain select * from (values(1, 2), (3, 4));
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     16    0                    0   Start at 16
1     InitCoroutine  1     9     2                    0
2     Integer        1     4     0                    0   r[4]=1
3     Integer        2     5     0                    0   r[5]=2
4     Yield          1     0     0                    0
5     Integer        3     4     0                    0   r[4]=3
6     Integer        4     5     0                    0   r[5]=4
7     Yield          1     0     0                    0
8     EndCoroutine   1     0     0                    0
9     InitCoroutine  1     0     2                    0
10      Yield          1     15    0                    0   next row of 2-ROW VALUES CLAUSE
11      Copy           4     8     0                    2   r[8]=r[4]
12      Copy           5     9     0                    2   r[9]=r[5]
13      ResultRow      8     2     0                    0   output=r[8..9]
14    Goto           0     10    0                    0
15    Halt           0     0     0                    0
16    Goto           0     1     0                    0
```

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1549
2025-05-23 13:56:31 +03:00
meteorgan
01c8a4ca63 simpify values when it's subquery 2025-05-23 17:45:56 +08:00
Jussi Saurio
128a406f8c TableReference: fix stale comment 2025-05-23 10:06:24 +03:00
Zaid Humayun
bc5d93c18a Addresses comment https://github.com/tursodatabase/limbo/pull/1548#discussion_r2103333264 by @jussisaurio
this commit uses more descriptive names for registers
2025-05-23 10:19:08 +05:30
Diego Reis
2f8042da22 core/pragma: Add support for update user_version
It also changes the type from u32 to i32 since
sqlite supports negative values
2025-05-22 20:38:27 -03:00
Zaid Humayun
780d0cb5a8 addresses https://github.com/tursodatabase/limbo/pull/1548#discussion_r2102599487, https://github.com/tursodatabase/limbo/pull/1548#discussion_r2102601189 & https://github.com/tursodatabase/limbo/pull/1548#discussion_r2102602774 by @jussisaurio
this commit addresses comments regarding using decsriptive variable names for the loops and loop labels. Also, adds documentation for instructions that cause jumps in both loops
2025-05-23 00:31:19 +05:30
Zaid Humayun
3a8b18c481 addressed comment https://github.com/tursodatabase/limbo/pull/1548#discussion_r2102595246 by @jussisaurio
this commit addresses the comment regarding inline code documentation
2025-05-23 00:12:27 +05:30
Zaid Humayun
1999eac891 Fixes test Testing: can drop kv_store vtable
Earlier this test broke because the code to translate the drop table was not checking to see if a table was not a virtual table.
SQLite does this with a macro called IsVirtualTable. Here, I check with the existing methods on the BTree struct
2025-05-22 23:53:48 +05:30
meteorgan
34e05ef974 make values work in subquery 2025-05-23 00:30:04 +08:00
meteorgan
0467d7e11b Support values statement and values in select 2025-05-23 00:29:54 +08:00
Zaid Humayun
4072a41c9c Drop Table now uses an ephemeral table as a scratch table
Now when dropping a table, an ephemeral table is created as a scratch table. If a root page of some other table is moved into the page occupied by the root page of the table being dropped, that row is first written into an ephemeral table. Then on a next pass, it is deleted from the schema table and then re-inserted with the new root page.

This happens during AUTOVACUUM when deleting a root page will force the last root page to move into the slot being vacated by the root page of the table being deleted
2025-05-22 19:39:46 +05:30
Jussi Saurio
0c4c451d2a rename 2025-05-22 16:51:03 +03:00
Jussi Saurio
6ed5412bde extract method 2025-05-22 16:51:03 +03:00
Jussi Saurio
df8a19767f Fixes to account for collation 2025-05-22 16:51:03 +03:00
Jussi Saurio
f3ea9a603a add support for SELECT DISTINCT 2025-05-22 16:51:03 +03:00
Jussi Saurio
b0c3483e94 Allocate ephemeral index for SELECT DISTINCT 2025-05-22 16:51:03 +03:00
Jussi Saurio
76227ec274 Rename to Distinctness + add distinctness information to SelectPlan 2025-05-22 16:51:03 +03:00
Jussi Saurio
8bec75d804 Merge 'Initial Support for Nested Translation' from Pedro Muniz
This PR introduces some modifications to the Program Builder to allow us
to use nested parsing. By focusing the emission of Init and the last
Goto (prologue and epilogue), inside the ProgramBuilder, we can just not
emit them if we are parsing/translating in a nested context. For this
PR, I only migrated insert to use these functions as I need them to
support Insert statements that use `SELECT FROM` syntax. Nested parsing
overall enables code reuse for us and arguably is one of the only ways
to parse deeply nested queries without a lot of code duplication.
#1528

Closes #1543
2025-05-22 10:52:00 +03:00
Jussi Saurio
c7f984c5c8 Merge 'Page cache fixes' from Pere Diaz Bou
This PR builds on top of
https://github.com/tursodatabase/limbo/pull/1368 and adds few things
like allowing inserting pages with the same page key, fix fuzz tests by
adding transactions and some minor improvements to cacheflush.

Closes #1523
2025-05-22 10:12:56 +03:00
Jussi Saurio
fc150b12c9 Merge 'CSV virtual table extension' from Piotr Rżysko
This PR adds a port of [SQLite's CSV virtual table
extension](https://www.sqlite.org/csv.html).
Planned follow-ups:
* Pass detailed error messages from `VTabModule::create`, not just
`ResultCode`s.
* Address the TODO in `VTabModuleImpl::create_schema`.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1544
2025-05-22 09:48:53 +03:00
pedrocarlo
53bf5d5ef5 adjust translate functions to take a program instead of Option<ProgramBuilder> + remove any Init emission in traslate functions + use epilogue in all places necessary 2025-05-21 16:41:10 -03:00
pedrocarlo
1c12535d9f push prologue to top-level translate function 2025-05-21 15:50:43 -03:00
pedrocarlo
8084d54c26 lift pragma statement handling as it cannot be created in a nested context 2025-05-21 14:13:28 -03:00
pedrocarlo
d21229d4a3 create inner translate function to enable calling it from a nested context 2025-05-21 14:08:02 -03:00
pedrocarlo
3090dd91fa push translate_ctx creation outside of prologue 2025-05-21 13:06:25 -03:00
pedrocarlo
fc08f786fc use prologue and epilogue in insert 2025-05-21 12:47:51 -03:00
pedrocarlo
f5d6d11d16 extract prologue and epilogue to program builder 2025-05-21 12:47:51 -03:00
pedrocarlo
517c7c81cd refactor to include optional program builder argument 2025-05-21 12:47:51 -03:00
Pere Diaz Bou
67e260ff71 allow delete of dirty page in cacheflush
Dirty pages can be deleted in `cacheflush`. Furthermore, there could be
multiple live references in the stack of a cursor so let's allow them to
exist while deleting.
2025-05-21 14:09:39 +02:00
Jussi Saurio
696c98877c Merge 'btree: Remove assumption that all btrees have a rowid' from Jussi Saurio
For example, implementing `SELECT DISTINCT` (#1517) and `UNION` (#1545)
require that we are able to create indexes without a rowid column
present. Similarly, `WITHOUT ROWID` tables require this.
I implemented this by replacing the `rowid` and `empty_record`
properties in `BtreeCursor` with
```rust
/// Whether the cursor is currently pointing to a record.
#[derive(Debug, Clone, Copy, PartialEq)]
enum CursorHasRecord {
    Yes {
        rowid: Option<u64>, // not all indexes and btrees have rowids, so this is optional.
    },
    No,
}
```

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1518
2025-05-21 14:53:00 +03:00
Piotr Rzysko
9c1dca72db Introduce VTable
This allows storing table arguments parsed in the VTabModule::create
method.
2025-05-21 08:33:17 +02:00
Jussi Saurio
c4548b51f1 Merge 'Optimization: lift common subexpressions from OR terms' from Jussi Saurio
```sql
-- This PR does effectively this transformation:

select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#22'
		and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
		and l_quantity >= 8 and l_quantity <= 8 + 10
		and p_size between 1 and 5
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#23'
		and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
		and l_quantity >= 10 and l_quantity <= 10 + 10
		and p_size between 1 and 10
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#12'
		and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
		and l_quantity >= 24 and l_quantity <= 24 + 10
		and p_size between 1 and 15
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	);

-- Same query with common conjuncts (ANDs) extracted:
select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	p_partkey = l_partkey
	and l_shipmode in ('AIR', 'AIR REG')
	and l_shipinstruct = 'DELIVER IN PERSON'
	and (
		(
			p_brand = 'Brand#22'
			and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
			and l_quantity >= 8 and l_quantity <= 8 + 10
			and p_size between 1 and 5
		)
		or
		(
			p_brand = 'Brand#23'
			and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
			and l_quantity >= 10 and l_quantity <= 10 + 10
			and p_size between 1 and 10
		)
		or
		(
			p_brand = 'Brand#12'
			and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
			and l_quantity >= 24 and l_quantity <= 24 + 10
			and p_size between 1 and 15
		)
	);
```
This allows Limbo's optimizer to 1. recognize `p_partkey=l_partkey` as
an index constraint on `part`, and 2. filter out `lineitem` rows before
joining. With this optimization, Limbo completes TPC-H `19.sql` nearly
as fast as SQLite on my machine. Without it, Limbo takes forever.
This branch: `939ms`
Main: `uh, i started running it a few minutes ago and it hasnt finished,
and i dont feel like waiting i guess`

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1520
2025-05-20 14:33:49 +03:00
Jussi Saurio
14058357ad Merge 'refactor: replace Operation::Subquery with Table::FromClauseSubquery' from Jussi Saurio
Previously the Operation enum consisted of:
- Operation::Scan
- Operation::Search
- Operation::Subquery
Which was always a dumb hack because what we really are doing is an
Operation::Scan on a "virtual"/"pseudo" table (overloaded names...)
derived from a subquery appearing in the FROM clause.
Hence, refactor the relevant data structures so that the Table enum now
contains a new variant:
Table::FromClauseSubquery
And the Operation enum only consists of Scan and Search.
```
SELECT * FROM (SELECT ...) sub;

-- the subquery here was previously interpreted as Operation::Subquery on a Table::Pseudo,
-- with a lot of special handling for Operation::Subquery in different code paths
-- now it's an Operation::Scan on a Table::FromClauseSubquery
```
No functional changes (intended, at least!)

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1529
2025-05-20 14:31:42 +03:00
Jussi Saurio
63457bda14 Adjust logic not to delete WhereTerms, since 'consumed' property was introduced 2025-05-20 14:28:05 +03:00