Commit Graph

997 Commits

Author SHA1 Message Date
Jussi Saurio
2087393d22 Merge 'Write database header via normal pager route' from meteorgan
Closes: #1613

Closes #1634
2025-06-04 09:39:14 +03:00
Pekka Enberg
c6ef19396d Merge 'Add support for pragma table-valued functions' from Piotr Rżysko
This PR adds support for table-valued functions for PRAGMAs (see the
[PRAGMA functions section](https://www.sqlite.org/pragma.html)).
Additionally, it introduces built-in table-valued functions. I
considered using extensions for this, but there are several reasons in
favor of a dedicated mechanism:
* It simplifies the use of internal functions, structs, etc. For
example, when implementing `json_each` and `json_tree`, direct access to
internals was necessary:
https://github.com/tursodatabase/limbo/pull/1088
* It avoids FFI overhead. [Benchmarks](https://github.com/piotrrzysko/li
mbo/blob/pragma_vtabs_bench/core/benches/pragma_benchmarks.rs) on my
hardware show that `pragma_table_info()` implemented as an extension is
2.5× slower than the built-in version.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1642
2025-06-04 09:08:10 +03:00
meteorgan
f2bf6251cd write database header via normal pager route 2025-06-03 22:06:08 +08:00
Jussi Saurio
31b37332d5 all index cursors must be opened when DELETE does an index seek too 2025-06-03 15:18:45 +03:00
Jussi Saurio
06626f72eb Fix cursors not being opened for indexes in DELETE 2025-06-03 14:45:01 +03:00
Jussi Saurio
ea301de726 Merge 'Pass input string to translate function' from Pedro Muniz
In preparation for `CREATE VIEW`, we need to have the original sql query
that was used to create the view. I'm using the scanner's offset to
slice into the original input, trimming the newlines, and passing it to
the translate function.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1621
2025-06-02 17:43:11 +03:00
pedrocarlo
9dc6638313 cleaner approach for opening indexes 2025-06-02 01:13:14 -03:00
Piotr Rzysko
4d35e36b77 Introduce virtual table types 2025-06-01 07:45:57 +02:00
pedrocarlo
bc563266b3 add instrumentation to more functions for debugging + adjust how cursors are opened 2025-05-30 20:35:50 -03:00
pedrocarlo
b73200de86 pass input string to translate function 2025-05-30 11:20:36 -03:00
Jussi Saurio
819a6138d0 Merge 'Fix: aggregate regs must be initialized as NULL at the start' from Jussi Saurio
Again found when fuzzing nested where clause subqueries:
Aggregate registers need to be NULLed at the start because the same
registers might be reused on another invocation of a subquery, and if
they are not NULLed, the 2nd invocation of the same subquery will have
values left over from the first invocation.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1614
2025-05-30 09:39:37 +03:00
Jussi Saurio
f8257df77b Fix: aggregate regs must be initialized as NULL at the start 2025-05-29 18:44:53 +03:00
Jussi Saurio
211b511189 Fix join optimizer tests 2025-05-29 11:44:56 +03:00
Jussi Saurio
cc405dea7e Use new TableReferences struct everywhere 2025-05-29 11:44:56 +03:00
Jussi Saurio
124b38a262 plan.rs: add new datastructures
- TableReferences struct, which holds both:
     - joined_tables, and
     - outer_query_refs

- JoinedTable:
     - this is just a rename of the previous TableReference struct

- OuterQueryReference
     - this is to distinguish from JoinedTable those cases where
       e.g. a subquery refers to an outer query's table, or a CTE
       refers to a previous CTE.

Both JoinedTable and OuterQueryReference can be referred to by expressions,
but only JoinedTables are considered for join ordering optimization and so
forth.

This commit does not compile.
2025-05-29 11:03:09 +03:00
Jussi Saurio
77ce4780d9 Fix ProgramBuilder::cursor_ref not having unique keys
Currently we have this:

program.alloc_cursor_id(Option<String>, CursorType)`

where the String is the table's name or alias ('users' or 'u' in
the query).

This is problematic because this can happen:

`SELECT * FROM t WHERE EXISTS (SELECT * FROM t)`

There are two cursors, both with identifier 't'. This causes a bug
where the program will use the same cursor for both the main query
and the subquery, since they are keyed by 't'.

Instead introduce `CursorKey`, which is a combination of:

1. `TableInternalId`, and
2. index name (Option<String> -- in case of index cursors.

This should provide key uniqueness for cursors:

`SELECT * FROM t WHERE EXISTS (SELECT * FROM t)`

here the first 't' will have a different `TableInternalId` than the
second `t`, so there is no clash.
2025-05-29 00:59:24 +03:00
Jussi Saurio
7ab243dc4e Merge 'Make WhereTerm::consumed a Cell<bool>' from Jussi Saurio
Currently in the main translation logic after planning and optimization,
we don't _really_ need to pass a `&mut Vec<WhereTerm>` around anymore,
except for the fact that virtual table constraint resolution is done ad-
hoc in `init_loop()`.
Even there, the only thing we mutate is `WhereTerm::consumed` which is a
boolean indicating that the term has been "used up" by the optimizer and
shouldn't be evaluated as a normal where clause condition anymore.
In the upcoming branch for WHERE clause subqueries, I want to store
immutable references to WHERE clause expressions in `Resolver`, but this
is unfortunately not possible if we still use the aforementioned mutable
references.
Hence, we can temporarily make `WhereTerm::consumed` a `Cell<bool>`
which allows us to pass an immutable reference to `init_loop()`, and the
`Cell` can be removed once the virtual table constraint resolution is
moved to an earlier part of the query processing pipeline.

Closes #1597
2025-05-28 11:14:40 +03:00
Jussi Saurio
73e806ad84 Make WhereTerm::consumed a Cell<bool>
Currently in the main translation logic after planning and optimization,
we don't _really_ need to pass a &mut Vec<WhereTerm> around anymore, except
for the fact that virtual table constraint resolution is done ad-hoc in
`init_loop()`. Even there, the only thing we mutate is `WhereTerm::consumed`
which is a boolean indicating that the term has been "used up" by the optimizer
and shouldn't be evaluated as a normal where clause condition anymore.

In the upcoming branch for WHERE clause subqueries, I want to store immutable
references to WHERE clause expressions in `Resolver`, but this is unfortunately
not possible if we still use the aforementioned mutable references.

Hence, we can temporarily make `WhereTerm::consumed` a `Cell<bool>` which allows
us to pass an immutable reference to `init_loop()`, and the `Cell` can be removed
once the virtual table constraint resolution is moved to an earlier part of the
query processing pipeline.
2025-05-28 11:02:39 +03:00
Jussi Saurio
51605ad2a4 Use lifetimes in walk_expr() to guarantee that child expr has same lifetime as parent expr 2025-05-28 10:56:30 +03:00
Jussi Saurio
a9ae1af75c Fix: init_limit() in wrong place for Delete 2025-05-27 21:26:31 +03:00
Jussi Saurio
3c587b91b5 Add comment on init_limit() 2025-05-27 21:19:28 +03:00
Jussi Saurio
4e9d9a2470 Fix LIMIT handling
Currently we have some usages of LIMIT where the actual limit counter
is initialized next to the DecrJumpZero instruction, and then
`program.mark_last_insn_constant()` is used to hoist the counter
initialization to the beginning of the program.

This is very fragile, and already FROM clause subquery handling works
around this with a hack (removed in this PR), and (upcoming) WHERE clause
subqueries would also run into problems because of this, because the LIMIT
might need to be initialized once for every iteration of the subquery.

This PR removes those usages for LIMIT, and LIMIT processing is now more
intuitive:

- limit counter is now initialized at the start of the query processing
- a function init_limit() is extracted to do this for select/update/delete
2025-05-27 21:12:22 +03:00
Jussi Saurio
8abe5efe99 Merge 'Add Schema reference to Resolver - needed for adhoc subquery planning' from Jussi Saurio
enabler for WHERE clause subquery ad-hoc planning&translation

Closes #1589
2025-05-27 20:19:45 +03:00
Jussi Saurio
ad0f2bb399 Merge 'Small VDBE insn tweaks' from Jussi Saurio
1. allow calling op_null with Insn::BeginSubrtn
    - BeginSubrtn is identical to Null, but named differently so that
its use in context is clearer
2. Insn::Return: add possibility to fallthrough on non-integer values as
per sqlite spec

Closes #1588
2025-05-27 20:19:31 +03:00
meteorgan
2f82762ca2 add function parse_signed_number 2025-05-28 00:33:41 +08:00
meteorgan
d9d3a5ecbb Use the SetCookie opcode to implement user_version pragma 2025-05-28 00:31:11 +08:00
Jussi Saurio
d2a287f67f Add Schema reference to Resolver - needed for adhoc subquery planning 2025-05-27 19:12:47 +03:00
Jussi Saurio
70965f4b28 Insn::Return: add possibility to fallthrough on non-integer values as per sqlite spec 2025-05-27 19:09:10 +03:00
pedrocarlo
1410e57112 correct union result_row or yield emission + test 2025-05-26 01:06:26 -03:00
pedrocarlo
ee93316c46 fix num_values detection + emitting correct column for temp_table + tests 2025-05-25 19:15:28 -03:00
pedrocarlo
e3fd1e589e support using a INSERT SELECT that references the same table in both statements 2025-05-25 19:15:28 -03:00
pedrocarlo
90e3c8483d tests with compound select 2025-05-25 19:15:28 -03:00
pedrocarlo
72c1f2f582 fix rebase issues and make code compile by cloning query type. Adjust the compound select behavior with insert 2025-05-25 19:13:40 -03:00
pedrocarlo
c8144340a0 adjust proper ordering for value insert 2025-05-25 19:12:30 -03:00
pedrocarlo
810211b3d1 passing incorrect number of values to virtual table insert 2025-05-25 19:12:30 -03:00
pedrocarlo
4bcfc8ca60 create separate function to populate multiple columns in a multi-row VALUES clause or in an INSERT INTO <table> SELECT. Virtual Table insert is broken, need to fix it still 2025-05-25 19:12:30 -03:00
pedrocarlo
bb7da39c72 remove assumption that translate_select is always called from a top-level context + adjust insert to use translate_select when needed 2025-05-25 19:12:30 -03:00
pedrocarlo
fd9e0db5cc pass the owned ast to translate_insert + remove assumption of a list of values in populate_columns_insert 2025-05-25 19:02:17 -03:00
pedrocarlo
15ffdd3e51 modify translate_select to return number of result columns 2025-05-25 19:02:17 -03:00
Jussi Saurio
07fa3a9668 Rename SelectQueryType to QueryDestination 2025-05-25 21:23:04 +03:00
Jussi Saurio
d893a55c55 UNION 2025-05-25 21:23:04 +03:00
Jussi Saurio
7c07c09300 Add stable internal_id property to TableReference
Currently our "table id"/"table no"/"table idx" references always
use the direct index of the `TableReference` in the plan, e.g. in
`SelectPlan::table_references`. For example:

```rust
Expr::Column { table: 0, column: 3, .. }
```

refers to the 0'th table in the `table_references` list.

This is a fragile approach because it assumes the table_references
list is stable for the lifetime of the query processing. This has so
far been the case, but there exist certain query transformations,
e.g. subquery unnesting, that may fold new table references from
a subquery (which has its own table ref list) into the table reference
list of the parent.

If such a transformation is made, then potentially all of the Expr::Column
references to tables will become invalid. Consider this example:

```sql
-- Assume tables: users(id, age), orders(user_id, amount)

-- Get total amount spent per user on orders over $100
SELECT u.id, sub.total
FROM users u JOIN
     (SELECT user_id, SUM(amount) as total
      FROM orders o
      WHERE o.amount > 100
      GROUP BY o.user_id) sub
WHERE u.id = sub.user_id

-- Before subquery unnesting:
-- Main query table_references: [users, sub]
-- u.id refers to table 0, column 0
-- sub.total refers to table 1, column 1
--
-- Subquery table_references: [orders]
-- o.user_id refers to table 0, column 0
-- o.amount refers to table 0, column 1
--
-- After unnesting and folding subquery tables into main query,
-- the query might look like this:

SELECT u.id, SUM(o.amount) as total
FROM users u JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100
GROUP BY u.id;

-- Main query table_references: [users, orders]
-- u.id refers to table index 0 (correct)
-- o.amount refers to table index 0 (incorrect, should be 1)
-- o.user_id refers to table index 0 (incorrect, should be 1)
```

We could ofc traverse every expression in the subquery and rewrite
the table indexes to be correct, but if we instead use stable identifiers
for each table reference, then all the column references will continue
to be correct.

Hence, this PR introduces a `TableInternalId` used in `TableReference`
as well as `Expr::Column` and `Expr::Rowid` so that this kind of query
transformations can happen with less pain.
2025-05-25 20:26:17 +03:00
PThorpe92
a2f8b2dfea Fix add check for invalid argv index for vtab constraints in main loop 2025-05-24 14:49:58 -04:00
Jussi Saurio
8ed5334ca7 tests/fuzz: add compound_select_fuzz() 2025-05-24 13:12:41 +03:00
Jussi Saurio
f6443ae742 Support LIMIT with UNION ALL 2025-05-24 13:12:41 +03:00
Jussi Saurio
08bda9cc58 UNION ALL 2025-05-24 13:12:41 +03:00
Jussi Saurio
0b2c3298aa Merge 'refactor: introduce walk_expr() and walk_expr_mut() to reduce repetitive pattern matching' from Jussi Saurio
We do a lot of
```rust
match expr {
  ...
}
```
just to find some specific case like `Expr::Column` deep inside an
expression tree. This PR introduces new helpers `walk_expr()` and
`walk_expr_mut()` that handle the tree-walking part, and the business
logic functions where this tree traversal is used can focus on the exact
enum variants of `Expr` that they are interested in.

Closes #1564
2025-05-23 22:00:20 +03:00
Jussi Saurio
2e095e6d03 Merge 'Add some comments for values statement' from meteorgan
follow up: #1549
simultaneously, address a warning in CI as the package `sqlite3-parser`
has been renamed to `limbo_sqlite3_parser`.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1565
2025-05-23 17:30:30 +03:00
meteorgan
3bf0ce7fb3 Add some comments for values statement 2025-05-23 22:11:34 +08:00
Jussi Saurio
1a937462b3 Merge 'core/pragma: Add support for update user_version' from Diego Reis
It also changes the type from u32 to i32 since
sqlite supports negative values

Closes #1559
2025-05-23 17:00:55 +03:00