Commit Graph

4756 Commits

Author SHA1 Message Date
pedrocarlo
fd9e0db5cc pass the owned ast to translate_insert + remove assumption of a list of values in populate_columns_insert 2025-05-25 19:02:17 -03:00
pedrocarlo
15ffdd3e51 modify translate_select to return number of result columns 2025-05-25 19:02:17 -03:00
Jussi Saurio
be89809335 Merge 'Add PThorpe92 to codeowners file for extensions + go bindings' from Preston Thorpe
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1573
2025-05-25 21:43:00 +03:00
Jussi Saurio
41dfb2fd5e Merge 'UNION' from Jussi Saurio
```sql

limbo> select * from products where id between 1 and 3
UNION ALL
select * from products where id between 2 and 4;
┌───┬─────────┬──────┐
│ 1 │ hat     │ 79.0 │
├───┼─────────┼──────┤
│ 2 │ cap     │ 82.0 │
├───┼─────────┼──────┤
│ 3 │ shirt   │ 18.0 │
├───┼─────────┼──────┤
│ 2 │ cap     │ 82.0 │
├───┼─────────┼──────┤
│ 3 │ shirt   │ 18.0 │
├───┼─────────┼──────┤
│ 4 │ sweater │ 25.0 │
└───┴─────────┴──────┘
limbo> select * from products where id between 1 and 3
UNION
select * from products where id between 2 and 4;
┌───┬─────────┬──────┐
│ 1 │ hat     │ 79.0 │
├───┼─────────┼──────┤
│ 2 │ cap     │ 82.0 │
├───┼─────────┼──────┤
│ 3 │ shirt   │ 18.0 │
├───┼─────────┼──────┤
│ 4 │ sweater │ 25.0 │
└───┴─────────┴──────┘
limbo>
```
Similarly as UNION ALL (#1541 ), supports LIMIT but not OFFSET or ORDER
BY.
Augments `compound_select_fuzz()` to work with both UNION and UNION ALL

Closes #1545
2025-05-25 21:40:35 +03:00
Jussi Saurio
d5623e752b make compound_select_fuzz() test more likely to generate duplicate rows 2025-05-25 21:27:45 +03:00
Jussi Saurio
07fa3a9668 Rename SelectQueryType to QueryDestination 2025-05-25 21:23:04 +03:00
Jussi Saurio
d893a55c55 UNION 2025-05-25 21:23:04 +03:00
Jussi Saurio
75cab791b7 Merge 'Refactor: add stable internal_id property to TableReference' from Jussi Saurio
Closes #1557
Currently our "table id"/"table no"/"table idx" references always use
the direct index of the `TableReference` in the plan, e.g. in
`SelectPlan::table_references`. For example:
```rust
Expr::Column { table: 0, column: 3, .. }
```
refers to the 0'th table in the `table_references` list.
This is a fragile approach because it assumes the table_references list
is stable for the lifetime of the query processing. This has so far been
the case, but there exist certain query transformations, e.g. subquery
unnesting, that may fold new table references from a subquery (which has
its own table ref list) into the table reference list of the parent.
If such a transformation is made, then potentially all of the
Expr::Column references to tables will become invalid. Consider this
example:
```sql
-- Assume tables: users(id, age), orders(user_id, amount)

-- Get total amount spent per user on orders over $100
SELECT u.id, sub.total
FROM users u JOIN
     (SELECT user_id, SUM(amount) as total
      FROM orders o
      WHERE o.amount > 100
      GROUP BY o.user_id) sub
WHERE u.id = sub.user_id

-- Before subquery unnesting:
-- Main query table_references: [users, sub]
-- u.id refers to table 0, column 0
-- sub.total refers to table 1, column 1
--
-- Subquery table_references: [orders]
-- o.user_id refers to table 0, column 0
-- o.amount refers to table 0, column 1
--
-- After unnesting and folding subquery tables into main query,
-- the query might look like this:

SELECT u.id, SUM(o.amount) as total
FROM users u JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100
GROUP BY u.id;

-- Main query table_references: [users, orders]
-- u.id refers to table index 0 (correct)
-- o.amount refers to table index 0 (incorrect, should be 1)
-- o.user_id refers to table index 0 (incorrect, should be 1)
```
We could ofc traverse every expression in the subquery and rewrite the
table indexes to be correct, but if we instead use stable identifiers
for each table reference, then all the column references will continue
to be correct.
Hence, this PR introduces a `TableInternalId` used in `TableReference`
as well as `Expr::Column` and `Expr::Rowid` so that this kind of query
transformations can happen with less pain. I used a separate newtype
struct for `TableInternalId` because it made the refactor a lot easier
due to not having to spend time thinking which `usize` is what.
---
Potential follow-up: `join_order` can be removed from `SelectPlan`
because the `table_references` vec can simply be sorted after join
reordering, because the `Expr::Column` references will continue to be
valid.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1561
2025-05-25 21:21:22 +03:00
PThorpe92
72d82abb80 Add PThorpe92 to codeowners file for extensions + go bindings 2025-05-25 13:29:05 -04:00
Jussi Saurio
7c07c09300 Add stable internal_id property to TableReference
Currently our "table id"/"table no"/"table idx" references always
use the direct index of the `TableReference` in the plan, e.g. in
`SelectPlan::table_references`. For example:

```rust
Expr::Column { table: 0, column: 3, .. }
```

refers to the 0'th table in the `table_references` list.

This is a fragile approach because it assumes the table_references
list is stable for the lifetime of the query processing. This has so
far been the case, but there exist certain query transformations,
e.g. subquery unnesting, that may fold new table references from
a subquery (which has its own table ref list) into the table reference
list of the parent.

If such a transformation is made, then potentially all of the Expr::Column
references to tables will become invalid. Consider this example:

```sql
-- Assume tables: users(id, age), orders(user_id, amount)

-- Get total amount spent per user on orders over $100
SELECT u.id, sub.total
FROM users u JOIN
     (SELECT user_id, SUM(amount) as total
      FROM orders o
      WHERE o.amount > 100
      GROUP BY o.user_id) sub
WHERE u.id = sub.user_id

-- Before subquery unnesting:
-- Main query table_references: [users, sub]
-- u.id refers to table 0, column 0
-- sub.total refers to table 1, column 1
--
-- Subquery table_references: [orders]
-- o.user_id refers to table 0, column 0
-- o.amount refers to table 0, column 1
--
-- After unnesting and folding subquery tables into main query,
-- the query might look like this:

SELECT u.id, SUM(o.amount) as total
FROM users u JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100
GROUP BY u.id;

-- Main query table_references: [users, orders]
-- u.id refers to table index 0 (correct)
-- o.amount refers to table index 0 (incorrect, should be 1)
-- o.user_id refers to table index 0 (incorrect, should be 1)
```

We could ofc traverse every expression in the subquery and rewrite
the table indexes to be correct, but if we instead use stable identifiers
for each table reference, then all the column references will continue
to be correct.

Hence, this PR introduces a `TableInternalId` used in `TableReference`
as well as `Expr::Column` and `Expr::Rowid` so that this kind of query
transformations can happen with less pain.
2025-05-25 20:26:17 +03:00
Jussi Saurio
b7d2173e99 Merge 'Fix off-by-one error in max_frame after WAL load' from Jussi Saurio
🤦

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1572
2025-05-25 20:25:50 +03:00
Jussi Saurio
b5ac095716 Fix off-by-one error in max_frame after WAL load 2025-05-25 19:34:51 +03:00
Jussi Saurio
f388bc571e Merge 'xConnect for virtual tables to query core db connection' from Preston Thorpe
Re-Opening #1076 because it had bit-rotted to a point of no return.
However it has improved. Now with Weak references and no incrementing Rc
strong counts.
This also includes a better test extension that returns info about the
other tables in the schema.
![image](https://github.com/user-
attachments/assets/4292dc9c-121e-4ba2-8a51-4533bbcf2afd)
(theme doesn't show rows column)

Closes #1366
2025-05-25 14:37:38 +03:00
Jussi Saurio
621ae60ab5 Merge 'Reconstruct WAL frame cache when WAL is opened' from Jussi Saurio
Fixes #1567
Probably also fixes #1485
Currently we are simply unable to read any WAL frames from disk once a
fresh process w/ Limbo is opened, since we never try to read anything
from disk unless we already have it in our in-memory frame cache.
This commit implements a crude way of reading entire WAL into memory as
a single buffer and reconstructing the frame cache.

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1570
2025-05-25 14:35:47 +03:00
Jussi Saurio
385c0d8987 clippy stfu part 2: electric boogaloo 2025-05-25 10:32:23 +03:00
Jussi Saurio
2df01f0b6f clippy stfu 2025-05-25 10:29:30 +03:00
Jussi Saurio
6254246541 use tempfile in test 2025-05-25 10:25:52 +03:00
Jussi Saurio
64ef3f1343 simplify condition 2025-05-25 10:22:46 +03:00
Jussi Saurio
20e65c0125 bump max_loops to 100k 2025-05-25 10:21:41 +03:00
Pere Diaz Bou
a91f8aee78 Merge 'set non-shared cache by default' from Pere Diaz Bou
Shared cache requires more locking mechasnisms. We still have multi
threading issues not related to shared cache so it is wise to first fix
those and then once they are fixed, we can incrementally add shared
cache back with locking in place.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1568
2025-05-25 08:50:21 +02:00
PThorpe92
954df3f837 Fix csv test assertion 2025-05-24 17:33:01 -04:00
PThorpe92
98e1c0ddd4 Remove unused method for converting from ext type without ownership 2025-05-24 17:22:12 -04:00
PThorpe92
e53e30e06d Fix tests in csv extension to adapt to new API 2025-05-24 17:16:45 -04:00
PThorpe92
ef28906be3 Update extensions README with example for xConnect 2025-05-24 17:10:26 -04:00
PThorpe92
cf163f2dc0 Prevent double free in ext connection 2025-05-24 16:49:52 -04:00
PThorpe92
1cacbf1f0d Close statements in extension tests, and use mut pointers for stmt 2025-05-24 16:45:25 -04:00
PThorpe92
d63f9d8cff Make sure all resources are cleaned up properly in xconnect 2025-05-24 16:38:33 -04:00
PThorpe92
a4ed464ec4 Add some traces for errors in xconnect 2025-05-24 15:44:06 -04:00
PThorpe92
0decafbbc1 Use transparent struct in public api wrapper for vtab connect 2025-05-24 15:32:14 -04:00
PThorpe92
2e4343402e Add null checks to prevent double frees in vtab connections 2025-05-24 15:20:09 -04:00
PThorpe92
999205f896 Add documentation for connection api 2025-05-24 15:19:37 -04:00
Jussi Saurio
09df7e0c48 Merge 'TPC-H with criterion and nyrkio' from Pedro Muniz
Added a benchmark to bench TPC-H with criterion and have it uploaded to
Nyrkio. I did not delete the other tpc-h job because maybe someone uses
it and I'm not aware of it.

Closes #1560
2025-05-24 21:54:48 +03:00
PThorpe92
c5d364064a Add python tests for xConnect behavior and testing extension 2025-05-24 14:49:59 -04:00
PThorpe92
687edefcdf Add option to py tests to create temporary db with clone of testing.db 2025-05-24 14:49:59 -04:00
PThorpe92
faa12987b4 Add test case to table stats extension 2025-05-24 14:49:59 -04:00
PThorpe92
4142d813c0 Change method name to bind_at to better reflect args in ext Statement 2025-05-24 14:49:58 -04:00
PThorpe92
a2f8b2dfea Fix add check for invalid argv index for vtab constraints in main loop 2025-05-24 14:49:58 -04:00
PThorpe92
58e1d5a4f8 Add additional test vtable extension for querying core 2025-05-24 14:49:58 -04:00
PThorpe92
d11ef6b9c5 Add execute method to xConnect db interface for vtables 2025-05-24 14:49:58 -04:00
PThorpe92
c2ec6caae1 Finish integrating xConnect into vtable open api 2025-05-24 14:49:58 -04:00
PThorpe92
cbd7245677 Update Vtable open method to accept core db connection 2025-05-24 14:49:58 -04:00
PThorpe92
2c784070f1 Impl Default for ext Value 2025-05-24 14:49:58 -04:00
PThorpe92
f61ccc78e8 Add from_ffi_ptr method to create OwnedValue from Ext type without taking ownership 2025-05-24 14:49:58 -04:00
PThorpe92
d51614a4fd Create extern functions to support vtab xConnect in core/ext 2025-05-24 14:49:57 -04:00
Jussi Saurio
208639c5ee clippy 2025-05-24 21:01:13 +03:00
Jussi Saurio
67359dc17b Add another persistence test and also assert that the data was in the WAL, not the main db 2025-05-24 20:44:47 +03:00
Jussi Saurio
1baa9c7038 Add regression test for being able to read WAL from disk 2025-05-24 18:35:53 +03:00
Jussi Saurio
fc45e0ec0d Reconstruct WAL frame cache when WAL is opened
Currently we are simply unable to read any WAL frames from disk
once a fresh process w/ Limbo is opened, since we never try to read
anything from disk unless we already have it in our in-memory
frame cache.

This commit implements a crude way of reading entire WAL into memory
as a single buffer and reconstructing the frame cache.
2025-05-24 18:29:44 +03:00
Jussi Saurio
02e7726249 Merge 'UNION ALL' from Jussi Saurio
Adds support for `UNION ALL` and introduces `Plan::CompoundSelect` so
that it can be extended to support `UNION/EXCEPT/INTERSECT` as well
```sql
do_execsql_test_on_specific_db {:memory:} select-union-all-1 {
  CREATE TABLE t1(x INTEGER);
  CREATE TABLE t2(x INTEGER);
  CREATE TABLE t3(x INTEGER);

  INSERT INTO t1 VALUES(1),(2),(3);
  INSERT INTO t2 VALUES(4),(5),(6);
  INSERT INTO t3 VALUES(7),(8),(9);

  SELECT x FROM t1
  UNION ALL
  SELECT x FROM t2
  UNION ALL
  SELECT x FROM t3;
} {1
2
3
4
5
6
7
8
9}

do_execsql_test_on_specific_db {:memory:} select-union-all-with-filters {
  CREATE TABLE t4(x INTEGER);
  CREATE TABLE t5(x INTEGER);
  CREATE TABLE t6(x INTEGER);

  INSERT INTO t4 VALUES(1),(2),(3),(4);
  INSERT INTO t5 VALUES(5),(6),(7),(8);
  INSERT INTO t6 VALUES(9),(10),(11),(12);

  SELECT x FROM t4 WHERE x > 2
  UNION ALL
  SELECT x FROM t5 WHERE x < 7
  UNION ALL
  SELECT x FROM t6 WHERE x = 10;
} {3
4
5
6
10}
```
Supports LIMIT. Currently does not support `WITH()`, `OFFSET` or `ORDER
BY` and explicitly returns a parse error if those are present.

Closes #1541
2025-05-24 13:41:57 +03:00
Jussi Saurio
8ed5334ca7 tests/fuzz: add compound_select_fuzz() 2025-05-24 13:12:41 +03:00