Commit Graph

1530 Commits

Author SHA1 Message Date
Nikita Sivukhin
c63c820bb7 add busy_timeout pragma 2025-09-19 16:48:12 +04:00
Preston Thorpe
6b273af7e9 Merge 'translate/optimize: centralize AST/expr traversal' from Preston Thorpe
Previously we were rewriting/traversing the AST in a couple different
places, each of these added kinda ad-hoc as we needed them. This
attempts to do the binding of column references as well as the rewriting
of anonymous `Expr::Variable` -> `__param_N` that we use to maintain the
order of bound variables, also normalizes the Qualified Name's.
Also we previously weren't accepting Variable (or at least they wouldn't
work) in places like `LIMIT ? OFFSET ?`, which this PR adds.
I kinda want to keep refactoring translation a bit, and try to break
plan building up into more easy-to-digest chunks.. but I will resist the
urge right now as it's definitely not high priority pre-beta

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3210
2025-09-19 08:03:39 -04:00
Preston Thorpe
20493441e0 Merge 'prevent alter table with materialized views' from Glauber Costa
I don't want to even think about the complexity involved in making sure
that materialized views are still sane after the base table(s) are
altered.

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3223
2025-09-19 08:01:58 -04:00
Glauber Costa
8300d0390e prevent alter table with materialized views
I don't want to even think about the complexity involved in making sure
that materialized views are still sane after the base table(s) are
altered.
2025-09-19 05:59:46 -05:00
PThorpe92
e1ed12b284 rm claude comment 2025-09-19 05:20:20 -04:00
Glauber Costa
f149b40e75 Implement JOINs in the DBSP circuit
This PR improves the DBSP circuit so that it handles the JOIN operator.
The JOIN operator exposes a weakness of our current model: we usually
pass a list of columns between operators, and find the right column by
name when needed.

But with JOINs, many tables can have the same columns. The operators
will then find the wrong column (same name, different table), and
produce incorrect results.

To fix this, we must do two things:
1) Change the Logical Plan. It needs to track table provenance.
2) Fix the aggregators: it needs to operate on indexes, not names.

For the aggregators, note that table provenance is the wrong
abstraction. The aggregator is likely working with a logical table that
is the result of previous nodes in the circuit. So we just need to be
able to tell it which index in the column array it should use.
2025-09-19 03:59:28 -05:00
Glauber Costa
2e7a45559b add joins to the logical plan 2025-09-19 03:57:11 -05:00
Glauber Costa
0b3317d449 extract columns from all tables in case of joins.
Our code for view needs to extract the list of columns used in the view.
We currently extract only from "the base table", but once we have joins,
we need a more complex structure, that keeps the mapping of
(tables, columns).

This actually affects both views and materialized views: for views, the
queries with joins work just fine, because views are just aliases for
a query. But the list of columns returned by pragma table_info on the
view is incorrect. We add a test to make sure it is fixed.

For materialized views, we add extensive tests to make sure that the
columns are extracted correctly.
2025-09-19 03:57:11 -05:00
PThorpe92
b86f321eca Add comments to bind_and_rewrite_expr 2025-09-18 19:15:14 -04:00
PThorpe92
1a3a41997c Clippy warning, fix needless mut refs and remove import 2025-09-18 19:04:13 -04:00
PThorpe92
6f446aaf48 remove bind_column_references method and its last usages 2025-09-18 18:59:28 -04:00
PThorpe92
38096ffc9e Rewrite true/false to 0/1 even tho its also done in the parser now 2025-09-18 18:44:35 -04:00
PThorpe92
ffd1f87682 Centralize most of the AST traversal by binding columns and rewriting exprs together 2025-09-18 18:38:03 -04:00
PThorpe92
c941955444 Fix issue with result columns being inappropriate for inserting multiple rows 2025-09-18 14:35:12 -04:00
Jussi Saurio
1d2b461a2a Merge 'Compat: Translate the 2nd argument of group_concat / string_agg' from Iaroslav Zeigerman
Fixes #3140

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3155
2025-09-18 09:23:05 +03:00
Jussi Saurio
0d37ac2519 Merge 'translate: couple fixes from testing with Gorm' from Preston Thorpe
Ongoing tests for [turso-go](https://github.com/tursodatabase/turso-go)
have unearthed a couple more issues
closes #3187
### Number 1:
We were getting something like:
```sql
sqlite_autoindex_`databases`_2
```
when creating autoindex for table in Gorm (gorm is notorious for
backticks everywhere), because of not normalizing the column name when
creating autoindex.
### Number 2:
When creating table with `PRIMARY KEY AUTOINCREMENT`, we were still
creating the index, but it wasn't properly handled in
`populate_indices`, because we are doing the following:
```rust
                if column.primary_key && unique_set.is_primary_key {
                    if column.is_rowid_alias {
                        // rowid alias, no index needed
                        continue; // continues, but doesn't consume it..
                    }
```
So if we created such an index entry for the AUTOINCREMENT... we would
trip this:
```rust
assert!(automatic_indexes.is_empty(), "all automatic indexes parsed from sqlite_schema should have been consumed, but {} remain", automatic_indexes.len());
```

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3186
2025-09-18 09:21:41 +03:00
Jussi Saurio
498293658e Merge 'Reduce allocations needed for break_predicate_at_and_boundaries' from Lâm Hoàng Phúc
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3188
2025-09-18 09:21:24 +03:00
TcMits
88119888d0 reduce allocation needed for break_predicate_at_and_boundaries 2025-09-18 10:52:29 +07:00
PThorpe92
5aa07eb826 Use normalized table name for autoindex creation 2025-09-17 20:51:22 -04:00
PThorpe92
45fcadbf20 Fix formatting of autoindex 2025-09-17 16:07:11 -04:00
PThorpe92
c57545d504 Avoid panicking when we create autoindex for AUTOINCREMENT primary key 2025-09-17 15:52:42 -04:00
PThorpe92
dde8a49f4e normalize identifier for creating autoindex to prevent e.g. sqlite_autoindextable_2 2025-09-17 13:25:33 -04:00
PThorpe92
4e71524e42 normalize identifier for ID::Name in upsert expr rewriting 2025-09-17 13:24:06 -04:00
Preston Thorpe
8c53d7f024 Merge 'translation: rewrite expressions and properly handle quoted identifiers in UPSERT' from Preston Thorpe
This PR fixes bugs found in the [turso-
go](https://github.com/tursodatabase/turso-go) driver with UPSERT clause
earlier, where `Gorm` will (obviously) use Expr::Variable's as well as
use quotes for `Expr::Qualified` in the tail end of an UPSERT statement.
Example:
```sql
INSERT INTO users (a,b,c) VALUES (?,?,?) ON CONFLICT (`users`.`a`) DO UPDATE SET b = `excluded`.`b`, a = ?;
```
and previously we were not properly calling `rewrite_expr`, which was
not properly setting the anonymous `Expr::Variable` to `__param_N` named
parameter, so it would ignore it completely, then return the wrong # of
parameters.
Also, we didn't handle quoted "`excluded`.`x`", so it would panic in the
optimizer that Qualified should have been rewritten earlier.

Closes #3157
2025-09-17 11:25:13 -04:00
Iaroslav Zeigerman
29e0cabf2a Compat: Translate the 2nd argument of group_concat / string_agg 2025-09-17 07:42:07 -07:00
Preston Thorpe
bcafb288ad Merge 'Fix is_nonnull returns true on 1 / 0' from Lâm Hoàng Phúc
turso:
```sh
turso> CREATE TABLE t (x PRIMARY KEY, y, z);
turso> INSERT INTO t VALUES (37, -70, -196792117);
turso> SELECT * FROM t WHERE  (1 / 0) >= -3289742039 < t.x;
┌────┬─────┬────────────┐
│ x  │ y   │ z          │
├────┼─────┼────────────┤
│ 37 │ -70 │ -196792117 │
└────┴─────┴────────────┘
turso>
```
sqlite:
```sh
sqlite> CREATE TABLE t (x PRIMARY KEY, y, z);
sqlite> INSERT INTO t VALUES (37, -70, -196792117);
sqlite> SELECT * FROM t WHERE  (1 / 0) >= -3289742039 < t.x;
sqlite>
```
related: https://github.com/tursodatabase/turso/actions/runs/17765571409
/job/50488042583?pr=3147#step:8:855

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3167
2025-09-17 06:55:55 -04:00
PThorpe92
97c11898fe Minor refactor in translate/insert 2025-09-17 06:44:10 -04:00
PThorpe92
5dd466941e Handle upsert even in inserting_multiple_rows case 2025-09-17 06:44:09 -04:00
PThorpe92
85eee42bf1 Support quoted qualified identifiers in UPSERT excluded.x clauses 2025-09-17 06:44:08 -04:00
PThorpe92
d2cd833b86 Rewrite exprs in set + where clause for UPSERT 2025-09-17 06:38:25 -04:00
TcMits
6606bf12d3 is_nonnull returns true on 1 / 0 2025-09-17 14:50:15 +07:00
Pekka Enberg
06d869ea5e core/ext: Switch vtab_modules from Rc to Arc 2025-09-17 10:36:12 +03:00
Pekka Enberg
17e9f05ea4 core: Convert Rc<Pager> to Arc<Pager> 2025-09-17 09:32:49 +03:00
Jussi Saurio
cae234818b Merge 'Inital support for window functions' from Piotr Rżysko
This adds basic support for window functions. For now:
* Only existing aggregate functions can be used as window functions.
* Specialized window-specific functions (`rank`, `row_number`, etc.) are
not yet supported.
* Only the default frame definition is implemented:
`RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW EXCLUDE NO OTHERS`.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3079
2025-09-17 08:29:16 +03:00
Pekka Enberg
ae25a0f088 Merge 'Implement Min/Max aggregators' from Glauber Costa
We have not implemented them before because they require the raw
elements to be kept. It is easy to see why in the following example:
```
current_min = 3;
insert(2) => current_min = 2 // can be done without state
delete(2) => needs to look at the state to determine new min!
```
The aggregator state was a very simple key-value structure. To
accomodate for min/max, we will make it into a more complex table, where
we can encode a more complex structure.
The key insight is that we can use a primary key composed of:
```
1) storage_id
2) zset_id,
3) element
```
The storage_id and zset_id are our previous key, except they are now
exploded to support a larger range of storage_id. With more bits
available in the storage_id, we can encode information about which
column we are storing. For aggregations in multiple columns, we will
need to keep a different list of values for min/max!
The element is just the values of the columns.
Because this is a primary key, the data will be sorted in the btree. We
can then just do a prefix search in the first two components of the key
and easily find the min/max when needed.
This new format is also adequate for joins. Joins will just have a new
storage_id which encodes two "columns" (left side, right side).

Closes #3143
2025-09-16 16:19:59 +03:00
Pekka Enberg
74331898a3 Merge 'Add quoted identifier test cases for ALTER TABLE' from Levy A.
Resolves #2093
There is a small incompatibility on how we quote the added column on the
final schema, but doesn't change any behavior.

Closes #2943
2025-09-16 11:46:12 +03:00
Glauber Costa
3565e7978a Add an index to the dbsp internal table
And also change the schema of the main table. I have come to see the
current key-value schema as inadequate for non-aggregate operators.
Calculating Min/Max, for example, doesn't feat in this schema because
we have to be able to track existing values and index them.

Another alternative is to keep one table per operator type, but this
quickly leads to an explosion of tables.
2025-09-15 22:30:48 -05:00
Jussi Saurio
396091044e store tx_mode in conn.mv_tx
otherwise op_transaction works completely wrong because each separate
insert statement overrides the tx_mode to Write
2025-09-14 21:59:08 +03:00
Piotr Rzysko
1a95131c3c Include windows in ToTokens for SelectPlan 2025-09-13 11:12:44 +02:00
Piotr Rzysko
9ff2133ff2 Rewrite window function expressions in the optimizer
Currently, this is effectively a no-op because, at the optimization
stage, window function expressions are in the form
win_func(subquery_column1, subquery_column2, ...).

Nevertheless, expressions are rewritten to maintain consistency with
aggregates, which also hold cloned expressions from sources like result
columns. This ensures future changes in the optimizer won’t break window
function handling.
2025-09-13 11:12:44 +02:00
Piotr Rzysko
f5efcbe745 Add support for window functions
Adds initial support for window functions. For now, only existing
aggregate functions can be used as window functions—no specialized
window-specific functions are supported yet.

Currently, only the default frame definition is implemented:
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW EXCLUDE NO OTHERS.
2025-09-13 11:12:44 +02:00
Piotr Rzysko
c81cd16230 Extract QueryDestination::placeholder_for_subquery 2025-09-13 10:49:14 +02:00
Piotr Rzysko
1826023c32 Decouple AggArgumentSource::Expression from Aggregate
This allows it to be reused for window function processing without
relying on the Aggregate struct.
2025-09-13 10:49:14 +02:00
Piotr Rzysko
6c3c44e204 Expose fewer details from AggArgumentSource
Hides unnecessary internals to decouple the API from the Aggregate struct.
2025-09-13 10:49:14 +02:00
Piotr Rzysko
5f2a3e1242 Handle dummy argument for count() and count(*) in translation
Two main reasons for this change:
* Improve readability by moving the logic for this special case closer
  to the code that relies on it.
* Decouple AggFunc from the Aggregate struct. In the future, window
  function processing will use AggFunc directly, without necessarily
  depending on Aggregate.
2025-09-13 10:49:14 +02:00
Piotr Rzysko
6d84cbedc2 Fix delimiter handling in group_concat and string_agg
Non-literal delimiters must be translated by AggArgumentSource.
2025-09-13 10:49:14 +02:00
Piotr Rzysko
110ffba2a1 Fix accumulator reset when arguments outnumber aggregates
Previously, while resetting accumulator registers, we would also
reset subsequent registers. This happened because the number of registers
to reset was computed as the sum of arguments rather than the number of
aggregate functions.
2025-09-13 10:49:14 +02:00
Piotr Rzysko
6224cdbbd3 Support WalkControl in walk_expr_mut
Now walk_expr_mut can use WalkControl to skip parts of the expression
tree. This makes it consistent with walk_expr.
2025-09-13 10:49:14 +02:00
Pekka Enberg
d8f07fe3da core: Panic on fsync() error by default
Retrying fsync() on error was historically not safe ("fsyncgate") and
Postgres still defaults to panicing on fsync(). Therefore, add a
"data_sync_retry" pragma (disabled by default) and use it to determine
whether to panic on fsync() error or not.
2025-09-13 10:21:12 +03:00
Pekka Enberg
dcd43ab8fc Merge 'Handle EXPLAIN QUERY PLAN like SQLite' from Lâm Hoàng Phúc
After this PR:
```
turso> EXPLAIN QUERY PLAN SELECT 1;
QUERY PLAN
`--SCAN CONSTANT ROW
turso> EXPLAIN QUERY PLAN SELECT 1 UNION SELECT 1;
QUERY PLAN
`--COMPOUND QUERY
   |--LEFT-MOST SUBQUERY
   |  `--SCAN CONSTANT ROW
   `--UNION USING TEMP B-TREE
      `--SCAN CONSTANT ROW
turso> CREATE TABLE x(y);
turso> CREATE TABLE z(y);
turso> EXPLAIN QUERY PLAN SELECT * from x,z;
QUERY PLAN
|--SCAN x
`--SCAN z
turso> EXPLAIN QUERY PLAN SELECT * from x,z ON x.y = z.y;
QUERY PLAN
|--SCAN x
`--SEARCH z USING INDEX ephemeral_z_t2
turso>
```

Closes #3057
2025-09-12 20:41:23 +03:00