Commit Graph

188 Commits

Author SHA1 Message Date
Jussi Saurio
6cf2072b51 translate: disallow correlated subqueries in HAVING and ORDER BY
These are supported by SQLite, but we cannot handle them correctly yet.
2025-10-29 15:37:19 +02:00
Jussi Saurio
4bf8ad8cfd Merge 'Support subqueries in all positions of a SELECT statement' from Jussi Saurio
Follow-up to #3847.
Adds support for subqueries in all other positions of a SELECT (the
result list, GROUP BY, ORDER BY, HAVING, LIMIT, OFFSET).
Turns out I am a sql noob and didn't realize that correlated subqueries
are supported in basically all positions except LIMIT/OFFSET, so added
support for those too + accompanying TCL tests.
Thankfully the abstractions introduced in #3847 carry over to this very
well so the code change is relatively small (over half of the diff is
tests and a lot of the remaining diff is just moving logic around).

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3852
2025-10-29 10:19:39 +02:00
Jussi Saurio
29fe3b585a Add more tests and disable correlated IN-subqueries in HAVING position
I discovered a flaw in our current translation that makes queries of type
HAVING foo IN (SELECT ...) not work properly - in these cases we need to
defer translation of the subquery until later.

I will fix this in a future PR because I suspect it's not trivial.
2025-10-29 09:57:55 +02:00
Nikita Sivukhin
0da3b4bfd3 fix after rebase 2025-10-28 11:27:35 +04:00
Nikita Sivukhin
e9b1ca12b6 add new access operation through IndexMethod 2025-10-28 11:27:35 +04:00
Jussi Saurio
d993ac8157 Merge 'index_method: implement basic trait and simple toy index' from Nikita Sivukhin
This PR adds `index_method` trait and implementation of toy sparse
vector index.
In order to make PR more lightweight - for now index methods are not
deeply integrated into the query planner and only necessary components
are added in order to make integration tests which uses `index_method`
API directly to work.
Primary changes introduced in this PR are:
1. `SymbolTable` extended with `index_methods` field and builtin
extensions populated with 2 native indices: `backing_btree` and
`toy_vector_sparse_ivf`
2. `Index` struct extended with `index_method` field which holds
`IndexMethodAttachment` constructed for the table with given parameters
from `IndexMethod` "factory" trait
The toy index implementation store inverted index pairs `(dimension,
rowid)` in the auxilary BTree index. This index uses special
`backing_btree` index_method which marked as `backing_btree: true` and
treated in a special way by the db core: this is real BTree index which
is not managed by the tursodb core and must be managed by index_method
created it (so it responsible for data population, creation, destruction
of this btree).

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3846
2025-10-28 07:01:36 +02:00
Jussi Saurio
8fecd82311 Emit non from clause subqueries in translation 2025-10-27 16:01:39 +02:00
Jussi Saurio
c54988192e Add SelectPlan::is_correlated() method 2025-10-27 16:01:39 +02:00
Jussi Saurio
580333ddd3 Add NonFromClauseSubquery struct and add a Vec of them to SelectPlan 2025-10-27 16:01:39 +02:00
Jussi Saurio
609d9957c1 Add new QueryDestination variants for subquery types 2025-10-27 16:01:39 +02:00
Nikita Sivukhin
05f0ee6a72 add more integration in order to properly skip backing_btree index_method 2025-10-27 17:00:26 +04:00
Jussi Saurio
5c05383cc1 Implement union for ColumnUsedMask 2025-10-27 13:57:56 +02:00
Jussi Saurio
de81af29e5 find_table_by_internal_id() returns whether table is an outer query reference
Unfortunately, our current translation machinery is unable to know for sure
whether a subquery reference to an outer table 't1' has opened a table cursor,
an index cursor, or both.

For this reason, return a flag from `TableReferences::find_table_by_internal_id()`
that tells the caller whether the table is an outer query reference, and further
commits will have some additional logic to decide which cursor a subquery will
read from when referencing a table from the outer query.
2025-10-27 13:47:49 +02:00
Jussi Saurio
0173d31c04 clippy: collapse nested if 2025-10-14 15:51:31 +03:00
Jussi Saurio
4b80678898 Allow case where cursor for btree is already opened
When populating an ephemeral table for UPDATE, it may open a cursor
on the (permanent) table - in this case we don't need to open it
again in the UPDATE loop
2025-10-14 15:32:48 +03:00
Jussi Saurio
f5ee4807da Properly differentiate between source and target in UPDATE
- Encode information about ephemeral source table in OperationMode::UPDATE
  if present
- Use OperationMode information to correctly resolve cursors in UPDATE
2025-10-14 14:17:28 +03:00
Jussi Saurio
691dce6b8a Make decision about UpdatePlan::ephemeral_plan _after_ optimizer
An ephemeral table is required if the b-tree key of the table (rowid)
or the index (index key) is affected by the UPDATE.
2025-10-14 14:17:28 +03:00
Jussi Saurio
c2fe13ad4f Update documentation of UpdatePlan::ephemeral_plan
It now better reflects when it is used.
2025-10-14 12:18:53 +03:00
Jussi Saurio
3669437482 Add vibecoded tests for ColumnUsedMask 2025-10-13 14:03:34 +03:00
Jussi Saurio
e055ed9a8d Allow arbitrarily many columns in a table
Use roaring bitmaps because ColumnUsedMask is likely to be
sparsely populated.
2025-10-13 13:30:26 +03:00
Jussi Saurio
59a1c2ae2e Disallow joining more than 63 tables
Returns an error instead of panicing
2025-10-13 13:30:03 +03:00
Nikita Sivukhin
4313f57ecb Optimize range scans 2025-10-09 11:47:41 +03:00
Jussi Saurio
f02757fe11 Collate: add proper collation to FROM-clause subquery result cols 2025-10-02 21:49:33 +03:00
Jussi Saurio
c0da38e24a Merge 'Clear WhereTerm 'from_outer_join' state when LEFT JOIN is optimized to INNER JOIN' from Jussi Saurio
Closes #3470
## Background
In a query like `SELECT * FROM t LEFT JOIN s ON t.a=s.a WHERE s.a =
'foo'` we can remove the LEFT JOIN and replace it with an `INNER JOIN`
because NULL values will never be equal to 'foo'. Rewriting as `INNER
JOIN` allows the optimizer to also reorder the table join order to come
up with a more efficient query plan. In fact, we have this optimization
already.
## Problem
However, there is a dumb bug where `WhereTerm`s involving this join
still retain their `from_outer_join` state, resulting in forcing the
evaluation of those terms at the original join index, which results in
completely wrong bytecode if the join optimizer decides to reorder the
join as `s JOIN t` instead. Effectively it will evaluate `t.a=s.a` after
table `s` is open but table `t` is not open yet.
## Fix
This PR fixes that issue by clearing `from_outer_join` properly from the
relevant `WhereTerm`s.

Closes #3475
2025-10-02 06:56:07 +03:00
Jussi Saurio
b2f9854b1c Add more documentation for WhereTerm::from_outer_join 2025-10-01 13:42:36 +03:00
Jussi Saurio
27b1c1a1db Merge 'Fix self-insert with nested subquery' from Mikaël Francoeur
There were 2 problems:
1. The SELECT wasn't propagating which register it used for its results,
so sometimes the INSERT read bad data.
2. `TableReferences::contains_table` was only checking the top-level
tables, not the nested tables in FROM queries. This condition is used to
emit "template 4", the bytecode template for self-inserts.
Closes https://github.com/tursodatabase/turso/issues/3312

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3436
2025-10-01 08:56:16 +03:00
Nikita Sivukhin
a32ed53bd8 remove optimization
- even if index search will return only 1 row - it will call next in the loop - and we incorrecty can process same row values multiple times
- the following query failed with this optimization:

turso> CREATE TABLE t (id INTEGER PRIMARY KEY AUTOINCREMENT, k TEXT, c0 INT);
turso> CREATE UNIQUE INDEX idx_p1_0 ON t(c0);
turso> insert into t values (null, 'uu', -1);
turso> insert into t values (null, 'uu', -2);
turso> UPDATE t SET c0 = NULL WHERE c0 = -1;
turso> SELECT * FROM t
┌────┬────┬────┐
│ id │ k  │ c0 │
├────┼────┼────┤
│  1 │ uu │    │
├────┼────┼────┤
│  2 │ uu │    │
└────┴────┴────┘
2025-09-30 16:37:41 +04:00
Mikaël Francoeur
dc231abb2e fix self-insert bug 2025-09-29 17:18:19 -04:00
Nikita Sivukhin
c4b3074575 format 2025-09-26 13:01:49 +04:00
Nikita Sivukhin
fdf8ca88fd introduce exact(...) function - because enum variant will disappear 2025-09-26 13:01:49 +04:00
PThorpe92
6dc7d04c5a Replace translate_epxr with translate_condition_expr and fix constraint error 2025-09-20 15:02:06 -04:00
TcMits
88119888d0 reduce allocation needed for break_predicate_at_and_boundaries 2025-09-18 10:52:29 +07:00
Piotr Rzysko
f5efcbe745 Add support for window functions
Adds initial support for window functions. For now, only existing
aggregate functions can be used as window functions—no specialized
window-specific functions are supported yet.

Currently, only the default frame definition is implemented:
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW EXCLUDE NO OTHERS.
2025-09-13 11:12:44 +02:00
Piotr Rzysko
c81cd16230 Extract QueryDestination::placeholder_for_subquery 2025-09-13 10:49:14 +02:00
Piotr Rzysko
5f2a3e1242 Handle dummy argument for count() and count(*) in translation
Two main reasons for this change:
* Improve readability by moving the logic for this special case closer
  to the code that relies on it.
* Decouple AggFunc from the Aggregate struct. In the future, window
  function processing will use AggFunc directly, without necessarily
  depending on Aggregate.
2025-09-13 10:49:14 +02:00
Pekka Enberg
f88f39082a core/vdbe: Fix MakeRecord affinity handling
The MakeRecord instruction now accepts an optional affinity_str
parameter that applies column-specific type conversions before creating
records. When provided, the affinity string is applied
character-by-character to each register using the existing
apply_affinity_char() function, matching SQLite's behavior.

Fixes #2040
Fixes #2041
2025-09-08 18:49:13 +03:00
Glauber Costa
08b2e685d5 Persistence for DBSP-based materialized views
This fairly long commit implements persistence for materialized view.
It is hard to split because of all the interdependencies between components,
so it is a one big thing. This commit message will at least try to go into
details about the basic architecture.

Materialized Views as tables
============================

Materialized views are now a normal table - whereas before they were a virtual
table.  By making a materialized view a table, we can reuse all the
infrastructure for dealing with tables (cursors, etc).

One of the advantages of doing this is that we can create indexes on view
columns.  Later, we should also be able to write those views to separate files
with ATTACH write.

Materialized Views as Zsets
===========================

The contents of the table are a ZSet: rowid, values, weight. Readers will
notice that because of this, the usage of the ZSet data structure dwindles
throughout the codebase. The main difference between our materialized ZSet and
the standard DBSP ZSet, is that obviously ours is backed by a BTree, not a Hash
(since SQLite tables are BTrees)

Aggregator State
================

In DBSP, the aggregator nodes also have state. To store that state, there is a
second table.  The table holds all aggregators in the view, and there is one
table per view. That is __turso_internal_dbsp_state_{view_name}. The format of
that table is similar to a ZSet: rowid, serialized_values, weight. We serialize
the values because there will be many aggregators in the table. We can't rely
on a particular format for the values.

The Materialized View Cursor
============================

Reading from a Materialized View essentially means reading from the persisted
ZSet, and enhancing that with data that exists within the transaction.
Transaction data is ephemeral, so we do not materialize this anywhere: we have
a carefully crafted implementation of seek that takes care of merging weights
and stitching the two sets together.
2025-09-05 07:04:33 -05:00
Pekka Enberg
44357f93a2 Merge branch 'main' into 2025-08-21-make-limit-and-offset-expr 2025-09-04 09:54:45 +03:00
Piotr Rzysko
3ad4016080 Fix handling of zero-argument grouped aggregations
This commit consolidates the creation of the Aggregate struct, which was
previously handled differently in `prepare_one_select_plan` and
`resolve_aggregates`. That discrepancy caused inconsistent handling of
zero-argument aggregates.

The queries added in the new tests would previously trigger a panic.
2025-08-31 12:02:09 +02:00
bit-aloo
a16bee4574 move to new parser 2025-08-26 19:56:24 +05:30
bit-aloo
28439efd09 make offset and limit Expr 2025-08-26 19:56:11 +05:30
Pekka Enberg
26ba09c45f Revert "Merge 'Remove double indirection in the Parser' from Pedro Muniz"
This reverts commit 71c1b357e4, reversing
changes made to 6bc568ff69 because it
actually makes things slower.
2025-08-26 14:58:21 +03:00
pedrocarlo
d3240844ec refactor Core to remove the double indirection 2025-08-25 22:59:31 -03:00
Levy A.
4ba1304fb9 complete parser integration 2025-08-21 15:23:59 -03:00
Levy A.
186e2f5d8e switch to new parser 2025-08-21 15:19:16 -03:00
Jussi Saurio
5da76c9125 Allow index in UPDATE for point queries (i.e. max 1 row affected) 2025-08-14 15:58:01 +03:00
Nikita Sivukhin
5d0ada9fb9 add "updates" column for cdc table 2025-08-11 12:46:15 +04:00
Jussi Saurio
c498196c7b fix/perf: fix regression in SELECT 1 benchmark
Do not start a read transaction when a SELECT is not going to access
the database, which means we can avoid checking whether the schema has
changed.
2025-08-05 15:10:55 +03:00
Piotr Rzysko
8fb4fbf8af Make WhereTerm::consumed a plain bool
Now that virtual tables are integrated into the optimizer, this field no
longer needs to be wrapped in Cell<bool>.
2025-08-05 05:48:28 +02:00
Piotr Rzysko
82491ceb6a Integrate virtual tables with optimizer
This change connects virtual tables with the query optimizer.
The optimizer now considers virtual tables during join order search
and invokes their best_index callbacks to determine feasible access
paths.

Currently, this is not a visible change, since none of the existing
extensions return information indicating that a plan is invalid.
2025-08-05 05:48:28 +02:00