Merge 'Where clause subquery support' from Jussi Saurio

Closes #1282
# Support for WHERE clause subqueries
This PR implements support for subqueries that appear in the WHERE
clause of SELECT statements.
## What are those lol
1. **EXISTS subqueries**: `WHERE EXISTS (SELECT ...)`
2. **Row value subqueries**: `WHERE x = (SELECT ...)` or `WHERE (x, y) =
(SELECT ...)`. The latter are not yet supported - only the single-column
("scalar subquery") case is.
3. **IN subqueries**: `WHERE x IN (SELECT ...)` or `WHERE (x, y) IN
(SELECT ...)`
## Correlated vs Uncorrelated Subqueries
- **Uncorrelated subqueries** reference only their own tables and can be
evaluated once.
- **Correlated subqueries** reference columns from the outer query
(e.g., `WHERE EXISTS (SELECT * FROM t2 WHERE t2.id = t1.id)`) and must
be re-evaluated for each row of the outer query
## Implementation
### Planning
During query planning, the WHERE clause is walked to find subquery
expressions (`Expr::Exists`, `Expr::Subquery`, `Expr::InSelect`). Each
subquery is:
1. Assigned a unique internal ID
2. Compiled into its own `SelectPlan` with outer query tables provided
as available references
3. Replaced in the AST with an `Expr::SubqueryResult` node that
references the subquery with its internal ID
4. Stored in a `Vec<NonFromClauseSubquery>` on the `SelectPlan`
For IN subqueries, an ephemeral index is created to store the subquery
results; for other kinds, the results are stored in register(s).
### Translation
Before emitting bytecode, we need to determine when each subquery should
be evaluated:
- **Uncorrelated**: Evaluated once before opening any table cursors
- **Correlated**: Evaluated at the appropriate nested loop depth after
all referenced outer tables are in scope
This is calculated by examining which outer query tables the subquery
references and finding the right-most (innermost) loop that opens those
tables - using similar mechanisms that we use for figuring out when to
evaluate other `WhereTerm`s too.
### Code Generation
- **EXISTS**: Sets a register to 1 if any row is produced, 0 otherwise.
Has new `QueryDestination::ExistsSubqueryResult` variant.
- **IN**: Results stored in an ephemeral index and the index is probed.
- **RowValue**: Results stored in a range of registers. Has new
`QueryDestination::RowValueSubqueryResult` variant.
## Annoying details
### Which cursor to read from in a subquery?
Sometimes a query will use a covering index, i.e. skip opening the table
cursor at all if the index contains All The Needed Stuff.
Correlated subqueries reading columns from outer tables is a bit
problematic in this regard: with our current translation code, the
subquery doesn't know whether the outer query opened a table cursor,
index cursor, or both. So, for now, we try to find a table cursor first,
then fall back to finding any index cursor for that table.

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3847
This commit is contained in:
Jussi Saurio
2025-10-28 06:36:55 +02:00
committed by GitHub
32 changed files with 2116 additions and 129 deletions

View File

@@ -476,6 +476,40 @@ pub enum Expr {
Unary(UnaryOperator, Box<Expr>),
/// Parameters
Variable(String),
/// Subqueries from e.g. the WHERE clause are planned separately
/// and their results will be placed in registers or in an ephemeral index
/// pointed to by this type.
SubqueryResult {
/// Internal "opaque" identifier for the subquery. When the translator encounters
/// a [Expr::SubqueryResult], it needs to know which subquery in the corresponding
/// query plan it references.
subquery_id: TableInternalId,
/// Left-hand side expression for IN subqueries.
/// This property plus 'not_in' are only relevant for IN subqueries,
/// and the reason they are not included in the [SubqueryType] enum is so that
/// we don't have to clone this Box.
lhs: Option<Box<Expr>>,
/// Whether the IN subquery is a NOT IN subquery.
not_in: bool,
/// The type of subquery.
query_type: SubqueryType,
},
}
#[derive(Debug, Clone, PartialEq, Eq)]
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
pub enum SubqueryType {
/// EXISTS subquery; result is stored in a single register.
Exists { result_reg: usize },
/// Row value subquery; result is stored in a range of registers.
/// Example: x = (SELECT ...) or (x, y) = (SELECT ...)
RowValue {
result_reg_start: usize,
num_regs: usize,
},
/// IN subquery; result is stored in an ephemeral index.
/// Example: x <NOT> IN (SELECT ...)
In { cursor_id: usize },
}
impl Expr {

View File

@@ -707,6 +707,10 @@ impl ToTokens for Expr {
context: &C,
) -> Result<(), S::Error> {
match self {
Self::SubqueryResult { .. } => {
// FIXME: what to put here? This is a highly "artificial" AST node that has no meaning when stringified.
Ok(())
}
Self::Between {
lhs,
not,