Closes#1282
# Support for WHERE clause subqueries
This PR implements support for subqueries that appear in the WHERE
clause of SELECT statements.
## What are those lol
1. **EXISTS subqueries**: `WHERE EXISTS (SELECT ...)`
2. **Row value subqueries**: `WHERE x = (SELECT ...)` or `WHERE (x, y) =
(SELECT ...)`. The latter are not yet supported - only the single-column
("scalar subquery") case is.
3. **IN subqueries**: `WHERE x IN (SELECT ...)` or `WHERE (x, y) IN
(SELECT ...)`
## Correlated vs Uncorrelated Subqueries
- **Uncorrelated subqueries** reference only their own tables and can be
evaluated once.
- **Correlated subqueries** reference columns from the outer query
(e.g., `WHERE EXISTS (SELECT * FROM t2 WHERE t2.id = t1.id)`) and must
be re-evaluated for each row of the outer query
## Implementation
### Planning
During query planning, the WHERE clause is walked to find subquery
expressions (`Expr::Exists`, `Expr::Subquery`, `Expr::InSelect`). Each
subquery is:
1. Assigned a unique internal ID
2. Compiled into its own `SelectPlan` with outer query tables provided
as available references
3. Replaced in the AST with an `Expr::SubqueryResult` node that
references the subquery with its internal ID
4. Stored in a `Vec<NonFromClauseSubquery>` on the `SelectPlan`
For IN subqueries, an ephemeral index is created to store the subquery
results; for other kinds, the results are stored in register(s).
### Translation
Before emitting bytecode, we need to determine when each subquery should
be evaluated:
- **Uncorrelated**: Evaluated once before opening any table cursors
- **Correlated**: Evaluated at the appropriate nested loop depth after
all referenced outer tables are in scope
This is calculated by examining which outer query tables the subquery
references and finding the right-most (innermost) loop that opens those
tables - using similar mechanisms that we use for figuring out when to
evaluate other `WhereTerm`s too.
### Code Generation
- **EXISTS**: Sets a register to 1 if any row is produced, 0 otherwise.
Has new `QueryDestination::ExistsSubqueryResult` variant.
- **IN**: Results stored in an ephemeral index and the index is probed.
- **RowValue**: Results stored in a range of registers. Has new
`QueryDestination::RowValueSubqueryResult` variant.
## Annoying details
### Which cursor to read from in a subquery?
Sometimes a query will use a covering index, i.e. skip opening the table
cursor at all if the index contains All The Needed Stuff.
Correlated subqueries reading columns from outer tables is a bit
problematic in this regard: with our current translation code, the
subquery doesn't know whether the outer query opened a table cursor,
index cursor, or both. So, for now, we try to find a table cursor first,
then fall back to finding any index cursor for that table.
Reviewed-by: Preston Thorpe <preston@turso.tech>
Closes#3847
Current error messages are too "low level", e.g returning tokens in
messages. This PR improves this a bit.
Before:
```text
turso> with t as (select * from pragma_schema_version); select c.schema_version from t as c;
× unexpected token at SourceSpan { offset: SourceOffset(47), length: 1 }
╭────
1 │ with t as (select * from pragma_schema_version); select c.schema_version from t as c;
· ┬
· ╰── here
╰────
help: expected [TK_SELECT, TK_VALUES, TK_UPDATE, TK_DELETE, TK_INSERT, TK_REPLACE] but found TK_SEMI
```
Now:
```text
turso> with t as (select * from pragma_schema_version); select c.schema_version from t as c;
× unexpected token ';' at offset 47
╭────
1 │ with t as (select * from pragma_schema_version);select c.schema_version from t as c;
· ┬
· ╰── here
╰────
help: expected SELECT, VALUES, UPDATE, DELETE, INSERT, or REPLACE but found ';'
```
@TcMits WDYT?
Closes#3190
Discovered this one while working on #3322
It was a bit more elusive because the original error was essentially a
red herring:
```
turso> CREATE INDEX idx ON "t t" (`a a`);
× unexpected token at SourceSpan { offset: SourceOffset(22), length: 1 }
╭────
1 │ CREATE INDEX idx ON "t t" (`a a`);
· ┬
· ╰── here
╰────
help: expected [TK_LP] but found TK_ID
```
Reviewed-by: Preston Thorpe <preston@turso.tech>
Closes#3345
Retrying fsync() on error was historically not safe ("fsyncgate") and
Postgres still defaults to panicing on fsync(). Therefore, add a
"data_sync_retry" pragma (disabled by default) and use it to determine
whether to panic on fsync() error or not.
Currently, when MVCC is enabled, every transaction mode supports
concurrent reads and writes, which makes it hard to adopt for existing
applications that use `BEGIN DEFERRED` or `BEGIN IMMEDIATE`.
Therefore, add support for `BEGIN CONCURRENT` transactions when MVCC is
enabled. The transaction mode allows multiple concurrent read/write
transactions that don't block each other, with conflicts resolved at
commit time. Furthermore, implement the correct semantics for `BEGIN
DEFERRED` and `BEGIN IMMEDIATE` by taking advantage of the pager level
write lock when transaction upgrades to write. This means that now
concurrent MVCC transactions are serialized against the legacy ones when
needed.
The implementation includes:
- Parser support for CONCURRENT keyword in BEGIN statements
- New Concurrent variant in TransactionMode to distinguish from regular
read/write transactions
- MVCC store tracking of exclusive transactions to support IMMEDIATE and
EXCLUSIVE modes alongside CONCURRENT
- Proper transaction state management for all transaction types in MVCC
This enables better concurrency for applications that can handle
optimistic concurrency control, while still supporting traditional
SQLite transaction semantics via IMMEDIATE and EXCLUSIVE modes.
@penberg this PR try to clean up `turso_parser`'s`fmt` code.
- `get_table_name` and `get_column_name` should return None when
table/column does not exist.
```rust
/// Context to be used in ToSqlString
pub trait ToSqlContext {
/// Given an id, get the table name
/// First Option indicates whether the table exists
///
/// Currently not considering aliases
fn get_table_name(&self, _id: TableInternalId) -> Option<&str> {
None
}
/// Given a table id and a column index, get the column name
/// First Option indicates whether the column exists
/// Second Option indicates whether the column has a name
fn get_column_name(&self, _table_id: TableInternalId, _col_idx: usize) -> Option<Option<&str>> {
None
}
// help function to handle missing table/column names
fn get_table_and_column_names(
&self,
table_id: TableInternalId,
col_idx: usize,
) -> (String, String) {
let table_name = self
.get_table_name(table_id)
.map(|s| s.to_owned())
.unwrap_or_else(|| format!("t{}", table_id.0));
let column_name = self
.get_column_name(table_id, col_idx)
.map(|opt| {
opt.map(|s| s.to_owned())
.unwrap_or_else(|| format!("c{col_idx}"))
})
.unwrap_or_else(|| format!("c{col_idx}"));
(table_name, column_name)
}
}
```
- remove `FmtTokenStream` because it is same as `WriteTokenStream `
- remove useless functions and simplify `ToTokens`
```rust
/// Generate token(s) from AST node
/// Also implements Display to make sure devs won't forget Display
pub trait ToTokens: Display {
/// Send token(s) to the specified stream with context
fn to_tokens<S: TokenStream + ?Sized, C: ToSqlContext>(
&self,
s: &mut S,
context: &C,
) -> Result<(), S::Error>;
// Return displayer representation with context
fn displayer<'a, 'b, C: ToSqlContext>(&'b self, ctx: &'a C) -> SqlDisplayer<'a, 'b, C, Self>
where
Self: Sized,
{
SqlDisplayer::new(ctx, self)
}
}
```
Closes#2748