136 Commits

Author SHA1 Message Date
Jussi Saurio
9c87b20cb2 Merge 'Where clause subquery support' from Jussi Saurio
Closes #1282
# Support for WHERE clause subqueries
This PR implements support for subqueries that appear in the WHERE
clause of SELECT statements.
## What are those lol
1. **EXISTS subqueries**: `WHERE EXISTS (SELECT ...)`
2. **Row value subqueries**: `WHERE x = (SELECT ...)` or `WHERE (x, y) =
(SELECT ...)`. The latter are not yet supported - only the single-column
("scalar subquery") case is.
3. **IN subqueries**: `WHERE x IN (SELECT ...)` or `WHERE (x, y) IN
(SELECT ...)`
## Correlated vs Uncorrelated Subqueries
- **Uncorrelated subqueries** reference only their own tables and can be
evaluated once.
- **Correlated subqueries** reference columns from the outer query
(e.g., `WHERE EXISTS (SELECT * FROM t2 WHERE t2.id = t1.id)`) and must
be re-evaluated for each row of the outer query
## Implementation
### Planning
During query planning, the WHERE clause is walked to find subquery
expressions (`Expr::Exists`, `Expr::Subquery`, `Expr::InSelect`). Each
subquery is:
1. Assigned a unique internal ID
2. Compiled into its own `SelectPlan` with outer query tables provided
as available references
3. Replaced in the AST with an `Expr::SubqueryResult` node that
references the subquery with its internal ID
4. Stored in a `Vec<NonFromClauseSubquery>` on the `SelectPlan`
For IN subqueries, an ephemeral index is created to store the subquery
results; for other kinds, the results are stored in register(s).
### Translation
Before emitting bytecode, we need to determine when each subquery should
be evaluated:
- **Uncorrelated**: Evaluated once before opening any table cursors
- **Correlated**: Evaluated at the appropriate nested loop depth after
all referenced outer tables are in scope
This is calculated by examining which outer query tables the subquery
references and finding the right-most (innermost) loop that opens those
tables - using similar mechanisms that we use for figuring out when to
evaluate other `WhereTerm`s too.
### Code Generation
- **EXISTS**: Sets a register to 1 if any row is produced, 0 otherwise.
Has new `QueryDestination::ExistsSubqueryResult` variant.
- **IN**: Results stored in an ephemeral index and the index is probed.
- **RowValue**: Results stored in a range of registers. Has new
`QueryDestination::RowValueSubqueryResult` variant.
## Annoying details
### Which cursor to read from in a subquery?
Sometimes a query will use a covering index, i.e. skip opening the table
cursor at all if the index contains All The Needed Stuff.
Correlated subqueries reading columns from outer tables is a bit
problematic in this regard: with our current translation code, the
subquery doesn't know whether the outer query opened a table cursor,
index cursor, or both. So, for now, we try to find a table cursor first,
then fall back to finding any index cursor for that table.

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3847
2025-10-28 06:36:55 +02:00
Jussi Saurio
5eb74ce8e6 AST: Add Expr::SubqueryResult variant and enum SubqueryType 2025-10-27 16:01:39 +02:00
Nikita Sivukhin
67e62fd6ea support USING ... WITH ... syntax for index creation 2025-10-27 12:13:43 +04:00
Jussi Saurio
ffc601b4b0 Merge 'Return better syntax error messages' from Diego Reis
Current error messages are too "low level", e.g returning tokens in
messages. This PR improves this a bit.
Before:
```text
 turso> with t as (select * from pragma_schema_version); select c.schema_version from t as c;

  × unexpected token at SourceSpan { offset: SourceOffset(47), length: 1 }
   ╭────
 1 │ with t as (select * from pragma_schema_version); select c.schema_version from t as c;
   ·                                                ┬
   ·                                                ╰── here
   ╰────
  help: expected [TK_SELECT, TK_VALUES, TK_UPDATE, TK_DELETE, TK_INSERT, TK_REPLACE] but found TK_SEMI
```
Now:
```text
 turso> with t as (select * from pragma_schema_version); select c.schema_version from t as c;

  × unexpected token ';' at offset 47
   ╭────
 1 │ with t as (select * from pragma_schema_version);select c.schema_version from t as c;
   ·                                                ┬
   ·                                                ╰── here
   ╰────
  help: expected SELECT, VALUES, UPDATE, DELETE, INSERT, or REPLACE but found ';'
  ```
@TcMits WDYT?

Closes #3190
2025-10-22 10:57:54 +03:00
PThorpe92
892fcc881d Handle TRUE|FALSE literal case for default column constraint in the parser 2025-10-21 21:15:35 -04:00
Jussi Saurio
c88061e1eb Merge 'Fix IN operator NULL handling' from Diego Reis
Closes #3277
Basically changed what we used to do to match what and how SQLite does.

Closes #3606
2025-10-10 10:17:57 +03:00
pedrocarlo
642ec3032d use parser's ColumnDefinition for Sql Generation Column struct 2025-10-09 17:25:04 -03:00
Diego Reis
d2d265a06f Small nits and code clean ups 2025-10-09 12:14:20 -03:00
Diego Reis
da323fa0c4 Some clean ups and correctly working on WHERE clauses 2025-10-09 11:57:15 -03:00
Diego Reis
52ed0f7997 Add in expr optimization at the parser level instead of translation.
lhs IN () and lhs NOT IN () can be translated to false and true.
2025-10-09 11:56:14 -03:00
PThorpe92
b40e784903 Update COMPAT.md, add fk related opcodes 2025-10-07 16:22:15 -04:00
bit-aloo
551dbf518e Add new mvcc_checkpoint_threshold pragma name 2025-10-07 10:17:05 +05:30
Diego Reis
ff11ba08c7 Makes tests pass 2025-09-30 09:41:54 -03:00
Diego Reis
e9b2c41c39 Return better syntax error messages 2025-09-30 09:17:22 -03:00
Nikita Sivukhin
759ffc1770 fix clippy 2025-09-30 14:03:25 +04:00
Nikita Sivukhin
30a9b2b860 auto-assign anonymous parameters directly in the Parser
- otherwise it will be easy to mess up with the traversal order
2025-09-30 13:58:59 +04:00
Jussi Saurio
13ca94755e Merge 'correct span in ParseUnexpectedToken' from Lâm Hoàng Phúc
Closes #3447
2025-09-30 10:35:49 +03:00
TcMits
04beead7fe correct span in ParseUnexpectedToken 2025-09-30 12:49:26 +07:00
TcMits
4211460ef3 fmt 2025-09-30 12:28:24 +07:00
TcMits
d3d6a015ec fix issue 3425 2025-09-30 12:25:20 +07:00
Nikita Sivukhin
86a95e813d Merge branch 'main' into quoting-fix-attempt-2 2025-09-29 10:58:51 +04:00
PThorpe92
30f80c2000 Correct spelling issue in ForeignKey ast node 2025-09-27 17:38:45 -04:00
PThorpe92
9bd852297a Allow in parser using rowid explicitly for a col when creating table 2025-09-26 17:31:02 -04:00
Nikita Sivukhin
faf3a970a7 fix clippy 2025-09-26 13:02:35 +04:00
Nikita Sivukhin
27c9506059 do not auto-lowercase Name identifiers
- this is complicated because column names must preserve case
2025-09-26 13:01:49 +04:00
Nikita Sivukhin
c4b3074575 format 2025-09-26 13:01:49 +04:00
Nikita Sivukhin
ae24d637a8 adjust edge-cases 2025-09-26 13:01:49 +04:00
Nikita Sivukhin
12b89fd2f1 do not use Name::new 2025-09-26 13:01:49 +04:00
Nikita Sivukhin
68fd940d73 small fix to make pragmas work 2025-09-26 13:01:49 +04:00
Nikita Sivukhin
f3f9219795 completely remove usage of enum variants 2025-09-26 13:01:49 +04:00
Nikita Sivukhin
fdf8ca88fd introduce exact(...) function - because enum variant will disappear 2025-09-26 13:01:49 +04:00
Jussi Saurio
064cc04b69 Merge 'Fix CREATE INDEX with quoted identifiers' from Iaroslav Zeigerman
Discovered this one while working on #3322
It was a bit more elusive because the original error was essentially a
red herring:
```
turso> CREATE INDEX idx ON "t t" (`a a`);
  × unexpected token at SourceSpan { offset: SourceOffset(22), length: 1 }
   ╭────
 1 │ CREATE INDEX idx ON "t t" (`a a`);
   ·                       ┬
   ·                       ╰── here
   ╰────
  help: expected [TK_LP] but found TK_ID
```

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3345
2025-09-26 09:13:30 +03:00
Iaroslav Zeigerman
b34e53d6c6 Fix CREATE INDEX with quoted identifiers 2025-09-25 12:55:06 -07:00
Pekka Enberg
b3a0008950 Merge 'parser: fix incorrect LIMIT/OFFSET parsing of form LIMIT x,y' from Jussi Saurio
In this case x is the OFFSET and y is the LIMIT, not the other way
around.
Closes #3303

Closes #3339
2025-09-25 21:01:40 +03:00
Diego Reis
7a56c93b81 Makes clippy happy 2025-09-25 10:42:14 -03:00
Jussi Saurio
152074d2d7 fix LIMIT OFFSET test 2025-09-25 15:42:32 +03:00
Jussi Saurio
bb082c25f5 Fix incorrect LIMIT/OFFSET parsing of form LIMIT x,y
In this case x is the OFFSET and y is the LIMIT, not the
other way around.
2025-09-25 15:10:14 +03:00
Nikita Sivukhin
57e52077be add link to the docs 2025-09-19 16:48:43 +04:00
Nikita Sivukhin
c63c820bb7 add busy_timeout pragma 2025-09-19 16:48:12 +04:00
Glauber Costa
cb7c04ffad return error instead of panic for invalid syntax on views
I have accidentally typed "create materialized views", and noticed
that this panics, instead of returning an error. Fix it.
2025-09-19 03:59:28 -05:00
pedrocarlo
3c91ae206b move as many dependencies as possible to workspace to avoid multiple versions of the same dependency 2025-09-15 17:19:36 -03:00
Pekka Enberg
d8f07fe3da core: Panic on fsync() error by default
Retrying fsync() on error was historically not safe ("fsyncgate") and
Postgres still defaults to panicing on fsync(). Therefore, add a
"data_sync_retry" pragma (disabled by default) and use it to determine
whether to panic on fsync() error or not.
2025-09-13 10:21:12 +03:00
Pekka Enberg
433b60555f Add BEGIN CONCURRENT support for MVCC mode
Currently, when MVCC is enabled, every transaction mode supports
concurrent reads and writes, which makes it hard to adopt for existing
applications that use `BEGIN DEFERRED` or `BEGIN IMMEDIATE`.

Therefore, add support for `BEGIN CONCURRENT` transactions when MVCC is
enabled. The transaction mode allows multiple concurrent read/write
transactions that don't block each other, with conflicts resolved at
commit time. Furthermore, implement the correct semantics for `BEGIN
DEFERRED` and `BEGIN IMMEDIATE` by taking advantage of the pager level
write lock when transaction upgrades to write. This means that now
concurrent MVCC transactions are serialized against the legacy ones when
needed.

The implementation includes:

- Parser support for CONCURRENT keyword in BEGIN statements

- New Concurrent variant in TransactionMode to distinguish from regular
  read/write transactions

- MVCC store tracking of exclusive transactions to support IMMEDIATE and
  EXCLUSIVE modes alongside CONCURRENT

- Proper transaction state management for all transaction types in MVCC

This enables better concurrency for applications that can handle
optimistic concurrency control, while still supporting traditional
SQLite transaction semantics via IMMEDIATE and EXCLUSIVE modes.
2025-09-11 16:05:52 +03:00
TcMits
e725814ce8 fix test 2025-09-05 13:38:33 +07:00
TcMits
0f9ae5853c test 2025-09-05 13:19:22 +07:00
TcMits
4726ffaa74 only check 'e' in eat_number 2025-09-05 13:08:42 +07:00
TcMits
168f6dcbb5 unrelated comments 2025-09-05 13:00:07 +07:00
TcMits
f2d4087462 support float without fractional part 2025-09-05 12:58:28 +07:00
Pekka Enberg
1de647758f Merge 'refactor parser fmt' from Lâm Hoàng Phúc
@penberg this PR try to clean up `turso_parser`'s`fmt` code.
- `get_table_name` and `get_column_name` should return None when
table/column does not exist.
```rust
/// Context to be used in ToSqlString
pub trait ToSqlContext {
    /// Given an id, get the table name
    /// First Option indicates whether the table exists
    ///
    /// Currently not considering aliases
    fn get_table_name(&self, _id: TableInternalId) -> Option<&str> {
        None
    }

    /// Given a table id and a column index, get the column name
    /// First Option indicates whether the column exists
    /// Second Option indicates whether the column has a name
    fn get_column_name(&self, _table_id: TableInternalId, _col_idx: usize) -> Option<Option<&str>> {
        None
    }

    // help function to handle missing table/column names
    fn get_table_and_column_names(
        &self,
        table_id: TableInternalId,
        col_idx: usize,
    ) -> (String, String) {
        let table_name = self
            .get_table_name(table_id)
            .map(|s| s.to_owned())
            .unwrap_or_else(|| format!("t{}", table_id.0));

        let column_name = self
            .get_column_name(table_id, col_idx)
            .map(|opt| {
                opt.map(|s| s.to_owned())
                    .unwrap_or_else(|| format!("c{col_idx}"))
            })
            .unwrap_or_else(|| format!("c{col_idx}"));

        (table_name, column_name)
    }
}
```
- remove `FmtTokenStream` because it is same as `WriteTokenStream `
- remove useless functions and simplify `ToTokens`
```rust
/// Generate token(s) from AST node
/// Also implements Display to make sure devs won't forget Display
pub trait ToTokens: Display {
    /// Send token(s) to the specified stream with context
    fn to_tokens<S: TokenStream + ?Sized, C: ToSqlContext>(
        &self,
        s: &mut S,
        context: &C,
    ) -> Result<(), S::Error>;

    // Return displayer representation with context
    fn displayer<'a, 'b, C: ToSqlContext>(&'b self, ctx: &'a C) -> SqlDisplayer<'a, 'b, C, Self>
    where
        Self: Sized,
    {
        SqlDisplayer::new(ctx, self)
    }
}
```

Closes #2748
2025-09-02 18:35:43 +03:00
Pekka Enberg
adc6cb008a Merge 'introduce match_ignore_ascii_case macro' from Lâm Hoàng Phúc
this PR converts generate code in `turso_parser`'s `build.rs` into macro
for reusability. `match_ignore_ascii_case` will generate trie-like tree
matching from normal match expression.
example:
```rust
    match_ignore_ascii_case!(match input {
        b"AB" => TokenType::TK_ABORT,
        b"AC" => TokenType::TK_ACTION,
        _ => TokenType::TK_ID,
    })
```
will generate:
```rust
    match input.get(0) {
        Some(b'A') | Some(b'a') => match input.get(1) {
            Some(b'B') | Some(b'b') => match input.get(2) {
                None => TokenType::TK_ABORT,
                _ => TokenType::TK_ID,
            },
            Some(b'C') | Some(b'c') => match input.get(2) {
                None => TokenType::TK_ACTION,
                _ => TokenType::TK_ID,
            },
            _ => TokenType::TK_ID,
        },
        _ => TokenType::TK_ID,
    }
```

Closes #2865
2025-09-02 18:34:55 +03:00