Commit Graph

71 Commits

Author SHA1 Message Date
Piotr Rzysko
f5efcbe745 Add support for window functions
Adds initial support for window functions. For now, only existing
aggregate functions can be used as window functions—no specialized
window-specific functions are supported yet.

Currently, only the default frame definition is implemented:
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW EXCLUDE NO OTHERS.
2025-09-13 11:12:44 +02:00
Jussi Saurio
e3bd00883b Fix creation of automatic indexes
indexes with the naming scheme "sqlite_autoindex_<tblname>_<number>"
are automatically created when a table is created with UNIQUE or
PRIMARY KEY definitions.

these indexes must map to the table definition SQL in definition order,
i.e. sqlite_autoindex_foo_1 must be the first instance of UNIQUE or
PRIMARY KEY and so on.

this commit fixes our autoindex creation / parsing so that this invariant
is upheld.
2025-09-11 14:11:30 +03:00
Glauber Costa
08b2e685d5 Persistence for DBSP-based materialized views
This fairly long commit implements persistence for materialized view.
It is hard to split because of all the interdependencies between components,
so it is a one big thing. This commit message will at least try to go into
details about the basic architecture.

Materialized Views as tables
============================

Materialized views are now a normal table - whereas before they were a virtual
table.  By making a materialized view a table, we can reuse all the
infrastructure for dealing with tables (cursors, etc).

One of the advantages of doing this is that we can create indexes on view
columns.  Later, we should also be able to write those views to separate files
with ATTACH write.

Materialized Views as Zsets
===========================

The contents of the table are a ZSet: rowid, values, weight. Readers will
notice that because of this, the usage of the ZSet data structure dwindles
throughout the codebase. The main difference between our materialized ZSet and
the standard DBSP ZSet, is that obviously ours is backed by a BTree, not a Hash
(since SQLite tables are BTrees)

Aggregator State
================

In DBSP, the aggregator nodes also have state. To store that state, there is a
second table.  The table holds all aggregators in the view, and there is one
table per view. That is __turso_internal_dbsp_state_{view_name}. The format of
that table is similar to a ZSet: rowid, serialized_values, weight. We serialize
the values because there will be many aggregators in the table. We can't rely
on a particular format for the values.

The Materialized View Cursor
============================

Reading from a Materialized View essentially means reading from the persisted
ZSet, and enhancing that with data that exists within the transaction.
Transaction data is ephemeral, so we do not materialize this anywhere: we have
a carefully crafted implementation of seek that takes care of merging weights
and stitching the two sets together.
2025-09-05 07:04:33 -05:00
Preston Thorpe
2ea2be6f85 Merge 'prevent modification to system tables.' from Glauber Costa
SQLite does not allow us to modify system tables, but we do. Let's fix
it.

Reviewed-by: Preston Thorpe <preston@turso.tech>
Reviewed-by: Avinash Sajjanshetty (@avinassh)

Closes #2855
2025-09-04 19:57:04 -04:00
Glauber Costa
032eabb3a4 prevent modification to system tables.
SQLite does not allow us to modify system tables, but we do.
Let's fix it.
2025-09-04 17:34:47 -05:00
bit-aloo
51d40092db add empty table references, and error out in case if the table references are present in limit/offset 2025-08-26 19:56:25 +05:30
bit-aloo
a3b87cd97f add review comments 2025-08-26 19:56:25 +05:30
Pekka Enberg
26ba09c45f Revert "Merge 'Remove double indirection in the Parser' from Pedro Muniz"
This reverts commit 71c1b357e4, reversing
changes made to 6bc568ff69 because it
actually makes things slower.
2025-08-26 14:58:21 +03:00
pedrocarlo
d3240844ec refactor Core to remove the double indirection 2025-08-25 22:59:31 -03:00
themixednuts
80eca66be9 fix: normalize quotes in update
fixes: #2744
2025-08-23 03:17:03 -05:00
Levy A.
4ba1304fb9 complete parser integration 2025-08-21 15:23:59 -03:00
Levy A.
186e2f5d8e switch to new parser 2025-08-21 15:19:16 -03:00
Jussi Saurio
9d44e97a7a Fix: all indexes need to be updated if the rowid changes 2025-08-21 15:48:46 +03:00
Nikita Sivukhin
5d0ada9fb9 add "updates" column for cdc table 2025-08-11 12:46:15 +04:00
Jussi Saurio
21dc2d0161 translate: return parse errors for unsupported features instead of silently ignoring 2025-08-08 11:39:30 +03:00
Piotr Rzysko
82491ceb6a Integrate virtual tables with optimizer
This change connects virtual tables with the query optimizer.
The optimizer now considers virtual tables during join order search
and invokes their best_index callbacks to determine feasible access
paths.

Currently, this is not a visible change, since none of the existing
extensions return information indicating that a plan is invalid.
2025-08-05 05:48:28 +02:00
Piotr Rzysko
718598eab8 Introduce scan type
Different scan parameters are required for different table types.
Currently, index and iteration direction are only used by B-tree tables,
while the remaining table types don’t require any parameters. Planning
access to virtual tables, however, will require passing additional
information from the planner, such as the virtual table index (distinct
from a B-tree index) and the constraints that must be forwarded to the
`filter` method.
2025-08-04 20:27:22 +02:00
Jussi Saurio
86b1232268 chore: enable indexes by default 2025-08-01 15:44:56 +03:00
Diego Reis
ab01b4e8ca Refactor UPDATE .. SET row values logic and add some comments 2025-07-31 00:08:15 -03:00
Diego Reis
31c73f3c9a Add basic support for row values in UPDATE .. SET statements
e.g `.. SET (a, b) = (1, 2)` is equivalent to `.. SET a = 1, b = 2`.

Alongside, to repeated lhs values, `(a, a)`, the last rhs prevail; so
`.. SET (a, a) = (1, 2)` is equivalent to `.. SET a = 2`
2025-07-31 00:08:15 -03:00
Pere Diaz Bou
752a876f9a change every Rc to Arc in schema internals 2025-07-28 10:51:17 +02:00
Pekka Enberg
6bf6cc28e4 Merge 'Implement the Returning statement for inserts and updates' from Glauber Costa
They are very similar. DELETE is very different, so that one we'll do it
later.

Closes #2276
2025-07-27 09:11:16 +03:00
Glauber Costa
5d8d08d1b6 Implement the Returning statement for inserts and updates
They are very similar. DELETE is very different, so that one we'll
do it later.
2025-07-26 09:01:09 -05:00
Iaroslav Zeigerman
f13b9105b9 Fix error handling when binding column references while translating the UPDATE statement 2025-07-26 04:51:42 -07:00
Glauber Costa
b5927dcfd5 support doubly qualified identifiers 2025-07-25 14:52:45 -05:00
Pekka Enberg
669b231714 Merge 'parser: Distinguish quoted identifiers and unify Id into Name enum' from bit-aloo
Closes: #1947
This PR replaces the `Name(pub String)` struct with a `Name` enum that
explicitly models how the name appeared in the source either as an
unquoted identifier (`Ident`) or a quoted string (`Quoted`).
In the process, the separate `Id` wrapper type has been coalesced into
the `Name` enum, simplifying the AST and reducing duplication in
identifier handling logic.
While this increases the size of some AST nodes (notably
`yyStackEntry`).
cc: @levydsa

Reviewed-by: Levy A. (@levydsa)
Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #2251
2025-07-25 12:08:54 +03:00
Glauber Costa
988b16f962 Support ATTACH (read only)
Support for attaching databases. The main difference from SQLite is that
we support an arbitrary number of attached databases, and we are not
bound to just 100ish.

We for now only support read-only databases. We open them as read-only,
but also, to keep things simple, we don't patch any of the insert
machinery to resolve foreign tables.  So if an insert is tried on an
attached database, it will just fail with a "no such table" error - this
is perfect for now.

The code in core/translate/attach.rs is written by Claude, who also
played a key part in the boilerplate for stuff like the .databases
command and extending the pragma database_list, and also aided me in
the test cases.
2025-07-24 19:19:48 -05:00
bit-aloo
9a54ef214e parser: Distinguish quoted identifiers and unify Id into Name enum
This commit replaces the `Name(pub String)` struct with a `Name` enum that
explicitly models how the name appeared in the source either as an
unquoted identifier (`Ident`) or a quoted string (`Quoted`).

In the process, the separate `Id` wrapper type has been coalesced into the
`Name` enum, simplifying the AST and reducing duplication in identifier
handling logic.

While this increases the size of some AST nodes (notably `yyStackEntry`),
it improves correctness and makes source structure more explicit for
later phases.
2025-07-24 14:40:19 +05:30
Ihor Andrianov
37dd5da436 clippy 2025-07-16 15:05:06 +03:00
Ihor Andrianov
6d4e542522 last set clause wins 2025-07-16 14:56:10 +03:00
Piotr Rzysko
000d70f1f3 Propagate info about hidden columns 2025-07-14 07:16:53 +02:00
Nikita Sivukhin
c9c5ef4e25 remote query_mode from ProgramBuilderOpts and from function arguments
- mode never changes and ProgramBuilder already created with proper mode set correctly
2025-07-02 13:24:12 +04:00
pedrocarlo
7e0225b1af add some comments 2025-06-29 17:37:46 -03:00
pedrocarlo
738e2cc06c do not emit ephemeral plan when doing a SeekRowId + emit Delete instruction when rowid in set clause 2025-06-29 17:12:24 -03:00
Pekka Enberg
725c3e4ddc Rename limbo_sqlite3_parser crate to turso_sqlite3_parser 2025-06-29 12:34:46 +03:00
Pekka Enberg
2fc5c0ce5c Switch to runtime flag for enabling indexes
Makes it easier to test the feature:

```
$ cargo run --  --experimental-indexes
Limbo v0.0.22
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database
limbo> CREATE TABLE t(x);
limbo> CREATE INDEX t_idx ON t(x);
limbo> DROP INDEX t_idx;
```
2025-06-26 10:07:28 +03:00
Nils Koch
2827b86917 chore: fix clippy warnings 2025-06-23 19:52:13 +01:00
pedrocarlo
6596ee28a8 introduce EphemeralTable query destination 2025-06-20 16:30:21 -03:00
pedrocarlo
e53a290a48 move ephemeral table logic to update plan and reuse select logic for ephemeral index 2025-06-20 16:30:21 -03:00
pedrocarlo
9048ad398b modify loop functions to accomodate for ephemeral tables 2025-06-20 16:29:10 -03:00
Pere Diaz Bou
f91d2c5e99 fix disable in write cases 2025-06-17 19:33:23 +02:00
Pere Diaz Bou
b5f2f375b8 disable alter, delete, create index, insert and update for indexes 2025-06-17 19:33:23 +02:00
Levy A.
de2ac89ad2 feat: complete ALTER TABLE implementation 2025-06-11 14:17:36 -03:00
Jussi Saurio
cc405dea7e Use new TableReferences struct everywhere 2025-05-29 11:44:56 +03:00
Jussi Saurio
d2a287f67f Add Schema reference to Resolver - needed for adhoc subquery planning 2025-05-27 19:12:47 +03:00
Jussi Saurio
7c07c09300 Add stable internal_id property to TableReference
Currently our "table id"/"table no"/"table idx" references always
use the direct index of the `TableReference` in the plan, e.g. in
`SelectPlan::table_references`. For example:

```rust
Expr::Column { table: 0, column: 3, .. }
```

refers to the 0'th table in the `table_references` list.

This is a fragile approach because it assumes the table_references
list is stable for the lifetime of the query processing. This has so
far been the case, but there exist certain query transformations,
e.g. subquery unnesting, that may fold new table references from
a subquery (which has its own table ref list) into the table reference
list of the parent.

If such a transformation is made, then potentially all of the Expr::Column
references to tables will become invalid. Consider this example:

```sql
-- Assume tables: users(id, age), orders(user_id, amount)

-- Get total amount spent per user on orders over $100
SELECT u.id, sub.total
FROM users u JOIN
     (SELECT user_id, SUM(amount) as total
      FROM orders o
      WHERE o.amount > 100
      GROUP BY o.user_id) sub
WHERE u.id = sub.user_id

-- Before subquery unnesting:
-- Main query table_references: [users, sub]
-- u.id refers to table 0, column 0
-- sub.total refers to table 1, column 1
--
-- Subquery table_references: [orders]
-- o.user_id refers to table 0, column 0
-- o.amount refers to table 0, column 1
--
-- After unnesting and folding subquery tables into main query,
-- the query might look like this:

SELECT u.id, SUM(o.amount) as total
FROM users u JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100
GROUP BY u.id;

-- Main query table_references: [users, orders]
-- u.id refers to table index 0 (correct)
-- o.amount refers to table index 0 (incorrect, should be 1)
-- o.user_id refers to table index 0 (incorrect, should be 1)
```

We could ofc traverse every expression in the subquery and rewrite
the table indexes to be correct, but if we instead use stable identifiers
for each table reference, then all the column references will continue
to be correct.

Hence, this PR introduces a `TableInternalId` used in `TableReference`
as well as `Expr::Column` and `Expr::Rowid` so that this kind of query
transformations can happen with less pain.
2025-05-25 20:26:17 +03:00
pedrocarlo
53bf5d5ef5 adjust translate functions to take a program instead of Option<ProgramBuilder> + remove any Init emission in traslate functions + use epilogue in all places necessary 2025-05-21 16:41:10 -03:00
pedrocarlo
517c7c81cd refactor to include optional program builder argument 2025-05-21 12:47:51 -03:00
Levy A.
023a116b0d feat: initial implementation of ALTER TABLE
only supporting renaming tables
2025-05-08 09:24:56 -03:00
Jussi Saurio
306e097950 Merge 'Fix bug: we cant remove order by terms from the head of the list' from Jussi Saurio
we had an incorrect optimization in `eliminate_orderby_like_groupby()`
where it could remove e.g. the first term of the ORDER BY if it matched
the first GROUP BY term and the result set was naturally ordered by that
term. this is invalid. see e.g.:
```sql
main branch - BAD: removes the `ORDER BY id` term because the results are naturally ordered by id.
However, this results in sorting the entire thing by last name only!

limbo> select id, last_name, count(1) from users GROUP BY 1,2 order by id, last_name desc limit 3;
┌──────┬───────────┬───────────┐
│ id   │ last_name │ count (1) │
├──────┼───────────┼───────────┤
│ 6235 │ Zuniga    │         1 │
├──────┼───────────┼───────────┤
│ 8043 │ Zuniga    │         1 │
├──────┼───────────┼───────────┤
│  944 │ Zimmerman │         1 │
└──────┴───────────┴───────────┘

after fix - GOOD:

limbo> select id, last_name, count(1) from users GROUP BY 1,2 order by id, last_name desc limit 3;
┌────┬───────────┬───────────┐
│ id │ last_name │ count (1) │
├────┼───────────┼───────────┤
│  1 │ Foster    │         1 │
├────┼───────────┼───────────┤
│  2 │ Salazar   │         1 │
├────┼───────────┼───────────┤
│  3 │ Perry     │         1 │
└────┴───────────┴───────────┘

I also refactored sorters to always use the ast `SortOrder` instead of boolean vectors, and use the `compare_immutable()` utility we use inside btrees too.

Closes #1365
2025-05-03 12:48:08 +03:00