Commit Graph

83 Commits

Author SHA1 Message Date
Preston Thorpe
44dc4c9636 Merge 'translate/emitter: Implement partial indexes' from Preston Thorpe
This PR adds support for partial indexes, e.g. `CREATE INDEX` with a
provided predicate
```sql
CREATE UNIQUE INDEX idx_expensive ON products(sku) where price > 100;
```
The PR does not yet implement support for using the partial indexes in
the optimizer.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3228
2025-09-22 09:09:54 -04:00
Pekka Enberg
0144ea8059 Merge 'Support UNION queries in DBSP-based Materialized Views' from Glauber Costa
UNION queries, while useful on their own, are a cornerstone of recursive
CTEs.
This PR implements:
* the merge operator, required to merge both sides of a union query.
* the circuitry necessary to issue the Merge operator.
* extraction of tables mentioned in union and CTE expressions, so we can
correctly populate tables that contain them.

Closes #3234
2025-09-22 11:33:19 +03:00
Glauber Costa
2627ad44de support union statements in the DBSP circuit compiler 2025-09-21 21:00:27 -03:00
Glauber Costa
b419db489a Implement the DBSP merge operator
The Merge operator is a stateless operator that merges two deltas.
There are two modes: Distinct, where we merge together values that
are the same, and All, where we preserve all values. We use the rowid of
the hashable row to guarantee that: In Distinct mode, the rowid is set
to 0 in both sides. If they values are the same, they will hash to the
same thing. For All, the rowids are different.

The merge operator is used for the UNION statement, which is a
cornerstone of Recursive CTEs.
2025-09-21 21:00:27 -03:00
Glauber Costa
9f54f60d45 make sure that complex select statements are captured by MV populate
The population code extracts table information from the select statement
so it can populate the materialized view. But the code, as written
today, is naive. It doesn't capture table information correctly if there
is more than one select statement (such in the case of a union query).
2025-09-21 21:00:27 -03:00
Glauber Costa
13260349b0 Return a parse error for a non-equality join
We currently don't handle non equality, but end up just returning a
bogus result. Let's parse error.
2025-09-20 20:35:10 -03:00
PThorpe92
a0f574d279 Add where_clause expr field to Index 2025-09-20 14:38:47 -04:00
Glauber Costa
f2f7f817e4 populate all tables in IncrementalView
For joins to work, we have to populate all referenced tables when we
create the view.
2025-09-19 03:59:28 -05:00
Glauber Costa
e5a106d8d6 enable joins in IncrementalView 2025-09-19 03:59:28 -05:00
Glauber Costa
832a4d7034 generate projection nodes inside filter clauses
We are currently not able to properly compute things like WHERE a+b=2.
Let's generate a projection node inside a filter when needed.
2025-09-19 03:59:28 -05:00
Glauber Costa
627f61aa81 support column comparisons in the filter operator
We currently only support column / literal comparisons in the filter
operator. But with JOINs, comparisons are usually against two columns.

Do the work to support it.
2025-09-19 03:59:28 -05:00
Glauber Costa
47097fbec6 Add tests for project operator working with ambiguous columns
Unlike the other operators, project works just fine with ambiguous
columsn, because it works with compiled expressions. We don't need to
patch it, but let's make sure it keeps working by writing a test.
2025-09-19 03:59:28 -05:00
Glauber Costa
e80dd8e5e1 move the filter operator to accept indexes instead of names
We already did similarly for the AggregateOperator: for joins
you can have the same column name in many tables. And passing schema
information to the operator is a layering violation (the operator may be
operating on the result of a previous node, and at that point there is
no more "schema"). Therefore we pass indexes into the column set the
operator has.

The FilterOperator has a complication: we are using it to generate the
SQL for the populate statement, and that needs column names. However,
we should *not* be using the FilterOperator for that, and that is a
relic from the time where we had operator information directly inside
the IncrementalView.

To enable moving the FilterOperator to index-based, we rework that code.
For joins, we'll need to populate many tables anyway, so we take the
time to do that work here.
2025-09-19 03:59:28 -05:00
Glauber Costa
f149b40e75 Implement JOINs in the DBSP circuit
This PR improves the DBSP circuit so that it handles the JOIN operator.
The JOIN operator exposes a weakness of our current model: we usually
pass a list of columns between operators, and find the right column by
name when needed.

But with JOINs, many tables can have the same columns. The operators
will then find the wrong column (same name, different table), and
produce incorrect results.

To fix this, we must do two things:
1) Change the Logical Plan. It needs to track table provenance.
2) Fix the aggregators: it needs to operate on indexes, not names.

For the aggregators, note that table provenance is the wrong
abstraction. The aggregator is likely working with a logical table that
is the result of previous nodes in the circuit. So we just need to be
able to tell it which index in the column array it should use.
2025-09-19 03:59:28 -05:00
Glauber Costa
9f3d119a5a move hashable row tests to dbsp.rs
The operator.rs file was so huge, that we didn't even notice there was a
test block in the middle of the file that was testing things that were
long moved to dbsp.rs (the HashableRow). Move the tests there now.
2025-09-19 03:59:28 -05:00
Glauber Costa
e2f0e372a1 move the join operator to its own file.
The code is becoming impossible to reason about with everything in
operator.rs
2025-09-19 03:59:28 -05:00
Glauber Costa
aa8fcdbe54 move the aggregate operator to its own file.
The code is becoming impossible to reason about with everything in
operator.rs
2025-09-19 03:59:24 -05:00
Glauber Costa
7178d8d31c move the project operator to its own file.
The code is becoming impossible to reason about with everything in
operator.rs
2025-09-19 03:57:11 -05:00
Glauber Costa
ee914fc543 move the filter operator to its own file.
The code is becoming impossible to reason about with everything in
operator.rs
2025-09-19 03:57:11 -05:00
Glauber Costa
9747d6c6b6 move the input operator to its own file.
The code is becoming impossible to reason about with everything in
operator.rs
2025-09-19 03:57:11 -05:00
Glauber Costa
6be5eb74d9 Implement the Join Operator
The join operator is also a stateful operator. It keeps the input deltas
stored in the state, for both the left and right branches of the join.

JOINs extract a join key, which is the values that were used in the
join's equality statement. That key is now our zset_id, and it points
to a collection of rows.
2025-09-19 03:57:11 -05:00
Glauber Costa
5b4a6e5c2d view: catch all tables mentioned, instead of just one.
Ahead of the implementation of JOINs, we need to evolve the
IncrementalView, which currently only accepts a single base table,
to keep a list of tables mentioned in the statement.
2025-09-19 03:57:11 -05:00
Glauber Costa
0b3317d449 extract columns from all tables in case of joins.
Our code for view needs to extract the list of columns used in the view.
We currently extract only from "the base table", but once we have joins,
we need a more complex structure, that keeps the mapping of
(tables, columns).

This actually affects both views and materialized views: for views, the
queries with joins work just fine, because views are just aliases for
a query. But the list of columns returned by pragma table_info on the
view is incorrect. We add a test to make sure it is fixed.

For materialized views, we add extensive tests to make sure that the
columns are extracted correctly.
2025-09-19 03:57:11 -05:00
Pekka Enberg
3f35267b7c core/mvcc: Kill noop storage
We don't need it for anything.
2025-09-19 08:52:57 +03:00
Pekka Enberg
0ce6469a4b Merge 'Fix some Rust compilation warnings' from Samuel Marks
Nothing fancy yet, assuming you merge this I'll do this one next:
```
warning: function pointer comparisons do not produce meaningful results since their addresses are not guaranteed to be unique
   --> core/types.rs:403:5
    |
398 | #[derive(Debug, Clone, PartialEq)]
    |                        --------- in this derive macro expansion
...
402 |     pub step_fn: StepFunction,
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^
403 |     pub finalize_fn: FinalizeFunction,
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: the address of the same function can vary between different codegen units
    = note: furthermore, different functions could have the same address after being merged together
    = note: for more information visit <https://doc.rust-lang.org/nightly/core/ptr/fn.fn_addr_eq.html>
```
And fix a test failure that I resolved in Python (specific to macOS
hosts). Basically this PR is putting my toe in the water to see how open
you are to contribs!

Closes #3211
2025-09-19 08:28:53 +03:00
Samuel Marks
e333f151ba [*.rs] Resolve warnings (mostly "hiding a lifetime that's elided elsewhere is confusing") 2025-09-18 22:47:43 -05:00
Pere Diaz Bou
ff3c79d5d7 remove mvvmode and set logical log as default 2025-09-18 18:22:25 +02:00
Pere Diaz Bou
e2824835dc fix all open_file use cases for mvcc mode 2025-09-18 18:22:05 +02:00
Pere Diaz Bou
de8a975a0b core/mvcc: introduce MvccMode Logical Log 2025-09-18 18:21:04 +02:00
Jussi Saurio
91ef4e5e9d Merge 'Introduce instruction VTABLE' from Lâm Hoàng Phúc
this PR improves 3-6% for `prepare` benchmark without slowing down
others.  After this PR we don't have to store `InsnFunction`  in
`Program` and `ProgramBuilder` anymore, because `to_function` will
return result without matching.

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3098
2025-09-18 09:18:48 +03:00
Pekka Enberg
c2b8bb0a2f core/incremental: Wrap ViewTransactionState in Arc
Make it Send.
2025-09-17 12:23:29 +03:00
Jussi Saurio
9a2797963a Merge 'Remove LimboResult enum and InsnFunctionStepResult::Busy variant' from Jussi Saurio
We can just use `LimboError::Busy` for both of these.

Reviewed-by: Pekka Enberg <penberg@iki.fi>

Closes #3170
2025-09-17 12:06:54 +03:00
TcMits
668f1f721c resolve conflict 2025-09-17 15:25:58 +07:00
Jussi Saurio
b9ceacc356 Remove InsnFunctionStepResult::Busy
we don't need all these busy variants, let's just handle
LimboError::Busy
2025-09-17 11:22:49 +03:00
Pekka Enberg
17e9f05ea4 core: Convert Rc<Pager> to Arc<Pager> 2025-09-17 09:32:49 +03:00
rajajisai
e605aff31b Merge branch 'main' into enc-page-1 2025-09-16 10:06:00 -04:00
rajajisai
89caa868f9 Encryption support for database header page 2025-09-16 10:04:30 -04:00
Glauber Costa
6bee6bb785 implement min/max
We have not implemented them before because they require the raw
elements to be kept. It is easy to see why in the following example:

current_min = 3;
insert(2) => current_min = 2 // can be done without state
delete(2) => needs to look at the state to determine new min!

The aggregator state was a very simple key-value structure. To
accomodate for min/max, we will make it into a more complex table, where
we can encode a more complex structure.

The key insight is that we can use a primary key composed of:

1) storage_id
2) zset_id,
3) element

The storage_id and zset_id are our previous key, except they are now
exploded to support a larger range of storage_id. With more bits
available in the storage_id, we can encode information about which
column we are storing. For aggregations in multiple columns, we will
need to keep a different list of values for min/max!

The element is just the values of the columns.

Because this is a primary key, the data will be sorted in the btree.
We can then just do a prefix search in the first two components of
the key and easily find the min/max when needed.

This new format is also adequate for joins. Joins will just have
a new storage_id which encodes two "columns" (left side, right side).
2025-09-15 22:30:48 -05:00
Glauber Costa
3565e7978a Add an index to the dbsp internal table
And also change the schema of the main table. I have come to see the
current key-value schema as inadequate for non-aggregate operators.
Calculating Min/Max, for example, doesn't feat in this schema because
we have to be able to track existing values and index them.

Another alternative is to keep one table per operator type, but this
quickly leads to an explosion of tables.
2025-09-15 22:30:48 -05:00
TcMits
cab0c7b545 peft tuning 2025-09-14 18:53:53 +07:00
TcMits
01da48fde9 introduce instruction virtual table 2025-09-13 16:35:17 +07:00
PThorpe92
5849819a59 Fix tests for views 2025-09-12 08:20:40 -04:00
Preston Thorpe
b09dcceeef Merge 'Fixes views' from Glauber Costa
This is a collection of fixes for materialized views ahead of adding
support for JOINs.
It is mostly issues with how we assume there is a single table, with a
single delta, but we have to send more than one.
Those are things that are just objectively wrong, so I am sending it
separately to make the JOIN PR smaller.

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3009
2025-09-12 07:43:32 -04:00
Glauber Costa
874047276e views: pass a DeltaSet for merge_delta
A DeltaSet is a collection of Deltas, one per table.
We'll need that for joins. The populate step for now will still generate
a single set. That will be our next step to fix.
2025-09-11 05:30:46 -07:00
Glauber Costa
841de334b7 view: catch all tables mentioned, instead of just one.
Ahead of the implementation of JOINs, we need to evolve the
IncrementalView, which currently only accepts a single base table,
to keep a list of tables mentioned in the statement.
2025-09-11 05:30:46 -07:00
Glauber Costa
c15ac87a3c fix cursor validation
We are validating that the weights on the materialized view table are
-1, 0, and 1. This is only true for the aggregator operator. For DBSP
in general, any number will do.

Our algorithm, however, would have deleted anything from the BTree that
is <= 0. So we don't expect them here.
2025-09-11 05:30:46 -07:00
Glauber Costa
e6008e532a Add a second delta to the EvalState, Commit
We will assert that the second one is always empty for the existing
operators - as they should be!

But joins will need both.
2025-09-11 05:30:46 -07:00
Glauber Costa
6541a43670 move hashable_row to dbsp.rs
There will be a new type for joins, so it makes less sense to have
a separate file just for it. dbsp.rs is good.
2025-09-11 05:30:46 -07:00
Glauber Costa
1fd345f382 unify code used for persistence.
We have code written for BTree (ZSet) persistence in both compiler.rs
and operator.rs, because there are minor differences between them. With
joins coming, it is time to unify this code.
2025-09-11 05:30:46 -07:00
Jussi Saurio
e3bd00883b Fix creation of automatic indexes
indexes with the naming scheme "sqlite_autoindex_<tblname>_<number>"
are automatically created when a table is created with UNIQUE or
PRIMARY KEY definitions.

these indexes must map to the table definition SQL in definition order,
i.e. sqlite_autoindex_foo_1 must be the first instance of UNIQUE or
PRIMARY KEY and so on.

this commit fixes our autoindex creation / parsing so that this invariant
is upheld.
2025-09-11 14:11:30 +03:00