Commit Graph

13 Commits

Author SHA1 Message Date
Glauber Costa
b419db489a Implement the DBSP merge operator
The Merge operator is a stateless operator that merges two deltas.
There are two modes: Distinct, where we merge together values that
are the same, and All, where we preserve all values. We use the rowid of
the hashable row to guarantee that: In Distinct mode, the rowid is set
to 0 in both sides. If they values are the same, they will hash to the
same thing. For All, the rowids are different.

The merge operator is used for the UNION statement, which is a
cornerstone of Recursive CTEs.
2025-09-21 21:00:27 -03:00
Glauber Costa
e2f0e372a1 move the join operator to its own file.
The code is becoming impossible to reason about with everything in
operator.rs
2025-09-19 03:59:28 -05:00
Glauber Costa
aa8fcdbe54 move the aggregate operator to its own file.
The code is becoming impossible to reason about with everything in
operator.rs
2025-09-19 03:59:24 -05:00
Glauber Costa
7178d8d31c move the project operator to its own file.
The code is becoming impossible to reason about with everything in
operator.rs
2025-09-19 03:57:11 -05:00
Glauber Costa
ee914fc543 move the filter operator to its own file.
The code is becoming impossible to reason about with everything in
operator.rs
2025-09-19 03:57:11 -05:00
Glauber Costa
9747d6c6b6 move the input operator to its own file.
The code is becoming impossible to reason about with everything in
operator.rs
2025-09-19 03:57:11 -05:00
Glauber Costa
6541a43670 move hashable_row to dbsp.rs
There will be a new type for joins, so it makes less sense to have
a separate file just for it. dbsp.rs is good.
2025-09-11 05:30:46 -07:00
Glauber Costa
1fd345f382 unify code used for persistence.
We have code written for BTree (ZSet) persistence in both compiler.rs
and operator.rs, because there are minor differences between them. With
joins coming, it is time to unify this code.
2025-09-11 05:30:46 -07:00
Glauber Costa
08b2e685d5 Persistence for DBSP-based materialized views
This fairly long commit implements persistence for materialized view.
It is hard to split because of all the interdependencies between components,
so it is a one big thing. This commit message will at least try to go into
details about the basic architecture.

Materialized Views as tables
============================

Materialized views are now a normal table - whereas before they were a virtual
table.  By making a materialized view a table, we can reuse all the
infrastructure for dealing with tables (cursors, etc).

One of the advantages of doing this is that we can create indexes on view
columns.  Later, we should also be able to write those views to separate files
with ATTACH write.

Materialized Views as Zsets
===========================

The contents of the table are a ZSet: rowid, values, weight. Readers will
notice that because of this, the usage of the ZSet data structure dwindles
throughout the codebase. The main difference between our materialized ZSet and
the standard DBSP ZSet, is that obviously ours is backed by a BTree, not a Hash
(since SQLite tables are BTrees)

Aggregator State
================

In DBSP, the aggregator nodes also have state. To store that state, there is a
second table.  The table holds all aggregators in the view, and there is one
table per view. That is __turso_internal_dbsp_state_{view_name}. The format of
that table is similar to a ZSet: rowid, serialized_values, weight. We serialize
the values because there will be many aggregators in the table. We can't rely
on a particular format for the values.

The Materialized View Cursor
============================

Reading from a Materialized View essentially means reading from the persisted
ZSet, and enhancing that with data that exists within the transaction.
Transaction data is ephemeral, so we do not materialize this anywhere: we have
a carefully crafted implementation of seek that takes care of merging weights
and stitching the two sets together.
2025-09-05 07:04:33 -05:00
Glauber Costa
29b93e3e58 add DBSP circuit compiler
The next step is to adapt the view code to use circuits instead of
listing the operators manually.
2025-08-27 14:21:32 -05:00
Glauber Costa
38def26704 Add expr_compiler
To be used in DBSP-based projections. This will compile an expression
to VDBE bytecode and execute it.

To do that we need to add a new type of Expression, which we call a
Register.

This is a way for us to pass parameters to a DBSP program which will be
not columns or literals, but inputs from the DBSP deltas.
2025-08-25 17:48:17 +03:00
Glauber Costa
145d6eede7 Implement very basic views using DBSP
This is just the bare minimum that I needed to convince myself that this
approach will work. The only views that we support are slices of the
main table: no aggregations, no joins, no projections.

drop view is implemented.
view population is implemented.
deletes, inserts and updates are implemented.

much like indexes before, a flag must be passed to enable views.
2025-08-10 23:34:04 -05:00
Glauber Costa
d5b7533ff8 Implement a DBSP module
We are not using the DBSP crate because it is very heavy on Tokio and
other dependencies that won't make sense for us to consume.
2025-08-10 23:15:26 -05:00