Yield is a completion that does not allocate any inner state. By design
it is completed from the start and has no errors. This allows lightly
yield without allocating any locks nor heap allocate inner state.
MVCC is like the annoying younger cousin (I know because I was him) that
needs to be treated differently. MVCC requires us to use root_pages that
might not be allocated yet, and the plan is to use negative root_pages
for that case. Therefore, we need i64 in order to fit this change.
The population code extracts table information from the select statement
so it can populate the materialized view. But the code, as written
today, is naive. It doesn't capture table information correctly if there
is more than one select statement (such in the case of a union query).
We already did similarly for the AggregateOperator: for joins
you can have the same column name in many tables. And passing schema
information to the operator is a layering violation (the operator may be
operating on the result of a previous node, and at that point there is
no more "schema"). Therefore we pass indexes into the column set the
operator has.
The FilterOperator has a complication: we are using it to generate the
SQL for the populate statement, and that needs column names. However,
we should *not* be using the FilterOperator for that, and that is a
relic from the time where we had operator information directly inside
the IncrementalView.
To enable moving the FilterOperator to index-based, we rework that code.
For joins, we'll need to populate many tables anyway, so we take the
time to do that work here.
Ahead of the implementation of JOINs, we need to evolve the
IncrementalView, which currently only accepts a single base table,
to keep a list of tables mentioned in the statement.
Our code for view needs to extract the list of columns used in the view.
We currently extract only from "the base table", but once we have joins,
we need a more complex structure, that keeps the mapping of
(tables, columns).
This actually affects both views and materialized views: for views, the
queries with joins work just fine, because views are just aliases for
a query. But the list of columns returned by pragma table_info on the
view is incorrect. We add a test to make sure it is fixed.
For materialized views, we add extensive tests to make sure that the
columns are extracted correctly.
And also change the schema of the main table. I have come to see the
current key-value schema as inadequate for non-aggregate operators.
Calculating Min/Max, for example, doesn't feat in this schema because
we have to be able to track existing values and index them.
Another alternative is to keep one table per operator type, but this
quickly leads to an explosion of tables.
A DeltaSet is a collection of Deltas, one per table.
We'll need that for joins. The populate step for now will still generate
a single set. That will be our next step to fix.
Ahead of the implementation of JOINs, we need to evolve the
IncrementalView, which currently only accepts a single base table,
to keep a list of tables mentioned in the statement.
This fairly long commit implements persistence for materialized view.
It is hard to split because of all the interdependencies between components,
so it is a one big thing. This commit message will at least try to go into
details about the basic architecture.
Materialized Views as tables
============================
Materialized views are now a normal table - whereas before they were a virtual
table. By making a materialized view a table, we can reuse all the
infrastructure for dealing with tables (cursors, etc).
One of the advantages of doing this is that we can create indexes on view
columns. Later, we should also be able to write those views to separate files
with ATTACH write.
Materialized Views as Zsets
===========================
The contents of the table are a ZSet: rowid, values, weight. Readers will
notice that because of this, the usage of the ZSet data structure dwindles
throughout the codebase. The main difference between our materialized ZSet and
the standard DBSP ZSet, is that obviously ours is backed by a BTree, not a Hash
(since SQLite tables are BTrees)
Aggregator State
================
In DBSP, the aggregator nodes also have state. To store that state, there is a
second table. The table holds all aggregators in the view, and there is one
table per view. That is __turso_internal_dbsp_state_{view_name}. The format of
that table is similar to a ZSet: rowid, serialized_values, weight. We serialize
the values because there will be many aggregators in the table. We can't rely
on a particular format for the values.
The Materialized View Cursor
============================
Reading from a Materialized View essentially means reading from the persisted
ZSet, and enhancing that with data that exists within the transaction.
Transaction data is ephemeral, so we do not materialize this anywhere: we have
a carefully crafted implementation of seek that takes care of merging weights
and stitching the two sets together.
We need a read only phase and a commit phase. Otherwise we will never
be able to rollback changes properly. We currently do that, but we
do that in the view. Before we move to circuits, this needs to be
internalized by the operator.
My goal with this patch is to be able to implement the ProjectOperator
for DBSP circuits using VDBE for expression evaluation.
*not* doing so is dangerous for the following reason: we will end up
with different, subtle, and incompatible behavior between SQLite
expressions if they are used in views versus outside of views.
In fact, even in our prototype had them: our projection tests, which
used to pass, were actually wrong =) (sqlite would return something
different if those functions were executed outside the view context)
For optimization reasons, we single out trivial expressions: they don't
have go through VDBE. Trivial expressions are expressions that only
involve Columns, Literals, and simple operators on elements of the same
type. Even type coercion takes this out of the realm of trivial.
Everything that is not trivial, is then translated with translate_expr -
in the same way SQLite will, and then compiled with VDBE.
We can, over time, make this process much better. There are essentially
infinite opportunities for optimization here. But for now, the main
warts are:
* VDBE execution needs a connection
* There is no good way in VDBE to pass parameters to a program.
* It is almost trivial to pollute the original connection. For example,
we need to issue HALT for the program to stop, but seeing that halt
will usually cause the program to try and halt the original program.
Subprograms, like the ones we use in triggers are a possible solution,
but they are much more expensive to execute, especially given that our
execution would essentially have to have a program with no other role
than to wrap the subprogram.
Therefore, what I am doing is:
* There is an in-memory database inside the projection operator (an
obvious optimization is to share it with *all* projection operators).
* We obtain a connection to that database when the operator is created
* We use that connection to execute our VDBE, which offers a clean, safe
and isolated way to execute the expression.
* We feed the values to the program manually by editing the registers
directly.
populate now has its own code path to apply changes to the view. It was
okay until now because all we do is filter. But now that we are also
applying aggregations, we'll end up with two disjoint code paths.
A better approach is to just apply the results of our select to the
delta set, and apply it.
This is just the bare minimum that I needed to convince myself that this
approach will work. The only views that we support are slices of the
main table: no aggregations, no joins, no projections.
drop view is implemented.
view population is implemented.
deletes, inserts and updates are implemented.
much like indexes before, a flag must be passed to enable views.