Table ID is an opaque identifier that is only meaningful to the MV store.
Each checkpointed MVCC table corresponds to a single B-tree on the pager,
which naturally has a root page.
We cannot use root page as the MVCC table ID directly because:
- We assign table IDs during MVCC commit, but
- we commit pages to the pager only during checkpoint
which means the root page is not easily knowable ahead of time.
Hence, we:
- store the mapping between table id and btree rootpage
- sqlite_schema rows will have a negative rootpage column if the
table has not been checkpointed yet.
MVCC is like the annoying younger cousin (I know because I was him) that
needs to be treated differently. MVCC requires us to use root_pages that
might not be allocated yet, and the plan is to use negative root_pages
for that case. Therefore, we need i64 in order to fit this change.
Our code for view needs to extract the list of columns used in the view.
We currently extract only from "the base table", but once we have joins,
we need a more complex structure, that keeps the mapping of
(tables, columns).
This actually affects both views and materialized views: for views, the
queries with joins work just fine, because views are just aliases for
a query. But the list of columns returned by pragma table_info on the
view is incorrect. We add a test to make sure it is fixed.
For materialized views, we add extensive tests to make sure that the
columns are extracted correctly.
Adds `round`, `hex`, `unhex`, `abs`, `lower`, `upper`, `sign` and `log`
(with base) to the expression fuzzer.
Rounding with the precision argument still has some incompatibilities.
Closes#3160
And also change the schema of the main table. I have come to see the
current key-value schema as inadequate for non-aggregate operators.
Calculating Min/Max, for example, doesn't feat in this schema because
we have to be able to track existing values and index them.
Another alternative is to keep one table per operator type, but this
quickly leads to an explosion of tables.
This fairly long commit implements persistence for materialized view.
It is hard to split because of all the interdependencies between components,
so it is a one big thing. This commit message will at least try to go into
details about the basic architecture.
Materialized Views as tables
============================
Materialized views are now a normal table - whereas before they were a virtual
table. By making a materialized view a table, we can reuse all the
infrastructure for dealing with tables (cursors, etc).
One of the advantages of doing this is that we can create indexes on view
columns. Later, we should also be able to write those views to separate files
with ATTACH write.
Materialized Views as Zsets
===========================
The contents of the table are a ZSet: rowid, values, weight. Readers will
notice that because of this, the usage of the ZSet data structure dwindles
throughout the codebase. The main difference between our materialized ZSet and
the standard DBSP ZSet, is that obviously ours is backed by a BTree, not a Hash
(since SQLite tables are BTrees)
Aggregator State
================
In DBSP, the aggregator nodes also have state. To store that state, there is a
second table. The table holds all aggregators in the view, and there is one
table per view. That is __turso_internal_dbsp_state_{view_name}. The format of
that table is similar to a ZSet: rowid, serialized_values, weight. We serialize
the values because there will be many aggregators in the table. We can't rely
on a particular format for the values.
The Materialized View Cursor
============================
Reading from a Materialized View essentially means reading from the persisted
ZSet, and enhancing that with data that exists within the transaction.
Transaction data is ephemeral, so we do not materialize this anywhere: we have
a carefully crafted implementation of seek that takes care of merging weights
and stitching the two sets together.
@penberg this PR try to clean up `turso_parser`'s`fmt` code.
- `get_table_name` and `get_column_name` should return None when
table/column does not exist.
```rust
/// Context to be used in ToSqlString
pub trait ToSqlContext {
/// Given an id, get the table name
/// First Option indicates whether the table exists
///
/// Currently not considering aliases
fn get_table_name(&self, _id: TableInternalId) -> Option<&str> {
None
}
/// Given a table id and a column index, get the column name
/// First Option indicates whether the column exists
/// Second Option indicates whether the column has a name
fn get_column_name(&self, _table_id: TableInternalId, _col_idx: usize) -> Option<Option<&str>> {
None
}
// help function to handle missing table/column names
fn get_table_and_column_names(
&self,
table_id: TableInternalId,
col_idx: usize,
) -> (String, String) {
let table_name = self
.get_table_name(table_id)
.map(|s| s.to_owned())
.unwrap_or_else(|| format!("t{}", table_id.0));
let column_name = self
.get_column_name(table_id, col_idx)
.map(|opt| {
opt.map(|s| s.to_owned())
.unwrap_or_else(|| format!("c{col_idx}"))
})
.unwrap_or_else(|| format!("c{col_idx}"));
(table_name, column_name)
}
}
```
- remove `FmtTokenStream` because it is same as `WriteTokenStream `
- remove useless functions and simplify `ToTokens`
```rust
/// Generate token(s) from AST node
/// Also implements Display to make sure devs won't forget Display
pub trait ToTokens: Display {
/// Send token(s) to the specified stream with context
fn to_tokens<S: TokenStream + ?Sized, C: ToSqlContext>(
&self,
s: &mut S,
context: &C,
) -> Result<(), S::Error>;
// Return displayer representation with context
fn displayer<'a, 'b, C: ToSqlContext>(&'b self, ctx: &'a C) -> SqlDisplayer<'a, 'b, C, Self>
where
Self: Sized,
{
SqlDisplayer::new(ctx, self)
}
}
```
Closes#2748
We have an issue at the moment that when a materialized view fails
to be created, we just swallow the error and leave the database in
a funny state.
We have can_create_view() to detect those issues early, but not all
errors can be detected that early.
We were not generating table_info for views. This PR fixes it. We were
so far storing columns as strings with just their names - since this is
all we needed - but we will move now to store Columns. We need to
convert the names to Column anyway for table_info to work.
Closes#2625
We were not generating table_info for views. This PR fixes it. We were
so far storing columns as strings with just their names - since this is
all we needed - but we will move now to store Columns. We need to
convert the names to Column anyway for table_info to work.
A lot of the structures we have - like the ones under Schema, are
specific for materialized views. In preparation to adding normal views,
rename them, so things are less confusing.