indexes with the naming scheme "sqlite_autoindex_<tblname>_<number>"
are automatically created when a table is created with UNIQUE or
PRIMARY KEY definitions.
these indexes must map to the table definition SQL in definition order,
i.e. sqlite_autoindex_foo_1 must be the first instance of UNIQUE or
PRIMARY KEY and so on.
this commit fixes our autoindex creation / parsing so that this invariant
is upheld.
The MakeRecord instruction now accepts an optional affinity_str
parameter that applies column-specific type conversions before creating
records. When provided, the affinity string is applied
character-by-character to each register using the existing
apply_affinity_char() function, matching SQLite's behavior.
Fixes#2040Fixes#2041
This fairly long commit implements persistence for materialized view.
It is hard to split because of all the interdependencies between components,
so it is a one big thing. This commit message will at least try to go into
details about the basic architecture.
Materialized Views as tables
============================
Materialized views are now a normal table - whereas before they were a virtual
table. By making a materialized view a table, we can reuse all the
infrastructure for dealing with tables (cursors, etc).
One of the advantages of doing this is that we can create indexes on view
columns. Later, we should also be able to write those views to separate files
with ATTACH write.
Materialized Views as Zsets
===========================
The contents of the table are a ZSet: rowid, values, weight. Readers will
notice that because of this, the usage of the ZSet data structure dwindles
throughout the codebase. The main difference between our materialized ZSet and
the standard DBSP ZSet, is that obviously ours is backed by a BTree, not a Hash
(since SQLite tables are BTrees)
Aggregator State
================
In DBSP, the aggregator nodes also have state. To store that state, there is a
second table. The table holds all aggregators in the view, and there is one
table per view. That is __turso_internal_dbsp_state_{view_name}. The format of
that table is similar to a ZSet: rowid, serialized_values, weight. We serialize
the values because there will be many aggregators in the table. We can't rely
on a particular format for the values.
The Materialized View Cursor
============================
Reading from a Materialized View essentially means reading from the persisted
ZSet, and enhancing that with data that exists within the transaction.
Transaction data is ephemeral, so we do not materialize this anywhere: we have
a carefully crafted implementation of seek that takes care of merging weights
and stitching the two sets together.
@penberg this PR try to clean up `turso_parser`'s`fmt` code.
- `get_table_name` and `get_column_name` should return None when
table/column does not exist.
```rust
/// Context to be used in ToSqlString
pub trait ToSqlContext {
/// Given an id, get the table name
/// First Option indicates whether the table exists
///
/// Currently not considering aliases
fn get_table_name(&self, _id: TableInternalId) -> Option<&str> {
None
}
/// Given a table id and a column index, get the column name
/// First Option indicates whether the column exists
/// Second Option indicates whether the column has a name
fn get_column_name(&self, _table_id: TableInternalId, _col_idx: usize) -> Option<Option<&str>> {
None
}
// help function to handle missing table/column names
fn get_table_and_column_names(
&self,
table_id: TableInternalId,
col_idx: usize,
) -> (String, String) {
let table_name = self
.get_table_name(table_id)
.map(|s| s.to_owned())
.unwrap_or_else(|| format!("t{}", table_id.0));
let column_name = self
.get_column_name(table_id, col_idx)
.map(|opt| {
opt.map(|s| s.to_owned())
.unwrap_or_else(|| format!("c{col_idx}"))
})
.unwrap_or_else(|| format!("c{col_idx}"));
(table_name, column_name)
}
}
```
- remove `FmtTokenStream` because it is same as `WriteTokenStream `
- remove useless functions and simplify `ToTokens`
```rust
/// Generate token(s) from AST node
/// Also implements Display to make sure devs won't forget Display
pub trait ToTokens: Display {
/// Send token(s) to the specified stream with context
fn to_tokens<S: TokenStream + ?Sized, C: ToSqlContext>(
&self,
s: &mut S,
context: &C,
) -> Result<(), S::Error>;
// Return displayer representation with context
fn displayer<'a, 'b, C: ToSqlContext>(&'b self, ctx: &'a C) -> SqlDisplayer<'a, 'b, C, Self>
where
Self: Sized,
{
SqlDisplayer::new(ctx, self)
}
}
```
Closes#2748
SQLite does not store the rowid alias column in the record at all
when it is a rowid alias, because the rowid is always stored anyway
in the record header.
This PR adds new `updates` column to the CDC table. This column holds
updated fields of the row in the following format:
```
[C boolean values where true set for changed columns]
[C values with updates where NULL is set for not-changed columns]
```
For example:
```
turso> UPDATE t SET y = 'turso', q = 'db' WHERE rowid = 1;
turso> SELECT bin_record_json_object('["x","y","z","q","x","y","z","q"]', updates) as updates FROM turso_cdc;
┌──────────────────────────────────────────────────────────────────┐
│ updates │
├──────────────────────────────────────────────────────────────────┤
│ {"x":0,"y":1,"z":0,"q":1,"x":null,"y":"turso","z":null,"q":"db"} │
└──────────────────────────────────────────────────────────────────┘
```
Also, this column works differently for `ALTER TABLE` statements where
update value for `sql` will be equal to the original `ALTER TABLE`:
```
turso> ALTER TABLE t ADD COLUMN t;
turso> SELECT bin_record_json_object('["type","name","tbl_name","rootpage","sql","type","name","tbl_name","rootpage","sql"]', updates) as updates FROM turso_cdc WHERE rowid = 2;
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ updates │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ {"type":0,"name":0,"tbl_name":0,"rootpage":0,"sql":1,"type":null,"name":null,"tbl_name":null,"rootpage":null,"sql":"ALTER TABLE t ADD COLUMN t;"} │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```
This will help turso-db to implement logical replication which supports
both column-level updates and schema changes
Closes#2538
This is just the bare minimum that I needed to convince myself that this
approach will work. The only views that we support are slices of the
main table: no aggregations, no joins, no projections.
drop view is implemented.
view population is implemented.
deletes, inserts and updates are implemented.
much like indexes before, a flag must be passed to enable views.
When building views (soon), it will be important to know which table
is being deleted. Getting from the cursor id is very cumbersome.
What we are doing here is symmetrical to op_insert, and sqlite also
passes table information in one of the registers (p4)
When building views (soon), it will be important to know which table
is being deleted. Getting from the cursor id is very cumbersome.
What we are doing here is symmetrical to op_insert, and sqlite also
passes table information in one of the registers (p4)
This commit replaces the `Name(pub String)` struct with a `Name` enum that
explicitly models how the name appeared in the source either as an
unquoted identifier (`Ident`) or a quoted string (`Quoted`).
In the process, the separate `Id` wrapper type has been coalesced into the
`Name` enum, simplifying the AST and reducing duplication in identifier
handling logic.
While this increases the size of some AST nodes (notably `yyStackEntry`),
it improves correctness and makes source structure more explicit for
later phases.
The SetCookie opcode is used, among other things, to notify the
transaction of schema changes. We are not issuing it on DropTable.
Without it, the transaction thinks the schema hasn't changed, and does
not update the schema of the connection back to the database.
SQLite will, of course, issue it:
35 DropTable 0 0 0 foo 0
36 SetCookie 0 1 2 0
Unfortunately I don't have a unit test that breaks with this, because
the one that is supposed to break is having, let's put it this way,
bigger problems.
There's no such thing as a read-only connection.
In a normal connection, you can have many attached databases. Some
r/o, some r/w.
To properly fix that, we also need to fix the OpenWrite opcode. Right
now we are passing a name, which is the name of the table. That
parameter is not used anywhere. That is also not what the SQLite opcode
specifies. Same as OpenRead, the p3 register should be the database
index.
With that change, we can - for now - pass the index 0, which is all
we support anyway, and then use that to test if we are r/o.
Two of the opcodes we implement (OpenRead and Transaction) should have
an opcode specifying the database to use, but they don't.
Add it, and for now always use 0 (the main database).
First step toward resolving
https://github.com/tursodatabase/limbo/issues/1643.
### This PR
With this change, the following two queries are considered equivalent:
```sql
SELECT value FROM generate_series(5, 50);
SELECT value FROM generate_series WHERE start = 5 AND stop = 50;
```
Arguments passed in parentheses to the virtual table name are now
matched to hidden columns.
Additionally, I fixed two bugs related to virtual tables.
### TODO (I'll handle this in a separate PR)
Column references are still not supported as table-valued function
arguments. The only difference is that previously, a query like:
```sql
SELECT one.value, series.value
FROM (SELECT 1 AS value) one, generate_series(one.value, 3) series;
```
would cause a panic. Now, it returns a proper error message instead.
Adding support for column references is more nuanced for two main
reasons:
* We need to ensure that in joins where a TVF depends on other tables,
those other tables are processed first. For example, in:
```sql
SELECT one.value, series.value
FROM generate_series(one.value, 3) series, (SELECT 1 AS value) one;
```
the one table must be processed by the top-level loop, and series must
be nested.
* For outer joins involving TVFs, the arguments must be treated as `ON`
predicates, not `WHERE` predicates.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#1727
Makes it easier to test the feature:
```
$ cargo run -- --experimental-indexes
Limbo v0.0.22
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database
limbo> CREATE TABLE t(x);
limbo> CREATE INDEX t_idx ON t(x);
limbo> DROP INDEX t_idx;
```
- `Update` query doesn't update `n_changes`. Let's make it work
- Add `InsertFlags` to add meta information related to insert operations
- For update query, add `UPDATE` flag
- Currently, the update query executes `Insn::Delete` and `Insn::Insert`
internally, it increases `n_change` by 2. So, for the update query,
let's skip increasing `n_change` for the `Insn::Insert`
https://github.com/tursodatabase/limbo/issues/1681
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#1683