Yield is a completion that does not allocate any inner state. By design
it is completed from the start and has no errors. This allows lightly
yield without allocating any locks nor heap allocate inner state.
This PR add proper program abort in case of unfinished statement reset
and interruption.
Also, this PR makes rollback methods non-failing because otherwise of
their callers usually unclear (if rollback failed - what is the state of
statement/connection/transaction?)
Reviewed-by: Preston Thorpe <preston@turso.tech>
Closes#3591
MVCC is like the annoying younger cousin (I know because I was him) that
needs to be treated differently. MVCC requires us to use root_pages that
might not be allocated yet, and the plan is to use negative root_pages
for that case. Therefore, we need i64 in order to fit this change.
In comparisons for joins, we were assuming that the left column belonged
to the left join (and vice-versa). That is incorrect, because you can
write the comparison condition in any order.
Fixes#3368
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#3400
In comparisons for joins, we were assuming that the left column belonged
to the left join (and vice-versa). That is incorrect, because you can
write the comparison condition in any order.
Fixes#3368
We had code for this, but the code had a fatal flaw: it tried to detect
a complex operation (an operation that needs projection), and return
false (no need for projection), for the others.
This is the exact opposite of what we should do: we should identify the
*simple* operations, and then return true (needs projection) for the
rest.
CAST is a special beast, since it is not a function, but rather, a
special opcode. Everything else above is the true just the same. But for
CAST, we have to do the extra work to capture it in the logical plan and
pass it down.
Fixes#3372Fixes#3370Fixes#3369
Improve serialization for DBSP views.
The serialization code was written organically, without much forward
thinking about stability as we evolved the table and operator format.
Now that this is done, we are at at point where we can actually make it
suck less and take a considerable step towards making this production
ready.
We also add a simple version check (in the table name, because that is
much easier than reading contents in parse_schema_row) to prevent views
to be used if we had to do anything to evolve the format of the circuit
(including the operators)
Closes#3351
as we make changes to the way materialized views are generated (think
adding new operators, changing the id of existing operators, etc), we
will need to persist the topology of the circuit itself. This is a
change that I believe to be premature. For now, it is enough to reserve
the first operator id for it, and add a version number to the table
name. We can just detect that something changed, and ask the user to
drop the view. We can get away with it due to the fact that the views
are experimental.
We have used i64 before because that is the size of an integer in
SQLite. However, I believe that for large enough databases, the chances
of collision here are just too high. The effect of a collision is the
database silently returning incorrect data in the materialized view.
So now that everything else is working, we should move to i128.
The Materialized View code had custom serialization written so we could
move this code forward. Now that we have many operators and the views
work, replace it with something saner.
The main insight is that if we transform the AggregateState into Values
before the serialization, we are able to just use standard SQLite
serialization for the values. We then just have to add sizes, codes for
the functions, etc (which are also represented as Values).
fixes#1976
and #1605
```zsh
turso> DROP TABLE IF EXISTS t;
CREATE TABLE t (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT
);
turso> INSERT INTO t (name) VALUES ('A'); SELECT * FROM sqlite_sequence;
┌──────┬─────┐
│ name │ seq │
├──────┼─────┤
│ t │ 1 │
└──────┴─────┘
turso> DROP TABLE IF EXISTS t;
CREATE TABLE t (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT
);
turso> INSERT INTO t (name) VALUES ('A'); SELECT * FROM sqlite_sequence;
┌──────┬─────┐
│ name │ seq │
├──────┼─────┤
│ t │ 1 │
└──────┴─────┘
turso> INSERT INTO t (name) VALUES ('A'); SELECT * FROM sqlite_sequence;
┌──────┬─────┐
│ name │ seq │
├──────┼─────┤
│ t │ 2 │
└──────┴─────┘
turso>
```
Reviewed-by: Preston Thorpe <preston@turso.tech>
Closes#2983
This PR adds support for partial indexes, e.g. `CREATE INDEX` with a
provided predicate
```sql
CREATE UNIQUE INDEX idx_expensive ON products(sku) where price > 100;
```
The PR does not yet implement support for using the partial indexes in
the optimizer.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#3228
UNION queries, while useful on their own, are a cornerstone of recursive
CTEs.
This PR implements:
* the merge operator, required to merge both sides of a union query.
* the circuitry necessary to issue the Merge operator.
* extraction of tables mentioned in union and CTE expressions, so we can
correctly populate tables that contain them.
Closes#3234
The Merge operator is a stateless operator that merges two deltas.
There are two modes: Distinct, where we merge together values that
are the same, and All, where we preserve all values. We use the rowid of
the hashable row to guarantee that: In Distinct mode, the rowid is set
to 0 in both sides. If they values are the same, they will hash to the
same thing. For All, the rowids are different.
The merge operator is used for the UNION statement, which is a
cornerstone of Recursive CTEs.
The population code extracts table information from the select statement
so it can populate the materialized view. But the code, as written
today, is naive. It doesn't capture table information correctly if there
is more than one select statement (such in the case of a union query).
We currently only support column / literal comparisons in the filter
operator. But with JOINs, comparisons are usually against two columns.
Do the work to support it.
Unlike the other operators, project works just fine with ambiguous
columsn, because it works with compiled expressions. We don't need to
patch it, but let's make sure it keeps working by writing a test.
We already did similarly for the AggregateOperator: for joins
you can have the same column name in many tables. And passing schema
information to the operator is a layering violation (the operator may be
operating on the result of a previous node, and at that point there is
no more "schema"). Therefore we pass indexes into the column set the
operator has.
The FilterOperator has a complication: we are using it to generate the
SQL for the populate statement, and that needs column names. However,
we should *not* be using the FilterOperator for that, and that is a
relic from the time where we had operator information directly inside
the IncrementalView.
To enable moving the FilterOperator to index-based, we rework that code.
For joins, we'll need to populate many tables anyway, so we take the
time to do that work here.
This PR improves the DBSP circuit so that it handles the JOIN operator.
The JOIN operator exposes a weakness of our current model: we usually
pass a list of columns between operators, and find the right column by
name when needed.
But with JOINs, many tables can have the same columns. The operators
will then find the wrong column (same name, different table), and
produce incorrect results.
To fix this, we must do two things:
1) Change the Logical Plan. It needs to track table provenance.
2) Fix the aggregators: it needs to operate on indexes, not names.
For the aggregators, note that table provenance is the wrong
abstraction. The aggregator is likely working with a logical table that
is the result of previous nodes in the circuit. So we just need to be
able to tell it which index in the column array it should use.
The operator.rs file was so huge, that we didn't even notice there was a
test block in the middle of the file that was testing things that were
long moved to dbsp.rs (the HashableRow). Move the tests there now.