Trying to return integer sometimes to match SQLite led to more problems
that I anticipated. The reason being, we can't *really* match SQLite's
behavior unless we know the type of *every* element in the sum. This is
not impossible, but it is very hard, for very little gain.
Fixes#3831
Implements COUNT/SUM/AVG(DISTINCT) and SELECT DISTINCT for materialized views.
To do this we have to keep a list of the actual distinct values
(similarly to how we do for min/max). We then update the operator (and
issue deltas) only when there is a state transition (for example, if we
already count the value x = 1, and we see an insert for x = 1, we do
nothing).
SELECT DISTINCT (with no aggregator) is similar. We already have to keep
a list of the values anyway to power the aggregates. So we just issue
new deltas based on the transition, without updating the aggregator.
This patch pushes unsafe Send and Sync to individual components instead
of doing it at Database level. This makes it easier for us to
incrementally fix thread-safety, but avoid developers adding more thread
unsafe code.
Yield is a completion that does not allocate any inner state. By design
it is completed from the start and has no errors. This allows lightly
yield without allocating any locks nor heap allocate inner state.
This PR add proper program abort in case of unfinished statement reset
and interruption.
Also, this PR makes rollback methods non-failing because otherwise of
their callers usually unclear (if rollback failed - what is the state of
statement/connection/transaction?)
Reviewed-by: Preston Thorpe <preston@turso.tech>
Closes#3591
MVCC is like the annoying younger cousin (I know because I was him) that
needs to be treated differently. MVCC requires us to use root_pages that
might not be allocated yet, and the plan is to use negative root_pages
for that case. Therefore, we need i64 in order to fit this change.
In comparisons for joins, we were assuming that the left column belonged
to the left join (and vice-versa). That is incorrect, because you can
write the comparison condition in any order.
Fixes#3368
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#3400
In comparisons for joins, we were assuming that the left column belonged
to the left join (and vice-versa). That is incorrect, because you can
write the comparison condition in any order.
Fixes#3368
We had code for this, but the code had a fatal flaw: it tried to detect
a complex operation (an operation that needs projection), and return
false (no need for projection), for the others.
This is the exact opposite of what we should do: we should identify the
*simple* operations, and then return true (needs projection) for the
rest.
CAST is a special beast, since it is not a function, but rather, a
special opcode. Everything else above is the true just the same. But for
CAST, we have to do the extra work to capture it in the logical plan and
pass it down.
Fixes#3372Fixes#3370Fixes#3369
Improve serialization for DBSP views.
The serialization code was written organically, without much forward
thinking about stability as we evolved the table and operator format.
Now that this is done, we are at at point where we can actually make it
suck less and take a considerable step towards making this production
ready.
We also add a simple version check (in the table name, because that is
much easier than reading contents in parse_schema_row) to prevent views
to be used if we had to do anything to evolve the format of the circuit
(including the operators)
Closes#3351
as we make changes to the way materialized views are generated (think
adding new operators, changing the id of existing operators, etc), we
will need to persist the topology of the circuit itself. This is a
change that I believe to be premature. For now, it is enough to reserve
the first operator id for it, and add a version number to the table
name. We can just detect that something changed, and ask the user to
drop the view. We can get away with it due to the fact that the views
are experimental.
We have used i64 before because that is the size of an integer in
SQLite. However, I believe that for large enough databases, the chances
of collision here are just too high. The effect of a collision is the
database silently returning incorrect data in the materialized view.
So now that everything else is working, we should move to i128.
The Materialized View code had custom serialization written so we could
move this code forward. Now that we have many operators and the views
work, replace it with something saner.
The main insight is that if we transform the AggregateState into Values
before the serialization, we are able to just use standard SQLite
serialization for the values. We then just have to add sizes, codes for
the functions, etc (which are also represented as Values).
fixes#1976
and #1605
```zsh
turso> DROP TABLE IF EXISTS t;
CREATE TABLE t (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT
);
turso> INSERT INTO t (name) VALUES ('A'); SELECT * FROM sqlite_sequence;
┌──────┬─────┐
│ name │ seq │
├──────┼─────┤
│ t │ 1 │
└──────┴─────┘
turso> DROP TABLE IF EXISTS t;
CREATE TABLE t (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT
);
turso> INSERT INTO t (name) VALUES ('A'); SELECT * FROM sqlite_sequence;
┌──────┬─────┐
│ name │ seq │
├──────┼─────┤
│ t │ 1 │
└──────┴─────┘
turso> INSERT INTO t (name) VALUES ('A'); SELECT * FROM sqlite_sequence;
┌──────┬─────┐
│ name │ seq │
├──────┼─────┤
│ t │ 2 │
└──────┴─────┘
turso>
```
Reviewed-by: Preston Thorpe <preston@turso.tech>
Closes#2983
This PR adds support for partial indexes, e.g. `CREATE INDEX` with a
provided predicate
```sql
CREATE UNIQUE INDEX idx_expensive ON products(sku) where price > 100;
```
The PR does not yet implement support for using the partial indexes in
the optimizer.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#3228
UNION queries, while useful on their own, are a cornerstone of recursive
CTEs.
This PR implements:
* the merge operator, required to merge both sides of a union query.
* the circuitry necessary to issue the Merge operator.
* extraction of tables mentioned in union and CTE expressions, so we can
correctly populate tables that contain them.
Closes#3234
The Merge operator is a stateless operator that merges two deltas.
There are two modes: Distinct, where we merge together values that
are the same, and All, where we preserve all values. We use the rowid of
the hashable row to guarantee that: In Distinct mode, the rowid is set
to 0 in both sides. If they values are the same, they will hash to the
same thing. For All, the rowids are different.
The merge operator is used for the UNION statement, which is a
cornerstone of Recursive CTEs.
The population code extracts table information from the select statement
so it can populate the materialized view. But the code, as written
today, is naive. It doesn't capture table information correctly if there
is more than one select statement (such in the case of a union query).