Instead of using static elements, use a dynamically generated DBSP-
circuit to keep views.
The DBSP circuit is generated from the logical plan, which only supports
enough for us to generate the DBSP circuit at the moment.
The state of the view is still kept inside the IncrementalView, instead
of materialized at the operator level. As a consequence, this still
depends on us always populating the view at startup. Fixing this is the
next step.
Closes#2815
```
Benchmarking Prepare `SELECT first_name, count(1) FROM users GROUP BY first_name HAVING count(1) > 1 ORDER BY cou...: Collecting 100 samples in estimated 5.008
Prepare `SELECT first_name, count(1) FROM users GROUP BY first_name HAVING count(1) > 1 ORDER BY cou...
time: [4.0081 µs 4.0223 µs 4.0364 µs]
change: [-2.9298% -2.2538% -1.6786%] (p = 0.00 < 0.05)
Performance has improved.
```
Reviewed-by: Preston Thorpe <preston@turso.tech>
Closes#2847
It's a hot instruction for TPC-H, for example, so worth optimizing.
Reduces op_zero_or_null() from 5.6% to 2.4% in CPU flamegraph for TCP-H
Q1.
Reviewed-by: Preston Thorpe <preston@turso.tech>
Closes#2842
This adds support for "OFF" and "FULL" (default) synchronous modes. As
future work, we need to add NORMAL and EXTRA as well because
applications expect them.
Closes#2833
- check free list trunk and pages
- use shared hash map to check for duplicate references for pages
- properly check overflow pages
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#2816
This adds support for "OFF" and "FULL" (default) synchronous modes. As
future work, we need to add NORMAL and EXTRA as well because
applications expect them.
The removed comment no longer matches the current code. The
OrderByRemapping struct and the surrounding comments are sufficient to
explain deduplication and remapping.
I added `IOContext` to `DatbaseStorage` IO trait and this struct will
carry the necessary ctx required for encryption (or checksums.). This
lets us set the encryption at outside and let the IO layer handle it
properly
Reviewed-by: Preston Thorpe <preston@turso.tech>
Closes#2812
We need a read only phase and a commit phase. Otherwise we will never
be able to rollback changes properly. We currently do that, but we
do that in the view. Before we move to circuits, this needs to be
internalized by the operator.
I am 100% sure they are total bullshit by now, since we don't implement
the join operator yet. The code evolved a lot, and in every turn there
are issues with aggregators, projectors, filters... some subtle, some
not so subtle.
We keep having to patch join slightly as we make changes to the API, but
we don't truly exercise whether or not they keep working because there
is no support for them in the views. Therefore: let's remove it. We'll
bring it back later.
min/max require O(N) storage because of deletions. It is easy to see
why: if you *add* a new row, you can quickly and incrementally check
if it is smaller / larger than the previous accumulator.
But when you *delete* a row you can't do that and have to check the
previous values.
Feldera uses something called "traces" which to me look a lot like
indexes. When we implement materialization, this is easy to do. But to
avoid having something broken, we'll just disable min / max until then.
The operator itself should handle deletions and updates that change
the rowid by consolidating its state.
Our current materialized views track state themselves, so we don't
see this problem now. But it becomes apparent once we switch the
views to use circuits.
This is a first pass on logical plans. The idea is that the DBSP
compiler will have an easier time operating on a logical plan, that
exposes linear algebra operators, than on SQL expr.
To keep this simple, we only support filters, aggregates and projections
for now, and will add more later as we agree on the core of the
implementation.
To make sure that the implementations is reasonable, I tried my best to
generate a couple of logical plans using Datafusion and seeing if we
were generating something similar.
Our plans are not the same as Datafusion's, though. There are two
important differences:
* SQLite is weird, and it allows columns that are not part of the group
by statement to appear in aggregated statements. For example:
select a, count(b) from table group by c; <== that "a" is usually not
permitted and datafusion will reject it. SQLite will be happy to
accept it
* Datafusion will not generate a projection on queries like this:
select sum(hex(a)) from table, and just keep the complex expression
hex(a) inside the aggregation. For DBSP to work well, we'll need an
explicit aggregation there.
Because there are no users yet, I am marking this as [cfg(test)], but
I wanted to put this out there ASAP.