I am 100% sure they are total bullshit by now, since we don't implement
the join operator yet. The code evolved a lot, and in every turn there
are issues with aggregators, projectors, filters... some subtle, some
not so subtle.
We keep having to patch join slightly as we make changes to the API, but
we don't truly exercise whether or not they keep working because there
is no support for them in the views. Therefore: let's remove it. We'll
bring it back later.
min/max require O(N) storage because of deletions. It is easy to see
why: if you *add* a new row, you can quickly and incrementally check
if it is smaller / larger than the previous accumulator.
But when you *delete* a row you can't do that and have to check the
previous values.
Feldera uses something called "traces" which to me look a lot like
indexes. When we implement materialization, this is easy to do. But to
avoid having something broken, we'll just disable min / max until then.
The operator itself should handle deletions and updates that change
the rowid by consolidating its state.
Our current materialized views track state themselves, so we don't
see this problem now. But it becomes apparent once we switch the
views to use circuits.
This is a first pass on logical plans. The idea is that the DBSP
compiler will have an easier time operating on a logical plan, that
exposes linear algebra operators, than on SQL expr.
To keep this simple, we only support filters, aggregates and projections
for now, and will add more later as we agree on the core of the
implementation.
To make sure that the implementations is reasonable, I tried my best to
generate a couple of logical plans using Datafusion and seeing if we
were generating something similar.
Our plans are not the same as Datafusion's, though. There are two
important differences:
* SQLite is weird, and it allows columns that are not part of the group
by statement to appear in aggregated statements. For example:
select a, count(b) from table group by c; <== that "a" is usually not
permitted and datafusion will reject it. SQLite will be happy to
accept it
* Datafusion will not generate a projection on queries like this:
select sum(hex(a)) from table, and just keep the complex expression
hex(a) inside the aggregation. For DBSP to work well, we'll need an
explicit aggregation there.
Because there are no users yet, I am marking this as [cfg(test)], but
I wanted to put this out there ASAP.
This PR makes plenty of changes to the sync engine among which the main
ones are:
1. Now, we have only single database file which we use to directly
execute `pull` and `push` requests (so we don't have `draft` / `synced`
databases)
2. Last-write-win strategy were fixed a little bit - because before this
PR insert-insert conflict wasn't resolved automatically
3. Now sync engine can apply arbitrary `transform` function to the
logical changes
4. Sync-engine-aware checkpoint was implemented. Now, database created
by sync-engine has explicit `checkpoint()` method which under the hood
will use additional file to save frames needed for revert operation
during pull
5. Pull operation were separated into 2 phases internally: wait for
changes & apply changes
* The problem is that pull operation itself (e.g. apply) right now
require exclusive lock to the sync engine and if user wants to pull &
push independently this will be problematic (as pull will lock the db
and push will never succeed)
6. Added support for V1 pull protocol
7. Exposed simple `stats()` method which return amount of pending cdc
operations and current wal size
Closes#2810
Go bindings have moved to their own repo
https://github.com/tursodatabase/turso-go
Otherwise it is a pain in the ass to have a Go module as a subdirectory
of another repository. and now, they can get the TLC that they needed,
as I found out that it never worked for windows, and the embedding
wasn't working properly in some cases. We want them to be plug and play.
Closes#2807
There's no need to call io.now() unless debug tracing is on. Let's
micro-optimize commit_dirty_pages() to avoid the unnecessary call.
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#2808
When we reverted the #2789 , we had already merged #2793 . That Pr used
some helper methods that were created in #2789. So I just added them
back here + fixed the simulator dockerfile.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#2803
We had missed #2794 because we don't get any visible feedback on PRs if
we return wrong results from TPC-H, so let's make that happen here.
Closes#2798
Decouple Sql generation code from simulator code, so that it can
potentially be reused for fuzzing on other crates and to create a
`GenerationContext` trait so that it becomes easier to create
`Simulation Profiles`. Ideally in further PRs, I want to expand the
`GenerationContext` trait so we can guide the generation with context
from the simulation profile.
Depends on #2789 .
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#2793