This PR improves the DBSP circuit so that it handles the JOIN operator.
The JOIN operator exposes a weakness of our current model: we usually
pass a list of columns between operators, and find the right column by
name when needed.
But with JOINs, many tables can have the same columns. The operators
will then find the wrong column (same name, different table), and
produce incorrect results.
To fix this, we must do two things:
1) Change the Logical Plan. It needs to track table provenance.
2) Fix the aggregators: it needs to operate on indexes, not names.
For the aggregators, note that table provenance is the wrong
abstraction. The aggregator is likely working with a logical table that
is the result of previous nodes in the circuit. So we just need to be
able to tell it which index in the column array it should use.
The operator.rs file was so huge, that we didn't even notice there was a
test block in the middle of the file that was testing things that were
long moved to dbsp.rs (the HashableRow). Move the tests there now.
The join operator is also a stateful operator. It keeps the input deltas
stored in the state, for both the left and right branches of the join.
JOINs extract a join key, which is the values that were used in the
join's equality statement. That key is now our zset_id, and it points
to a collection of rows.
Ahead of the implementation of JOINs, we need to evolve the
IncrementalView, which currently only accepts a single base table,
to keep a list of tables mentioned in the statement.
Our code for view needs to extract the list of columns used in the view.
We currently extract only from "the base table", but once we have joins,
we need a more complex structure, that keeps the mapping of
(tables, columns).
This actually affects both views and materialized views: for views, the
queries with joins work just fine, because views are just aliases for
a query. But the list of columns returned by pragma table_info on the
view is incorrect. We add a test to make sure it is fixed.
For materialized views, we add extensive tests to make sure that the
columns are extracted correctly.
Adds `round`, `hex`, `unhex`, `abs`, `lower`, `upper`, `sign` and `log`
(with base) to the expression fuzzer.
Rounding with the precision argument still has some incompatibilities.
Closes#3160
Nothing fancy yet, assuming you merge this I'll do this one next:
```
warning: function pointer comparisons do not produce meaningful results since their addresses are not guaranteed to be unique
--> core/types.rs:403:5
|
398 | #[derive(Debug, Clone, PartialEq)]
| --------- in this derive macro expansion
...
402 | pub step_fn: StepFunction,
| ^^^^^^^^^^^^^^^^^^^^^^^^^
403 | pub finalize_fn: FinalizeFunction,
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: the address of the same function can vary between different codegen units
= note: furthermore, different functions could have the same address after being merged together
= note: for more information visit <https://doc.rust-lang.org/nightly/core/ptr/fn.fn_addr_eq.html>
```
And fix a test failure that I resolved in Python (specific to macOS
hosts). Basically this PR is putting my toe in the water to see how open
you are to contribs!
Closes#3211
This is wip but since I have it working on a single connection maybe
this is worth to review right now.
I haven't done any header work right now for the on disk format, but
instead I've basically added the skeleton for logical log transactions
on MVCC to work.
**before merging this pr I have to remove cherry picked commit from
https://github.com/tursodatabase/turso/pull/3156**
Closes#3158