Commit Graph

6 Commits

Author SHA1 Message Date
Levy A.
77a412f6af refactor: remove unsafe reference semantics from RefValue
also renames `RefValue` to `ValueRef`, to align with rusqlite and other
crates
2025-10-07 10:43:44 -03:00
Pere Diaz Bou
0f631101df core: change page idx type from usize to i64
MVCC is like the annoying younger cousin (I know because I was him) that
needs to be treated differently. MVCC requires us to use root_pages that
might not be allocated yet, and the plan is to use negative root_pages
for that case. Therefore, we need i64 in order to fit this change.
2025-09-29 18:38:43 +02:00
Glauber Costa
3dc1dca5a8 use 128-bit hashes for the zset_id
We have used i64 before because that is the size of an integer in
SQLite. However, I believe that for large enough databases, the chances
of collision here are just too high. The effect of a collision is the
database silently returning incorrect data in the materialized view.

So now that everything else is working, we should move to i128.
2025-09-25 22:52:08 -03:00
Glauber Costa
b9011dfa16 Replace custom serialization with a saner version
The Materialized View code had custom serialization written so we could
move this code forward. Now that we have many operators and the views
work, replace it with something saner.

The main insight is that if we transform the AggregateState into Values
before the serialization, we are able to just use standard SQLite
serialization for the values. We then just have to add sizes, codes for
the functions, etc (which are also represented as Values).
2025-09-25 22:52:08 -03:00
Glauber Costa
f149b40e75 Implement JOINs in the DBSP circuit
This PR improves the DBSP circuit so that it handles the JOIN operator.
The JOIN operator exposes a weakness of our current model: we usually
pass a list of columns between operators, and find the right column by
name when needed.

But with JOINs, many tables can have the same columns. The operators
will then find the wrong column (same name, different table), and
produce incorrect results.

To fix this, we must do two things:
1) Change the Logical Plan. It needs to track table provenance.
2) Fix the aggregators: it needs to operate on indexes, not names.

For the aggregators, note that table provenance is the wrong
abstraction. The aggregator is likely working with a logical table that
is the result of previous nodes in the circuit. So we just need to be
able to tell it which index in the column array it should use.
2025-09-19 03:59:28 -05:00
Glauber Costa
aa8fcdbe54 move the aggregate operator to its own file.
The code is becoming impossible to reason about with everything in
operator.rs
2025-09-19 03:59:24 -05:00