# (another) refactor of read path query processing logic
This PR rewrites our select query processing architecture by moving away
from the stateful operator-based execution model, back to a more direct
bytecode generation approach that, IMO, is easier to follow. A large
part of the bytecode emission itself (`program.emit_insn(...)`) is just
copy-pasted from the old implementation (after all, it did _work_), but
just structured differently.
## Main Changes
1. Removed the `step()` state machine from operators. Previously, each
operator had internal state tracking its execution progress, and parent
operators would call `.step()` on their children until they needed to do
something else. Reading the code and trying to follow the execution was
not very easy, and the abstraction was also too general: there was a lot
of unnecessary pattern matching and special casing to make query
execution fit the model, when honestly the evaluation of a SELECT
without any CTEs or subqueries etc can only go a few different ways.
2. Because of the above change, the main codegen function
`emit_program()` now contains a series of linear conditional steps
instead of kicking off the state machines with `root_operator.step()`.
These steps are just things like: "open the cursors", "open the loops",
"emit a record into either the main output or a sorter", etc.
3. The `Plan` struct now (again) contains most of the familiar SELECT
query components (WHERE clause, GROUP BY, ORDER BY, etc.) rather than
having all of them embedded in a tree of operators. The operator tree
now ONLY consists of operators that read from a source table in some way
-- so it could just be called a join tree, I guess.
4. There's now `plan.result_columns` which is _ALWAYS_ evaluated to get
the final results of a SELECT. Previously the operator state machine
thing had a hodgepodge of different ways of arriving at the result row.
5. Removed operators:
- Removed Filter operator (even in the previous version the Filter
operator -- which is really the where clause -- had its predicates
pushed down to the table loops, and it didn't really ever exist in the
bytecode emission phase anymore)
- Removed Projection operator (`plan.result_columns`)
- Removed Limit operator (`plan.limit`)
- Removed Aggregate operator (`plan.group_by` and `plan.aggregates`)
- Removed Order operator (`plan.order_by`)
6. Added `ast::Expr::Column` to the vendored sqlite3 parser -- column
resolution is now done as early as possible. This eliminates repeated
string comparisons during execution. I.e. no need for
`resolve_ident_table()` etc
7. Simplified expression result caching by removing the complex, and
frankly weird, ExpressionResultCache apparatus. The refactored code
handles this by tracking which cursor to read columns from at a given
time, and copies values from existing registers if the expression is a
computation that has already been done in a previous step of the
execution. For example in:
```
limbo> select concat(u.first_name, '-LOL'), sum(u.age) from users u group by concat(u.first_name, '-LOL') order by sum(u.age) desc limit 10;
Michael-LOL|11204
David-LOL|8758
Robert-LOL|8109
Jennifer-LOL|7700
John-LOL|7299
Christopher-LOL|6397
James-LOL|5921
Joseph-LOL|5711
Brian-LOL|5059
William-LOL|5047
```
the query execution engine knows that `concat(u.first_name, '-LOL')` is
the second column of the `ORDER_BY` sorter without any complex caching.
**HACK:** For deduplicating expressions in ORDER BY and the SELECT body,
the code still relies on expression `==` equality to make those
decisions which sucks (e.g. `sum(x) != SUM(x)` -- I've marked the parts
where this is used with a TODO, we should have a custom expression
equality comparison function instead...). This is not a correctness-
breaking thing, but still.
## In short
- No more state machines
- The operator tree is now only a "join tree", pretty much
- No weird general purpose `ExpressionResultCache`
- More direct mapping between SQL operations and generated bytecode --
there's really no harm in carrying the "group by" etc concepts in the
bytecode generation phase instead of burying them inside Operators
- When a ResultRow is emitted, it is _always_ done by evaluating
`plan.result_columns`, instead of the special-casing and hacks that
existed previously
- 600+ LOC removed
Closes#416
- Add support for division in SQL expressions
- Fix issues with subtraction
- Support multiplication of integers and floats
- Support aggregate functions in mathematical expressions
- Add compatibility tests for mathematical operations, also with
aggregate functions
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#418
Implemeted checksums so that sqlite3 is able to read our WAL. This also
helps with future work on proper recovery of WAL.
Create some frames with CREATE TABLE and kill the process so that there
is no checkpoint.
```
Limbo v0.0.6
Enter ".help" for usage hints.
limbo> create table x(x);
limbo> [1] 15910 killed cargo run xlimbo.db
```
Now sqlite3 is able to recover from this WAL created in limbo:
```
sqlite3 xlimbo.db
SQLite version 3.43.2 2023-10-10 13:08:14
Enter ".help" for usage hints.
sqlite> .schema
CREATE TABLE x (x);
```
Closes#413