Commit Graph

1213 Commits

Author SHA1 Message Date
jussisaurio
3e9883bfbd update COMPAT 2024-11-30 10:06:37 +02:00
jussisaurio
3f80e41e7a support HAVING 2024-11-30 10:05:13 +02:00
jussisaurio
fceb9ac62b Merge 'core: (another) refactor of read path query processing logic' from Jussi Saurio
# (another) refactor of read path query processing logic
This PR rewrites our select query processing architecture by moving away
from the stateful operator-based execution model, back to a more direct
bytecode generation approach that, IMO, is easier to follow. A large
part of the bytecode emission itself (`program.emit_insn(...)`) is just
copy-pasted from the old implementation (after all, it did _work_), but
just structured differently.
## Main Changes
1. Removed the `step()` state machine from operators. Previously, each
operator had internal state tracking its execution progress, and parent
operators would call `.step()` on their children until they needed to do
something else. Reading the code and trying to follow the execution was
not very easy, and the abstraction was also too general: there was a lot
of unnecessary pattern matching and special casing to make query
execution fit the model, when honestly the evaluation of a SELECT
without any CTEs or subqueries etc can only go a few different ways.
2. Because of the above change, the main codegen function
`emit_program()` now contains a series of linear conditional steps
instead of kicking off the state machines with `root_operator.step()`.
These steps are just things like: "open the cursors", "open the loops",
"emit a record into either the main output or a sorter", etc.
3. The `Plan` struct now (again) contains most of the familiar SELECT
query components (WHERE clause, GROUP BY, ORDER BY, etc.) rather than
having all of them embedded in a tree of operators. The operator tree
now ONLY consists of operators that read from a source table in some way
-- so it could just be called a join tree, I guess.
4. There's now `plan.result_columns` which is _ALWAYS_ evaluated to get
the final results of a SELECT. Previously the operator state machine
thing had a hodgepodge of different ways of arriving at the result row.
5. Removed operators:
   - Removed Filter operator (even in the previous version the Filter
operator -- which is really the where clause -- had its predicates
pushed down to the table loops, and it didn't really ever exist in the
bytecode emission phase anymore)
   - Removed Projection operator (`plan.result_columns`)
   - Removed Limit operator (`plan.limit`)
   - Removed Aggregate operator (`plan.group_by` and `plan.aggregates`)
   - Removed Order operator (`plan.order_by`)
6. Added `ast::Expr::Column` to the vendored sqlite3 parser -- column
resolution is now done as early as possible. This eliminates repeated
string comparisons during execution. I.e. no need for
`resolve_ident_table()` etc
7. Simplified expression result caching by removing the complex, and
frankly weird, ExpressionResultCache apparatus. The refactored code
handles this by tracking which cursor to read columns from at a given
time, and copies values from existing registers if the expression is a
computation that has already been done in a previous step of the
execution. For example in:
```
limbo> select concat(u.first_name, '-LOL'), sum(u.age) from users u group by concat(u.first_name, '-LOL') order by sum(u.age) desc limit 10;
Michael-LOL|11204
David-LOL|8758
Robert-LOL|8109
Jennifer-LOL|7700
John-LOL|7299
Christopher-LOL|6397
James-LOL|5921
Joseph-LOL|5711
Brian-LOL|5059
William-LOL|5047
```
the query execution engine knows that `concat(u.first_name, '-LOL')` is
the second column of the `ORDER_BY` sorter without any complex caching.
**HACK:** For deduplicating expressions in ORDER BY and the SELECT body,
the code still relies on expression `==` equality to make those
decisions which sucks (e.g. `sum(x) != SUM(x)` -- I've marked the parts
where this is used with a TODO, we should have a custom expression
equality comparison function instead...). This is not a correctness-
breaking thing, but still.
## In short
- No more state machines
- The operator tree is now only a "join tree", pretty much
- No weird general purpose `ExpressionResultCache`
- More direct mapping between SQL operations and generated bytecode --
there's really no harm in carrying the "group by" etc concepts in the
bytecode generation phase instead of burying them inside Operators
- When a ResultRow is emitted, it is _always_ done by evaluating
`plan.result_columns`, instead of the special-casing and hacks that
existed previously
- 600+ LOC removed

Closes #416
2024-11-30 10:03:58 +02:00
jussisaurio
84742b81fa Obsolete comment 2024-11-27 22:43:36 +02:00
jussisaurio
da811dc403 add doc comments for members of Plan struct 2024-11-27 19:30:07 +02:00
jussisaurio
db462530f1 metadata instead of m 2024-11-27 19:27:36 +02:00
jussisaurio
7d569aee1f fix stupid comment 2024-11-26 18:37:06 +02:00
jussisaurio
1b34698872 add comments and rename some misleading label variables 2024-11-26 18:28:19 +02:00
jussisaurio
7f04f8e88f rename 2024-11-26 17:41:08 +02:00
jussisaurio
122546444f extract function order_by_sorter_insert() 2024-11-26 17:40:49 +02:00
jussisaurio
3d27ef90f5 emitting result columns generally works the same way -> extract it 2024-11-26 17:31:51 +02:00
jussisaurio
c74981873e Extract ORDER BY result column deduping into a function 2024-11-26 17:31:51 +02:00
jussisaurio
89569fa7a3 Remove redundant if-else after refactoring ResultSetColumn to struct 2024-11-26 17:31:51 +02:00
jussisaurio
ac12e9c7fd No need for ResultSetColumn to be an enum 2024-11-26 17:31:51 +02:00
jussisaurio
bb8ba7fb01 add tests for arithmetic on two aggregates with no from clause 2024-11-26 17:31:51 +02:00
jussisaurio
7d5fa12bb7 fix allocating wrong number of registers upfront for aggregation results 2024-11-26 17:31:51 +02:00
jussisaurio
4636f71522 test ordering by aggregate not mentioned in select 2024-11-26 17:31:51 +02:00
jussisaurio
56b15193d0 resolve aggregates from orderby as well 2024-11-26 17:31:51 +02:00
jussisaurio
885b6ecd76 Remove 'cursor_hint': it is never needed 2024-11-26 17:31:51 +02:00
jussisaurio
008be10cfd Add TODO about expression equality comparisons 2024-11-26 17:31:51 +02:00
jussisaurio
cfb7e79601 Function doc comments 2024-11-26 17:31:51 +02:00
jussisaurio
fc33c70481 remove many unnecessary fields from SortMetadata and GroupByMetadata 2024-11-26 17:31:51 +02:00
jussisaurio
ebce78bcd9 rename 2024-11-26 17:31:51 +02:00
jussisaurio
0510e150d3 fix comment 2024-11-26 17:31:51 +02:00
jussisaurio
1c37d8b24b extract function sorter_insert() 2024-11-26 17:31:51 +02:00
jussisaurio
4f3da982c0 extract function emit_result_row() 2024-11-26 17:31:51 +02:00
jussisaurio
52beeabd45 tweaks 2024-11-26 17:31:51 +02:00
jussisaurio
120601f732 fix metadata comments 2024-11-26 17:31:51 +02:00
jussisaurio
97ba4a788e remove sorts hashmap - only one sortmetadata struct is needed 2024-11-26 17:31:51 +02:00
jussisaurio
d2f84edd2e fix accidentally removing push_scan_direction() 2024-11-26 17:31:51 +02:00
jussisaurio
7ecc252507 fix rest of the failing tests 2024-11-26 17:31:51 +02:00
jussisaurio
9a557516b8 Fixes for expressions with aggregate arguments + limit 0 2024-11-26 17:31:51 +02:00
jussisaurio
cc902ed25d GROUP BY and ORDER BY mostly work 2024-11-26 17:31:51 +02:00
jussisaurio
3f9e60633f select refactor: order by and basic agg kinda work 2024-11-26 17:31:51 +02:00
jussisaurio
d0466e1cae introduce Column member of ast::Expr and bind idents to columns 2024-11-26 17:31:51 +02:00
Pekka Enberg
4c5f9eb73b Merge 'contributing: Add note about testing against TPC-H databases' from Jussi Saurio
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #419
2024-11-26 15:56:43 +02:00
jussisaurio
574f52ddbb Add note about testing against TPC-H databases 2024-11-25 21:57:34 +02:00
jussisaurio
418ad40401 Merge 'Fix some Clippy warnings' from Lauri Virtanen
Reviewed-by: Pere Diaz Bou <limeng.1@bytedance.com>

Closes #417
2024-11-25 16:43:06 +02:00
jussisaurio
1651779e4c Merge 'Improve maths support' from Lauri Virtanen
- Add support for division in SQL expressions
- Fix issues with subtraction
- Support multiplication of integers and floats
- Support aggregate functions in mathematical expressions
- Add compatibility tests for mathematical operations, also with
aggregate functions

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #418
2024-11-25 16:36:45 +02:00
Lauri Virtanen
1b2835b316 Add math operator compatibility tests 2024-11-24 22:12:23 +02:00
Lauri Virtanen
70c4d6b360 Support multiplying combinations of different types 2024-11-24 22:11:37 +02:00
Lauri Virtanen
af9d407dee Fix issues with subtraction of different type combinations 2024-11-24 22:10:23 +02:00
Lauri Virtanen
cafbf5499f Support divide operator in expressions 2024-11-24 22:10:07 +02:00
Lauri Virtanen
afeb1cbe74 Clippy warning fixes 2024-11-24 20:24:47 +02:00
Lauri Virtanen
a7100d8e9b Autofix clippy issues with cargo fix --clippy 2024-11-24 20:24:47 +02:00
Pekka Enberg
aa4dd5c8e7 Merge 'wal: checksums' from Pere Diaz Bou
Implemeted checksums so that sqlite3 is able to read our WAL. This also
helps with future work on proper recovery of WAL.
Create some frames with CREATE TABLE and kill the process so that there
is no checkpoint.
```
Limbo v0.0.6
Enter ".help" for usage hints.
limbo> create table x(x);
limbo> [1]    15910 killed     cargo run xlimbo.db
```
Now sqlite3 is able to recover from this WAL created in limbo:
```
sqlite3 xlimbo.db
SQLite version 3.43.2 2023-10-10 13:08:14
Enter ".help" for usage hints.
sqlite> .schema
CREATE TABLE x (x);
```

Closes #413
2024-11-22 13:21:01 +02:00
jussisaurio
046c4933a5 Merge 'io macro' from Jussi Saurio
just having some dang fun

Reviewed-by: Pere Diaz Bou <pere-altea@hotmail.com>
Reviewed-by: Pere Diaz Bou <pere-altea@hotmail.com>

Closes #385
2024-11-21 20:36:07 +02:00
jussisaurio
c722074016 missing cursorresult handling 2024-11-21 20:29:46 +02:00
jussisaurio
f945795ae6 consistent naming 2024-11-21 20:25:51 +02:00
jussisaurio
d8eb4be424 better, less cool names 2024-11-21 20:23:53 +02:00