Commit Graph

1218 Commits

Author SHA1 Message Date
Alex Miller
f7bb7f8dee Fix typo and improve comment 2024-12-08 14:20:23 -08:00
Alex Miller
c2e3957d73 I misunderstood what a constant instruction was 2024-12-08 14:12:45 -08:00
Alex Miller
eb00226cfe Add support for CASE expressions.
There's two forms of case:
  CASE (WHEN [bool expr] THEN [value])+ (ELSE [value])? END
which checks a series of boolean conditions, and:
  CASE expr (WHEN [expr] THEN [value})+ (ELSE [value])? END
Which checks a series of equality conditions.

This implements support for both. Note that the ELSE is optional, and
will be equivalent to `ELSE null` if not specified.

sqlite3 gives the implementation as:

sqlite> explain select case a WHEN a THEN b WHEN c THEN d ELSE 0 END from casetest;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     16    0                    0   Start at 16
1     OpenRead       0     3     0     4              0   root=3 iDb=0; casetest
2     Rewind         0     15    0                    0
3       Column         0     0     2                    0   r[2]= cursor 0 column 0
4       Column         0     0     3                    0   r[3]= cursor 0 column 0
5       Ne             3     8     2     BINARY-8       83  if r[2]!=r[3] goto 8
6       Column         0     1     1                    0   r[1]= cursor 0 column 1
7       Goto           0     13    0                    0
8       Column         0     2     3                    0   r[3]= cursor 0 column 2
9       Ne             3     12    2     BINARY-8       83  if r[2]!=r[3] goto 12
10      Column         0     3     1                    0   r[1]= cursor 0 column 3
11      Goto           0     13    0                    0
12      Integer        0     1     0                    0   r[1]=0
13      ResultRow      1     1     0                    0   output=r[1]
14    Next           0     3     0                    1
15    Halt           0     0     0                    0
16    Transaction    0     0     2     0              1   usesStmtJournal=0
17    Goto           0     1     0                    0

and after this patch, limbo gives:

addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     18    0                    0   Start at 18
1     OpenReadAsync      0     4     0                    0   table=casetest, root=4
2     OpenReadAwait      0     0     0                    0
3     RewindAsync        0     0     0                    0
4     RewindAwait        0     17    0                    0   Rewind table casetest
5       Column           0     0     2                    0   r[2]=casetest.a
6       Column           0     0     3                    0   r[3]=casetest.a
7       Ne               2     3     10                   0   if r[2]!=r[3] goto 10
8       Column           0     1     1                    0   r[1]=casetest.b
9       Goto             0     14    0                    0
10      Column           0     2     3                    0   r[3]=casetest.c
11      Ne               2     3     14                   0   if r[2]!=r[3] goto 14
12      Column           0     3     1                    0   r[1]=casetest.d
13      Goto             0     14    0                    0
14      ResultRow        1     1     0                    0   output=r[1]
15    NextAsync          0     0     0                    0
16    NextAwait          0     5     0                    0
17    Halt               0     0     0                    0
18    Transaction        0     0     0                    0
19    Integer            0     1     0                    0   r[1]=0
20    Goto               0     1     0                    0

And then as there's nowhere to annotate this new support in COMPAT.md, I
added a corresponding heading for SELECT expressions and what is/isn't
supported.
2024-12-08 14:09:03 -08:00
jussisaurio
9bc3ccc394 fmt 2024-12-03 19:11:08 +02:00
jussisaurio
885136a511 Merge 'fix(core/translate): fix bug with multiway joins and clean up left join implementation' from Jussi Saurio
There was a bug where this kind of query (a 3-way join with two seeks
and only one scan loop) would emit a wrong jump target for DecrJumpZero:
```
limbo> explain select u.first_name, u2.last_name, p.name from users u join users u2 on u.id=u2.id join products p on u2.id = p.id limit 3;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     21    0                    0   Start at 21
1     OpenReadAsync      0     2     0                    0   table=u, root=2
2     OpenReadAwait      0     0     0                    0
3     OpenReadAsync      1     2     0                    0   table=u2, root=2
4     OpenReadAwait      0     0     0                    0
5     OpenReadAsync      2     3     0                    0   table=p, root=3
6     OpenReadAwait      0     0     0                    0
7     RewindAsync        0     0     0                    0
8     RewindAwait        0     18    0                    0   Rewind table u
9       RowId            0     1     0                    0   r[1]=u.rowid
10      SeekRowid        1     1     18                   0   if (r[1]!=u2.rowid) goto 18
11      RowId            1     2     0                    0   r[2]=u2.rowid
12      SeekRowid        2     2     18                   0   if (r[2]!=p.rowid) goto 18
13      Column           0     1     3                    0   r[3]=u.first_name
14      Column           1     2     4                    0   r[4]=u2.last_name
15      Column           2     1     5                    0   r[5]=p.name
16      ResultRow        3     3     0                    0   output=r[3..5]
17      DecrJumpZero     6     18    0                    0   if (--r[6]==0) goto 18 <--- this should go to Halt!!!
18    NextAsync          0     0     0                    0
19    NextAwait          0     9     0                    0
20    Halt               0     0     0                    0
21    Transaction        0     0     0                    0
22    Integer            3     6     0                    0   r[6]=3
23    Goto               0     1     0                    0
```
due to incorrect label bookkeeping.
fixed the bookkeeping, plus cleaned up unnecessary crap from the left
join bookkeeping at the same time.

Closes #423
2024-12-03 19:02:28 +02:00
jussisaurio
ca25a73c95 a little bit more explanation about left join handling 2024-11-30 20:54:22 +02:00
jussisaurio
83f8ea1b13 Fix bug with multiway joins and clean up left join implementation 2024-11-30 20:47:48 +02:00
jussisaurio
fceb9ac62b Merge 'core: (another) refactor of read path query processing logic' from Jussi Saurio
# (another) refactor of read path query processing logic
This PR rewrites our select query processing architecture by moving away
from the stateful operator-based execution model, back to a more direct
bytecode generation approach that, IMO, is easier to follow. A large
part of the bytecode emission itself (`program.emit_insn(...)`) is just
copy-pasted from the old implementation (after all, it did _work_), but
just structured differently.
## Main Changes
1. Removed the `step()` state machine from operators. Previously, each
operator had internal state tracking its execution progress, and parent
operators would call `.step()` on their children until they needed to do
something else. Reading the code and trying to follow the execution was
not very easy, and the abstraction was also too general: there was a lot
of unnecessary pattern matching and special casing to make query
execution fit the model, when honestly the evaluation of a SELECT
without any CTEs or subqueries etc can only go a few different ways.
2. Because of the above change, the main codegen function
`emit_program()` now contains a series of linear conditional steps
instead of kicking off the state machines with `root_operator.step()`.
These steps are just things like: "open the cursors", "open the loops",
"emit a record into either the main output or a sorter", etc.
3. The `Plan` struct now (again) contains most of the familiar SELECT
query components (WHERE clause, GROUP BY, ORDER BY, etc.) rather than
having all of them embedded in a tree of operators. The operator tree
now ONLY consists of operators that read from a source table in some way
-- so it could just be called a join tree, I guess.
4. There's now `plan.result_columns` which is _ALWAYS_ evaluated to get
the final results of a SELECT. Previously the operator state machine
thing had a hodgepodge of different ways of arriving at the result row.
5. Removed operators:
   - Removed Filter operator (even in the previous version the Filter
operator -- which is really the where clause -- had its predicates
pushed down to the table loops, and it didn't really ever exist in the
bytecode emission phase anymore)
   - Removed Projection operator (`plan.result_columns`)
   - Removed Limit operator (`plan.limit`)
   - Removed Aggregate operator (`plan.group_by` and `plan.aggregates`)
   - Removed Order operator (`plan.order_by`)
6. Added `ast::Expr::Column` to the vendored sqlite3 parser -- column
resolution is now done as early as possible. This eliminates repeated
string comparisons during execution. I.e. no need for
`resolve_ident_table()` etc
7. Simplified expression result caching by removing the complex, and
frankly weird, ExpressionResultCache apparatus. The refactored code
handles this by tracking which cursor to read columns from at a given
time, and copies values from existing registers if the expression is a
computation that has already been done in a previous step of the
execution. For example in:
```
limbo> select concat(u.first_name, '-LOL'), sum(u.age) from users u group by concat(u.first_name, '-LOL') order by sum(u.age) desc limit 10;
Michael-LOL|11204
David-LOL|8758
Robert-LOL|8109
Jennifer-LOL|7700
John-LOL|7299
Christopher-LOL|6397
James-LOL|5921
Joseph-LOL|5711
Brian-LOL|5059
William-LOL|5047
```
the query execution engine knows that `concat(u.first_name, '-LOL')` is
the second column of the `ORDER_BY` sorter without any complex caching.
**HACK:** For deduplicating expressions in ORDER BY and the SELECT body,
the code still relies on expression `==` equality to make those
decisions which sucks (e.g. `sum(x) != SUM(x)` -- I've marked the parts
where this is used with a TODO, we should have a custom expression
equality comparison function instead...). This is not a correctness-
breaking thing, but still.
## In short
- No more state machines
- The operator tree is now only a "join tree", pretty much
- No weird general purpose `ExpressionResultCache`
- More direct mapping between SQL operations and generated bytecode --
there's really no harm in carrying the "group by" etc concepts in the
bytecode generation phase instead of burying them inside Operators
- When a ResultRow is emitted, it is _always_ done by evaluating
`plan.result_columns`, instead of the special-casing and hacks that
existed previously
- 600+ LOC removed

Closes #416
2024-11-30 10:03:58 +02:00
jussisaurio
84742b81fa Obsolete comment 2024-11-27 22:43:36 +02:00
jussisaurio
da811dc403 add doc comments for members of Plan struct 2024-11-27 19:30:07 +02:00
jussisaurio
db462530f1 metadata instead of m 2024-11-27 19:27:36 +02:00
jussisaurio
7d569aee1f fix stupid comment 2024-11-26 18:37:06 +02:00
jussisaurio
1b34698872 add comments and rename some misleading label variables 2024-11-26 18:28:19 +02:00
jussisaurio
7f04f8e88f rename 2024-11-26 17:41:08 +02:00
jussisaurio
122546444f extract function order_by_sorter_insert() 2024-11-26 17:40:49 +02:00
jussisaurio
3d27ef90f5 emitting result columns generally works the same way -> extract it 2024-11-26 17:31:51 +02:00
jussisaurio
c74981873e Extract ORDER BY result column deduping into a function 2024-11-26 17:31:51 +02:00
jussisaurio
89569fa7a3 Remove redundant if-else after refactoring ResultSetColumn to struct 2024-11-26 17:31:51 +02:00
jussisaurio
ac12e9c7fd No need for ResultSetColumn to be an enum 2024-11-26 17:31:51 +02:00
jussisaurio
bb8ba7fb01 add tests for arithmetic on two aggregates with no from clause 2024-11-26 17:31:51 +02:00
jussisaurio
7d5fa12bb7 fix allocating wrong number of registers upfront for aggregation results 2024-11-26 17:31:51 +02:00
jussisaurio
4636f71522 test ordering by aggregate not mentioned in select 2024-11-26 17:31:51 +02:00
jussisaurio
56b15193d0 resolve aggregates from orderby as well 2024-11-26 17:31:51 +02:00
jussisaurio
885b6ecd76 Remove 'cursor_hint': it is never needed 2024-11-26 17:31:51 +02:00
jussisaurio
008be10cfd Add TODO about expression equality comparisons 2024-11-26 17:31:51 +02:00
jussisaurio
cfb7e79601 Function doc comments 2024-11-26 17:31:51 +02:00
jussisaurio
fc33c70481 remove many unnecessary fields from SortMetadata and GroupByMetadata 2024-11-26 17:31:51 +02:00
jussisaurio
ebce78bcd9 rename 2024-11-26 17:31:51 +02:00
jussisaurio
0510e150d3 fix comment 2024-11-26 17:31:51 +02:00
jussisaurio
1c37d8b24b extract function sorter_insert() 2024-11-26 17:31:51 +02:00
jussisaurio
4f3da982c0 extract function emit_result_row() 2024-11-26 17:31:51 +02:00
jussisaurio
52beeabd45 tweaks 2024-11-26 17:31:51 +02:00
jussisaurio
120601f732 fix metadata comments 2024-11-26 17:31:51 +02:00
jussisaurio
97ba4a788e remove sorts hashmap - only one sortmetadata struct is needed 2024-11-26 17:31:51 +02:00
jussisaurio
d2f84edd2e fix accidentally removing push_scan_direction() 2024-11-26 17:31:51 +02:00
jussisaurio
7ecc252507 fix rest of the failing tests 2024-11-26 17:31:51 +02:00
jussisaurio
9a557516b8 Fixes for expressions with aggregate arguments + limit 0 2024-11-26 17:31:51 +02:00
jussisaurio
cc902ed25d GROUP BY and ORDER BY mostly work 2024-11-26 17:31:51 +02:00
jussisaurio
3f9e60633f select refactor: order by and basic agg kinda work 2024-11-26 17:31:51 +02:00
jussisaurio
d0466e1cae introduce Column member of ast::Expr and bind idents to columns 2024-11-26 17:31:51 +02:00
Pekka Enberg
4c5f9eb73b Merge 'contributing: Add note about testing against TPC-H databases' from Jussi Saurio
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #419
2024-11-26 15:56:43 +02:00
jussisaurio
574f52ddbb Add note about testing against TPC-H databases 2024-11-25 21:57:34 +02:00
jussisaurio
418ad40401 Merge 'Fix some Clippy warnings' from Lauri Virtanen
Reviewed-by: Pere Diaz Bou <limeng.1@bytedance.com>

Closes #417
2024-11-25 16:43:06 +02:00
jussisaurio
1651779e4c Merge 'Improve maths support' from Lauri Virtanen
- Add support for division in SQL expressions
- Fix issues with subtraction
- Support multiplication of integers and floats
- Support aggregate functions in mathematical expressions
- Add compatibility tests for mathematical operations, also with
aggregate functions

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #418
2024-11-25 16:36:45 +02:00
Lauri Virtanen
1b2835b316 Add math operator compatibility tests 2024-11-24 22:12:23 +02:00
Lauri Virtanen
70c4d6b360 Support multiplying combinations of different types 2024-11-24 22:11:37 +02:00
Lauri Virtanen
af9d407dee Fix issues with subtraction of different type combinations 2024-11-24 22:10:23 +02:00
Lauri Virtanen
cafbf5499f Support divide operator in expressions 2024-11-24 22:10:07 +02:00
Lauri Virtanen
afeb1cbe74 Clippy warning fixes 2024-11-24 20:24:47 +02:00
Lauri Virtanen
a7100d8e9b Autofix clippy issues with cargo fix --clippy 2024-11-24 20:24:47 +02:00