Commit Graph

1238 Commits

Author SHA1 Message Date
Pekka Enberg
ca272ba937 Merge 'Support JOIN USING and NATURAL JOIN' from Jussi Saurio
Closes #360
Closes #361

Closes #422
2024-12-11 09:17:51 +02:00
Pekka Enberg
eda1f5396c Merge 'Add octet_length scalar function' from Kacper Kołodziej
Adds `octet_length` scalar function.
Part of solution for: #144

Closes #430
2024-12-11 07:44:04 +02:00
Pekka Enberg
1024432b11 Merge 'Fix: length function characters counting' from Kacper Kołodziej
`String::len()` function returns number of bytes. Here we need to use
`chars().count()` to count real characters as specified in
https://www.sqlite.org/lang_corefunc.html#length

Fixes #431

Closes #429
2024-12-11 07:42:47 +02:00
Pekka Enberg
9a004a1742 Merge 'Fixed typo' from Wincent Balin
Closes #435
2024-12-11 07:41:46 +02:00
Pekka Enberg
e3d9082feb scripts/merge-pr.py: Fix pull from a fork 2024-12-11 07:41:33 +02:00
Wincent Balin
3f747feb5b Fixed typo 2024-12-11 03:42:21 +01:00
Kacper Kołodziej
e4d31cbe34 add tests for octet_length scalar function 2024-12-10 22:56:38 +01:00
Kacper Kołodziej
d4bff2c93e add octet_length scalar function 2024-12-10 22:56:38 +01:00
Kacper Kołodziej
660d3e8d07 fix: count characters in string in length function
`length` function should count characters, not bytes.

https://www.sqlite.org/lang_corefunc.html#length
2024-12-10 22:48:50 +01:00
Kacper Kołodziej
e68a86532a tests: length function with multibyte characters
Depending on encoding, some characters have more than one byte. Add
failing test to verify if current implementation of scalar function
`length` takes that into account.
2024-12-10 22:47:22 +01:00
jussisaurio
fe88d45e5e Add more comments to push_predicate/push_predicates 2024-12-09 17:50:29 +02:00
jussisaurio
840caed2f7 Fix bug with multiway joins that include the same table multiple times 2024-12-09 17:50:29 +02:00
jussisaurio
7924f9b64d consider all joined tables instead of just previous in natural/using 2024-12-09 17:50:29 +02:00
jussisaurio
4f027035de tests for multiple joins 2024-12-09 17:50:29 +02:00
jussisaurio
81b6605453 support NATURAL JOIN 2024-12-09 17:50:29 +02:00
jussisaurio
bed932c186 Support join USING 2024-12-09 17:50:29 +02:00
Pekka Enberg
f9b300a608 Update CHANGELOG 2024-12-09 17:31:24 +02:00
Pekka Enberg
ba1f7cd16f Merge 'feat(core/translate): support HAVING' from Jussi Saurio
support the HAVING clause.
note that sqlite (and i think standard sql?) supports HAVING even
without GROUP BY, but `sqlite3-parser` doesn't.
also fixes some issues with the PartialOrd implementation of OwnedValue
and the implementations of `concat` and `round` which i discovered due
to my HAVING tcl tests failing

Closes #420
2024-12-09 17:30:40 +02:00
Pekka Enberg
98a8dc58b1 Update CHANGELOG 2024-12-09 17:29:15 +02:00
Pekka Enberg
36f9565910 Merge 'feat(wasm): add get and iterate func' from Jean Arhancet
Add `get` and `iterate` functions to the wasm module

Closes #421
2024-12-09 17:28:43 +02:00
jussisaurio
9bc3ccc394 fmt 2024-12-03 19:11:08 +02:00
jussisaurio
885136a511 Merge 'fix(core/translate): fix bug with multiway joins and clean up left join implementation' from Jussi Saurio
There was a bug where this kind of query (a 3-way join with two seeks
and only one scan loop) would emit a wrong jump target for DecrJumpZero:
```
limbo> explain select u.first_name, u2.last_name, p.name from users u join users u2 on u.id=u2.id join products p on u2.id = p.id limit 3;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     21    0                    0   Start at 21
1     OpenReadAsync      0     2     0                    0   table=u, root=2
2     OpenReadAwait      0     0     0                    0
3     OpenReadAsync      1     2     0                    0   table=u2, root=2
4     OpenReadAwait      0     0     0                    0
5     OpenReadAsync      2     3     0                    0   table=p, root=3
6     OpenReadAwait      0     0     0                    0
7     RewindAsync        0     0     0                    0
8     RewindAwait        0     18    0                    0   Rewind table u
9       RowId            0     1     0                    0   r[1]=u.rowid
10      SeekRowid        1     1     18                   0   if (r[1]!=u2.rowid) goto 18
11      RowId            1     2     0                    0   r[2]=u2.rowid
12      SeekRowid        2     2     18                   0   if (r[2]!=p.rowid) goto 18
13      Column           0     1     3                    0   r[3]=u.first_name
14      Column           1     2     4                    0   r[4]=u2.last_name
15      Column           2     1     5                    0   r[5]=p.name
16      ResultRow        3     3     0                    0   output=r[3..5]
17      DecrJumpZero     6     18    0                    0   if (--r[6]==0) goto 18 <--- this should go to Halt!!!
18    NextAsync          0     0     0                    0
19    NextAwait          0     9     0                    0
20    Halt               0     0     0                    0
21    Transaction        0     0     0                    0
22    Integer            3     6     0                    0   r[6]=3
23    Goto               0     1     0                    0
```
due to incorrect label bookkeeping.
fixed the bookkeeping, plus cleaned up unnecessary crap from the left
join bookkeeping at the same time.

Closes #423
2024-12-03 19:02:28 +02:00
jussisaurio
ca25a73c95 a little bit more explanation about left join handling 2024-11-30 20:54:22 +02:00
jussisaurio
83f8ea1b13 Fix bug with multiway joins and clean up left join implementation 2024-11-30 20:47:48 +02:00
jussisaurio
3e9883bfbd update COMPAT 2024-11-30 10:06:37 +02:00
jussisaurio
3f80e41e7a support HAVING 2024-11-30 10:05:13 +02:00
jussisaurio
fceb9ac62b Merge 'core: (another) refactor of read path query processing logic' from Jussi Saurio
# (another) refactor of read path query processing logic
This PR rewrites our select query processing architecture by moving away
from the stateful operator-based execution model, back to a more direct
bytecode generation approach that, IMO, is easier to follow. A large
part of the bytecode emission itself (`program.emit_insn(...)`) is just
copy-pasted from the old implementation (after all, it did _work_), but
just structured differently.
## Main Changes
1. Removed the `step()` state machine from operators. Previously, each
operator had internal state tracking its execution progress, and parent
operators would call `.step()` on their children until they needed to do
something else. Reading the code and trying to follow the execution was
not very easy, and the abstraction was also too general: there was a lot
of unnecessary pattern matching and special casing to make query
execution fit the model, when honestly the evaluation of a SELECT
without any CTEs or subqueries etc can only go a few different ways.
2. Because of the above change, the main codegen function
`emit_program()` now contains a series of linear conditional steps
instead of kicking off the state machines with `root_operator.step()`.
These steps are just things like: "open the cursors", "open the loops",
"emit a record into either the main output or a sorter", etc.
3. The `Plan` struct now (again) contains most of the familiar SELECT
query components (WHERE clause, GROUP BY, ORDER BY, etc.) rather than
having all of them embedded in a tree of operators. The operator tree
now ONLY consists of operators that read from a source table in some way
-- so it could just be called a join tree, I guess.
4. There's now `plan.result_columns` which is _ALWAYS_ evaluated to get
the final results of a SELECT. Previously the operator state machine
thing had a hodgepodge of different ways of arriving at the result row.
5. Removed operators:
   - Removed Filter operator (even in the previous version the Filter
operator -- which is really the where clause -- had its predicates
pushed down to the table loops, and it didn't really ever exist in the
bytecode emission phase anymore)
   - Removed Projection operator (`plan.result_columns`)
   - Removed Limit operator (`plan.limit`)
   - Removed Aggregate operator (`plan.group_by` and `plan.aggregates`)
   - Removed Order operator (`plan.order_by`)
6. Added `ast::Expr::Column` to the vendored sqlite3 parser -- column
resolution is now done as early as possible. This eliminates repeated
string comparisons during execution. I.e. no need for
`resolve_ident_table()` etc
7. Simplified expression result caching by removing the complex, and
frankly weird, ExpressionResultCache apparatus. The refactored code
handles this by tracking which cursor to read columns from at a given
time, and copies values from existing registers if the expression is a
computation that has already been done in a previous step of the
execution. For example in:
```
limbo> select concat(u.first_name, '-LOL'), sum(u.age) from users u group by concat(u.first_name, '-LOL') order by sum(u.age) desc limit 10;
Michael-LOL|11204
David-LOL|8758
Robert-LOL|8109
Jennifer-LOL|7700
John-LOL|7299
Christopher-LOL|6397
James-LOL|5921
Joseph-LOL|5711
Brian-LOL|5059
William-LOL|5047
```
the query execution engine knows that `concat(u.first_name, '-LOL')` is
the second column of the `ORDER_BY` sorter without any complex caching.
**HACK:** For deduplicating expressions in ORDER BY and the SELECT body,
the code still relies on expression `==` equality to make those
decisions which sucks (e.g. `sum(x) != SUM(x)` -- I've marked the parts
where this is used with a TODO, we should have a custom expression
equality comparison function instead...). This is not a correctness-
breaking thing, but still.
## In short
- No more state machines
- The operator tree is now only a "join tree", pretty much
- No weird general purpose `ExpressionResultCache`
- More direct mapping between SQL operations and generated bytecode --
there's really no harm in carrying the "group by" etc concepts in the
bytecode generation phase instead of burying them inside Operators
- When a ResultRow is emitted, it is _always_ done by evaluating
`plan.result_columns`, instead of the special-casing and hacks that
existed previously
- 600+ LOC removed

Closes #416
2024-11-30 10:03:58 +02:00
JeanArhancet
5693cd1ae0 feat(wasm): add get and iterate func 2024-11-29 21:48:20 +01:00
jussisaurio
84742b81fa Obsolete comment 2024-11-27 22:43:36 +02:00
jussisaurio
da811dc403 add doc comments for members of Plan struct 2024-11-27 19:30:07 +02:00
jussisaurio
db462530f1 metadata instead of m 2024-11-27 19:27:36 +02:00
jussisaurio
7d569aee1f fix stupid comment 2024-11-26 18:37:06 +02:00
jussisaurio
1b34698872 add comments and rename some misleading label variables 2024-11-26 18:28:19 +02:00
jussisaurio
7f04f8e88f rename 2024-11-26 17:41:08 +02:00
jussisaurio
122546444f extract function order_by_sorter_insert() 2024-11-26 17:40:49 +02:00
jussisaurio
3d27ef90f5 emitting result columns generally works the same way -> extract it 2024-11-26 17:31:51 +02:00
jussisaurio
c74981873e Extract ORDER BY result column deduping into a function 2024-11-26 17:31:51 +02:00
jussisaurio
89569fa7a3 Remove redundant if-else after refactoring ResultSetColumn to struct 2024-11-26 17:31:51 +02:00
jussisaurio
ac12e9c7fd No need for ResultSetColumn to be an enum 2024-11-26 17:31:51 +02:00
jussisaurio
bb8ba7fb01 add tests for arithmetic on two aggregates with no from clause 2024-11-26 17:31:51 +02:00
jussisaurio
7d5fa12bb7 fix allocating wrong number of registers upfront for aggregation results 2024-11-26 17:31:51 +02:00
jussisaurio
4636f71522 test ordering by aggregate not mentioned in select 2024-11-26 17:31:51 +02:00
jussisaurio
56b15193d0 resolve aggregates from orderby as well 2024-11-26 17:31:51 +02:00
jussisaurio
885b6ecd76 Remove 'cursor_hint': it is never needed 2024-11-26 17:31:51 +02:00
jussisaurio
008be10cfd Add TODO about expression equality comparisons 2024-11-26 17:31:51 +02:00
jussisaurio
cfb7e79601 Function doc comments 2024-11-26 17:31:51 +02:00
jussisaurio
fc33c70481 remove many unnecessary fields from SortMetadata and GroupByMetadata 2024-11-26 17:31:51 +02:00
jussisaurio
ebce78bcd9 rename 2024-11-26 17:31:51 +02:00
jussisaurio
0510e150d3 fix comment 2024-11-26 17:31:51 +02:00
jussisaurio
1c37d8b24b extract function sorter_insert() 2024-11-26 17:31:51 +02:00