Pekka Enberg f39120dbb1 Merge 'GROUP BY support' from Jussi Saurio
This PR implements GROUP BY (and also ordering by the groups or the
aggregate expressions). See `groupby.test` for the kinds of things that
are now supported.
This PR is a rabbit hole and insanely big, sorry.
---
I thought about how to explain how GROUP BY works in SQLite bytecode,
and opted to go for just adding a bunch of Insn comments, so here's an
example that uses both GROUP BY and ORDER BY:
**LIMBO**
```
limbo> explain select u.first_name, sum(u.age) from users u group by u.first_name order by sum(u.age);
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     56    0                    0   Start at 56
1     SorterOpen         0     1     0     k(1,B)         0   cursor=0
2     SorterOpen         1     2     0     k(1,B)         0   cursor=1
3     Integer            0     2     0                    0   r[2]=0; clear group by abort flag
4     Null               0     4     0                    0   r[4]=NULL; initialize group by comparison registers to NULL
5     Gosub              8     45    0                    0   ; go to clear accumulator subroutine
6     OpenReadAsync      2     2     0                    0   table=u, root=2
7     OpenReadAwait      0     0     0                    0
8     RewindAsync        2     0     0                    0
9     RewindAwait        2     38    0                    0   Rewind table u
10      Column           2     1     10                   0   r[10]=u.first_name
11      Column           2     9     11                   0   r[11]=u.age
12      MakeRecord       10    2     7                    0   r[7]=mkrec(r[10..11])
13      SorterInsert     1     7     0     0              0   key=r[7]
14    NextAsync          2     0     0                    0
15    NextAwait          2     9     0                    0
16    OpenPseudo         3     7     2                    0   2 columns in r[7]
17    SorterSort         1     32    0                    0
18      SorterData       1     7     3                    0   r[7]=data
19      Column           3     0     12                   0   r[12]=cursor 3.u.first_name
20      Compare          4     12    1                    0   r[4..4]==r[12..12]
21      Jump             22    26    22                   0   ; start new group if comparison is not equal
22      Move             12    4     1                    0   r[4..4]=r[12..12]
23      Gosub            9     36    0                    0   ; check if ended group had data, and output if so
24      IfPos            2     55    0                    0   r[2]>0 -> r[2]-=0, goto 55; check abort flag
25      Gosub            8     45    0                    0   ; goto clear accumulator subroutine
26      Column           3     1     13                   0   r[13]=cursor 3.u.age
27      AggStep          0     13    6     sum            0   accum=r[6] step(r[13])
28      If               3     30    0                    0   if r[3] goto 30; don't emit group columns if continuing existing group
29      Column           3     0     5                    0   r[5]=cursor 3.u.first_name
30      Integer          1     3     0                    0   r[3]=1; indicate data in accumulator
31    SorterNext         1     18    0                    0
32    Gosub              9     36    0                    0   ; emit row for final group
33    Goto               0     48    0                    0   ; group by finished
34    Integer            1     2     0                    0   r[2]=1
35    Return             9     0     0                    0
36    IfPos              3     38    0                    0   r[3]>0 -> r[3]-=0, goto 38; output group by row subroutine start
37    Return             9     0     0                    0
38    AggFinal           0     6     0     sum            0   accum=r[6]
39    Copy               5     15    0                    0   r[15]=r[5]
40    Copy               6     16    0                    0   r[16]=r[6]
41    Copy               6     14    0                    0   r[14]=r[6]
42    MakeRecord         14    3     1                    0   r[1]=mkrec(r[14..16])
43    SorterInsert       0     1     0     0              0   key=r[1]
44    Return             9     0     0                    0
45    Null               0     5     6                    0   r[5..6]=NULL; clear accumulator subroutine start
46    Integer            0     3     0                    0   r[3]=0
47    Return             8     0     0                    0
48    OpenPseudo         4     1     3                    0   3 columns in r[1]
49    SorterSort         0     55    0                    0
50      SorterData       0     1     4                    0   r[1]=data
51      Column           4     1     17                   0   r[17]=cursor 4.sum
52      Column           4     2     18                   0   r[18]=cursor 4.u.first_name
53      ResultRow        17    2     0                    0   output=r[17..18]
54    SorterNext         0     50    0                    0
55    Halt               0     0     0                    0
56    Transaction        0     0     0                    0
57    Goto               0     1     0                    0
```
**SQLITE3**:
```
sqlite> explain select u.first_name, sum(u.age) from users u group by u.first_name order by sum(u.age);
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     52    0                    0   Start at 52
1     SorterOpen     1     4     0     k(1,B)         0
2     SorterOpen     2     2     0     k(1,B)         0
3     Integer        0     2     0                    0   r[2]=0; clear abort flag
4     Null           0     5     5                    0   r[5..5]=NULL
5     Gosub          4     41    0                    0
6     OpenRead       0     2     0     10             0   root=2 iDb=0; users
7     Rewind         0     13    0                    0
8       Column         0     1     10                   0   r[10]= cursor 0 column 1
9       Column         0     9     11                   0   r[11]= cursor 0 column 9
10      MakeRecord     10    2     12                   0   r[12]=mkrec(r[10..11])
11      SorterInsert   2     12    0                    0   key=r[12]
12    Next           0     8     0                    1
13    OpenPseudo     3     12    2                    0   2 columns in r[12]
14    SorterSort     2     44    0                    0   GROUP BY sort
15      SorterData     2     12    3                    0   r[12]=data
16      Column         3     0     6                    0   r[6]= cursor 3 column 0
17      Compare        5     6     1     k(1,B)         0   r[5] <-> r[6]
18      Jump           19    23    19                   0
19      Move           6     5     1                    0   r[5]=r[6]
20      Gosub          3     33    0                    0   output one row
21      IfPos          2     44    0                    0   if r[2]>0 then r[2]-=0, goto 44; check abort flag
22      Gosub          4     41    0                    0   reset accumulator
23      Column         3     1     13                   0   r[13]=users.age
24      AggStep        0     13    9     sum(1)         1   accum=r[9] step(r[13])
25      If             1     27    0                    0
26      Column         3     0     7                    0   r[7]=users.first_name
27      Integer        1     1     0                    0   r[1]=1; indicate data in accumulator
28    SorterNext     2     15    0                    0
29    Gosub          3     33    0                    0   output final row
30    Goto           0     44    0                    0
31    Integer        1     2     0                    0   r[2]=1; set abort flag
32    Return         3     0     0                    0
33    IfPos          1     35    0                    0   if r[1]>0 then r[1]-=0, goto 35; Groupby result generator entry point
34    Return         3     0     0                    0
35    AggFinal       9     1     0     sum(1)         0   accum=r[9] N=1
36    Copy           7     15    0                    0   r[15]=r[7]
37    Copy           9     14    0                    0   r[14]=r[9]
38    MakeRecord     14    2     17                   0   r[17]=mkrec(r[14..15])
39    SorterInsert   1     17    14    2              0   key=r[17]
40    Return         3     0     0                    0   end groupby result generator
41    Null           0     7     9                    0   r[7..9]=NULL
42    Integer        0     1     0                    0   r[1]=0; indicate accumulator empty
43    Return         4     0     0                    0
44    OpenPseudo     4     18    4                    0   4 columns in r[18]
45    SorterSort     1     51    0                    0
46      SorterData     1     18    4                    0   r[18]=data
47      Column         4     0     16                   0   r[16]=sum(u.age)
48      Column         4     1     15                   0   r[15]=u.first_name
49      ResultRow      15    2     0                    0   output=r[15..16]
50    SorterNext     1     46    0                    0
51    Halt           0     0     0                    0
52    Transaction    0     0     2     0              1   usesStmtJournal=0
53    Goto           0     1     0                    0
```
As you can see the bytecodes are fairly close in this scenario.
SQLite opts to use an ephemeral index in certain cases (e.g. when you
use `LIMIT` or have multiple grouping columns). Will not implement those
branching strategies as part of this PR
---
Example operator tree:
```
limbo> explain query plan select u.first_name, p.name, sum(u.age) from users u join products p on u.id = p.id group by u.first_name, p.name order by p.name limit 10;

QUERY PLAN
`--TAKE 10
   `--SORT p.name ASC
   |  `--PROJECT u.first_name, p.name, sum (u.age)
   |  |  `--AGGREGATE Sum(u.age)
   |  |  |  `--JOIN
   |  |  |  |  |--SCAN users AS u
   |  |  |  |  `--SEEK products.rowid ON rowid=u.id
```

Reviewed-by: Pere Diaz Bou <pere-altea@hotmail.com>

Closes #309
2024-09-14 16:33:09 +03:00
2024-09-01 16:11:00 +03:00
2024-09-14 16:14:45 +03:00
2024-08-25 22:55:15 +08:00
2024-09-10 14:32:25 -07:00
2024-09-13 09:15:55 +02:00
2024-09-14 16:14:45 +03:00
2024-09-13 09:15:55 +02:00
2024-09-13 07:56:21 +02:00
2024-09-02 14:30:33 +03:00
2024-09-14 16:14:45 +03:00
2024-07-24 09:04:49 +03:00
2024-05-07 16:33:44 -03:00
2024-07-05 09:51:56 +03:00
2024-07-12 13:07:34 -07:00
2024-07-12 12:38:56 -07:00
2024-08-31 12:02:37 +03:00

Limbo

Limbo

Limbo is a work-in-progress, in-process OLTP database management system, compatible with SQLite.

Build badge MIT Discord


Features

  • In-process OLTP database engine library
  • Asynchronous I/O support with io_uring
  • SQLite compatibility (status)
    • SQL dialect support
    • File format support
    • SQLite C API
  • JavaScript/WebAssembly bindings (wip)

Getting Started

Installing:

curl --proto '=https' --tlsv1.2 -LsSf \
  https://github.com/penberg/limbo/releases/latest/download/limbo-installer.sh | sh

Limbo is currently work-in-progress so it's recommended that you either use the sqlite3 program to create a test database:

$ sqlite3 database.db
SQLite version 3.42.0 2023-05-16 12:36:15
Enter ".help" for usage hints.
sqlite> CREATE TABLE users (id INT PRIMARY KEY, username TEXT);
sqlite> INSERT INTO users VALUES (1, 'alice');
sqlite> INSERT INTO users VALUES (2, 'bob');

or use the testing script to generate one for you:

pipenv run ./testing/gen-database.py

You can then start the Limbo shell with:

$ limbo database.db
Welcome to Limbo SQL shell!
> SELECT * FROM users LIMIT 1;
|1|Cody|Miller|mhurst@example.org|525.595.7319x21268|33667 Shaw Extension Suite 104|West Robert|VA|45161|`

Developing

Run tests:

cargo test

Test coverage report:

cargo tarpaulin -o html

Run benchmarks:

cargo bench

Run benchmarks and generate flamegraphs:

echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid
cargo bench --bench benchmark -- --profile-time=5

FAQ

How is Limbo different from libSQL?

Limbo is a research project to build a SQLite compatible in-process database in Rust with native async support. The libSQL project, on the other hand, is an open source, open contribution fork of SQLite, with focus on production features such as replication, backups, encryption, and so on. There is no hard dependency between the two projects. Of course, if Limbo becomes widely successful, we might consider merging with libSQL, but that is something that will be decided in the future.

Publications

  • Pekka Enberg, Sasu Tarkoma, Jon Crowcroft Ashwin Rao (2024). Serverless Runtime / Database Co-Design With Asynchronous I/O. In EdgeSys 24. [PDF]
  • Pekka Enberg, Sasu Tarkoma, and Ashwin Rao (2023). Towards Database and Serverless Runtime Co-Design. In CoNEXT-SW 23. [PDF] [Slides]

Contributing

We'd love to have you contribute to Limbo! Check out the contribution guide to get started.

License

This project is licensed under the MIT license.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in Limbo by you, shall be licensed as MIT, without any additional terms or conditions.

Description
No description provided
Readme 43 MiB
Languages
Rust 76.8%
Tcl 6.6%
C 6.4%
Dart 2.4%
Java 2.3%
Other 5.3%