This PR implements GROUP BY (and also ordering by the groups or the aggregate expressions). See `groupby.test` for the kinds of things that are now supported. This PR is a rabbit hole and insanely big, sorry. --- I thought about how to explain how GROUP BY works in SQLite bytecode, and opted to go for just adding a bunch of Insn comments, so here's an example that uses both GROUP BY and ORDER BY: **LIMBO** ``` limbo> explain select u.first_name, sum(u.age) from users u group by u.first_name order by sum(u.age); addr opcode p1 p2 p3 p4 p5 comment ---- ----------------- ---- ---- ---- ------------- -- ------- 0 Init 0 56 0 0 Start at 56 1 SorterOpen 0 1 0 k(1,B) 0 cursor=0 2 SorterOpen 1 2 0 k(1,B) 0 cursor=1 3 Integer 0 2 0 0 r[2]=0; clear group by abort flag 4 Null 0 4 0 0 r[4]=NULL; initialize group by comparison registers to NULL 5 Gosub 8 45 0 0 ; go to clear accumulator subroutine 6 OpenReadAsync 2 2 0 0 table=u, root=2 7 OpenReadAwait 0 0 0 0 8 RewindAsync 2 0 0 0 9 RewindAwait 2 38 0 0 Rewind table u 10 Column 2 1 10 0 r[10]=u.first_name 11 Column 2 9 11 0 r[11]=u.age 12 MakeRecord 10 2 7 0 r[7]=mkrec(r[10..11]) 13 SorterInsert 1 7 0 0 0 key=r[7] 14 NextAsync 2 0 0 0 15 NextAwait 2 9 0 0 16 OpenPseudo 3 7 2 0 2 columns in r[7] 17 SorterSort 1 32 0 0 18 SorterData 1 7 3 0 r[7]=data 19 Column 3 0 12 0 r[12]=cursor 3.u.first_name 20 Compare 4 12 1 0 r[4..4]==r[12..12] 21 Jump 22 26 22 0 ; start new group if comparison is not equal 22 Move 12 4 1 0 r[4..4]=r[12..12] 23 Gosub 9 36 0 0 ; check if ended group had data, and output if so 24 IfPos 2 55 0 0 r[2]>0 -> r[2]-=0, goto 55; check abort flag 25 Gosub 8 45 0 0 ; goto clear accumulator subroutine 26 Column 3 1 13 0 r[13]=cursor 3.u.age 27 AggStep 0 13 6 sum 0 accum=r[6] step(r[13]) 28 If 3 30 0 0 if r[3] goto 30; don't emit group columns if continuing existing group 29 Column 3 0 5 0 r[5]=cursor 3.u.first_name 30 Integer 1 3 0 0 r[3]=1; indicate data in accumulator 31 SorterNext 1 18 0 0 32 Gosub 9 36 0 0 ; emit row for final group 33 Goto 0 48 0 0 ; group by finished 34 Integer 1 2 0 0 r[2]=1 35 Return 9 0 0 0 36 IfPos 3 38 0 0 r[3]>0 -> r[3]-=0, goto 38; output group by row subroutine start 37 Return 9 0 0 0 38 AggFinal 0 6 0 sum 0 accum=r[6] 39 Copy 5 15 0 0 r[15]=r[5] 40 Copy 6 16 0 0 r[16]=r[6] 41 Copy 6 14 0 0 r[14]=r[6] 42 MakeRecord 14 3 1 0 r[1]=mkrec(r[14..16]) 43 SorterInsert 0 1 0 0 0 key=r[1] 44 Return 9 0 0 0 45 Null 0 5 6 0 r[5..6]=NULL; clear accumulator subroutine start 46 Integer 0 3 0 0 r[3]=0 47 Return 8 0 0 0 48 OpenPseudo 4 1 3 0 3 columns in r[1] 49 SorterSort 0 55 0 0 50 SorterData 0 1 4 0 r[1]=data 51 Column 4 1 17 0 r[17]=cursor 4.sum 52 Column 4 2 18 0 r[18]=cursor 4.u.first_name 53 ResultRow 17 2 0 0 output=r[17..18] 54 SorterNext 0 50 0 0 55 Halt 0 0 0 0 56 Transaction 0 0 0 0 57 Goto 0 1 0 0 ``` **SQLITE3**: ``` sqlite> explain select u.first_name, sum(u.age) from users u group by u.first_name order by sum(u.age); addr opcode p1 p2 p3 p4 p5 comment ---- ------------- ---- ---- ---- ------------- -- ------------- 0 Init 0 52 0 0 Start at 52 1 SorterOpen 1 4 0 k(1,B) 0 2 SorterOpen 2 2 0 k(1,B) 0 3 Integer 0 2 0 0 r[2]=0; clear abort flag 4 Null 0 5 5 0 r[5..5]=NULL 5 Gosub 4 41 0 0 6 OpenRead 0 2 0 10 0 root=2 iDb=0; users 7 Rewind 0 13 0 0 8 Column 0 1 10 0 r[10]= cursor 0 column 1 9 Column 0 9 11 0 r[11]= cursor 0 column 9 10 MakeRecord 10 2 12 0 r[12]=mkrec(r[10..11]) 11 SorterInsert 2 12 0 0 key=r[12] 12 Next 0 8 0 1 13 OpenPseudo 3 12 2 0 2 columns in r[12] 14 SorterSort 2 44 0 0 GROUP BY sort 15 SorterData 2 12 3 0 r[12]=data 16 Column 3 0 6 0 r[6]= cursor 3 column 0 17 Compare 5 6 1 k(1,B) 0 r[5] <-> r[6] 18 Jump 19 23 19 0 19 Move 6 5 1 0 r[5]=r[6] 20 Gosub 3 33 0 0 output one row 21 IfPos 2 44 0 0 if r[2]>0 then r[2]-=0, goto 44; check abort flag 22 Gosub 4 41 0 0 reset accumulator 23 Column 3 1 13 0 r[13]=users.age 24 AggStep 0 13 9 sum(1) 1 accum=r[9] step(r[13]) 25 If 1 27 0 0 26 Column 3 0 7 0 r[7]=users.first_name 27 Integer 1 1 0 0 r[1]=1; indicate data in accumulator 28 SorterNext 2 15 0 0 29 Gosub 3 33 0 0 output final row 30 Goto 0 44 0 0 31 Integer 1 2 0 0 r[2]=1; set abort flag 32 Return 3 0 0 0 33 IfPos 1 35 0 0 if r[1]>0 then r[1]-=0, goto 35; Groupby result generator entry point 34 Return 3 0 0 0 35 AggFinal 9 1 0 sum(1) 0 accum=r[9] N=1 36 Copy 7 15 0 0 r[15]=r[7] 37 Copy 9 14 0 0 r[14]=r[9] 38 MakeRecord 14 2 17 0 r[17]=mkrec(r[14..15]) 39 SorterInsert 1 17 14 2 0 key=r[17] 40 Return 3 0 0 0 end groupby result generator 41 Null 0 7 9 0 r[7..9]=NULL 42 Integer 0 1 0 0 r[1]=0; indicate accumulator empty 43 Return 4 0 0 0 44 OpenPseudo 4 18 4 0 4 columns in r[18] 45 SorterSort 1 51 0 0 46 SorterData 1 18 4 0 r[18]=data 47 Column 4 0 16 0 r[16]=sum(u.age) 48 Column 4 1 15 0 r[15]=u.first_name 49 ResultRow 15 2 0 0 output=r[15..16] 50 SorterNext 1 46 0 0 51 Halt 0 0 0 0 52 Transaction 0 0 2 0 1 usesStmtJournal=0 53 Goto 0 1 0 0 ``` As you can see the bytecodes are fairly close in this scenario. SQLite opts to use an ephemeral index in certain cases (e.g. when you use `LIMIT` or have multiple grouping columns). Will not implement those branching strategies as part of this PR --- Example operator tree: ``` limbo> explain query plan select u.first_name, p.name, sum(u.age) from users u join products p on u.id = p.id group by u.first_name, p.name order by p.name limit 10; QUERY PLAN `--TAKE 10 `--SORT p.name ASC | `--PROJECT u.first_name, p.name, sum (u.age) | | `--AGGREGATE Sum(u.age) | | | `--JOIN | | | | |--SCAN users AS u | | | | `--SEEK products.rowid ON rowid=u.id ``` Reviewed-by: Pere Diaz Bou <pere-altea@hotmail.com> Closes #309
Limbo
Limbo is a work-in-progress, in-process OLTP database management system, compatible with SQLite.
Features
- In-process OLTP database engine library
- Asynchronous I/O support with
io_uring - SQLite compatibility (status)
- SQL dialect support
- File format support
- SQLite C API
- JavaScript/WebAssembly bindings (wip)
Getting Started
Installing:
curl --proto '=https' --tlsv1.2 -LsSf \
https://github.com/penberg/limbo/releases/latest/download/limbo-installer.sh | sh
Limbo is currently work-in-progress so it's recommended that you either use the sqlite3 program to create a test database:
$ sqlite3 database.db
SQLite version 3.42.0 2023-05-16 12:36:15
Enter ".help" for usage hints.
sqlite> CREATE TABLE users (id INT PRIMARY KEY, username TEXT);
sqlite> INSERT INTO users VALUES (1, 'alice');
sqlite> INSERT INTO users VALUES (2, 'bob');
or use the testing script to generate one for you:
pipenv run ./testing/gen-database.py
You can then start the Limbo shell with:
$ limbo database.db
Welcome to Limbo SQL shell!
> SELECT * FROM users LIMIT 1;
|1|Cody|Miller|mhurst@example.org|525.595.7319x21268|33667 Shaw Extension Suite 104|West Robert|VA|45161|`
Developing
Run tests:
cargo test
Test coverage report:
cargo tarpaulin -o html
Run benchmarks:
cargo bench
Run benchmarks and generate flamegraphs:
echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid
cargo bench --bench benchmark -- --profile-time=5
FAQ
How is Limbo different from libSQL?
Limbo is a research project to build a SQLite compatible in-process database in Rust with native async support. The libSQL project, on the other hand, is an open source, open contribution fork of SQLite, with focus on production features such as replication, backups, encryption, and so on. There is no hard dependency between the two projects. Of course, if Limbo becomes widely successful, we might consider merging with libSQL, but that is something that will be decided in the future.
Publications
- Pekka Enberg, Sasu Tarkoma, Jon Crowcroft Ashwin Rao (2024). Serverless Runtime / Database Co-Design With Asynchronous I/O. In EdgeSys ‘24. [PDF]
- Pekka Enberg, Sasu Tarkoma, and Ashwin Rao (2023). Towards Database and Serverless Runtime Co-Design. In CoNEXT-SW ’23. [PDF] [Slides]
Contributing
We'd love to have you contribute to Limbo! Check out the contribution guide to get started.
License
This project is licensed under the MIT license.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in Limbo by you, shall be licensed as MIT, without any additional terms or conditions.
