I will warn that this PR is quite big out of necessity, since subqueries
are, as the name implies, queries within queries, so everything that
works with a regular query should also work with a subquery, roughly
speaking.
---
- Adds support for:
* uncorrelated subqueries in FROM clause (i.e. appear as a "table",
and do not refer to outer tables). Example of this at the end of the PR
description.
* column and subquery aliasing (`select sub.renamed from (select
name as renamed from products) sub`)
* inner and outer filtering of subqueries (`select sub.name from
(select name from products where name = 'joe') sub`, and, `select
sub.name from (select name from products) sub where sub.name = 'joe'`)
* joining between regular tables and subqueries
* joining between multiple subqueries
* in general working with subqueries should roughly equal working
with regular tables
- Main idea: subqueries are just wrappers of a `SelectPlan` that never
emit ResultRows, instead they `Yield` control back to the parent query,
and the parent query can copy the subquery result values into a
ResultRow. New variant `SourceOperator::Subquery` that wraps a subquery
`SelectPlan`.
- Plans can now not only refer to btree tables (`select p.name from
products`) but also subqueries (`select sub.foo from (select name as foo
from products) sub`. Hence this PR also adds support for column aliases
which didn't exist before.
* An `Expr::Column` that refers to a regular table will result in an
`Insn::Column` (i.e. a read from disk/memory) whereas an `Expr::Column`
that refers to a subquery will result in an `Insn::Copy` (from register
to register) instead
- Subquery handling is entirely unoptimized, there's no predicate
pushdown from outer query to subqueries, or elimination of redundant
subqueries (e.g. in the trivial example `SELECT * FROM (SELECT * FROM
users) sub` the subquery can just be entirely removed)
---
This PR does not add support (yet) for:
- subqueries in result columns: `SELECT t.foo, (SELECT .......) as
column_from_subquery FROM t`
- subqueries in WHERE clauses e.g. `SELECT * FROM t1 WHERE t1.foo IN
(SELECT ...)`
- subquery-related optimizations, of which there are plenty available.
No analysis is done regarding e.g. whether predicates on the outer query
level could be pushed into the subquery, or whether the subquery could
be entirely eliminated. Both of the above can probably be done fairly
easily for a bunch of trivial cases.
---
Example bytecode with comments added:
```
limbo> EXPLAIN SELECT p.name, sub.funny_name FROM products p JOIN (
select id, concat(name, '-lol') as funny_name from products
) sub USING (id) LIMIT 3;
addr opcode p1 p2 p3 p4 p5 comment
---- ----------------- ---- ---- ---- ------------- -- -------
0 Init 0 31 0 0 Start at 31
// Coroutine implementation starts at insn 2, jump immediately to 14
1 InitCoroutine 1 14 2 0
2 OpenReadAsync 0 3 0 0 table=products, root=3
3 OpenReadAwait 0 0 0 0
4 RewindAsync 0 0 0 0
5 RewindAwait 0 13 0 0 Rewind table products
6 RowId 0 2 0 0 r[2]=products.rowid
7 Column 0 1 4 0 r[4]=products.name
8 String8 0 5 0 -lol 0 r[5]='-lol'
9 Function 0 4 3 concat 0 r[3]=func(r[4..5])
// jump back to main loop of query (insn 20)
10 Yield 1 0 0 0
11 NextAsync 0 0 0 0
12 NextAwait 0 6 0 0
13 EndCoroutine 1 0 0 0
14 OpenReadAsync 1 3 0 0 table=p, root=3
15 OpenReadAwait 0 0 0 0
16 RewindAsync 1 0 0 0
17 RewindAwait 1 30 0 0 Rewind table p
// Since this subquery is the inner loop of the join, reinitialize it on every iteration of the outer loop
18 InitCoroutine 1 0 2 0
// Jump back to the subquery implementation to assign another row into registers
19 Yield 1 28 0 0
20 RowId 1 8 0 0 r[8]=p.rowid
// Copy sub.id
21 Copy 2 9 0 0 r[9]=r[2]
// p.id == sub.id?
22 Ne 8 9 27 0 if r[8]!=r[9] goto 27
23 Column 1 1 6 0 r[6]=p.name
// copy sub.funny_name
24 Copy 3 7 0 0 r[7]=r[3]
25 ResultRow 6 2 0 0 output=r[6..7]
26 DecrJumpZero 10 30 0 0 if (--r[10]==0) goto 30
27 Goto 0 19 0 0
28 NextAsync 1 0 0 0
29 NextAwait 1 18 0 0
30 Halt 0 0 0 0
31 Transaction 0 0 0 0
32 Integer 3 10 0 0 r[10]=3
33 Goto 0 1 0 0
```
Closes #566
Limbo
Limbo is a work-in-progress, in-process OLTP database management system, compatible with SQLite.
Features
- In-process OLTP database engine library
- Asynchronous I/O support on Linux with
io_uring - SQLite compatibility (status)
- SQL dialect support
- File format support
- SQLite C API
- JavaScript/WebAssembly bindings (wip)
- Support for Linux, macOS, and Windows
Getting Started
CLI
Install limbo with:
curl --proto '=https' --tlsv1.2 -LsSf \
https://github.com/penberg/limbo/releases/latest/download/limbo-installer.sh | sh
Then use the SQL shell to create and query a database:
$ limbo database.db
Limbo v0.0.6
Enter ".help" for usage hints.
limbo> CREATE TABLE users (id INT PRIMARY KEY, username TEXT);
limbo> INSERT INTO users VALUES (1, 'alice');
limbo> INSERT INTO users VALUES (2, 'bob');
limbo> SELECT * FROM users;
1|alice
2|bob
JavaScript (wip)
Installation:
npm i limbo-wasm
Example usage:
import { Database } from 'limbo-wasm';
const db = new Database('sqlite.db');
const stmt = db.prepare('SELECT * FROM users');
const users = stmt.all();
console.log(users);
Python (wip)
pip install pylimbo
Example usage:
import limbo
con = limbo.connect("sqlite.db")
cur = con.cursor()
res = cur.execute("SELECT * FROM users")
print(res.fetchone())
Developing
Build and run limbo cli:
cargo run --package limbo --bin limbo database.db
Run tests:
cargo test
Test coverage report:
cargo tarpaulin -o html
Run benchmarks:
cargo bench
Run benchmarks and generate flamegraphs:
echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid
cargo bench --bench benchmark -- --profile-time=5
FAQ
How is Limbo different from libSQL?
Limbo is a research project to build a SQLite compatible in-process database in Rust with native async support. The libSQL project, on the other hand, is an open source, open contribution fork of SQLite, with focus on production features such as replication, backups, encryption, and so on. There is no hard dependency between the two projects. Of course, if Limbo becomes widely successful, we might consider merging with libSQL, but that is something that will be decided in the future.
Publications
- Pekka Enberg, Sasu Tarkoma, Jon Crowcroft Ashwin Rao (2024). Serverless Runtime / Database Co-Design With Asynchronous I/O. In EdgeSys ‘24. [PDF]
- Pekka Enberg, Sasu Tarkoma, and Ashwin Rao (2023). Towards Database and Serverless Runtime Co-Design. In CoNEXT-SW ’23. [PDF] [Slides]
Contributing
We'd love to have you contribute to Limbo! Check out the contribution guide to get started.
License
This project is licensed under the MIT license.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in Limbo by you, shall be licensed as MIT, without any additional terms or conditions.
