Pekka Enberg d97a434e29 Merge 'Use SeekRowId to avoid nested scans' from Jussi Saurio
The SQLite `SeekRowid` instruction tries to find a record in a B-tree table with a particular `rowid`  (or INTEGER PRIMARY KEY, if defined), and jumps if it is not found. In this PR, constraints like `tbl.id = 5` or `tbl.id = tbl2.id` are transformed into special `SeekRowid` expressions that emit `SeekRowid` VM instructions. This avoids a table scan and instead completes in `log N` time (for a table of `N` rows), because of how B-Trees work.


`LoopInfo` now contains a `Plan` which is either `Scan` or `SeekRowid` -- in the case of `SeekRowid` no `Rewind`/`Next` instructions are emitted - i.e. no looping is done.

`BTreeCursor` now implements a `btree_seek_rowid()` method that tries to find a row by `rowid`.

Our loop order is currently static, i.e. `SELECT * from a join b join c` always results in "loop a, loop b, loop c", so `SeekRowId` is only supported for equality expressions where the non-PK side of the expression only refers to outer loops or constants. Examples:

**Because `u.id` refers to an outer loop compared to the primary key `p.id`, `p` is selected for SeekRowid optimization:**
```
limbo> explain SELECT u.age FROM users u JOIN products p ON u.id = p.id

addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     14    0                    0   Start at 14
1     OpenReadAsync      0     2     0                    0   table=u, root=2
2     OpenReadAwait      0     0     0                    0   
3     OpenReadAsync      1     3     0                    0   table=p, root=3
4     OpenReadAwait      0     0     0                    0   
5     RewindAsync        0     0     0                    0   
6     RewindAwait        0     -5    0                    0   Rewind table u
7       RowId            0     1     0                    0   r[1]=u.rowid
8       SeekRowid        1     1     11                   0   if (r[1]!=p.rowid) goto 11
9       Column           0     9     2                    0   r[2]=u.age
10      ResultRow        2     1     0                    0   output=r[2]
11    NextAsync          0     0     0                    0   
12    NextAwait          0     6     0                    0   
13    Halt               0     0     0                    0   
14    Transaction        0     0     0                    0   
15    Goto               0     1     0                    0

limbo> SELECT u.age FROM users u JOIN products p ON u.id = p.id
94
37
18
33
15
89
24
63
77
13
22
```

**Because `5` refers to a constant and `u.id` is a primary key, `u` is selected for SeekRowid optimization:**
```
limbo> explain SELECT u.first_name, p.name FROM users u JOIN products p ON u.id = 5;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     15    0                    0   Start at 15
1     OpenReadAsync      0     2     0                    0   table=u, root=2
2     OpenReadAwait      0     0     0                    0   
3     OpenReadAsync      1     3     0                    0   table=p, root=3
4     OpenReadAwait      0     0     0                    0   
5     Integer            5     1     0                    0   r[1]=5
6     SeekRowid          0     1     14                   0   if (r[1]!=u.rowid) goto 14
7     RewindAsync        1     0     0                    0   
8     RewindAwait        1     -8    0                    0   Rewind table p
9       Column           0     1     2                    0   r[2]=u.first_name
10      Column           1     1     3                    0   r[3]=p.name
11      ResultRow        2     2     0                    0   output=r[2..3]
12    NextAsync          1     0     0                    0   
13    NextAwait          1     8     0                    0   
14    Halt               0     0     0                    0   
15    Transaction        0     0     0                    0   
16    Goto               0     1     0                    0   
limbo> SELECT u.first_name, p.name FROM users u JOIN products p ON u.id = 5;
Edward|hat
Edward|cap
Edward|shirt
Edward|sweater
Edward|sweatshirt
Edward|shorts
Edward|jeans
Edward|sneakers
Edward|boots
Edward|coat
Edward|accessories
```

**Same, but LEFT JOIN:**
```
limbo> EXPLAIN SELECT u.first_name, p.name FROM users u LEFT JOIN products p ON u.id = 5;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     20    0                    0   Start at 20
1     OpenReadAsync      0     2     0                    0   table=u, root=2
2     OpenReadAwait      0     0     0                    0   
3     OpenReadAsync      1     3     0                    0   table=p, root=3
4     OpenReadAwait      0     0     0                    0   
5     Integer            0     1     0                    0   r[1]=0
6     RewindAsync        1     0     0                    0   
7     RewindAwait        1     -11   0                    0   Rewind table p
8       Integer          5     2     0                    0   r[2]=5
9       SeekRowid        0     2     14                   0   if (r[2]!=u.rowid) goto 14
10      Integer          1     1     0                    0   r[1]=1
11      Column           0     1     3                    0   r[3]=u.first_name
12      Column           1     1     4                    0   r[4]=p.name
13      ResultRow        3     2     0                    0   output=r[3..4]
14    NextAsync          1     0     0                    0   
15    NextAwait          1     7     0                    0   
16    IfPos              1     19    0                    0   r[1]>0 -> r[1]-=0, goto 19
17    NullRow            1     0     0                    0   Set cursor 1 to a (pseudo) NULL row
18    Goto               0     10    0                    0   
19    Halt               0     0     0                    0   
20    Transaction        0     0     0                    0   
21    Goto               0     1     0                    0   
limbo> SELECT u.first_name, p.name FROM users u LEFT JOIN products p ON u.id = 5;
Edward|hat
Edward|cap
Edward|shirt
Edward|sweater
Edward|sweatshirt
Edward|shorts
Edward|jeans
Edward|sneakers
Edward|boots
Edward|coat
Edward|accessories
```

**Both `p` and `u` selected for optimization:**
```
limbo> EXPLAIN SELECT u.first_name, p.name FROM users u JOIN products p ON u.id = p.id and u.id = 5;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     13    0                    0   Start at 13
1     OpenReadAsync      0     2     0                    0   table=u, root=2
2     OpenReadAwait      0     0     0                    0   
3     OpenReadAsync      1     3     0                    0   table=p, root=3
4     OpenReadAwait      0     0     0                    0   
5     Integer            5     1     0                    0   r[1]=5
6     SeekRowid          0     1     12                   0   if (r[1]!=u.rowid) goto 12
7     RowId              0     2     0                    0   r[2]=u.rowid
8     SeekRowid          1     2     12                   0   if (r[2]!=p.rowid) goto 12
9     Column             0     1     3                    0   r[3]=u.first_name
10    Column             1     1     4                    0   r[4]=p.name
11    ResultRow          3     2     0                    0   output=r[3..4]
12    Halt               0     0     0                    0   
13    Transaction        0     0     0                    0   
14    Goto               0     1     0                    0   

limbo> SELECT u.first_name, p.name FROM users u JOIN products p ON u.id = p.id and u.id = 5;
Edward|sweatshirt
```

**`p.id + 1` refers to an INNER loop compared to the primary key `u.id`, so optimization is skipped:**
```
limbo> EXPLAIN SELECT u.age FROM users u JOIN products p ON u.id = p.id + 1;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     21    0                    0   Start at 21
1     OpenReadAsync      0     2     0                    0   table=u, root=2
2     OpenReadAwait      0     0     0                    0   
3     OpenReadAsync      1     3     0                    0   table=p, root=3
4     OpenReadAwait      0     0     0                    0   
5     RewindAsync        0     0     0                    0   
6     RewindAwait        0     -5    0                    0   Rewind table u
7       RewindAsync      1     0     0                    0   
8       RewindAwait      1     -8    0                    0   Rewind table p
9         RowId          0     1     0                    0   r[1]=u.rowid
10        RowId          1     3     0                    0   r[3]=p.rowid
11        Integer        1     4     0                    0   r[4]=1
12        Add            3     4     2                    0   r[2]=r[3]+r[4]
13        Ne             1     2     16                   0   if r[1]!=r[2] goto 16
14        Column         0     9     5                    0   r[5]=u.age
15        ResultRow      5     1     0                    0   output=r[5]
16      NextAsync        1     0     0                    0   
17      NextAwait        1     8     0                    0   
18    NextAsync          0     0     0                    0   
19    NextAwait          0     6     0                    0   
20    Halt               0     0     0                    0   
21    Transaction        0     0     0                    0   
22    Goto               0     1     0                    0   
limbo> SELECT u.age FROM users u JOIN products p ON u.id = p.id + 1;
37
18
33
15
89
24
63
77
13
22
18
```

This whole thing is a bit ad-hoc / "something that works" kind of thing, probably me and @benclmnt need to put our noses down into some books and start to actually build some sort of graph-based query planner after this...

Closes #247
2024-08-01 18:54:33 +03:00
2024-07-31 17:38:33 +02:00
2024-07-31 17:27:02 +02:00
2024-08-01 18:32:01 +03:00
2024-08-01 10:23:06 +03:00
2024-07-23 15:04:40 +03:00
2024-07-07 19:12:48 +08:00
2024-07-25 17:15:08 +03:00
2024-07-24 11:14:31 +03:00
2024-07-31 20:08:46 +03:00
2024-07-24 09:04:49 +03:00
2024-07-16 19:52:39 -07:00
2024-05-07 16:33:44 -03:00
2024-07-05 09:51:56 +03:00
2024-07-12 13:07:34 -07:00
2024-07-12 12:38:56 -07:00
2024-07-31 20:08:46 +03:00

Limbo

Limbo

Limbo is a work-in-progress, in-process OLTP database management system, compatible with SQLite.

Build badge MIT


Features

  • In-process OLTP database engine library
  • Asynchronous I/O support with io_uring
  • SQLite compatibility (status)
    • SQL dialect support
    • File format support
    • SQLite C API
  • JavaScript/WebAssembly bindings (wip)

Getting Started

Installing:

curl --proto '=https' --tlsv1.2 -LsSf \
  https://github.com/penberg/limbo/releases/latest/download/limbo-installer.sh | sh

Limbo is currently work-in-progress so it's recommended that you either use the sqlite3 program to create a test database:

$ sqlite3 database.db
SQLite version 3.42.0 2023-05-16 12:36:15
Enter ".help" for usage hints.
sqlite> CREATE TABLE users (id INT PRIMARY KEY, username TEXT);
sqlite> INSERT INTO users VALUES (1, 'alice');
sqlite> INSERT INTO users VALUES (2, 'bob');

or use the testing script to generate one for you:

pipenv run ./testing/gen-database.py

You can then start the Limbo shell with:

$ cargo run database.db
Welcome to Limbo SQL shell!
> SELECT * FROM users LIMIT 1;
|1|Cody|Miller|mhurst@example.org|525.595.7319x21268|33667 Shaw Extension Suite 104|West Robert|VA|45161|`

Developing

Run tests:

cargo test

Test coverage report:

cargo tarpaulin -o html

Run benchmarks:

cargo bench

Run benchmarks and generate flamegraphs:

echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid
cargo bench --bench benchmark -- --profile-time=5

FAQ

How is Limbo different from libSQL?

Limbo is a research project to build a SQLite compatible in-process database in Rust with native async support. The libSQL project, on the other hand, is an open source, open contribution fork of SQLite, with focus on production features such as replication, backups, encryption, and so on. There is no hard dependency between the two projects. Of course, if Limbo becomes widely successful, we might consider merging with libSQL, but that is something that will be decided in the future.

Publications

  • Pekka Enberg, Sasu Tarkoma, Jon Crowcroft Ashwin Rao (2024). Serverless Runtime / Database Co-Design With Asynchronous I/O. In EdgeSys 24. [PDF]
  • Pekka Enberg, Sasu Tarkoma, and Ashwin Rao (2023). Towards Database and Serverless Runtime Co-Design. In CoNEXT-SW 23. [PDF] [Slides]

Contributing

We'd love to have you contribute to Limbo! Check out the contribution guide to get started.

License

This project is licensed under the MIT license.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in Limbo by you, shall be licensed as MIT, without any additional terms or conditions.

Description
No description provided
Readme 43 MiB
Languages
Rust 76.8%
Tcl 6.6%
C 6.4%
Dart 2.4%
Java 2.3%
Other 5.3%