mirror of https://github.com/aljazceru/turso.git synced 2026-02-09 18:24:20 +01:00

Go to file

Jussi Saurio 88501fc76f Merge 'Restructure optimizer to support join reordering' from Jussi Saurio

Closes #1283
In lieu of a better description, I will just copypaste the contents of
the new `OPTIMIZER.MD` here:
# Overview of the current state of the query optimizer in Limbo
Query optimization is obviously an important part of any SQL-based
database engine. This document is an overview of what we currently do.
## Structure of the optimizer directory
1. `mod.rs`
   - Provides the high-level optimization interface through
`optimize_plan()`
2. `access_method.rs`
   - Determines what is the best index to use when joining a table to a
set of preceding tables
3. `constraints.rs` - Manages query constraints:
   - Extracts constraints from the WHERE clause
   - Determines which constraints are usable with indexes
4. `cost.rs`
   - Calculates the cost of doing a seek vs a scan, for example
5. `join.rs`
   - Implements the System R style dynamic programming join ordering
algorithm
6. `order.rs`
   - Determines if sort operations can be eliminated based on the chosen
access methods and join order
## Join reordering and optimal index selection
**The goals of query optimization are at least the following:**
1. Do as little page I/O as possible
2. Do as little CPU work as possible
3. Retain query correctness.
**The most important ways to achieve no. 1 and no. 2 are:**
1. Choose the optimal access method for each table (e.g. an index or a
rowid-based seek, or a full table scan if all else fails).
2. Choose the best or near-best way to reorder the tables in the query
so that those optimal access methods can be used.
3. Also factor in whether the chosen join order and indexes allow
removal of any sort operations that are necessary for query correctness.
## Limbo's optimizer
Limbo's optimizer is an implementation of an extremely traditional [IBM
System R](https://www.cs.cmu.edu/~15721-f24/slides/02-Selinger-SystemR-
opt.pdf) style optimizer,
i.e. straight from the 70s! The DP algorithm is explained below.
### Current high level flow of the optimizer
1. **SQL rewriting**
  - Rewrite certain SQL expressions to another form (not a lot
currently; e.g. rewrite BETWEEN as two comparisons)
  - Eliminate constant conditions: e.g. `WHERE 1` is removed, `WHERE 0`
short-circuits the whole query because it is trivially false.
2. **Check whether there is an "interesting order"** that we should
consider when evaluating indexes and join orders
    - Is there a GROUP BY? an ORDER BY? Both?
3. **Convert WHERE clause conjucts to Constraints**
    - E.g. in `WHERE t.x = 5`, the expression `5` _constrains_  table
`t` to values of `x` that are exactly `5`.
    - E.g. in `Where t.x = u.x`, the expression `u.x` constrains `t`,
AND `t.x` constrains `u`.
    - Per table, each constraint has an estimated _selectivity_ (how
much it filters the result set); this affects join order calculations,
see the paragraph on _Estimation_  below.
    - Per table, constraints are also analyzed for whether one or
multiple of them can be used as an index seek key to avoid a full scan.
4. **Compute the best join order using a dynamic programming
algorithm:**
  - `n` = number of tables considered
  - `n=1`: find the lowest _cost_ way to access each single table, given
the constraints of the query. Memoize the result.
  - `n=2`: for each table found in the `n=1` step, find the best way to
join that table with each other table. Memoize the result.
  - `n=3`: for each 2-table subset found, find the best way to join that
result to each other table. Memoize the result.
  - `n=m`: for each `m-1` table subset found, find the best way to join
that result to the `m'th` table
  - **Use pruning to reduce search space:**
    - Compute the literal query order first, and store its _cost_  as an
upper threshold
    - If at any point a considered join order exceeds the upper
threshold, discard that search path since it cannot be better than the
current best.
      - For example, we have `SELECT * FROM a JOIN b JOIN c JOIN d`.
Compute `JOIN(a,b,c,d)` first. If `JOIN (b,a)` is already worse than
`JOIN(a,b,c,d)`, we don't have to even try `JOIN(b,a,c)`.
    - Also keep track of the best plan per _subset_:
      - If we find that `JOIN(b,a,c)` is better than any other
permutation of the same tables, e.g. `JOIN(a,b,c)`, then we can discard
_ALL_ of the other permutations for that subset. For example, we don't
need to consider `JOIN(a,b,c,d)` because we know it's worse than
`JOIN(b,a,c,d)`.
      - This is possible due to the associativity and commutativity of
INNER JOINs.
  - Also keep track of the best _ordered plan_ , i.e. one that provides
the "interesting order" mentioned above.
  - At the end, apply a cost penalty to the best overall plan
    - If it is now worse than the best sorted plan, then choose the
sorted plan as the best plan for the query.
      - This allows us to eliminate a sorting operation.
    - If the best overall plan is still best even with the sorting
penalty, then keep it. A sorting operation is later applied to sort the
rows according to the desired order.
5. **Mutate the plan's `join_order` and `Operation`s to match the
computed best plan.**
### Estimation of cost and cardinalities + a note on table statistics
Currently, in the absence of `ANALYZE`, `sqlite_stat1` etc. we assume
the following:
1. Each table has `1,000,000` rows.
2. Each equality (`=`) filter will filter out some percentage of the
result set.
3. Each nonequality (e.g. `>`) will filter out some smaller percentage
of the result set.
4. Each `4096` byte database page holds `50` rows, i.e. roughly `80`
bytes per row
5. Sort operations have some CPU cost dependent on the number of input
rows to the sort operation.
From the above, we derive the following formula for estimating the cost
of joining `t1` with `t2`
```
JOIN_COST = PAGE_IO(t1.rows) + t1.rows * PAGE_IO(t2.rows)
```
For example, let's take the query `SELECT * FROM t1 JOIN t2 USING(foo)
WHERE t2.foo > 10`. Let's assume the following:
- `t1` has `6400` rows and `t2` has `8000` rows
- there are no indexes at all
- let's ignore the CPU cost from the equation for simplicity.
The best access method for both is a full table scan. The output
cardinality of `t1` is the full table, because nothing is filtering it.
Hence, the cost of `t1 JOIN t2` becomes:
```
JOIN_COST = PAGE_IO(t1.input_rows) + t1.output_rows * PAGE_IO(t2.input_rows)

// plugging in the values:

JOIN_COST = PAGE_IO(6400) + 6400 * PAGE_IO(8000)
JOIN_COST = 80 + 6400 * 100 = 640080
```
Now let's consider `t2 JOIN t1`. The best access method for both is
still a full scan, but since we can filter on `t2.foo > 10`, its output
cardinality decreases. Let's assume only 1/4 of the rows of `t2` match
the condition `t2.foo > 10`. Hence, the cost of `t2 join t1` becomes:
```
JOIN_COST = PAGE_IO(t2.input_rows) + t2.output_rows * PAGE_IO(t1.input_rows)

// plugging in the values:

JOIN_COST = PAGE_IO(8000) + 1/4 * 8000 * PAGE_IO(6400)
JOIN_COST = 100 + 2000 * 80 = 160100
```
Even though `t2` is a larger table, because we were able to reduce the
input set to the join operation, it's dramatically cheaper.
#### Statistics
Since we don't support `ANALYZE`, nor can we assume that users will call
`ANALYZE` anyway, we use simple magic constants to estimate the
selectivity of join predicates, row count of tables, and so on. When we
have support for `ANALYZE`, we should plug the statistics from
`sqlite_stat1` and friends into the optimizer to make more informed
decisions.
### Estimating the output cardinality of a join
The output cardinality (output row count) of an operation is as follows:
```
OUTPUT_CARDINALITY_JOIN = INPUT_CARDINALITY_RHS * OUTPUT_CARDINALITY_RHS

where

INPUT_CARDINALITY_RHS = OUTPUT_CARDINALITY_LHS
```
example:
```
SELECT * FROM products p JOIN order_lines o ON p.id = o.product_id
```
Assuming there are 100 products, i.e. just selecting all products would
yield 100 rows:
```
OUTPUT_CARDINALITY_LHS = 100
INPUT_CARDINALITY_RHS = 100
```
Assuming p.id = o.product_id will return three orders per each product:
```
OUTPUT_CARDINALITY_RHS = 3

OUTPUT_CARDINALITY_JOIN = 100 * 3 = 300
```
i.e. the join is estimated to return 300 rows, 3 for each product.
Again, in the absence of statistics, we use magic constants to estimate
these cardinalities.
Estimating them is important because in multi-way joins the output
cardinality of the previous join becomes the input cardinality of the
next one.

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1462

2025-05-14 14:02:33 +03:00

.cargo

add comment about config.toml env var

2025-02-15 15:05:36 +04:00

.github

github: Ensure rustmft is installed

2025-05-14 09:12:35 +03:00

antithesis-tests

antithesis-tests: Fix generate_random_value() on older Python versions

2025-04-26 10:53:31 +03:00

bindings

Limbo 0.0.20

2025-05-14 09:49:05 +03:00

cli

Merge 'Save history on exit' from Piotr Rżysko

2025-04-29 21:50:41 +03:00

core

Prune bad plans earlier to avoid allocating useless JoinN structs

2025-05-14 09:42:26 +03:00

docs

testing/py: rename debug_print() to run_debug()

2025-05-12 10:52:13 +03:00

extensions

Fix memory leak caused by unclosed virtual table cursors

2025-05-05 21:26:23 +02:00

fuzz

fix: old name

2025-04-28 11:33:46 +03:00

licenses

fix tests and return nan as null

2025-03-29 14:46:11 +02:00

macros

Fix memory leak caused by unclosed virtual table cursors

2025-05-05 21:26:23 +02:00

perf

bump rusqlite to 0.34

2025-03-25 14:17:31 +01:00

scripts

scripts/antithesis: Build Docker image for x86-64

2025-04-24 20:55:30 +03:00

simulator

core/io: Untie MemoryIO's lifetime of the IO layer

2025-04-13 11:10:06 -03:00

sqlite3

return row as reference to registers

2025-03-29 22:04:08 +01:00

stress

Fix Antithesis docker-compose.yaml

2025-04-26 09:14:24 +03:00

testing

Do groupby/orderby sort elimination based on optimizer decision

2025-05-14 09:41:13 +03:00

tests

Add tests for parameter binding for update, select and delete queries

2025-05-13 12:50:10 -04:00

vendored/sqlite3-parser

Add PRAGMA schema_version

2025-04-30 09:41:04 +03:00

.github.json

Add Jussi to .github.json

2025-01-14 18:37:26 +02:00

.gitignore

Ignore any .log file inside testing

2025-04-12 17:47:16 -03:00

.python-version

setup uv for limbo

2025-04-15 12:45:46 -03:00

Cargo.lock

Limbo 0.0.20

2025-05-14 09:49:05 +03:00

Cargo.toml

Limbo 0.0.20

2025-05-14 09:49:05 +03:00

CHANGELOG.md

Fix CHANGELOG

2025-05-14 11:30:44 +03:00

COMPAT.md

Correct docs regarding between

2025-04-21 23:05:01 -03:00

CONTRIBUTING.md

Initial pass on Antithesis testing

2025-03-04 09:29:57 +02:00

db.sqlite

reset statement before executing

2025-05-02 19:26:44 -03:00

dist-workspace.toml

Update cargo-dist and switch to Astral's version

2025-04-05 09:10:48 +09:00

Dockerfile.antithesis

antithesis: Enable Rust backtraces again

2025-04-26 12:59:19 +03:00

flake.lock

refator+feat: full flake overhaul

2025-01-30 18:24:19 -03:00

flake.nix

introduce libFuzzer

2025-03-23 20:29:55 -03:00

LICENSE.md

License

2024-05-07 16:33:44 -03:00

limbo.png

Replace cover art with Turso branding

2025-02-04 20:47:33 +02:00

Makefile

Adjustments and explicitely just emitting NoConflict on unique indexes

2025-04-24 13:13:39 -03:00

NOTICE.md

big cleanup

2025-03-30 18:58:33 +03:00

PERF.md

Merge branch 'main' into bench_vfs

2025-04-22 21:36:07 -04:00

Pipfile

Updated Pipfile

2024-07-12 13:07:34 -07:00

Pipfile.lock

Added Pipfile and Pipfile.lock

2024-07-12 12:38:56 -07:00

pyproject.toml

python: add UV project for 'scripts'

2025-04-23 10:32:38 +03:00

README.md

Update README.md

2025-04-22 12:11:23 +03:00

rust-toolchain.toml

Add rustfmt to rust-toolchain.toml

2025-05-14 11:29:12 +03:00

uv.lock

python: add UV project for 'scripts'

2025-04-23 10:32:38 +03:00

README.md

Project Limbo

Limbo is a project to build the modern evolution of SQLite.

Features and Roadmap

Limbo is a work-in-progress, in-process OLTP database engine library written in Rust that has:

Asynchronous I/O support on Linux with io_uring
SQLite compatibility [doc] for SQL dialect, file formats, and the C API
Language bindings for JavaScript/WebAssembly, Rust, Go, Python, and Java
OS support for Linux, macOS, and Windows

In the future, we will be also working on:

Integrated vector search for embeddings and vector similarity.
BEGIN CONCURRENT for improved write throughput.
Improved schema management including better ALTER support and strict column types by default.

Getting Started

💻 Command Line

You can install the latest `limbo` release with:

curl --proto '=https' --tlsv1.2 -LsSf \
  https://github.com/tursodatabase/limbo/releases/latest/download/limbo_cli-installer.sh | sh

Then launch the shell to execute SQL statements:

Limbo
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database
limbo> CREATE TABLE users (id INT PRIMARY KEY, username TEXT);
limbo> INSERT INTO users VALUES (1, 'alice');
limbo> INSERT INTO users VALUES (2, 'bob');
limbo> SELECT * FROM users;
1|alice
2|bob

You can also build and run the latest development version with:

cargo run

🦀 Rust

cargo add limbo

Example usage:

let db = Builder::new_local("sqlite.db").build().await?;
let conn = db.connect()?;

let res = conn.query("SELECT * FROM users", ()).await?;

✨ JavaScript

npm i limbo-wasm

Example usage:

import { Database } from 'limbo-wasm';

const db = new Database('sqlite.db');
const stmt = db.prepare('SELECT * FROM users');
const users = stmt.all();
console.log(users);

🐍 Python

pip install pylimbo

Example usage:

import limbo

con = limbo.connect("sqlite.db")
cur = con.cursor()
res = cur.execute("SELECT * FROM users")
print(res.fetchone())

🐹 Go

Clone the repository
Build the library and set your LD_LIBRARY_PATH to include limbo's target directory

cargo build --package limbo-go
export LD_LIBRARY_PATH=/path/to/limbo/target/debug:$LD_LIBRARY_PATH

Use the driver

go get github.com/tursodatabase/limbo
go install github.com/tursodatabase/limbo

Example usage:

import (
    "database/sql"
    _"github.com/tursodatabase/limbo"
)

conn, _ = sql.Open("sqlite3", "sqlite.db")
defer conn.Close()

stmt, _ := conn.Prepare("select * from users")
defer stmt.Close()

rows, _ = stmt.Query()
for rows.Next() {
    var id int
    var username string
    _ := rows.Scan(&id, &username)
    fmt.Printf("User: ID: %d, Username: %s\n", id, username)
}

☕️ Java

We integrated Limbo into JDBC. For detailed instructions on how to use Limbo with java, please refer to the README.md under bindings/java.

Contributing

We'd love to have you contribute to Limbo! Please check out the contribution guide to get started.

FAQ

How is Limbo different from Turso's libSQL?

Limbo is a project to build the modern evolution of SQLite in Rust, with a strong open contribution focus and features like native async support, vector search, and more. The libSQL project is also an attempt to evolve SQLite in a similar direction, but through a fork rather than a rewrite.

Rewriting SQLite in Rust started as an unassuming experiment, and due to its incredible success, replaces libSQL as our intended direction. At this point, libSQL is production ready, Limbo is not - although it is evolving rapidly. As the project starts to near production readiness, we plan to rename it to just "Turso". More details here.

Publications

Pekka Enberg, Sasu Tarkoma, Jon Crowcroft Ashwin Rao (2024). Serverless Runtime / Database Co-Design With Asynchronous I/O. In EdgeSys ‘24. [PDF]
Pekka Enberg, Sasu Tarkoma, and Ashwin Rao (2023). Towards Database and Serverless Runtime Co-Design. In CoNEXT-SW ’23. [PDF] [Slides]

License

This project is licensed under the MIT license.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in Limbo by you, shall be licensed as MIT, without any additional terms or conditions.

Contributors

Thanks to all the contributors to Limbo!

Languages

Rust 76.8%

Tcl 6.6%

C 6.4%

Dart 2.4%

Java 2.3%

Other 5.3%

README.md Unescape Escape

Project Limbo

Features and Roadmap

Getting Started

Contributing

FAQ

How is Limbo different from Turso's libSQL?

Publications

License

Contribution

Contributors

README.md