mirror of https://github.com/aljazceru/turso.git synced 2026-02-15 13:04:20 +01:00

Go to file

jussisaurio fceb9ac62b Merge 'core: (another) refactor of read path query processing logic' from Jussi Saurio

# (another) refactor of read path query processing logic
This PR rewrites our select query processing architecture by moving away
from the stateful operator-based execution model, back to a more direct
bytecode generation approach that, IMO, is easier to follow. A large
part of the bytecode emission itself (`program.emit_insn(...)`) is just
copy-pasted from the old implementation (after all, it did _work_), but
just structured differently.
## Main Changes
1. Removed the `step()` state machine from operators. Previously, each
operator had internal state tracking its execution progress, and parent
operators would call `.step()` on their children until they needed to do
something else. Reading the code and trying to follow the execution was
not very easy, and the abstraction was also too general: there was a lot
of unnecessary pattern matching and special casing to make query
execution fit the model, when honestly the evaluation of a SELECT
without any CTEs or subqueries etc can only go a few different ways.
2. Because of the above change, the main codegen function
`emit_program()` now contains a series of linear conditional steps
instead of kicking off the state machines with `root_operator.step()`.
These steps are just things like: "open the cursors", "open the loops",
"emit a record into either the main output or a sorter", etc.
3. The `Plan` struct now (again) contains most of the familiar SELECT
query components (WHERE clause, GROUP BY, ORDER BY, etc.) rather than
having all of them embedded in a tree of operators. The operator tree
now ONLY consists of operators that read from a source table in some way
-- so it could just be called a join tree, I guess.
4. There's now `plan.result_columns` which is _ALWAYS_ evaluated to get
the final results of a SELECT. Previously the operator state machine
thing had a hodgepodge of different ways of arriving at the result row.
5. Removed operators:
   - Removed Filter operator (even in the previous version the Filter
operator -- which is really the where clause -- had its predicates
pushed down to the table loops, and it didn't really ever exist in the
bytecode emission phase anymore)
   - Removed Projection operator (`plan.result_columns`)
   - Removed Limit operator (`plan.limit`)
   - Removed Aggregate operator (`plan.group_by` and `plan.aggregates`)
   - Removed Order operator (`plan.order_by`)
6. Added `ast::Expr::Column` to the vendored sqlite3 parser -- column
resolution is now done as early as possible. This eliminates repeated
string comparisons during execution. I.e. no need for
`resolve_ident_table()` etc
7. Simplified expression result caching by removing the complex, and
frankly weird, ExpressionResultCache apparatus. The refactored code
handles this by tracking which cursor to read columns from at a given
time, and copies values from existing registers if the expression is a
computation that has already been done in a previous step of the
execution. For example in:
```
limbo> select concat(u.first_name, '-LOL'), sum(u.age) from users u group by concat(u.first_name, '-LOL') order by sum(u.age) desc limit 10;
Michael-LOL|11204
David-LOL|8758
Robert-LOL|8109
Jennifer-LOL|7700
John-LOL|7299
Christopher-LOL|6397
James-LOL|5921
Joseph-LOL|5711
Brian-LOL|5059
William-LOL|5047
```
the query execution engine knows that `concat(u.first_name, '-LOL')` is
the second column of the `ORDER_BY` sorter without any complex caching.
**HACK:** For deduplicating expressions in ORDER BY and the SELECT body,
the code still relies on expression `==` equality to make those
decisions which sucks (e.g. `sum(x) != SUM(x)` -- I've marked the parts
where this is used with a TODO, we should have a custom expression
equality comparison function instead...). This is not a correctness-
breaking thing, but still.
## In short
- No more state machines
- The operator tree is now only a "join tree", pretty much
- No weird general purpose `ExpressionResultCache`
- More direct mapping between SQL operations and generated bytecode --
there's really no harm in carrying the "group by" etc concepts in the
bytecode generation phase instead of burying them inside Operators
- When a ResultRow is emitted, it is _always_ done by evaluating
`plan.result_columns`, instead of the special-casing and hacks that
existed previously
- 600+ LOC removed

Closes #416

2024-11-30 10:03:58 +02:00

.github/workflows

github: Fix Python release workflow

2024-11-20 19:50:43 +02:00

bindings

Autofix clippy issues with cargo fix --clippy

2024-11-24 20:24:47 +02:00

cli

various fixes in btree

2024-11-19 17:15:19 +01:00

core

Obsolete comment

2024-11-27 22:43:36 +02:00

perf/latency

perf/latency: Update Cargo.lock

2024-11-18 18:46:46 +02:00

scripts

scripts/merge-pr.py: Manually map Github username to email address

2024-09-22 07:05:08 -04:00

simulator

Clippy warning fixes

2024-11-24 20:24:47 +02:00

sqlite3

sqlite3: Disable env_logger default features

2024-11-16 09:47:12 +02:00

test

Autofix clippy issues with cargo fix --clippy

2024-11-24 20:24:47 +02:00

testing

add tests for arithmetic on two aggregates with no from clause

2024-11-26 17:31:51 +02:00

vendored/sqlite3-parser

GROUP BY and ORDER BY mostly work

2024-11-26 17:31:51 +02:00

.github.json

github: Fix Pekka's email address

2024-11-15 15:55:33 +02:00

.gitignore

Ignore .vscode directory

2024-10-14 00:00:33 +03:00

Cargo.lock

Limbo 0.0.8

2024-11-20 19:16:11 +02:00

Cargo.toml

Limbo 0.0.8

2024-11-20 19:16:11 +02:00

CHANGELOG.md

Limbo 0.0.8

2024-11-20 19:16:11 +02:00

COMPAT.md

Support multiplying combinations of different types

2024-11-24 22:11:37 +02:00

CONTRIBUTING.md

Add note about testing against TPC-H databases

2024-11-25 21:57:34 +02:00

flake.lock

improve nix flake by moving to fenix

2024-07-16 19:52:39 -07:00

flake.nix

update flake to include tcl latest (addresses #243 ) and add core foundation lib and link flags for darwin

2024-08-01 18:00:46 -07:00

LICENSE.md

License

2024-05-07 16:33:44 -03:00

limbo.png

Limbo mascot

2024-07-05 09:51:56 +03:00

Makefile

update flake to include tcl latest (addresses #243 ) and add core foundation lib and link flags for darwin

2024-08-01 18:00:46 -07:00

Pipfile

Updated Pipfile

2024-07-12 13:07:34 -07:00

Pipfile.lock

Added Pipfile and Pipfile.lock

2024-07-12 12:38:56 -07:00

README.md

Fix Python example in README

2024-11-20 20:00:16 +02:00

README.md

Limbo

Limbo is a work-in-progress, in-process OLTP database management system, compatible with SQLite.

Features

In-process OLTP database engine library
Asynchronous I/O support with io_uring
SQLite compatibility (status)
- SQL dialect support
- File format support
- SQLite C API
JavaScript/WebAssembly bindings (wip)

Getting Started

CLI

Instal limbo with:

curl --proto '=https' --tlsv1.2 -LsSf \
  https://github.com/penberg/limbo/releases/latest/download/limbo-installer.sh | sh

Then use the SQL shell to create and query a database:

$ limbo database.db
Limbo v0.0.6
Enter ".help" for usage hints.
limbo> CREATE TABLE users (id INT PRIMARY KEY, username TEXT);
limbo> INSERT INTO users VALUES (1, 'alice');
limbo> INSERT INTO users VALUES (2, 'bob');
limbo> SELECT * FROM users;
1|alice
2|bob

JavaScript (wip)

Installation:

npm i limbo-wasm

Example usage:

import { Database } from 'limbo-wasm';

const db = new Database('sqlite.db');
const stmt = db.prepare('SELECT * FROM users');
const users = stmt.all();
console.log(users);

Python (wip)

pip install pylimbo

Example usage:

import limbo

con = limbo.connect("sqlite.db")
cur = con.cursor()
res = cur.execute("SELECT * FROM users")
print(res.fetchone())

Developing

Run tests:

cargo test

Test coverage report:

cargo tarpaulin -o html

Run benchmarks:

cargo bench

Run benchmarks and generate flamegraphs:

echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid
cargo bench --bench benchmark -- --profile-time=5

FAQ

How is Limbo different from libSQL?

Limbo is a research project to build a SQLite compatible in-process database in Rust with native async support. The libSQL project, on the other hand, is an open source, open contribution fork of SQLite, with focus on production features such as replication, backups, encryption, and so on. There is no hard dependency between the two projects. Of course, if Limbo becomes widely successful, we might consider merging with libSQL, but that is something that will be decided in the future.

Publications

Pekka Enberg, Sasu Tarkoma, Jon Crowcroft Ashwin Rao (2024). Serverless Runtime / Database Co-Design With Asynchronous I/O. In EdgeSys ‘24. [PDF]
Pekka Enberg, Sasu Tarkoma, and Ashwin Rao (2023). Towards Database and Serverless Runtime Co-Design. In CoNEXT-SW ’23. [PDF] [Slides]

Contributing

We'd love to have you contribute to Limbo! Check out the contribution guide to get started.

License

This project is licensed under the MIT license.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in Limbo by you, shall be licensed as MIT, without any additional terms or conditions.

Languages

Rust 76.8%

Tcl 6.6%

C 6.4%

Dart 2.4%

Java 2.3%

Other 5.3%

README.md Unescape Escape

Limbo

Features

Getting Started

CLI

JavaScript (wip)

Python (wip)

Developing

FAQ

How is Limbo different from libSQL?

Publications

Contributing

License

Contribution

README.md