mirror of https://github.com/aljazceru/turso.git synced 2026-01-08 02:34:20 +01:00

Go to file

Jussi Saurio bda6526d28 Merge 'GROUP BY: refactor logic to support cases where no sorting is needed' from Jussi Saurio

Right now we have the following problem with GROUP BY:
- it always allocates a sorter and sorts the input rows, even when the
rows are already sorted in the right order
This PR is a refactor supporting a future PR that introduces a new
version of the optimizer which does 1. join reordering and 2. sorting
elimination based on plan cost. The PR splits GROUP BY into multiple
subsections:
1. Initializing the sorter, if needed
2. Reading rows from the sorter, if needed
3. Doing the actual grouping (this is done regardless of whether sorting
is needed)
4. Emitting rows during grouping in a subroutine (this is done
regardless of whether sorting is needed)
For example, you might currently have the following pseudo-bytecode for
GROUP BY:
```
SorterOpen (groupby_sorter)
OpenRead (users)
Rewind (users)
   <read columns from users>
   SorterInsert (groupby_sorter)
Next (users)
SorterSort (groupby_sorter)
   <do grouping>
SorterNext (groupby_sorter)
ResultRow
```
This PR allows us to do the following in cases where the rows are
already sorted:
```
OpenRead (users)
Rewind (users)
  <read columns from users>
  <do grouping>
Next (users)
ResultRow
```
---
In fact this is where the vast majority of the changes in this PR come
from -- eliminating the implied assumption that sorting for GROUP BY is
always required. The PR does not change current behavior, i.e. sorting
is always done for GROUP BY, but it adds the _ability_ to not do sorting
if the planner so decides.
The most important changes to understand are these:
```rust
/// Enum representing the source for the rows processed during a GROUP BY.
/// In case sorting is needed (which is most of the time), the variant
/// [GroupByRowSource::Sorter] encodes the necessary information about that
/// sorter.
///
/// In case where the rows are already ordered, for example:
/// "SELECT indexed_col, count(1) FROM t GROUP BY indexed_col"
/// the rows are processed directly in the order they arrive from
/// the main query loop.
#[derive(Debug)]
pub enum GroupByRowSource {
    Sorter {
        /// Cursor opened for the pseudo table that GROUP BY reads rows from.
        pseudo_cursor: usize,
        /// The sorter opened for ensuring the rows are in GROUP BY order.
        sort_cursor: usize,
        /// Register holding the key used for sorting in the Sorter
        reg_sorter_key: usize,
        /// Number of columns in the GROUP BY sorter
        sorter_column_count: usize,
        /// In case some result columns of the SELECT query are equivalent to GROUP BY members,
        /// this mapping encodes their position.
        column_register_mapping: Vec<Option<usize>>,
    },
    MainLoop {
        /// If GROUP BY rows are read directly in the main loop, start_reg is the first register
        /// holding the value of a relevant column.
        start_reg_src: usize,
        /// The grouping columns for a group that is not yet finalized must be placed in new registers,
        /// so that they don't get overwritten by the next group's data.
        /// This is because the emission of a group that is "done" is made after a comparison between the "current" and "next" grouping
        /// columns returns nonequal. If we don't store the "current" group in a separate set of registers, the "next" group's data will
        /// overwrite the "current" group's columns and the wrong grouping column values will be emitted.
        /// Aggregation results do not require new registers as they are not at risk of being overwritten before a given group
        /// is processed.
        start_reg_dest: usize,
    },
}

/// Enum representing the source of the aggregate function arguments
/// emitted for a group by aggregation.
/// In the common case, the aggregate function arguments are first inserted
/// into a sorter in the main loop, and in the group by aggregation phase
/// we read the data from the sorter.
///
/// In the alternative case, no sorting is required for group by,
/// and the aggregate function arguments are retrieved directly from
/// registers allocated in the main loop.
pub enum GroupByAggArgumentSource<'a> {
    /// The aggregate function arguments are retrieved from a pseudo cursor
    /// which reads from the GROUP BY sorter.
    PseudoCursor {
        cursor_id: usize,
        col_start: usize,
        dest_reg_start: usize,
        aggregate: &'a Aggregate,
    },
    /// The aggregate function arguments are retrieved from a contiguous block of registers
    /// allocated in the main loop for that given aggregate function.
    Register {
        src_reg_start: usize,
        aggregate: &'a Aggregate,
    },
}
```

Closes #1438

2025-05-09 08:56:38 +03:00

.cargo

add comment about config.toml env var

2025-02-15 15:05:36 +04:00

.github

adjust workflow to install uv

2025-04-15 12:45:46 -03:00

antithesis-tests

antithesis-tests: Fix generate_random_value() on older Python versions

2025-04-26 10:53:31 +03:00

bindings

Merge 'Add embedded library support to Go adapter' from Jonathan Ness

2025-05-07 22:31:29 +03:00

cli

Merge 'Save history on exit' from Piotr Rżysko

2025-04-29 21:50:41 +03:00

core

GROUP BY: refactor logic to support cases where no sorting is needed

2025-05-08 12:39:26 +03:00

docs

update dst section of testing.md

2025-04-09 14:05:22 -04:00

extensions

Fix memory leak caused by unclosed virtual table cursors

2025-05-05 21:26:23 +02:00

fuzz

fix: old name

2025-04-28 11:33:46 +03:00

licenses

fix tests and return nan as null

2025-03-29 14:46:11 +02:00

macros

Fix memory leak caused by unclosed virtual table cursors

2025-05-05 21:26:23 +02:00

perf

bump rusqlite to 0.34

2025-03-25 14:17:31 +01:00

scripts

scripts/antithesis: Build Docker image for x86-64

2025-04-24 20:55:30 +03:00

simulator

core/io: Untie MemoryIO's lifetime of the IO layer

2025-04-13 11:10:06 -03:00

sqlite3

return row as reference to registers

2025-03-29 22:04:08 +01:00

stress

Fix Antithesis docker-compose.yaml

2025-04-26 09:14:24 +03:00

testing

Merge 'Adjust vtab schema creation to display the underlying columns' from Preston Thorpe

2025-05-03 19:17:25 +03:00

tests

Merge 'Read only mode' from Pedro Muniz

2025-05-03 19:15:06 +03:00

vendored/sqlite3-parser

Add PRAGMA schema_version

2025-04-30 09:41:04 +03:00

.github.json

Add Jussi to .github.json

2025-01-14 18:37:26 +02:00

.gitignore

Ignore any .log file inside testing

2025-04-12 17:47:16 -03:00

.python-version

setup uv for limbo

2025-04-15 12:45:46 -03:00

Cargo.lock

Merge 'Read only mode' from Pedro Muniz

2025-05-03 19:15:06 +03:00

Cargo.toml

Cargo.toml: add profile for antithesis builds for full debug

2025-04-24 12:22:03 -04:00

CHANGELOG.md

Update CHANGELOG.md

2025-04-16 15:21:48 +03:00

COMPAT.md

Correct docs regarding between

2025-04-21 23:05:01 -03:00

CONTRIBUTING.md

Initial pass on Antithesis testing

2025-03-04 09:29:57 +02:00

db.sqlite

reset statement before executing

2025-05-02 19:26:44 -03:00

dist-workspace.toml

Update cargo-dist and switch to Astral's version

2025-04-05 09:10:48 +09:00

Dockerfile.antithesis

antithesis: Enable Rust backtraces again

2025-04-26 12:59:19 +03:00

flake.lock

refator+feat: full flake overhaul

2025-01-30 18:24:19 -03:00

flake.nix

introduce libFuzzer

2025-03-23 20:29:55 -03:00

LICENSE.md

License

2024-05-07 16:33:44 -03:00

limbo.png

Replace cover art with Turso branding

2025-02-04 20:47:33 +02:00

Makefile

Adjustments and explicitely just emitting NoConflict on unique indexes

2025-04-24 13:13:39 -03:00

NOTICE.md

big cleanup

2025-03-30 18:58:33 +03:00

PERF.md

Merge branch 'main' into bench_vfs

2025-04-22 21:36:07 -04:00

Pipfile

Updated Pipfile

2024-07-12 13:07:34 -07:00

Pipfile.lock

Added Pipfile and Pipfile.lock

2024-07-12 12:38:56 -07:00

pyproject.toml

python: add UV project for 'scripts'

2025-04-23 10:32:38 +03:00

README.md

Update README.md

2025-04-22 12:11:23 +03:00

rust-toolchain.toml

Pin toolchain to Rust version 1.83

2024-12-15 10:54:52 +02:00

uv.lock

python: add UV project for 'scripts'

2025-04-23 10:32:38 +03:00

README.md

Project Limbo

Limbo is a project to build the modern evolution of SQLite.

Features and Roadmap

Limbo is a work-in-progress, in-process OLTP database engine library written in Rust that has:

Asynchronous I/O support on Linux with io_uring
SQLite compatibility [doc] for SQL dialect, file formats, and the C API
Language bindings for JavaScript/WebAssembly, Rust, Go, Python, and Java
OS support for Linux, macOS, and Windows

In the future, we will be also working on:

Integrated vector search for embeddings and vector similarity.
BEGIN CONCURRENT for improved write throughput.
Improved schema management including better ALTER support and strict column types by default.

Getting Started

💻 Command Line

You can install the latest `limbo` release with:

curl --proto '=https' --tlsv1.2 -LsSf \
  https://github.com/tursodatabase/limbo/releases/latest/download/limbo_cli-installer.sh | sh

Then launch the shell to execute SQL statements:

Limbo
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database
limbo> CREATE TABLE users (id INT PRIMARY KEY, username TEXT);
limbo> INSERT INTO users VALUES (1, 'alice');
limbo> INSERT INTO users VALUES (2, 'bob');
limbo> SELECT * FROM users;
1|alice
2|bob

You can also build and run the latest development version with:

cargo run

🦀 Rust

cargo add limbo

Example usage:

let db = Builder::new_local("sqlite.db").build().await?;
let conn = db.connect()?;

let res = conn.query("SELECT * FROM users", ()).await?;

✨ JavaScript

npm i limbo-wasm

Example usage:

import { Database } from 'limbo-wasm';

const db = new Database('sqlite.db');
const stmt = db.prepare('SELECT * FROM users');
const users = stmt.all();
console.log(users);

🐍 Python

pip install pylimbo

Example usage:

import limbo

con = limbo.connect("sqlite.db")
cur = con.cursor()
res = cur.execute("SELECT * FROM users")
print(res.fetchone())

🐹 Go

Clone the repository
Build the library and set your LD_LIBRARY_PATH to include limbo's target directory

cargo build --package limbo-go
export LD_LIBRARY_PATH=/path/to/limbo/target/debug:$LD_LIBRARY_PATH

Use the driver

go get github.com/tursodatabase/limbo
go install github.com/tursodatabase/limbo

Example usage:

import (
    "database/sql"
    _"github.com/tursodatabase/limbo"
)

conn, _ = sql.Open("sqlite3", "sqlite.db")
defer conn.Close()

stmt, _ := conn.Prepare("select * from users")
defer stmt.Close()

rows, _ = stmt.Query()
for rows.Next() {
    var id int
    var username string
    _ := rows.Scan(&id, &username)
    fmt.Printf("User: ID: %d, Username: %s\n", id, username)
}

☕️ Java

We integrated Limbo into JDBC. For detailed instructions on how to use Limbo with java, please refer to the README.md under bindings/java.

Contributing

We'd love to have you contribute to Limbo! Please check out the contribution guide to get started.

FAQ

How is Limbo different from Turso's libSQL?

Limbo is a project to build the modern evolution of SQLite in Rust, with a strong open contribution focus and features like native async support, vector search, and more. The libSQL project is also an attempt to evolve SQLite in a similar direction, but through a fork rather than a rewrite.

Rewriting SQLite in Rust started as an unassuming experiment, and due to its incredible success, replaces libSQL as our intended direction. At this point, libSQL is production ready, Limbo is not - although it is evolving rapidly. As the project starts to near production readiness, we plan to rename it to just "Turso". More details here.

Publications

Pekka Enberg, Sasu Tarkoma, Jon Crowcroft Ashwin Rao (2024). Serverless Runtime / Database Co-Design With Asynchronous I/O. In EdgeSys ‘24. [PDF]
Pekka Enberg, Sasu Tarkoma, and Ashwin Rao (2023). Towards Database and Serverless Runtime Co-Design. In CoNEXT-SW ’23. [PDF] [Slides]

License

This project is licensed under the MIT license.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in Limbo by you, shall be licensed as MIT, without any additional terms or conditions.

Contributors

Thanks to all the contributors to Limbo!

Languages

Rust 76.8%

Tcl 6.6%

C 6.4%

Dart 2.4%

Java 2.3%

Other 5.3%

README.md Unescape Escape

Project Limbo

Features and Roadmap

Getting Started

Contributing

FAQ

How is Limbo different from Turso's libSQL?

Publications

License

Contribution

Contributors

README.md