Jussi Saurio bda6526d28 Merge 'GROUP BY: refactor logic to support cases where no sorting is needed' from Jussi Saurio
Right now we have the following problem with GROUP BY:
- it always allocates a sorter and sorts the input rows, even when the
rows are already sorted in the right order
This PR is a refactor supporting a future PR that introduces a new
version of the optimizer which does 1. join reordering and 2. sorting
elimination based on plan cost. The PR splits GROUP BY into multiple
subsections:
1. Initializing the sorter, if needed
2. Reading rows from the sorter, if needed
3. Doing the actual grouping (this is done regardless of whether sorting
is needed)
4. Emitting rows during grouping in a subroutine (this is done
regardless of whether sorting is needed)
For example, you might currently have the following pseudo-bytecode for
GROUP BY:
```
SorterOpen (groupby_sorter)
OpenRead (users)
Rewind (users)
   <read columns from users>
   SorterInsert (groupby_sorter)
Next (users)
SorterSort (groupby_sorter)
   <do grouping>
SorterNext (groupby_sorter)
ResultRow
```
This PR allows us to do the following in cases where the rows are
already sorted:
```
OpenRead (users)
Rewind (users)
  <read columns from users>
  <do grouping>
Next (users)
ResultRow
```
---
In fact this is where the vast majority of the changes in this PR come
from -- eliminating the implied assumption that sorting for GROUP BY is
always required. The PR does not change current behavior, i.e. sorting
is always done for GROUP BY, but it adds the _ability_ to not do sorting
if the planner so decides.
The most important changes to understand are these:
```rust
/// Enum representing the source for the rows processed during a GROUP BY.
/// In case sorting is needed (which is most of the time), the variant
/// [GroupByRowSource::Sorter] encodes the necessary information about that
/// sorter.
///
/// In case where the rows are already ordered, for example:
/// "SELECT indexed_col, count(1) FROM t GROUP BY indexed_col"
/// the rows are processed directly in the order they arrive from
/// the main query loop.
#[derive(Debug)]
pub enum GroupByRowSource {
    Sorter {
        /// Cursor opened for the pseudo table that GROUP BY reads rows from.
        pseudo_cursor: usize,
        /// The sorter opened for ensuring the rows are in GROUP BY order.
        sort_cursor: usize,
        /// Register holding the key used for sorting in the Sorter
        reg_sorter_key: usize,
        /// Number of columns in the GROUP BY sorter
        sorter_column_count: usize,
        /// In case some result columns of the SELECT query are equivalent to GROUP BY members,
        /// this mapping encodes their position.
        column_register_mapping: Vec<Option<usize>>,
    },
    MainLoop {
        /// If GROUP BY rows are read directly in the main loop, start_reg is the first register
        /// holding the value of a relevant column.
        start_reg_src: usize,
        /// The grouping columns for a group that is not yet finalized must be placed in new registers,
        /// so that they don't get overwritten by the next group's data.
        /// This is because the emission of a group that is "done" is made after a comparison between the "current" and "next" grouping
        /// columns returns nonequal. If we don't store the "current" group in a separate set of registers, the "next" group's data will
        /// overwrite the "current" group's columns and the wrong grouping column values will be emitted.
        /// Aggregation results do not require new registers as they are not at risk of being overwritten before a given group
        /// is processed.
        start_reg_dest: usize,
    },
}

/// Enum representing the source of the aggregate function arguments
/// emitted for a group by aggregation.
/// In the common case, the aggregate function arguments are first inserted
/// into a sorter in the main loop, and in the group by aggregation phase
/// we read the data from the sorter.
///
/// In the alternative case, no sorting is required for group by,
/// and the aggregate function arguments are retrieved directly from
/// registers allocated in the main loop.
pub enum GroupByAggArgumentSource<'a> {
    /// The aggregate function arguments are retrieved from a pseudo cursor
    /// which reads from the GROUP BY sorter.
    PseudoCursor {
        cursor_id: usize,
        col_start: usize,
        dest_reg_start: usize,
        aggregate: &'a Aggregate,
    },
    /// The aggregate function arguments are retrieved from a contiguous block of registers
    /// allocated in the main loop for that given aggregate function.
    Register {
        src_reg_start: usize,
        aggregate: &'a Aggregate,
    },
}
```

Closes #1438
2025-05-09 08:56:38 +03:00
2025-04-15 12:45:46 -03:00
2025-04-09 14:05:22 -04:00
2025-04-28 11:33:46 +03:00
2025-03-29 14:46:11 +02:00
2025-03-25 14:17:31 +01:00
2025-04-26 09:14:24 +03:00
2025-01-14 18:37:26 +02:00
2025-04-12 17:47:16 -03:00
2025-04-15 12:45:46 -03:00
2025-04-16 15:21:48 +03:00
2025-04-21 23:05:01 -03:00
2025-05-02 19:26:44 -03:00
2025-01-30 18:24:19 -03:00
2025-03-23 20:29:55 -03:00
2024-05-07 16:33:44 -03:00
2025-03-30 18:58:33 +03:00
2025-04-22 21:36:07 -04:00
2024-07-12 13:07:34 -07:00
2024-07-12 12:38:56 -07:00
2025-04-22 12:11:23 +03:00
2025-04-23 10:32:38 +03:00

Limbo

Project Limbo

Limbo is a project to build the modern evolution of SQLite.

PyPI PyPI PyPI

Chat with developers on Discord


Features and Roadmap

Limbo is a work-in-progress, in-process OLTP database engine library written in Rust that has:

  • Asynchronous I/O support on Linux with io_uring
  • SQLite compatibility [doc] for SQL dialect, file formats, and the C API
  • Language bindings for JavaScript/WebAssembly, Rust, Go, Python, and Java
  • OS support for Linux, macOS, and Windows

In the future, we will be also working on:

  • Integrated vector search for embeddings and vector similarity.
  • BEGIN CONCURRENT for improved write throughput.
  • Improved schema management including better ALTER support and strict column types by default.

Getting Started

💻 Command Line
You can install the latest `limbo` release with:
curl --proto '=https' --tlsv1.2 -LsSf \
  https://github.com/tursodatabase/limbo/releases/latest/download/limbo_cli-installer.sh | sh

Then launch the shell to execute SQL statements:

Limbo
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database
limbo> CREATE TABLE users (id INT PRIMARY KEY, username TEXT);
limbo> INSERT INTO users VALUES (1, 'alice');
limbo> INSERT INTO users VALUES (2, 'bob');
limbo> SELECT * FROM users;
1|alice
2|bob

You can also build and run the latest development version with:

cargo run
🦀 Rust
cargo add limbo

Example usage:

let db = Builder::new_local("sqlite.db").build().await?;
let conn = db.connect()?;

let res = conn.query("SELECT * FROM users", ()).await?;
JavaScript
npm i limbo-wasm

Example usage:

import { Database } from 'limbo-wasm';

const db = new Database('sqlite.db');
const stmt = db.prepare('SELECT * FROM users');
const users = stmt.all();
console.log(users);
🐍 Python
pip install pylimbo

Example usage:

import limbo

con = limbo.connect("sqlite.db")
cur = con.cursor()
res = cur.execute("SELECT * FROM users")
print(res.fetchone())
🐹 Go
  1. Clone the repository
  2. Build the library and set your LD_LIBRARY_PATH to include limbo's target directory
cargo build --package limbo-go
export LD_LIBRARY_PATH=/path/to/limbo/target/debug:$LD_LIBRARY_PATH
  1. Use the driver
go get github.com/tursodatabase/limbo
go install github.com/tursodatabase/limbo

Example usage:

import (
    "database/sql"
    _"github.com/tursodatabase/limbo"
)

conn, _ = sql.Open("sqlite3", "sqlite.db")
defer conn.Close()

stmt, _ := conn.Prepare("select * from users")
defer stmt.Close()

rows, _ = stmt.Query()
for rows.Next() {
    var id int
    var username string
    _ := rows.Scan(&id, &username)
    fmt.Printf("User: ID: %d, Username: %s\n", id, username)
}
Java

We integrated Limbo into JDBC. For detailed instructions on how to use Limbo with java, please refer to the README.md under bindings/java.

Contributing

We'd love to have you contribute to Limbo! Please check out the contribution guide to get started.

FAQ

How is Limbo different from Turso's libSQL?

Limbo is a project to build the modern evolution of SQLite in Rust, with a strong open contribution focus and features like native async support, vector search, and more. The libSQL project is also an attempt to evolve SQLite in a similar direction, but through a fork rather than a rewrite.

Rewriting SQLite in Rust started as an unassuming experiment, and due to its incredible success, replaces libSQL as our intended direction. At this point, libSQL is production ready, Limbo is not - although it is evolving rapidly. As the project starts to near production readiness, we plan to rename it to just "Turso". More details here.

Publications

  • Pekka Enberg, Sasu Tarkoma, Jon Crowcroft Ashwin Rao (2024). Serverless Runtime / Database Co-Design With Asynchronous I/O. In EdgeSys 24. [PDF]
  • Pekka Enberg, Sasu Tarkoma, and Ashwin Rao (2023). Towards Database and Serverless Runtime Co-Design. In CoNEXT-SW 23. [PDF] [Slides]

License

This project is licensed under the MIT license.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in Limbo by you, shall be licensed as MIT, without any additional terms or conditions.

Contributors

Thanks to all the contributors to Limbo!

Description
No description provided
Readme 43 MiB
Languages
Rust 76.8%
Tcl 6.6%
C 6.4%
Dart 2.4%
Java 2.3%
Other 5.3%