Commit Graph

4282 Commits

Author SHA1 Message Date
Pekka Enberg
0949e7d2f2 Merge 'bindings/go: Upgrade ebitengine/purego to allow for use with go 1.23.9' from Preston Thorpe
Go 1.23.9 introduced a change with to it's linker that caused a
`duplicate symbol` error with purego 1.82
this is the recommended fix per
https://github.com/golang/go/issues/73617

Closes #1466
2025-05-10 07:53:44 +03:00
PThorpe92
efd4767b6a Bindings/Go: Upgrade ebitengine/purego to allow for use with go 1.23.9 2025-05-09 20:25:29 -04:00
Jussi Saurio
9a5990f87e Merge 'Add tests for INSERT with specified column-name list' from Anton Harniakou
Let's add some missing tests for the INSERT statement.

Closes #1455
2025-05-09 08:59:39 +03:00
Jussi Saurio
bda6526d28 Merge 'GROUP BY: refactor logic to support cases where no sorting is needed' from Jussi Saurio
Right now we have the following problem with GROUP BY:
- it always allocates a sorter and sorts the input rows, even when the
rows are already sorted in the right order
This PR is a refactor supporting a future PR that introduces a new
version of the optimizer which does 1. join reordering and 2. sorting
elimination based on plan cost. The PR splits GROUP BY into multiple
subsections:
1. Initializing the sorter, if needed
2. Reading rows from the sorter, if needed
3. Doing the actual grouping (this is done regardless of whether sorting
is needed)
4. Emitting rows during grouping in a subroutine (this is done
regardless of whether sorting is needed)
For example, you might currently have the following pseudo-bytecode for
GROUP BY:
```
SorterOpen (groupby_sorter)
OpenRead (users)
Rewind (users)
   <read columns from users>
   SorterInsert (groupby_sorter)
Next (users)
SorterSort (groupby_sorter)
   <do grouping>
SorterNext (groupby_sorter)
ResultRow
```
This PR allows us to do the following in cases where the rows are
already sorted:
```
OpenRead (users)
Rewind (users)
  <read columns from users>
  <do grouping>
Next (users)
ResultRow
```
---
In fact this is where the vast majority of the changes in this PR come
from -- eliminating the implied assumption that sorting for GROUP BY is
always required. The PR does not change current behavior, i.e. sorting
is always done for GROUP BY, but it adds the _ability_ to not do sorting
if the planner so decides.
The most important changes to understand are these:
```rust
/// Enum representing the source for the rows processed during a GROUP BY.
/// In case sorting is needed (which is most of the time), the variant
/// [GroupByRowSource::Sorter] encodes the necessary information about that
/// sorter.
///
/// In case where the rows are already ordered, for example:
/// "SELECT indexed_col, count(1) FROM t GROUP BY indexed_col"
/// the rows are processed directly in the order they arrive from
/// the main query loop.
#[derive(Debug)]
pub enum GroupByRowSource {
    Sorter {
        /// Cursor opened for the pseudo table that GROUP BY reads rows from.
        pseudo_cursor: usize,
        /// The sorter opened for ensuring the rows are in GROUP BY order.
        sort_cursor: usize,
        /// Register holding the key used for sorting in the Sorter
        reg_sorter_key: usize,
        /// Number of columns in the GROUP BY sorter
        sorter_column_count: usize,
        /// In case some result columns of the SELECT query are equivalent to GROUP BY members,
        /// this mapping encodes their position.
        column_register_mapping: Vec<Option<usize>>,
    },
    MainLoop {
        /// If GROUP BY rows are read directly in the main loop, start_reg is the first register
        /// holding the value of a relevant column.
        start_reg_src: usize,
        /// The grouping columns for a group that is not yet finalized must be placed in new registers,
        /// so that they don't get overwritten by the next group's data.
        /// This is because the emission of a group that is "done" is made after a comparison between the "current" and "next" grouping
        /// columns returns nonequal. If we don't store the "current" group in a separate set of registers, the "next" group's data will
        /// overwrite the "current" group's columns and the wrong grouping column values will be emitted.
        /// Aggregation results do not require new registers as they are not at risk of being overwritten before a given group
        /// is processed.
        start_reg_dest: usize,
    },
}

/// Enum representing the source of the aggregate function arguments
/// emitted for a group by aggregation.
/// In the common case, the aggregate function arguments are first inserted
/// into a sorter in the main loop, and in the group by aggregation phase
/// we read the data from the sorter.
///
/// In the alternative case, no sorting is required for group by,
/// and the aggregate function arguments are retrieved directly from
/// registers allocated in the main loop.
pub enum GroupByAggArgumentSource<'a> {
    /// The aggregate function arguments are retrieved from a pseudo cursor
    /// which reads from the GROUP BY sorter.
    PseudoCursor {
        cursor_id: usize,
        col_start: usize,
        dest_reg_start: usize,
        aggregate: &'a Aggregate,
    },
    /// The aggregate function arguments are retrieved from a contiguous block of registers
    /// allocated in the main loop for that given aggregate function.
    Register {
        src_reg_start: usize,
        aggregate: &'a Aggregate,
    },
}
```

Closes #1438
2025-05-09 08:56:38 +03:00
Jussi Saurio
37097e01ae GROUP BY: refactor logic to support cases where no sorting is needed 2025-05-08 12:39:26 +03:00
Jussi Saurio
ae2561dbca Merge 'Fix memory leak caused by unclosed virtual table cursors' from Piotr Rżysko
The following code reproduces the leak (memory usage increases over
time):
```rust
#[tokio::main]
async fn main() {
    let db = Builder::new_local(":memory:").build().await.unwrap();
    let conn = db.connect().unwrap();

    conn.execute("SELECT load_extension('./target/debug/liblimbo_series');", ())
        .await
        .unwrap();

    loop {
        conn.execute("SELECT * FROM generate_series(1,10,2);", ())
            .await
            .unwrap();
    }
}
```
After switching to the system allocator, the leak becomes detectable
with Valgrind:
```
32,000 bytes in 1,000 blocks are definitely lost in loss record 24 of 24
   at 0x538580F: malloc (vg_replace_malloc.c:446)
   by 0x62E15FA: alloc::alloc::alloc (alloc.rs:99)
   by 0x62E172C: alloc::alloc::Global::alloc_impl (alloc.rs:192)
   by 0x62E1530: allocate (alloc.rs:254)
   by 0x62E1530: alloc::alloc::exchange_malloc (alloc.rs:349)
   by 0x62E0271: new<limbo_series::GenerateSeriesCursor> (boxed.rs:257)
   by 0x62E0271: open_GenerateSeriesVTab (lib.rs:19)
   by 0x425D8FA: limbo_core::VirtualTable::open (lib.rs:732)
   by 0x4285DDA: limbo_core::vdbe::execute::op_vopen (execute.rs:890)
   by 0x42351E8: limbo_core::vdbe::Program::step (mod.rs:396)
   by 0x425C638: limbo_core::Statement::step (lib.rs:610)
   by 0x40DB238: limbo::Statement::execute::{{closure}} (lib.rs:181)
   by 0x40D9EAF: limbo::Connection::execute::{{closure}} (lib.rs:109)
   by 0x40D54A1: example::main::{{closure}} (example.rs:26)
```
Interestingly, when using mimalloc, neither Valgrind nor mimalloc’s
internal statistics report the leak.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1447
2025-05-08 10:48:23 +03:00
Pekka Enberg
2bd221e5db Merge 'Add embedded library support to Go adapter' from Jonathan Ness
This change enables the Go adapter to embed platform-specific libraries
and extract them at runtime, eliminating the need for users to set
LD_LIBRARY_PATH or other environment variables.
- Add embedded.go with core library extraction functionality
- Update limbo_unix.go and limbo_windows.go to use embedded libraries
- Add build_lib.sh script to generate platform-specific libraries
- Update README.md with documentation for the new feature
- Add .gitignore to prevent committing binary files
- Add test coverage for Vector operations (vector(), vector_extract(),
vector_distance_cos()) and sqlite core features
The implementation maintains backward compatibility with the traditional
library loading mechanism as a fallback. This approach is inspired by
projects like go-embed-python that use a similar technique for native
library distribution.
https://github.com/tursodatabase/limbo/issues/506

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1434
2025-05-07 22:31:29 +03:00
Pekka Enberg
e7902541cd Merge 'Add time.Time support to Go driver parameter binding' from Jonathan Ness
### Problem
The Limbo Go driver currently throws an "unsupported type" error when
trying to bind `time.Time` values as query parameters. This requires
applications to manually convert datetime values to strings before
passing them to queries.
### Solution
Added `time.Time` support to the `buildArgs` function in `types.go`. The
implementation:
- Converts `time.Time` to RFC3339 format strings
- Uses the existing `textVal` type for storage
- Maintains compatibility with Limbo's datetime handling
### Example Usage
```go
// Previously failed with "unsupported type: time.Time"
now := time.Now()
db.Exec("INSERT INTO events (timestamp) VALUES (?)", now)

// Now works as expected
```
### Testing
I tested with various datetime operations:
- Parameter binding with time.Time
- Round-trip storage and retrieval
- Compatibility with existing datetime functions
Values are stored in standard ISO8601/RFC3339 format which I believe is
same as sqlite.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1442
2025-05-07 22:30:44 +03:00
Jussi Saurio
d592e8ec8a Merge 'Show explanation for the NewRowid opcode' from Anton Harniakou
After this commit explain will start showing a comment for NewRowid.
```
limbo> explain insert into t(id) values (1);
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     17    0                    0   Start at 17
...
6     NewRowid           0     1     0                    0   r[1]=rowid
...
18    Goto               0     1     0                    0
```

Closes #1445
2025-05-07 09:15:46 +03:00
Jussi Saurio
57b16d5b2b Merge 'Add notion of join ordering to plan' from Jussi Saurio
This PR is an enabler for our (Coming Soon ™️ ) join reordering
optimizer -- simply adds the notion of a join order to the current query
execution. This PR does not do any join ordering -- the join order is
always the same as expressed in the SQL query.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1439
2025-05-07 08:55:13 +03:00
jnesss
a4c0f57f82 added embedded library usage notes 2025-05-06 21:43:18 -07:00
jnesss
548bcd4692 accept debug or release parameter - default to release 2025-05-06 20:44:02 -07:00
Anton Harniakou
d907f95716 Add tests for INSERT with specified column-name list 2025-05-06 11:31:41 +03:00
Piotr Rzysko
977b6b331a Fix memory leak caused by unclosed virtual table cursors
The following code reproduces the leak, with memory usage increasing
over time:

```
#[tokio::main]
async fn main() {
    let db = Builder::new_local(":memory:").build().await.unwrap();
    let conn = db.connect().unwrap();

    conn.execute("SELECT load_extension('./target/debug/liblimbo_series');", ())
        .await
        .unwrap();

    loop {
        conn.execute("SELECT * FROM generate_series(1,10,2);", ())
            .await
            .unwrap();
    }
}
```

After switching to the system allocator, the leak becomes detectable
with Valgrind:

```
32,000 bytes in 1,000 blocks are definitely lost in loss record 24 of 24
   at 0x538580F: malloc (vg_replace_malloc.c:446)
   by 0x62E15FA: alloc::alloc::alloc (alloc.rs:99)
   by 0x62E172C: alloc::alloc::Global::alloc_impl (alloc.rs:192)
   by 0x62E1530: allocate (alloc.rs:254)
   by 0x62E1530: alloc::alloc::exchange_malloc (alloc.rs:349)
   by 0x62E0271: new<limbo_series::GenerateSeriesCursor> (boxed.rs:257)
   by 0x62E0271: open_GenerateSeriesVTab (lib.rs:19)
   by 0x425D8FA: limbo_core::VirtualTable::open (lib.rs:732)
   by 0x4285DDA: limbo_core::vdbe::execute::op_vopen (execute.rs:890)
   by 0x42351E8: limbo_core::vdbe::Program::step (mod.rs:396)
   by 0x425C638: limbo_core::Statement::step (lib.rs:610)
   by 0x40DB238: limbo::Statement::execute::{{closure}} (lib.rs:181)
   by 0x40D9EAF: limbo::Connection::execute::{{closure}} (lib.rs:109)
   by 0x40D54A1: example::main::{{closure}} (example.rs:26)
```

Interestingly, when using mimalloc, neither Valgrind nor mimalloc’s
internal statistics report the leak.
2025-05-05 21:26:23 +02:00
Anton Harniakou
a971bce353 Show explanation for the NewRowid opcode 2025-05-04 14:14:19 +03:00
jnesss
061579e716 Add time.Time support to Go driver parameter binding 2025-05-03 14:09:25 -07:00
jnesss
091169af38 Go uses amd64 to refer to the 64-bit x86 architecture, while Linux
systems report x86_64. This change maps the architecture names to ensure the built libraries are placed in the correct directories for Go's embedded loading system (e.g., libs/linux_amd64)
2025-05-03 10:23:27 -07:00
Jussi Saurio
46c915b13c Merge 'Add static feature to Cargo.toml to support extensions written inside core' from Pedro Muniz
Added the static feature from the extensions crate to core, as it is
needed to write extensions that should be defined and depend in code
from core. Added some docs to account for this.

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1070
2025-05-03 19:26:06 +03:00
Jussi Saurio
4e05023bd3 Merge branch 'main' into ext-static-feature 2025-05-03 19:18:28 +03:00
Jussi Saurio
40c04c7074 Merge 'Adjust vtab schema creation to display the underlying columns' from Preston Thorpe
### The problem:
Sqlite displays the column names of the underlying vtab module when
displaying the `.schema`
![image](https://github.com/user-
attachments/assets/ca6aa1c9-0af7-4f34-a5c4-c8336fa23858)
Previously limbo omitted this, which makes it difficult for the user to
see what/how many columns the module's table has.
This matches sqlite's behavior by fetching the module's schema when the
schema entry is being inserted in translation.
![image](https://github.com/user-
attachments/assets/a56b8239-0f65-420b-a0b6-536ede117fba)

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1168
2025-05-03 19:17:25 +03:00
Jussi Saurio
c9eb56b54a Merge 'Read only mode' from Pedro Muniz
Closes #1413 . Basically, SQLite emits a check in a transaction to see
if it is attempting to write. If the db is in read only mode, it throws
an error, else the statement is executed. Mirroring how Rusqlite does
it, I modified the `OpenFlags` to use bitflags to better configure how
we open our VFS. This modification, will enable us to run tests against
the same database in parallel.

Closes #1433
2025-05-03 19:15:06 +03:00
Jussi Saurio
c2f30d796e Merge 'Test that DROP TABLE also deletes the related indices' from Anton Harniakou
These test were commented out, maybe they were written before CREATE
INDEX was added.

Closes #1437
2025-05-03 18:35:20 +03:00
Jussi Saurio
e57cea8de7 Merge 'reset statement before executing in rust binding' from Pedro Muniz
Closes #1426

Closes #1436
2025-05-03 18:34:44 +03:00
Jussi Saurio
7920161efc update Cargo.lock 2025-05-03 18:32:58 +03:00
Jussi Saurio
9ea958561b Merge 'Bump assorted dependencies' from Preston Thorpe
Closes #1425
2025-05-03 18:31:58 +03:00
Jussi Saurio
b86123a82e Merge 'Fix panic on async io due to reading locked page' from Preston Thorpe
closes  #1417
Man chasing this down was much much harder than it should have been.
We very frequently call `read_page` then push the return value onto the
page stack, or otherwise use it without it necessarily needing to not be
'in progress' of IO, so it was tricky to figure out where this was
happening and it had me thinking that it was something wrong with the
changes to `io_uring` on my branch.

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1418
2025-05-03 18:30:29 +03:00
Jussi Saurio
5f91d30d94 Merge 'implement Clone for Arc<Mutex> types' from Pete Hayman
`Statement` and `Rows` both have a private Arc, implementing clone
avoids users needing to Arc<Mutex> it again.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1412
2025-05-03 18:30:00 +03:00
Jussi Saurio
fafeabd081 Merge 'Eliminate a superfluous read transaction when doing PRAGMA user_version' from Anton Harniakou
This PR removes an unnecessary read transaction.
Bytecode before this PR:
```
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     5     0                    0   Start at 5
1     Transaction        0     0     0                    0   write=false
2     ReadCookie         0     1     6                    0
3     ResultRow          1     1     0                    0   output=r[1]
4     Halt               0     0     0                    0
5     Transaction        0     0     0                    0   write=false
6     Goto               0     1     0                    0
```
Bytecode after this PR:
```limbo> explain PRAGMA user_version;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     4     0                    0   Start at 4
1     ReadCookie         0     1     6                    0
2     ResultRow          1     1     0                    0   output=r[1]
3     Halt               0     0     0                    0
4     Transaction        0     0     0                    0   write=false
5     Goto               0     1     0                    0
```

Closes #1431
2025-05-03 15:40:27 +03:00
Jussi Saurio
330fedbc2f Add notion of join ordering to plan + make determining where to eval expr dynamic always 2025-05-03 15:32:06 +03:00
Jussi Saurio
306e097950 Merge 'Fix bug: we cant remove order by terms from the head of the list' from Jussi Saurio
we had an incorrect optimization in `eliminate_orderby_like_groupby()`
where it could remove e.g. the first term of the ORDER BY if it matched
the first GROUP BY term and the result set was naturally ordered by that
term. this is invalid. see e.g.:
```sql
main branch - BAD: removes the `ORDER BY id` term because the results are naturally ordered by id.
However, this results in sorting the entire thing by last name only!

limbo> select id, last_name, count(1) from users GROUP BY 1,2 order by id, last_name desc limit 3;
┌──────┬───────────┬───────────┐
│ id   │ last_name │ count (1) │
├──────┼───────────┼───────────┤
│ 6235 │ Zuniga    │         1 │
├──────┼───────────┼───────────┤
│ 8043 │ Zuniga    │         1 │
├──────┼───────────┼───────────┤
│  944 │ Zimmerman │         1 │
└──────┴───────────┴───────────┘

after fix - GOOD:

limbo> select id, last_name, count(1) from users GROUP BY 1,2 order by id, last_name desc limit 3;
┌────┬───────────┬───────────┐
│ id │ last_name │ count (1) │
├────┼───────────┼───────────┤
│  1 │ Foster    │         1 │
├────┼───────────┼───────────┤
│  2 │ Salazar   │         1 │
├────┼───────────┼───────────┤
│  3 │ Perry     │         1 │
└────┴───────────┴───────────┘

I also refactored sorters to always use the ast `SortOrder` instead of boolean vectors, and use the `compare_immutable()` utility we use inside btrees too.

Closes #1365
2025-05-03 12:48:08 +03:00
Anton Harniakou
b6a5cbe626 Test that DROP TABLE also deletes the related indices 2025-05-03 12:41:19 +03:00
Anton Harniakou
3c0b7cad74 Eliminate a superfluous read transaction when doing PRAGMA user_version 2025-05-03 10:48:27 +03:00
Jussi Saurio
5689f0ef5e Merge 'update index on updated indexed columns' from Pere Diaz Bou
Previously columns that were indexed were updated only in the
BtreeTable, but not on Index table. This commit basically enables
updates on indexes too if they are needed.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1428
2025-05-03 10:41:13 +03:00
pedrocarlo
7cc190a12b reset statement before executing 2025-05-02 19:26:44 -03:00
jnesss
69965b4eee remove TestTransactions that was being skipped. added back in second PR 2025-05-02 14:44:49 -07:00
jnesss
aed36c5b30 change to address CI failure. libs dir must exist with at least one file during build time. Add libs/.gitkeep as placeholder. Update .gitignore to exclude built libs but keep placeholder. This change is for CI purposes only 2025-05-02 13:02:52 -07:00
jnesss
bcb2f9f307 add documentation for new embedded library feature, including usage instructions and implementation notes 2025-05-02 12:45:30 -07:00
jnesss
1a3c3866b8 Update Windows library loading to prioritize the embedded library while maintaining compatibility with PATH-based lookup 2025-05-02 12:44:46 -07:00
jnesss
1e0b4676dc Update library loading mechanism to first attempt using the embedded library before falling back to traditional LD_LIBRARY_PATH lookup 2025-05-02 12:44:24 -07:00
jnesss
2476d2c6c2 embeds and extracts platform-specific libraries at runtime using Go's embed package 2025-05-02 12:43:36 -07:00
jnesss
322c2859e6 platform-specific build script that generates and organizes library for embedding into Go binaries 2025-05-02 12:42:38 -07:00
jnesss
992324f318 Add .gitignore for generated library files 2025-05-02 12:41:47 -07:00
pedrocarlo
2b3285d669 test opening in read only mode 2025-05-02 16:31:11 -03:00
pedrocarlo
0c22382f3c shared lock on file and throw ReadOnly error in transaction 2025-05-02 16:30:48 -03:00
jnesss
a9b5fc7f63 Add tests for vector operations and date/time functions in Go adapter 2025-05-02 11:30:31 -07:00
PThorpe92
d4cf8367ba Wrap return_if_locked in balance non root in debug assertion cfg 2025-05-02 10:55:00 -04:00
PThorpe92
f025f7e91e Fix panic on async io due to reading locked page 2025-05-02 10:55:00 -04:00
Pere Diaz Bou
f15a17699b check indexes are not added twice in update plan 2025-05-01 12:38:34 +03:00
Pere Diaz Bou
c808863256 test update with index 2025-05-01 11:44:23 +03:00
Pere Diaz Bou
e503bb4641 run_query helper for test_write_path 2025-05-01 11:36:29 +03:00