Go 1.23.9 introduced a change with to it's linker that caused a
`duplicate symbol` error with purego 1.82
this is the recommended fix per
https://github.com/golang/go/issues/73617Closes#1466
Right now we have the following problem with GROUP BY:
- it always allocates a sorter and sorts the input rows, even when the
rows are already sorted in the right order
This PR is a refactor supporting a future PR that introduces a new
version of the optimizer which does 1. join reordering and 2. sorting
elimination based on plan cost. The PR splits GROUP BY into multiple
subsections:
1. Initializing the sorter, if needed
2. Reading rows from the sorter, if needed
3. Doing the actual grouping (this is done regardless of whether sorting
is needed)
4. Emitting rows during grouping in a subroutine (this is done
regardless of whether sorting is needed)
For example, you might currently have the following pseudo-bytecode for
GROUP BY:
```
SorterOpen (groupby_sorter)
OpenRead (users)
Rewind (users)
<read columns from users>
SorterInsert (groupby_sorter)
Next (users)
SorterSort (groupby_sorter)
<do grouping>
SorterNext (groupby_sorter)
ResultRow
```
This PR allows us to do the following in cases where the rows are
already sorted:
```
OpenRead (users)
Rewind (users)
<read columns from users>
<do grouping>
Next (users)
ResultRow
```
---
In fact this is where the vast majority of the changes in this PR come
from -- eliminating the implied assumption that sorting for GROUP BY is
always required. The PR does not change current behavior, i.e. sorting
is always done for GROUP BY, but it adds the _ability_ to not do sorting
if the planner so decides.
The most important changes to understand are these:
```rust
/// Enum representing the source for the rows processed during a GROUP BY.
/// In case sorting is needed (which is most of the time), the variant
/// [GroupByRowSource::Sorter] encodes the necessary information about that
/// sorter.
///
/// In case where the rows are already ordered, for example:
/// "SELECT indexed_col, count(1) FROM t GROUP BY indexed_col"
/// the rows are processed directly in the order they arrive from
/// the main query loop.
#[derive(Debug)]
pub enum GroupByRowSource {
Sorter {
/// Cursor opened for the pseudo table that GROUP BY reads rows from.
pseudo_cursor: usize,
/// The sorter opened for ensuring the rows are in GROUP BY order.
sort_cursor: usize,
/// Register holding the key used for sorting in the Sorter
reg_sorter_key: usize,
/// Number of columns in the GROUP BY sorter
sorter_column_count: usize,
/// In case some result columns of the SELECT query are equivalent to GROUP BY members,
/// this mapping encodes their position.
column_register_mapping: Vec<Option<usize>>,
},
MainLoop {
/// If GROUP BY rows are read directly in the main loop, start_reg is the first register
/// holding the value of a relevant column.
start_reg_src: usize,
/// The grouping columns for a group that is not yet finalized must be placed in new registers,
/// so that they don't get overwritten by the next group's data.
/// This is because the emission of a group that is "done" is made after a comparison between the "current" and "next" grouping
/// columns returns nonequal. If we don't store the "current" group in a separate set of registers, the "next" group's data will
/// overwrite the "current" group's columns and the wrong grouping column values will be emitted.
/// Aggregation results do not require new registers as they are not at risk of being overwritten before a given group
/// is processed.
start_reg_dest: usize,
},
}
/// Enum representing the source of the aggregate function arguments
/// emitted for a group by aggregation.
/// In the common case, the aggregate function arguments are first inserted
/// into a sorter in the main loop, and in the group by aggregation phase
/// we read the data from the sorter.
///
/// In the alternative case, no sorting is required for group by,
/// and the aggregate function arguments are retrieved directly from
/// registers allocated in the main loop.
pub enum GroupByAggArgumentSource<'a> {
/// The aggregate function arguments are retrieved from a pseudo cursor
/// which reads from the GROUP BY sorter.
PseudoCursor {
cursor_id: usize,
col_start: usize,
dest_reg_start: usize,
aggregate: &'a Aggregate,
},
/// The aggregate function arguments are retrieved from a contiguous block of registers
/// allocated in the main loop for that given aggregate function.
Register {
src_reg_start: usize,
aggregate: &'a Aggregate,
},
}
```
Closes#1438
The following code reproduces the leak (memory usage increases over
time):
```rust
#[tokio::main]
async fn main() {
let db = Builder::new_local(":memory:").build().await.unwrap();
let conn = db.connect().unwrap();
conn.execute("SELECT load_extension('./target/debug/liblimbo_series');", ())
.await
.unwrap();
loop {
conn.execute("SELECT * FROM generate_series(1,10,2);", ())
.await
.unwrap();
}
}
```
After switching to the system allocator, the leak becomes detectable
with Valgrind:
```
32,000 bytes in 1,000 blocks are definitely lost in loss record 24 of 24
at 0x538580F: malloc (vg_replace_malloc.c:446)
by 0x62E15FA: alloc::alloc::alloc (alloc.rs:99)
by 0x62E172C: alloc::alloc::Global::alloc_impl (alloc.rs:192)
by 0x62E1530: allocate (alloc.rs:254)
by 0x62E1530: alloc::alloc::exchange_malloc (alloc.rs:349)
by 0x62E0271: new<limbo_series::GenerateSeriesCursor> (boxed.rs:257)
by 0x62E0271: open_GenerateSeriesVTab (lib.rs:19)
by 0x425D8FA: limbo_core::VirtualTable::open (lib.rs:732)
by 0x4285DDA: limbo_core::vdbe::execute::op_vopen (execute.rs:890)
by 0x42351E8: limbo_core::vdbe::Program::step (mod.rs:396)
by 0x425C638: limbo_core::Statement::step (lib.rs:610)
by 0x40DB238: limbo::Statement::execute::{{closure}} (lib.rs:181)
by 0x40D9EAF: limbo::Connection::execute::{{closure}} (lib.rs:109)
by 0x40D54A1: example::main::{{closure}} (example.rs:26)
```
Interestingly, when using mimalloc, neither Valgrind nor mimalloc’s
internal statistics report the leak.
Reviewed-by: Preston Thorpe (@PThorpe92)
Closes#1447
This change enables the Go adapter to embed platform-specific libraries
and extract them at runtime, eliminating the need for users to set
LD_LIBRARY_PATH or other environment variables.
- Add embedded.go with core library extraction functionality
- Update limbo_unix.go and limbo_windows.go to use embedded libraries
- Add build_lib.sh script to generate platform-specific libraries
- Update README.md with documentation for the new feature
- Add .gitignore to prevent committing binary files
- Add test coverage for Vector operations (vector(), vector_extract(),
vector_distance_cos()) and sqlite core features
The implementation maintains backward compatibility with the traditional
library loading mechanism as a fallback. This approach is inspired by
projects like go-embed-python that use a similar technique for native
library distribution.
https://github.com/tursodatabase/limbo/issues/506
Reviewed-by: Preston Thorpe (@PThorpe92)
Closes#1434
### Problem
The Limbo Go driver currently throws an "unsupported type" error when
trying to bind `time.Time` values as query parameters. This requires
applications to manually convert datetime values to strings before
passing them to queries.
### Solution
Added `time.Time` support to the `buildArgs` function in `types.go`. The
implementation:
- Converts `time.Time` to RFC3339 format strings
- Uses the existing `textVal` type for storage
- Maintains compatibility with Limbo's datetime handling
### Example Usage
```go
// Previously failed with "unsupported type: time.Time"
now := time.Now()
db.Exec("INSERT INTO events (timestamp) VALUES (?)", now)
// Now works as expected
```
### Testing
I tested with various datetime operations:
- Parameter binding with time.Time
- Round-trip storage and retrieval
- Compatibility with existing datetime functions
Values are stored in standard ISO8601/RFC3339 format which I believe is
same as sqlite.
Reviewed-by: Preston Thorpe (@PThorpe92)
Closes#1442
This PR is an enabler for our (Coming Soon ™️ ) join reordering
optimizer -- simply adds the notion of a join order to the current query
execution. This PR does not do any join ordering -- the join order is
always the same as expressed in the SQL query.
Reviewed-by: Preston Thorpe (@PThorpe92)
Closes#1439
The following code reproduces the leak, with memory usage increasing
over time:
```
#[tokio::main]
async fn main() {
let db = Builder::new_local(":memory:").build().await.unwrap();
let conn = db.connect().unwrap();
conn.execute("SELECT load_extension('./target/debug/liblimbo_series');", ())
.await
.unwrap();
loop {
conn.execute("SELECT * FROM generate_series(1,10,2);", ())
.await
.unwrap();
}
}
```
After switching to the system allocator, the leak becomes detectable
with Valgrind:
```
32,000 bytes in 1,000 blocks are definitely lost in loss record 24 of 24
at 0x538580F: malloc (vg_replace_malloc.c:446)
by 0x62E15FA: alloc::alloc::alloc (alloc.rs:99)
by 0x62E172C: alloc::alloc::Global::alloc_impl (alloc.rs:192)
by 0x62E1530: allocate (alloc.rs:254)
by 0x62E1530: alloc::alloc::exchange_malloc (alloc.rs:349)
by 0x62E0271: new<limbo_series::GenerateSeriesCursor> (boxed.rs:257)
by 0x62E0271: open_GenerateSeriesVTab (lib.rs:19)
by 0x425D8FA: limbo_core::VirtualTable::open (lib.rs:732)
by 0x4285DDA: limbo_core::vdbe::execute::op_vopen (execute.rs:890)
by 0x42351E8: limbo_core::vdbe::Program::step (mod.rs:396)
by 0x425C638: limbo_core::Statement::step (lib.rs:610)
by 0x40DB238: limbo::Statement::execute::{{closure}} (lib.rs:181)
by 0x40D9EAF: limbo::Connection::execute::{{closure}} (lib.rs:109)
by 0x40D54A1: example::main::{{closure}} (example.rs:26)
```
Interestingly, when using mimalloc, neither Valgrind nor mimalloc’s
internal statistics report the leak.
systems report x86_64. This change maps the architecture names to ensure the built libraries are placed in the correct directories for Go's embedded loading system (e.g., libs/linux_amd64)
Added the static feature from the extensions crate to core, as it is
needed to write extensions that should be defined and depend in code
from core. Added some docs to account for this.
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#1070
### The problem:
Sqlite displays the column names of the underlying vtab module when
displaying the `.schema`

Previously limbo omitted this, which makes it difficult for the user to
see what/how many columns the module's table has.
This matches sqlite's behavior by fetching the module's schema when the
schema entry is being inserted in translation.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#1168
Closes#1413 . Basically, SQLite emits a check in a transaction to see
if it is attempting to write. If the db is in read only mode, it throws
an error, else the statement is executed. Mirroring how Rusqlite does
it, I modified the `OpenFlags` to use bitflags to better configure how
we open our VFS. This modification, will enable us to run tests against
the same database in parallel.
Closes#1433
closes #1417
Man chasing this down was much much harder than it should have been.
We very frequently call `read_page` then push the return value onto the
page stack, or otherwise use it without it necessarily needing to not be
'in progress' of IO, so it was tricky to figure out where this was
happening and it had me thinking that it was something wrong with the
changes to `io_uring` on my branch.
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#1418
`Statement` and `Rows` both have a private Arc, implementing clone
avoids users needing to Arc<Mutex> it again.
Reviewed-by: Preston Thorpe (@PThorpe92)
Closes#1412
we had an incorrect optimization in `eliminate_orderby_like_groupby()`
where it could remove e.g. the first term of the ORDER BY if it matched
the first GROUP BY term and the result set was naturally ordered by that
term. this is invalid. see e.g.:
```sql
main branch - BAD: removes the `ORDER BY id` term because the results are naturally ordered by id.
However, this results in sorting the entire thing by last name only!
limbo> select id, last_name, count(1) from users GROUP BY 1,2 order by id, last_name desc limit 3;
┌──────┬───────────┬───────────┐
│ id │ last_name │ count (1) │
├──────┼───────────┼───────────┤
│ 6235 │ Zuniga │ 1 │
├──────┼───────────┼───────────┤
│ 8043 │ Zuniga │ 1 │
├──────┼───────────┼───────────┤
│ 944 │ Zimmerman │ 1 │
└──────┴───────────┴───────────┘
after fix - GOOD:
limbo> select id, last_name, count(1) from users GROUP BY 1,2 order by id, last_name desc limit 3;
┌────┬───────────┬───────────┐
│ id │ last_name │ count (1) │
├────┼───────────┼───────────┤
│ 1 │ Foster │ 1 │
├────┼───────────┼───────────┤
│ 2 │ Salazar │ 1 │
├────┼───────────┼───────────┤
│ 3 │ Perry │ 1 │
└────┴───────────┴───────────┘
I also refactored sorters to always use the ast `SortOrder` instead of boolean vectors, and use the `compare_immutable()` utility we use inside btrees too.
Closes#1365
Previously columns that were indexed were updated only in the
BtreeTable, but not on Index table. This commit basically enables
updates on indexes too if they are needed.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#1428