I was baffled previously, because any time that `free` was called on a
type from an extension, it would hang even when I knew it wasn't in use
any longer, and hadn't been double free'd.
After #737 was merged, I tried it again and noticed that it would no
longer hang... but only for extensions that were staticly linked.
Then I realized that we are using a global allocator, that likely wasn't
getting used in the shared library that is built separately that won't
inherit from our global allocator in core, causing some symbol mismatch
and the subsequent hanging on calls to `free`.
This PR adds the global allocator to extensions behind a feature flag in
the macro that will prevent it from being used in `wasm` and staticly
linked environments where it would conflict with limbos normal global
allocator. This allows us to properly free the memory from returning
extension functions over FFI.
This PR also changes the Extension type to a union field so we can store
int + float values inline without boxing them.
any additional tips or thoughts anyone else has on improving this would
be appreciated 👍Closes#803
This patch adds some libSQL vector extension functions such as
`vector()` and `vector_distance_cos()`, which can be used for exact
nearest neighbor search as follows:
```
limbo> SELECT embedding, vector_distance_cos(embedding, '[9, 9, 9]')
...> FROM movies ORDER BY vector_distance_cos(embedding, '[9, 9, 9]');
[4, 5, 6]|0.013072490692138672
[1, 2, 3]|0.07417994737625122
```
The current status of the PR is halfway. The new framing of simulation
runner where `setup_simulation` is separated from `run_simulation`
allows for injecting custom plans easily. The PR is currently missing
the functionality to update the `SimulatorEnv` ad hoc from the plan, as
the environment tables were typically created during the planning phase.
The next steps will be to implement a function `fn
mk_env(InteractionPlan, SimulatorEnv) -> SimulatorEnv`, add `--load`
flag to the CLI for loading a serialized plan file, making a
corresponding environment and running the simulation.
We can optionally combine this with a `--save` option, in which we keep
a seed-vault as part of limbo simulator, corresponding each seed with
its generated plan and save the time to regenerate existing seeds by
just loading them into memory. I am curious to hear thoughts on this?
Would the maintainers be open to adding such a seed-vault? Do you think
the saved time would be worth the complexity of the approach?
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#720
- add `--watch` flag
- start saving seeds in persistent storage
- make a separate version of execution functions that use `vector of interaction` instead of `InteractionPlan`
#708
This PR adds basic support for the following API for defining
Aggregates, and changes the existing API for scalars.
```rust
register_extension! {
scalars: { Double },
aggregates: { MedianState },
}
#[derive(ScalarDerive)]
struct Double;
impl Scalar for Double {
fn name(&self) -> &'static str {
"double"
}
fn call(&self, args: &[Value]) -> Value {
if let Some(arg) = args.first() {
match arg.value_type() {
ValueType::Float => {
let val = arg.to_float().unwrap();
Value::from_float(val * 2.0)
}
ValueType::Integer => {
let val = arg.to_integer().unwrap();
Value::from_integer(val * 2)
}
_ => {
println!("arg: {:?}", arg);
Value::null()
}
}
} else {
Value::null()
}
}
}
#[derive(AggregateDerive)]
struct MedianState;
impl AggFunc for MedianState {
type State = Vec<f64>;
fn name(&self) -> &'static str {
"median"
}
fn args(&self) -> i32 { 1 }
fn step(state: &mut Self::State, args: &[Value]) {
if let Some(val) = args.first().and_then(Value::to_float) {
state.push(val);
}
}
fn finalize(state: Self::State) -> Value {
if state.is_empty() {
return Value::null();
}
let mut sorted = state;
sorted.sort_by(|a, b| a.partial_cmp(b).unwrap());
let len = sorted.len();
if len % 2 == 1 {
Value::from_float(sorted[len / 2])
} else {
let mid1 = sorted[len / 2 - 1];
let mid2 = sorted[len / 2];
Value::from_float((mid1 + mid2) / 2.0)
}
}
}
```
I know it's a bit more verbose than the previous version, but I think in
the long run this will be more robust, and it matches the aggregate API
of implementing a trait on a struct that you derive the relevant trait
on.
Also this allows for better registration of functions, I think passing
in the struct identifiers just feels much better than the `"func_name"
=> function_ptr`
Closes#721
- makes interaction plans serializable
- fixes the shadowing bug where non-created tables were assumed to be created in the shadow tables map
- makes small changes to make clippy happy
- reorganizes simulation running flow to remove unnecessary plan regenerations while shrinking and double checking
This shaves off `nix` as a dependency in core, which was only being used
in the io_uring backend for the `fcntl` advisory record locking. Since a
previous PR made `rustix` a dep not only for Mac but for any Unix, and
`rustix` also has `fcntl`, `nix` becomes redundant.
Furthermore, it reduces `libc` usage across core. The only remaining
uses are:
```rust
io_uring::opcode::Readv::new(fd, iovec as *const iovec as *const libc::iovec, 1)
io_uring::opcode::Writev::new(fd, iovec as *const iovec as *const libc::iovec, 1)
```
These two are a little ugly, but sadly the `io_uring` crate requires
both opcodes to take a `libc::iovec` while it doesn't export the type.
See tokio-rs/io-uring#259 for a request to use `std::io::IoSlice` or to
export the type directly.
Apart from those two, there are no other usages of libc, so if those are
resolved, we could also drop the dependency on libc.
Closes#668
This implements libSQL compatible Rust API on top of Limbo's core. The
purpose of this is to allow libraries and apps that build on libSQL to
use Limbo.
Lots of cleanup still left to do. Draft PR for adding support for OPFS
for WASM build (add support for limbo in browser).
Overall explanation of the architecture: this follows the sqlite wasm
architecture for OPFS.
main <> (limbo-worker.js) limbo (VFS - opfs.js) <> opfs-sync-proxy
The main thread loads limbo-worker.js which bootstraps the opfs.js and
opfs-sync-proxy.js and then launches the limbo-wasm.js.
At that point it can be used with worker.postmessage and
worker.onmessage interactions from the main thread.
The VFS provided by opfs.js provides a sync API by offloading async
operations to opfs-sync-proxy.js. This is done through SharedArrayBuffer
and Atomic.wait() to make the actual async operations appear synchronous
for limbo.
resolves#531Closes#594
I noticed that the parse errors were a bit hard to read - only the nearest token and the line/col offsets were printed.
I made a first attempt at improving the errors using [miette](https://github.com/zkat/miette).
- Added derive for `miette::Diagnostic` to both the parser's error type and LimboError.
- Added miette dependency to both sqlite3_parser and core. The `fancy` feature is only enabled for CLI.
Some future improvements that can be made further:
- Add spans to AST nodes so that errors can better point to the correct token. See upstream issue: https://github.com/gwenn/lemon-rs/issues/33
- Construct more errors with offset information. I noticed that most parser errors are constructed with `None` as the offset.
Comparisons.
Before:
```
❯ cargo run --package limbo --bin limbo database.db --output-mode pretty
...
limbo> selet * from a;
[2025-01-05T11:22:55Z ERROR sqlite3Parser] near "Token([115, 101, 108, 101, 116])": syntax error
Parse error: near "selet": syntax error at (1, 6)
```
After:
```
❯ cargo run --package limbo --bin limbo database.db --output-mode pretty
...
limbo> selet * from a;
[2025-01-05T12:25:52Z ERROR sqlite3Parser] near "Token([115, 101, 108, 101, 116])": syntax error
× near "selet": syntax error at (1, 6)
╭────
1 │ selet * from a
· ▲
· ╰── syntax error
╰────
```