Commit Graph

2383 Commits

Author SHA1 Message Date
Jussi Saurio
37097e01ae GROUP BY: refactor logic to support cases where no sorting is needed 2025-05-08 12:39:26 +03:00
Jussi Saurio
ae2561dbca Merge 'Fix memory leak caused by unclosed virtual table cursors' from Piotr Rżysko
The following code reproduces the leak (memory usage increases over
time):
```rust
#[tokio::main]
async fn main() {
    let db = Builder::new_local(":memory:").build().await.unwrap();
    let conn = db.connect().unwrap();

    conn.execute("SELECT load_extension('./target/debug/liblimbo_series');", ())
        .await
        .unwrap();

    loop {
        conn.execute("SELECT * FROM generate_series(1,10,2);", ())
            .await
            .unwrap();
    }
}
```
After switching to the system allocator, the leak becomes detectable
with Valgrind:
```
32,000 bytes in 1,000 blocks are definitely lost in loss record 24 of 24
   at 0x538580F: malloc (vg_replace_malloc.c:446)
   by 0x62E15FA: alloc::alloc::alloc (alloc.rs:99)
   by 0x62E172C: alloc::alloc::Global::alloc_impl (alloc.rs:192)
   by 0x62E1530: allocate (alloc.rs:254)
   by 0x62E1530: alloc::alloc::exchange_malloc (alloc.rs:349)
   by 0x62E0271: new<limbo_series::GenerateSeriesCursor> (boxed.rs:257)
   by 0x62E0271: open_GenerateSeriesVTab (lib.rs:19)
   by 0x425D8FA: limbo_core::VirtualTable::open (lib.rs:732)
   by 0x4285DDA: limbo_core::vdbe::execute::op_vopen (execute.rs:890)
   by 0x42351E8: limbo_core::vdbe::Program::step (mod.rs:396)
   by 0x425C638: limbo_core::Statement::step (lib.rs:610)
   by 0x40DB238: limbo::Statement::execute::{{closure}} (lib.rs:181)
   by 0x40D9EAF: limbo::Connection::execute::{{closure}} (lib.rs:109)
   by 0x40D54A1: example::main::{{closure}} (example.rs:26)
```
Interestingly, when using mimalloc, neither Valgrind nor mimalloc’s
internal statistics report the leak.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1447
2025-05-08 10:48:23 +03:00
Jussi Saurio
d592e8ec8a Merge 'Show explanation for the NewRowid opcode' from Anton Harniakou
After this commit explain will start showing a comment for NewRowid.
```
limbo> explain insert into t(id) values (1);
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     17    0                    0   Start at 17
...
6     NewRowid           0     1     0                    0   r[1]=rowid
...
18    Goto               0     1     0                    0
```

Closes #1445
2025-05-07 09:15:46 +03:00
Jussi Saurio
57b16d5b2b Merge 'Add notion of join ordering to plan' from Jussi Saurio
This PR is an enabler for our (Coming Soon ™️ ) join reordering
optimizer -- simply adds the notion of a join order to the current query
execution. This PR does not do any join ordering -- the join order is
always the same as expressed in the SQL query.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1439
2025-05-07 08:55:13 +03:00
Piotr Rzysko
977b6b331a Fix memory leak caused by unclosed virtual table cursors
The following code reproduces the leak, with memory usage increasing
over time:

```
#[tokio::main]
async fn main() {
    let db = Builder::new_local(":memory:").build().await.unwrap();
    let conn = db.connect().unwrap();

    conn.execute("SELECT load_extension('./target/debug/liblimbo_series');", ())
        .await
        .unwrap();

    loop {
        conn.execute("SELECT * FROM generate_series(1,10,2);", ())
            .await
            .unwrap();
    }
}
```

After switching to the system allocator, the leak becomes detectable
with Valgrind:

```
32,000 bytes in 1,000 blocks are definitely lost in loss record 24 of 24
   at 0x538580F: malloc (vg_replace_malloc.c:446)
   by 0x62E15FA: alloc::alloc::alloc (alloc.rs:99)
   by 0x62E172C: alloc::alloc::Global::alloc_impl (alloc.rs:192)
   by 0x62E1530: allocate (alloc.rs:254)
   by 0x62E1530: alloc::alloc::exchange_malloc (alloc.rs:349)
   by 0x62E0271: new<limbo_series::GenerateSeriesCursor> (boxed.rs:257)
   by 0x62E0271: open_GenerateSeriesVTab (lib.rs:19)
   by 0x425D8FA: limbo_core::VirtualTable::open (lib.rs:732)
   by 0x4285DDA: limbo_core::vdbe::execute::op_vopen (execute.rs:890)
   by 0x42351E8: limbo_core::vdbe::Program::step (mod.rs:396)
   by 0x425C638: limbo_core::Statement::step (lib.rs:610)
   by 0x40DB238: limbo::Statement::execute::{{closure}} (lib.rs:181)
   by 0x40D9EAF: limbo::Connection::execute::{{closure}} (lib.rs:109)
   by 0x40D54A1: example::main::{{closure}} (example.rs:26)
```

Interestingly, when using mimalloc, neither Valgrind nor mimalloc’s
internal statistics report the leak.
2025-05-05 21:26:23 +02:00
Anton Harniakou
a971bce353 Show explanation for the NewRowid opcode 2025-05-04 14:14:19 +03:00
Jussi Saurio
4e05023bd3 Merge branch 'main' into ext-static-feature 2025-05-03 19:18:28 +03:00
Jussi Saurio
40c04c7074 Merge 'Adjust vtab schema creation to display the underlying columns' from Preston Thorpe
### The problem:
Sqlite displays the column names of the underlying vtab module when
displaying the `.schema`
![image](https://github.com/user-
attachments/assets/ca6aa1c9-0af7-4f34-a5c4-c8336fa23858)
Previously limbo omitted this, which makes it difficult for the user to
see what/how many columns the module's table has.
This matches sqlite's behavior by fetching the module's schema when the
schema entry is being inserted in translation.
![image](https://github.com/user-
attachments/assets/a56b8239-0f65-420b-a0b6-536ede117fba)

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1168
2025-05-03 19:17:25 +03:00
Jussi Saurio
c9eb56b54a Merge 'Read only mode' from Pedro Muniz
Closes #1413 . Basically, SQLite emits a check in a transaction to see
if it is attempting to write. If the db is in read only mode, it throws
an error, else the statement is executed. Mirroring how Rusqlite does
it, I modified the `OpenFlags` to use bitflags to better configure how
we open our VFS. This modification, will enable us to run tests against
the same database in parallel.

Closes #1433
2025-05-03 19:15:06 +03:00
Jussi Saurio
9ea958561b Merge 'Bump assorted dependencies' from Preston Thorpe
Closes #1425
2025-05-03 18:31:58 +03:00
Jussi Saurio
b86123a82e Merge 'Fix panic on async io due to reading locked page' from Preston Thorpe
closes  #1417
Man chasing this down was much much harder than it should have been.
We very frequently call `read_page` then push the return value onto the
page stack, or otherwise use it without it necessarily needing to not be
'in progress' of IO, so it was tricky to figure out where this was
happening and it had me thinking that it was something wrong with the
changes to `io_uring` on my branch.

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1418
2025-05-03 18:30:29 +03:00
Jussi Saurio
fafeabd081 Merge 'Eliminate a superfluous read transaction when doing PRAGMA user_version' from Anton Harniakou
This PR removes an unnecessary read transaction.
Bytecode before this PR:
```
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     5     0                    0   Start at 5
1     Transaction        0     0     0                    0   write=false
2     ReadCookie         0     1     6                    0
3     ResultRow          1     1     0                    0   output=r[1]
4     Halt               0     0     0                    0
5     Transaction        0     0     0                    0   write=false
6     Goto               0     1     0                    0
```
Bytecode after this PR:
```limbo> explain PRAGMA user_version;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     4     0                    0   Start at 4
1     ReadCookie         0     1     6                    0
2     ResultRow          1     1     0                    0   output=r[1]
3     Halt               0     0     0                    0
4     Transaction        0     0     0                    0   write=false
5     Goto               0     1     0                    0
```

Closes #1431
2025-05-03 15:40:27 +03:00
Jussi Saurio
330fedbc2f Add notion of join ordering to plan + make determining where to eval expr dynamic always 2025-05-03 15:32:06 +03:00
Jussi Saurio
306e097950 Merge 'Fix bug: we cant remove order by terms from the head of the list' from Jussi Saurio
we had an incorrect optimization in `eliminate_orderby_like_groupby()`
where it could remove e.g. the first term of the ORDER BY if it matched
the first GROUP BY term and the result set was naturally ordered by that
term. this is invalid. see e.g.:
```sql
main branch - BAD: removes the `ORDER BY id` term because the results are naturally ordered by id.
However, this results in sorting the entire thing by last name only!

limbo> select id, last_name, count(1) from users GROUP BY 1,2 order by id, last_name desc limit 3;
┌──────┬───────────┬───────────┐
│ id   │ last_name │ count (1) │
├──────┼───────────┼───────────┤
│ 6235 │ Zuniga    │         1 │
├──────┼───────────┼───────────┤
│ 8043 │ Zuniga    │         1 │
├──────┼───────────┼───────────┤
│  944 │ Zimmerman │         1 │
└──────┴───────────┴───────────┘

after fix - GOOD:

limbo> select id, last_name, count(1) from users GROUP BY 1,2 order by id, last_name desc limit 3;
┌────┬───────────┬───────────┐
│ id │ last_name │ count (1) │
├────┼───────────┼───────────┤
│  1 │ Foster    │         1 │
├────┼───────────┼───────────┤
│  2 │ Salazar   │         1 │
├────┼───────────┼───────────┤
│  3 │ Perry     │         1 │
└────┴───────────┴───────────┘

I also refactored sorters to always use the ast `SortOrder` instead of boolean vectors, and use the `compare_immutable()` utility we use inside btrees too.

Closes #1365
2025-05-03 12:48:08 +03:00
Anton Harniakou
3c0b7cad74 Eliminate a superfluous read transaction when doing PRAGMA user_version 2025-05-03 10:48:27 +03:00
pedrocarlo
0c22382f3c shared lock on file and throw ReadOnly error in transaction 2025-05-02 16:30:48 -03:00
PThorpe92
d4cf8367ba Wrap return_if_locked in balance non root in debug assertion cfg 2025-05-02 10:55:00 -04:00
PThorpe92
f025f7e91e Fix panic on async io due to reading locked page 2025-05-02 10:55:00 -04:00
Pere Diaz Bou
f15a17699b check indexes are not added twice in update plan 2025-05-01 12:38:34 +03:00
Pere Diaz Bou
c808863256 test update with index 2025-05-01 11:44:23 +03:00
Pere Diaz Bou
64a12ed887 update index on indexed columns
Previously columns that were indexed were updated only in the
BtreeTable, but not on Index table. This commit basically enables
updates on indexes too if they are needed.
2025-05-01 11:16:29 +03:00
Jussi Saurio
6096cfb3d8 Merge 'Add PRAGMA schema_version' from Anton Harniakou
This PR adds `PRAGMA schema_version` to get the value of the schema-
version integer at offset 40 in the database header.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1427
2025-05-01 10:50:02 +03:00
Jussi Saurio
a25f228ea7 Merge 'Fix setting default value for primary key on UPDATE' from Pere Diaz Bou
I noticed when updating a table with a primary key, it would sometimes
set primary key column to null. I believe the problem was due to
incorrect condition that was inconsistent with the comment above: "don't
emit null for pkey of virtual tables."
cc: @PThorpe92

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1422
2025-05-01 10:47:17 +03:00
Jussi Saurio
a525feb7ad Merge 'Fix: allow page_size=65536' from meteorgan
Since `page_size` in `DatabaseHeader` can be 1 representing 65526 bytes,
it can't be used it directly.  Additionally, we should use `u32` instead
of `u16` or `usize` in other contexts.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1411
2025-05-01 10:46:19 +03:00
Jussi Saurio
7643b7666c Merge 'Fix page_count pragma' from meteorgan
This issue was introduced in #819. However, I believe the solution is
suboptimal because `pragma page_count` can never return 1, which is
inconsistent with SQLite.
<img width="442" alt="image" src="https://github.com/user-
attachments/assets/c772eae7-3e9f-4687-a94a-230deb0eb034" />
To align with SQLite's behavior, we should allocate the first page when
the first schema object is created, rather than immediately after
creating database. And it's always preferable to return an accurate page
count.

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1407
2025-05-01 10:36:36 +03:00
Pere Diaz Bou
1a2a383635 fix setting default value for primary key on UPDATE
I noticed when updating a table with a primary key, it would sometimes
set primary key column to null. A primary key can be nullified if it
isn't a rowid alias, meaning it isn't a INTEGER PRIMAR KEY.
2025-05-01 09:46:48 +03:00
Anton Harniakou
525b7fdbaa Add PRAGMA schema_version 2025-04-30 09:41:04 +03:00
PThorpe92
1e2be35e3b Add fs feature to rustix dependency 2025-04-29 23:07:28 -04:00
Pekka Enberg
f60fc26578 Merge 'Support literal-value current_time, current_date and current_timestamp' from meteorgan
I haven't found a way to automate these tests.
<p><img width="361" alt="image" src="https://github.com/user-
attachments/assets/a1563776-97e0-4aa5-844a-b9b23c5273e5" /></p>
<p><img width="279" alt="image" src="https://github.com/user-
attachments/assets/df036951-2649-4835-bffa-f25e6f59bb07" /></p>

Closes #1424
2025-04-29 21:51:33 +03:00
PThorpe92
7b6452034b Bump lru dependency to 0.14.0 2025-04-29 10:44:26 -04:00
PThorpe92
7a3d949bd1 Bump mimalloc dependency to 0.1.46 2025-04-29 10:43:46 -04:00
PThorpe92
f581d1de3a Bump miette dependency to 7.6.0 2025-04-29 10:43:07 -04:00
PThorpe92
ba225ade0d Bump libc dependency to 0.2.172 2025-04-29 10:42:10 -04:00
PThorpe92
582ca68640 Bump rustix dependency to v1.0.5 2025-04-29 10:39:26 -04:00
PThorpe92
2785fd5d4a Bump polling crate dependency to 3.7.4 2025-04-29 10:38:46 -04:00
PThorpe92
be5ae7d0e3 Bump io_uring dependency to 0.7.5 2025-04-29 10:38:01 -04:00
meteorgan
51d43074f3 Support literal-value current_time, current_date and current_timestamp 2025-04-29 22:35:26 +08:00
Pere Diaz Bou
a30241ca91 Add state machine for op_idx_delete + DeleteState simplification
DeleteState had a bit too many unnecessary states so I removed them.
Usually we care about having a different state when I/O is triggered
requiring a state to be stored for later.

Furthermore, there was a bug with op_idx_delete where if balance is
triggered, op_idx_delete wouldn't be re-entrant. So a state machine was
added to prevent that from happening.
2025-04-29 14:58:20 +03:00
Preston Thorpe
d837f89d74 Merge branch 'main' into vtab_schema 2025-04-28 22:09:10 -04:00
meteorgan
d1a50f8a69 skip unneccessary conversion 2025-04-28 16:13:07 +08:00
meteorgan
d2dce740f7 fix some issues about page_size 2025-04-28 16:13:07 +08:00
Jussi Saurio
3459c1f7dd Merge 'btree/tablebtree_move_to: micro-optimizations' from Jussi Saurio
```bash
jussi@Jussis-MacBook-Pro limbo % git co main && cargo build --bin limbo --release && hyperfine --shell=none --warmup 5 './target/release/limbo TPC-H.db "select l_orderkey, 3 as revenue, o_orderdate, o_shippriority from lineitem, orders, customer where c_mktsegment = '\''FURNITURE'\'' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < cast('\''1995-03-29'\'' as datetime) and l_shipdate > cast('\''1995-03-29'\'' as datetime);"'

...

Benchmark 1: ./target/release/limbo TPC-H.db "select l_orderkey, 3 as revenue, o_orderdate, o_shippriority from lineitem, orders, customer where c_mktsegment = 'FURNITURE' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < cast('1995-03-29' as datetime) and l_shipdate > cast('1995-03-29' as datetime);"
  Time (mean ± σ):      2.104 s ±  0.006 s    [User: 1.952 s, System: 0.151 s]
  Range (min … max):    2.094 s …  2.115 s    10 runs

jussi@Jussis-MacBook-Pro limbo % git co move-to-micro-opt && cargo build --bin limbo --release && hyperfine --shell=none --warmup 5 './target/release/limbo TPC-H.db "select l_orderkey, 3 as revenue, o_orderdate, o_shippriority from lineitem, orders, customer where c_mktsegment = '\''FURNITURE'\'' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < cast('\''1995-03-29'\'' as datetime) and l_shipdate > cast('\''1995-03-29'\'' as datetime);"'

...

Benchmark 1: ./target/release/limbo TPC-H.db "select l_orderkey, 3 as revenue, o_orderdate, o_shippriority from lineitem, orders, customer where c_mktsegment = 'FURNITURE' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < cast('1995-03-29' as datetime) and l_shipdate > cast('1995-03-29' as datetime);"
  Time (mean ± σ):      1.883 s ±  0.012 s    [User: 1.733 s, System: 0.146 s]
  Range (min … max):    1.866 s …  1.908 s    10 runs
```

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1408
2025-04-28 10:27:59 +03:00
Pere Diaz Bou
63a94e7c62 Merge 'Emit IdxDelete instruction and some fixes on seek after deletion' from Pere Diaz Bou
Previously `DELETE FROM ...` only emitted deletes for main table, but
this is incorrect as we want to remove entries from index tables as
well.

Closes #1383
2025-04-28 09:13:54 +03:00
meteorgan
f3f09a5b7b Fix pragma page_count 2025-04-26 21:45:18 +08:00
Jussi Saurio
46d45a6bf4 don't recompute cell_count 2025-04-26 14:50:42 +03:00
Jussi Saurio
75c6678a06 sqlite3_ondisk: use debug asserts for cell_table_interior_read... funcs 2025-04-26 14:50:42 +03:00
Jussi Saurio
ac1bc17ea4 btree/tablebtree_seek: remove some more useless calls to set_cell_index() 2025-04-26 13:41:30 +03:00
Jussi Saurio
5060f1a1fa don't use integer division in binary search halfpoint calculation 2025-04-26 12:38:15 +03:00
Jussi Saurio
23ae5a27c4 btree/tablebtree_move_to: micro-optimizations 2025-04-26 11:32:24 +03:00
Jussi Saurio
454a409cae Merge 'refactor database open_file and open' from meteorgan
reduce redundant code

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1406
2025-04-25 22:03:49 +03:00