Commit Graph

4395 Commits

Author SHA1 Message Date
Jussi Saurio
c8f5bd3f4f rename 2025-05-14 09:42:26 +03:00
Jussi Saurio
630a6093aa refactor join_lhs_tables_to_rhs_table 2025-05-14 09:42:26 +03:00
Jussi Saurio
62d2ee8eb6 rename 2025-05-14 09:42:26 +03:00
Jussi Saurio
5f9ebe26a0 as_binary_components() helper 2025-05-14 09:42:26 +03:00
Jussi Saurio
a92d94270a Get rid of useless ScanCost struct 2025-05-14 09:42:26 +03:00
Jussi Saurio
de9e8442e8 fix ephemeral 2025-05-14 09:42:25 +03:00
Jussi Saurio
3b1aef4a9e Do Less Work (tm) - everything works except ephemeral 2025-05-14 09:42:01 +03:00
Jussi Saurio
87850e5706 simplify 2025-05-14 09:41:14 +03:00
Jussi Saurio
77f11ba004 simplify AccessMethodKind 2025-05-14 09:41:14 +03:00
Jussi Saurio
5f724d6b2e Add more comments to join ordering logic 2025-05-14 09:41:14 +03:00
Jussi Saurio
c02d3f8bcd Do groupby/orderby sort elimination based on optimizer decision 2025-05-14 09:41:13 +03:00
Jussi Saurio
1e46f1d9de Feature: join reordering optimizer 2025-05-14 09:40:48 +03:00
Jussi Saurio
c8c83fc6e6 OPTIMIZER.MD docs 2025-05-14 09:39:47 +03:00
Jussi Saurio
67a080bfa0 dont mutate where clause during individual index selection phase 2025-05-14 09:39:47 +03:00
Jussi Saurio
1b71f58bbf Merge 'Redesign parameter binding in query translator' from Preston Thorpe
closes #1467
## Example:
Previously as explained in #1449, our parameter binding wasn't working
properly because we would essentially
assign the first index of whatever was translated first
```console
limbo> create table t (id integer primary key, name text, age integer);
limbo> explain select * from t where name = ? and id > ? and age between ? and ?;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     20    0                    0   Start at 20
1     OpenRead           0     2     0                    0   table=t, root=2
2     Variable           1     4     0                    0   r[4]=parameter(1) # always 1
3     IsNull             4     19    0                    0   if (r[4]==NULL) goto 19
4     SeekGT             0     19    4                    0   key=[4..4]
5       Column           0     1     5                    0   r[5]=t.name
6       Variable         2     6     0                    0   r[6]=parameter(2) # always 2
7       Ne               5     6     18                   0   if r[5]!=r[6] goto 18
8       Variable         3     7     0                    0   r[7]=parameter(3) # etc...
9       Column           0     2     8                    0   r[8]=t.age
10      Gt               7     8     18                   0   if r[7]>r[8] goto 18
11      Column           0     2     9                    0   r[9]=t.age
12      Variable         4     10    0                    0   r[10]=parameter(4)
13      Gt               9     10    18                   0   if r[9]>r[10] goto 18
14      RowId            0     1     0                    0   r[1]=t.rowid
15      Column           0     1     2                    0   r[2]=t.name
16      Column           0     2     3                    0   r[3]=t.age
17      ResultRow        1     3     0                    0   output=r[1..3]
18    Next               0     5     0                    0
19    Halt               0     0     0                    0
20    Transaction        0     0     0                    0   write=false
21    Goto               0     1     0                    0
```
## Solution:
`rewrite_expr` currently is used to transform `true|false` to `1|0`, so
it has been adapted to transform anonymous `Expr::Variable`s to named
variables, inserting the appropriate index of the parameter by passing
in a counter.
```rust
        ast::Expr::Variable(var) => {
            if var.is_empty() {
                // rewrite anonymous variables only, ensure that the `param_idx` starts at 1 and
                // all the expressions are rewritten in the order they come in the statement
                *expr = ast::Expr::Variable(format!("{}{param_idx}", PARAM_PREFIX));
                *param_idx += 1;
            }
            Ok(())
        }
```
# Corrected output: (notice the seek)
```console
limbo> explain select * from t where name = ? and id > ? and age between ? and ?;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     20    0                    0   Start at 20
1     OpenRead           0     2     0                    0   table=t, root=2
2     Variable           2     4     0                    0   r[4]=parameter(2)
3     IsNull             4     19    0                    0   if (r[4]==NULL) goto 19
4     SeekGT             0     19    4                    0   key=[4..4]
5       Column           0     1     5                    0   r[5]=t.name
6       Variable         1     6     0                    0   r[6]=parameter(1)
7       Ne               5     6     18                   0   if r[5]!=r[6] goto 18
8       Variable         3     7     0                    0   r[7]=parameter(3)
9       Column           0     2     8                    0   r[8]=t.age
10      Gt               7     8     18                   0   if r[7]>r[8] goto 18
11      Column           0     2     9                    0   r[9]=t.age
12      Variable         4     10    0                    0   r[10]=parameter(4)
13      Gt               9     10    18                   0   if r[9]>r[10] goto 18
14      RowId            0     1     0                    0   r[1]=t.rowid
15      Column           0     1     2                    0   r[2]=t.name
16      Column           0     2     3                    0   r[3]=t.age
17      ResultRow        1     3     0                    0   output=r[1..3]
18    Next               0     5     0                    0
19    Halt               0     0     0                    0
20    Transaction        0     0     0                    0   write=false
21    Goto               0     1     0                    0
```
## And a `Delete`:
```console
limbo> explain delete from t where name = ? and age > ? and id > ?;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     15    0                    0   Start at 15
1     OpenWrite          0     2     0                    0
2     Variable           3     1     0                    0   r[1]=parameter(3)
3     IsNull             1     14    0                    0   if (r[1]==NULL) goto 14
4     SeekGT             0     14    1                    0   key=[1..1]
5       Column           0     1     2                    0   r[2]=t.name
6       Variable         1     3     0                    0   r[3]=parameter(1)
7       Ne               2     3     13                   0   if r[2]!=r[3] goto 13
8       Column           0     2     4                    0   r[4]=t.age
9       Variable         2     5     0                    0   r[5]=parameter(2)
10      Le               4     5     13                   0   if r[4]<=r[5] goto 13
11      RowId            0     6     0                    0   r[6]=t.rowid
12      Delete           0     0     0                    0
13    Next               0     5     0                    0
14    Halt               0     0     0                    0
15    Transaction        0     1     0                    0   write=true
16    Goto               0     1     0                    0
```

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1475
2025-05-14 09:26:06 +03:00
Jussi Saurio
a0f973cb34 Merge 'Fix infinite loop when inserting multiple rows' from Jussi Saurio
Due to constant instruction reshuffling introduced in #1359, it is
advisable not to do either of the following:
1. Use raw offsets as jump targets
2. Use `program.resolve_label(my_label, program.offset())` when it is
uncertain what the next instruction will be
Instead, if you want a label to point to "whatever instruction follows
the last one", you should use
`program.preassign_label_to_next_insn(label)`, which will work correctly
even with instruction rerdering

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1474
2025-05-14 09:24:47 +03:00
Pekka Enberg
bef665b7f3 Limbo 0.0.20-pre.2 2025-05-14 09:17:07 +03:00
Pekka Enberg
d912f14528 Update CHANGELOG 2025-05-14 09:16:53 +03:00
Pekka Enberg
67775fbc1d Merge 'github: Ensure rustmft is installed' from Pekka Enberg
Closes #1478
2025-05-14 09:16:10 +03:00
Pekka Enberg
da3815e1cb github: Ensure rustmft is installed 2025-05-14 09:12:35 +03:00
Pekka Enberg
3be9807e4f Update CHANGELOG 2025-05-14 08:58:08 +03:00
PThorpe92
a0b2b6e85d Consolidate match case in parameters push to handle all anonymous params in one case 2025-05-13 14:42:12 -04:00
PThorpe92
2f255524bd Remove unused import and unnecessary mut annotations in insert.rs 2025-05-13 14:34:22 -04:00
PThorpe92
94aa9cd99d Add cases to rewrite_expr in the optimizer 2025-05-13 14:33:45 -04:00
PThorpe92
16ac6ab918 Fix parameter push method to re-convert anonymous parameters 2025-05-13 14:33:11 -04:00
PThorpe92
e91d17f06e Add tests for parameter binding for update, select and delete queries 2025-05-13 12:50:10 -04:00
PThorpe92
0593a99f0e Remove insertCtx from parameters and replace fix with expr rewriting 2025-05-13 12:49:16 -04:00
Jussi Saurio
3cc9147f6c Merge 'testing/py: rename debug_print() to run_debug()' from Jussi Saurio
I wasted a few minutes staring at #1471 because I thought
`limbo.debug_print()` just prints the sql, but it actually executes it
too

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1472
2025-05-13 10:12:27 +03:00
Jussi Saurio
a2e577ad01 Merge 'Fix handling of empty strings in prepared statements' from Diego Reis
When `prepare()` is called with an empty string it should throw an error
(e.g `ApiMisuse` in rusqlite). I'm testing only in JS but it should
throw to any bind.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1473
2025-05-13 10:12:09 +03:00
Jussi Saurio
44e282f630 Add multi-row insert regression test 2025-05-13 09:03:01 +03:00
Jussi Saurio
957fe1b446 Fix infinite loop when inserting multiple rows 2025-05-13 08:54:25 +03:00
Diego Reis
07bfeadd56 core: Simplify error handling of malformed strings for prepared statements 2025-05-12 13:25:11 -03:00
Diego Reis
f7ab8b11d6 cargo fmt 2025-05-12 10:56:53 -03:00
Jussi Saurio
2b6b09d435 Merge 'btree: Coalesce free blocks in page_free_array()' from Mohamed Hossam
Coalesce adjacent free blocks during `page_free_array()` in
`core/storage/btree`.
Instead of immediately passing free cells to `free_cell_range()`, buffer
up to 10 free cells and try to merge adjacent free blocks. Break on the
first merge to avoid time complexity, `free_cell_range()` coalesces
blocks afterwards anyways. This follows SQLite's [`pageFreeArray()`](htt
ps://github.com/sqlite/sqlite/blob/d7324103b196c572a98724a5658970b4000b8
c39/src/btree.c#L7729) implementation.
Removed this TODO:
```rust
fn page_free_array( . . )
    .
    .
    // TODO: implement fancy smart free block coalescing procedure instead of dumb free to
    // then defragment
```

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1448
2025-05-12 16:52:11 +03:00
Diego Reis
bc72c396f0 binds: Add empty prepared statement tests 2025-05-12 10:41:05 -03:00
Diego Reis
c4e7be04f8 core: Handles prepared statement with empty SQL 2025-05-12 10:38:58 -03:00
m0hossam
70b9438f80 Add comments 2025-05-12 15:58:36 +03:00
Jussi Saurio
6926d7b931 testing/py: rename debug_print() to run_debug() 2025-05-12 10:52:13 +03:00
Jussi Saurio
9b96e2bcc3 Merge 'CREATE VIRTUAL TABLE fixes' from Piotr Rżysko
This PR fixes two bugs in `CREATE VIRTUAL TABLE` (see individual commits
for details).
I added a test in `extensions.py` instead of the TCL test suite because
it's currently more convenient - there’s no existing framework for
loading extensions in TCL tests. However, I believe the test should
eventually be moved to TCL, as it verifies behavior expected to be
compatible with SQLite and is independent of any specific extension.
I've seen a PR proposing to migrate TCL tests to Rust, so please let me
know if moving this test to TCL would still be valuable.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1471
2025-05-12 10:47:51 +03:00
Jussi Saurio
7b388b9696 Merge 'Bindings/Go: Fix symbols for FFI calls' from Preston Thorpe
While debugging another issue for #1469, I noticed that the symbols
aren't being properly referenced for `getError` for either limboRows or
limboStmt types.

Closes #1470
2025-05-12 10:44:31 +03:00
Jussi Saurio
501e95637a Merge 'Support isnull and notnull expr' from meteorgan
Limbo `isnull` output:
```
limbo> explain select 1 isnull;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     3     0                    0   Start at 3
1     ResultRow          1     1     0                    0   output=r[1]
2     Halt               0     0     0                    0
3     Integer            1     2     0                    0   r[2]=1
4     Integer            1     1     0                    0   r[1]=1
5     IsNull             2     7     0                    0   if (r[2]==NULL) goto 7
6     Integer            0     1     0                    0   r[1]=0
7     Goto               0     1     0                    0
```
Sqlite `isnull` output:
```
sqlite>  explain select 1 isnull;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     6     0                    0   Start at 6
1     Integer        1     1     0                    0   r[1]=1
2     IsNull         2     4     0                    0   if r[2]==NULL goto 4
3     Integer        0     1     0                    0   r[1]=0
4     ResultRow      1     1     0                    0   output=r[1]
5     Halt           0     0     0                    0
6     Integer        1     2     0                    0   r[2]=1
7     Goto           0     1     0                    0
```
------------------------------------------------------------------------
-------------------
Limbo `notnull` output:
```
limbo> explain select 1 notnull;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     3     0                    0   Start at 3
1     ResultRow          1     1     0                    0   output=r[1]
2     Halt               0     0     0                    0
3     Integer            1     2     0                    0   r[2]=1
4     Integer            1     1     0                    0   r[1]=1
5     NotNull            2     7     0                    0   r[2]!=NULL -> goto 7
6     Integer            0     1     0                    0   r[1]=0
7     Goto               0     1     0                    0
```
Sqlite `notnull` output:
```
sqlite> explain select 1 notnull;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     6     0                    0   Start at 6
1     Integer        1     1     0                    0   r[1]=1
2     NotNull        2     4     0                    0   if r[2]!=NULL goto 4
3     Integer        0     1     0                    0   r[1]=0
4     ResultRow      1     1     0                    0   output=r[1]
5     Halt           0     0     0                    0
6     Integer        1     2     0                    0   r[2]=1
7     Goto           0     1     0                    0
```

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1468
2025-05-12 10:06:35 +03:00
Jussi Saurio
07415dac07 Merge 'Count optimization' from Pedro Muniz
After reading #1123, I wanted to see what optimizations I could do.
Sqlite optimizes `count` aggregation for the following case: `SELECT
count() FROM <tbl>`. This is so widely used, that they made an
optimization just for it in the form of the `COUNT` opcode.
This PR thus implements this optimization by creating the `COUNT`
opcode, and checking in the select emitter if we the query is a Simple
Count Query. If it is, we just emit the Opcode instead of going through
a Rewind loop, saving on execution time.
The screenshots below show a huge decrease in execution time.
- **Main**
<img width="383" alt="image" src="https://github.com/user-
attachments/assets/99a9dec4-e7c5-41db-ba67-4eafa80dd2e6" />
- **Count Optimization**
<img width="435" alt="image" src="https://github.com/user-
attachments/assets/e93b3233-92e6-4736-aa60-b52b2477179f" />

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1443
2025-05-12 10:01:50 +03:00
Jussi Saurio
6d11035f92 Merge 'Fix bound parameters on insert statements with out of order column indexes' from Preston Thorpe
closes #1449
A quick `explain` will show the root of the problem:
![image](https://github.com/user-
attachments/assets/c5e9f2cf-494f-410a-9e6f-399d4256cbc8)
vs `sqlite3`
![image](https://github.com/user-
attachments/assets/79b18254-4a5d-40bb-a5f2-1322326b745f)
We process the columns in the order they need to be to create the
record, and although we store the index of the value meant to be
inserted into that column based on the order they are bound, we still
lose the original ordering.
## EDIT: Fixed
![image](https://github.com/user-
attachments/assets/4b40451a-6eda-4ff1-8f21-a36ba8598d00)
### Multi-row insert:
![image](https://github.com/user-
attachments/assets/5b68b44c-1a96-442b-80cd-3a4df0e598ae)
## Solution:
We just needed to traverse the insert values beforehand and create an
array of N integers each representing the index of each
`ast::Expr::Variable`: then later on we can lookup the `value_index` in
that array, and grab the index of matching integer to get  the
parameters index to be bound to the relevant variable.
(yes I know that sounds like a leetcode problem)
Thanks to @jnesss for the write-up for this bug

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1459
2025-05-12 10:00:38 +03:00
PThorpe92
755a19658f Bindings/Go: Fix symbols for rows + stmt getError FFI calls 2025-05-11 15:26:07 -04:00
Piotr Rzysko
d5984445a9 Fix panic on CREATE VIRTUAL TABLE IF NOT EXISTS by halting VM properly
Fixes a runtime panic caused by failing to halt the virtual machine
after executing CREATE VIRTUAL TABLE IF NOT EXISTS.

Previously resulted in:
thread 'main' panicked at core/vdbe/mod.rs:408:52:
index out of bounds: the len is 3 but the index is 3
2025-05-11 21:21:18 +02:00
Piotr Rzysko
fdffbc9534 Ensure virtual table name uniqueness 2025-05-11 21:21:18 +02:00
PThorpe92
ab23f2a24f Add comments and reorganize fix of ordering parameters for insert statements 2025-05-11 14:20:57 -04:00
meteorgan
5185f4bf9e Support isnull and notnull expr 2025-05-11 23:47:30 +08:00
pedrocarlo
9f726dbe62 simplify simple count detection 2025-05-10 22:36:43 -03:00
pedrocarlo
977d09fd36 small fixes 2025-05-10 22:23:01 -03:00