## Problem:
- We have cases where we are evaluating expressions in a hot loop that
could only be evaluated once. For example: `CAST('2025-01-01' as
DATETIME)` -- the value of this never changes, so we should only run it
once.
- We have no robust way of doing this right now for entire _expressions_
-- the only existing facility we have is
`program.mark_last_insn_constant()`, which has no concept of how many
instructions translating a given _expression_ spends, and breaks very
easily for this reason.
## Main ideas of this PR:
- Add `expr.is_constant()` determining whether the expression is
compile-time constant. Tries to be conservative and not deem something
compile-time constant if there is no certainty.
- Whenever we think a compile-time constant expression is about to be
translated into bytecode in `translate_expr()`, start a so called
`constant span`, which means a range of instructions that are part of a
compile-time constant expression.
- At the end of translating the program, all `constant spans` are
hoisted outside of any table loops so they only get evaluated once.
- The target offsets of any jump instructions (e.g. `Goto`) are moved to
the correct place, taking into account all instructions whose offsets
were shifted due to moving the compile-time constant expressions around.
- An escape hatch wrapper `translate_expr_no_constant_opt()` is added
for cases where we should not hoist constants even if we otherwise
could. Right now the only example of this is cases where we are reusing
the same register(s) in multiple iterations of some kind of loop, e.g.
`VALUES(...)` or in the `coalesce()` function implementation.
## Performance effects
Here is an example of a modified/simplified TPC-H query where the
`CAST()` calls were previously run millions of times in a hot loop, but
now they are optimized out of the loop.
**BYTECODE PLAN BEFORE:**
```sql
limbo> explain select
l_orderkey,
3 as revenue,
o_orderdate,
o_shippriority
from
lineitem,
orders,
customer
where
c_mktsegment = 'FURNITURE'
and c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate < cast('1995-03-29' as datetime)
and l_shipdate > cast('1995-03-29' as datetime);
addr opcode p1 p2 p3 p4 p5 comment
---- ----------------- ---- ---- ---- ------------- -- -------
0 Init 0 26 0 0 Start at 26
1 OpenRead 0 10 0 0 table=lineitem, root=10
2 OpenRead 1 9 0 0 table=orders, root=9
3 OpenRead 2 8 0 0 table=customer, root=8
4 Rewind 0 25 0 0 Rewind lineitem
5 Column 0 10 5 0 r[5]=lineitem.l_shipdate
6 String8 0 7 0 1995-03-29 0 r[7]='1995-03-29'
7 Function 0 7 6 cast 0 r[6]=func(r[7..8]) <-- CAST() executed millions of times
8 Le 5 6 24 0 if r[5]<=r[6] goto 24
9 Column 0 0 9 0 r[9]=lineitem.l_orderkey
10 SeekRowid 1 9 24 0 if (r[9]!=orders.rowid) goto 24
11 Column 1 4 10 0 r[10]=orders.o_orderdate
12 String8 0 12 0 1995-03-29 0 r[12]='1995-03-29'
13 Function 0 12 11 cast 0 r[11]=func(r[12..13])
14 Ge 10 11 24 0 if r[10]>=r[11] goto 24
15 Column 1 1 14 0 r[14]=orders.o_custkey
16 SeekRowid 2 14 24 0 if (r[14]!=customer.rowid) goto 24
17 Column 2 6 15 0 r[15]=customer.c_mktsegment
18 Ne 15 16 24 0 if r[15]!=r[16] goto 24
19 Column 0 0 1 0 r[1]=lineitem.l_orderkey
20 Integer 3 2 0 0 r[2]=3
21 Column 1 4 3 0 r[3]=orders.o_orderdate
22 Column 1 7 4 0 r[4]=orders.o_shippriority
23 ResultRow 1 4 0 0 output=r[1..4]
24 Next 0 5 0 0
25 Halt 0 0 0 0
26 Transaction 0 0 0 0 write=false
27 String8 0 8 0 DATETIME 0 r[8]='DATETIME'
28 String8 0 13 0 DATETIME 0 r[13]='DATETIME'
29 String8 0 16 0 FURNITURE 0 r[16]='FURNITURE'
30 Goto 0 1 0
```
**BYTECODE PLAN AFTER**:
```sql
limbo> explain select
l_orderkey,
3 as revenue,
o_orderdate,
o_shippriority
from
lineitem,
orders,
customer
where
c_mktsegment = 'FURNITURE'
and c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate < cast('1995-03-29' as datetime)
and l_shipdate > cast('1995-03-29' as datetime);
addr opcode p1 p2 p3 p4 p5 comment
---- ----------------- ---- ---- ---- ------------- -- -------
0 Init 0 21 0 0 Start at 21
1 OpenRead 0 10 0 0 table=lineitem, root=10
2 OpenRead 1 9 0 0 table=orders, root=9
3 OpenRead 2 8 0 0 table=customer, root=8
4 Rewind 0 20 0 0 Rewind lineitem
5 Column 0 10 5 0 r[5]=lineitem.l_shipdate
6 Le 5 6 19 0 if r[5]<=r[6] goto 19
7 Column 0 0 9 0 r[9]=lineitem.l_orderkey
8 SeekRowid 1 9 19 0 if (r[9]!=orders.rowid) goto 19
9 Column 1 4 10 0 r[10]=orders.o_orderdate
10 Ge 10 11 19 0 if r[10]>=r[11] goto 19
11 Column 1 1 14 0 r[14]=orders.o_custkey
12 SeekRowid 2 14 19 0 if (r[14]!=customer.rowid) goto 19
13 Column 2 6 15 0 r[15]=customer.c_mktsegment
14 Ne 15 16 19 0 if r[15]!=r[16] goto 19
15 Column 0 0 1 0 r[1]=lineitem.l_orderkey
16 Column 1 4 3 0 r[3]=orders.o_orderdate
17 Column 1 7 4 0 r[4]=orders.o_shippriority
18 ResultRow 1 4 0 0 output=r[1..4]
19 Next 0 5 0 0
20 Halt 0 0 0 0
21 Transaction 0 0 0 0 write=false
22 String8 0 7 0 1995-03-29 0 r[7]='1995-03-29'
23 String8 0 8 0 DATETIME 0 r[8]='DATETIME'
24 Function 1 7 6 cast 0 r[6]=func(r[7..8]) <-- CAST() executed twice
25 String8 0 12 0 1995-03-29 0 r[12]='1995-03-29'
26 String8 0 13 0 DATETIME 0 r[13]='DATETIME'
27 Function 1 12 11 cast 0 r[11]=func(r[12..13])
28 String8 0 16 0 FURNITURE 0 r[16]='FURNITURE'
29 Integer 3 2 0 0 r[2]=3
30 Goto 0 1 0 0
```
**EXECUTION RUNTIME BEFORE:**
```sql
limbo> select
l_orderkey,
3 as revenue,
o_orderdate,
o_shippriority
from
lineitem,
orders,
customer
where
c_mktsegment = 'FURNITURE'
and c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate < cast('1995-03-29' as datetime)
and l_shipdate > cast('1995-03-29' as datetime);
┌────────────┬─────────┬─────────────┬────────────────┐
│ l_orderkey │ revenue │ o_orderdate │ o_shippriority │
├────────────┼─────────┼─────────────┼────────────────┤
└────────────┴─────────┴─────────────┴────────────────┘
Command stats:
----------------------------
total: 3.633396667 s (this includes parsing/coloring of cli app)
```
**EXECUTION RUNTIME AFTER:**
```sql
limbo> select
l_orderkey,
3 as revenue,
o_orderdate,
o_shippriority
from
lineitem,
orders,
customer
where
c_mktsegment = 'FURNITURE'
and c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate < cast('1995-03-29' as datetime)
and l_shipdate > cast('1995-03-29' as datetime);
┌────────────┬─────────┬─────────────┬────────────────┐
│ l_orderkey │ revenue │ o_orderdate │ o_shippriority │
├────────────┼─────────┼─────────────┼────────────────┤
└────────────┴─────────┴─────────────┴────────────────┘
Command stats:
----------------------------
total: 2.0923475 s (this includes parsing/coloring of cli app)
````
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#1359
### The problem:
I often need to copy the output of an `Explain` statement to my
clipboard. Currently this is not possible because it currently will only
write to stdout.
All other limbo output, I am able to run `.output file` in the CLI, then
enter my query and in another tmux pane I simply `cat file | xclip -in
-selection clipboard`.
### The solution:
Expose a `statement.explain()` method that returns the query explanation
as a string. If the user uses something like `execute` instead of
prepare, it will default to `stdout` as expected, but this allows the
user to access the query plan on the prepared statement and do with it
what they please.
Closes#1166
This PR adds support for `DROP TABLE` and addresses issue
https://github.com/tursodatabase/limbo/issues/894
It depends on https://github.com/tursodatabase/limbo/pull/785 being
merged in because it requires the implementation of `free_page`.
EDIT: The PR above has been merged.
It adds the following:
* an implementation for the `DropTable` AST instruction via a method
called `translate_drop_table`
* a couple of new instructions - `Destroy` and `DropTable`. The former
is to modify physical b-tree pages and the latter is to modify in-memory
structures like the schema hash table.
* `btree_destroy` on `BTreeCursor` to walk the tree of pages for this
table and place it in free list.
* state machine traversal for both `btree_destroy` and
`clear_overflow_pages` to ensure performant, correct code.
* unit & tcl tests
* modifies the `Null` instruction to follow SQLite semantics and accept
a second register. It will set all registers in this range to null. This
is required for `DROP TABLE`.
The screenshots below have a comparison of the bytecodes generated via
SQLite & Limbo.
Limbo has the same instruction set except for the subroutines which
involve opening an ephemeral table, copying over the triggers from the
`sqlite_schema` table and then re-inserting them back into the
`sqlite_schema` table.
This is because `OpenEphemeral` is still a WIP and is being tracked at
https://github.com/tursodatabase/limbo/pull/768


Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#897
modified the Null instruction to more closely match SQLite semantics. Allows passing in a second register and all registers from r1..r2 area set to null
- right-nested expression can generate multiple labels which needs to be
resolved to next generated instruction
- for example, COALESCE(0, COALESCE(0, 1))
Use knowledge of query plan to inform how much memory to initially
allocate for `ProgramBuilder` vectors
Some of them are exact, some are semi-random estimates
```sql
Prepare `SELECT 1`/Limbo/SELECT 1
time: [756.93 ns 758.11 ns 759.59 ns]
change: [-4.5974% -4.3153% -4.0393%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) low severe
1 (1.00%) low mild
3 (3.00%) high mild
1 (1.00%) high severe
Prepare `SELECT * FROM users LIMIT 1`/Limbo/SELECT * FROM users LIMIT 1
time: [1.4739 µs 1.4769 µs 1.4800 µs]
change: [-7.9364% -7.7171% -7.4979%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
Prepare `SELECT first_name, count(1) FROM users GROUP BY first_name HAVING count(1) > 1 ORDER BY cou...`
time: [3.7440 µs 3.7520 µs 3.7596 µs]
change: [-5.4627% -5.1578% -4.8445%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
```
Closes#899
The main difference between = and != is how null values are handled.
SQLite passes a flag "NULLEQ" to Eq and Ne to disambiguate that.
In the presence of that flag, NULL = NULL.
Some prep work is done to make sure we can pass a flag instead of a
boolean to Eq and Ne. I looked into the bitflags crate but got a bit
scared with the list of dependencies.
Warning:
The following query produces a different result for Limbo:
```
select * from demo where value is null or id == 2;
```
I strongly suspect the issue is with the OR implementation, though. The
bytecode generated is quite different.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#847
Instead of always having the caller specify all instructions, this
work introduces convenience functions into the program builder,
making the code a lot cleaner.
TLDR: no need to call either of:
program.emit_insn_with_label_dependency() -> just call program.emit_insn()
program.defer_label_resolution() -> just call program.resolve_label()
Changes:
- make BranchOffset an explicit enum (Label, Offset, Placeholder)
- remove program.emit_insn_with_label_dependency() - label dependency is automatically detected
- for label to offset mapping, use a hashmap from label(negative i32) to offset (positive u32)
- resolve all labels in program.build()
- remove program.defer_label_resolution() - all labels are resolved in build()