Commit Graph

4134 Commits

Author SHA1 Message Date
Jussi Saurio
029e5eddde Fix existing resolve_label() calls to work with new system 2025-04-24 11:05:21 +03:00
Jussi Saurio
e557503091 expr.rs: use constant spans to optimize constant expressions 2025-04-24 11:05:21 +03:00
Jussi Saurio
0f5c791784 vdbe: refactor label resolution to account for insn offsets changing 2025-04-24 11:05:21 +03:00
Jussi Saurio
b4b38bdb3c vdbe: resolve labels for InitCoroutine::start_offset 2025-04-24 11:05:21 +03:00
Jussi Saurio
47f3f3bda3 vdbe: replace constant_insns with constant_spans 2025-04-24 11:05:21 +03:00
Jussi Saurio
e5bab63522 add expr.is_constant() 2025-04-24 11:05:21 +03:00
Jussi Saurio
5bed331505 add Func::is_deterministic() 2025-04-24 11:05:21 +03:00
Jussi Saurio
b36c898842 rename check_constant() to less confusing name 2025-04-24 11:05:21 +03:00
Jussi Saurio
6ff5ff49b7 Merge 'perf/btree: use binary search for Index seek operations' from Jussi Saurio
## Beef
Followup to #1357 which did the same treatment for table btrees only.
After this PR, all of our seeks use binary search for both interior and
leaf pages.
## Perf comparison
using TPC-H 1GB db for this query:
```sql
limbo> explain select count(1) from lineitem join partsupp on l_partkey = ps_partkey and l_suppkey = ps_suppkey;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     18    0                    0   Start at 18
1     OpenRead           0     10    0                    0   table=lineitem, root=10
2     OpenRead           1     7     0                    0   table=sqlite_autoindex_partsupp_1, root=7
3     Rewind             0     14    0                    0   Rewind lineitem
4       Column           0     1     2                    0   r[2]=lineitem.l_partkey
5       IsNull           2     13    0                    0   if (r[2]==NULL) goto 13
6       Column           0     2     3                    0   r[3]=lineitem.l_suppkey
7       IsNull           3     13    0                    0   if (r[3]==NULL) goto 13
8       SeekGE           1     13    2                    0   key=[2..3] <-- index seek here, for every row in lineitem
9         IdxGT          1     13    2                    0   key=[2..3]
10        Integer        1     5     0                    0   r[5]=1
11        AggStep        0     5     4     count          0   accum=r[4] step(r[5])
12      Next             1     9     0                    0
13    Next               0     4     0                    0
14    AggFinal           0     4     0     count          0   accum=r[4]
15    Copy               4     1     0                    0   r[1]=r[4]
16    ResultRow          1     1     0                    0   output=r[1]
17    Halt               0     0     0                    0
18    Transaction        0     0     0                    0   write=false
19    Goto               0     1     0                    0
```
main:
```sql
limbo> select count(1) from lineitem join partsupp on l_partkey = ps_partkey and l_suppkey = ps_suppkey;
┌───────────┐
│ count (1) │
├───────────┤
│   6001215 │
└───────────┘
Command stats:
----------------------------
total: 40.292102375 s (this includes parsing/coloring of cli app)
```
PR:
```sql
limbo> select count(1) from lineitem join partsupp on l_partkey = ps_partkey and l_suppkey = ps_suppkey;
┌───────────┐
│ count (1) │
├───────────┤
│   6001215 │
└───────────┘
Command stats:
----------------------------
total: 14.021689916 s (this includes parsing/coloring of cli app)
```
almost 3x faster. buzzkill: still 3x slower than sqlite :)

Closes #1387
2025-04-24 10:53:35 +03:00
Jussi Saurio
c88c579154 Merge 'expr.is_nonnull(): return true if col.primary_key || col.notnull' from Jussi Saurio
This avoids redundant `IsNull` instructions during index seeks if the
seek key columns are primary keys of other tables, which they often are.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1388
2025-04-24 10:32:00 +03:00
Jussi Saurio
c09e4d1d38 Merge 'Numeric Types Overhaul' from Levy A.
### Summary
  - Sqlite compatible string to float conversion
    - Accompanied with the new `cast_real` fuzz target
  - `NonNan` wrapper type over `f64`
    - Now we can guarantee that operations that can make result in a NaN
need to be handled
  - `Numeric` and `NullableInteger` types that encapsulate all numeric
and bitwise operations
    - This is now guaranteed to be 100% compatible with sqlite with the
`expression` fuzz target (with the exception of the commented out
operation that will be implemented in a later PR)
One thing that might be reworked here is the heavy use of traits and
operator overloading, but looks reasonable to me.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1386
2025-04-23 18:34:32 +03:00
Jussi Saurio
9e1f15c679 Merge 'python: add UV project for 'scripts'' from Jussi Saurio
mainly so i don't have to install pygithub every time i want to `uv run
scripts/merge-pr.py`

Closes #1385
2025-04-23 18:33:57 +03:00
Jussi Saurio
a7488496d5 expr.is_nonnull(): return true if col.primary_key || col.notnull 2025-04-23 18:10:33 +03:00
Jussi Saurio
af703110f8 btree: remove extra iter_dir argument that can be derived from seek_op 2025-04-23 17:38:48 +03:00
Jussi Saurio
044339efc7 btree: rename tablebtree_move_to_binsearch -> tablebtree_move_to 2025-04-23 17:35:22 +03:00
Jussi Saurio
8c338438dd btree: use binary search for index interior cell seek 2025-04-23 17:34:32 +03:00
Jussi Saurio
7a133f422f btree: use binary search for index leaves 2025-04-23 17:34:32 +03:00
Jussi Saurio
8743dcd0da btree: extract indexbtree_seek() into a function like tablebtree_seek() 2025-04-23 17:34:32 +03:00
Jussi Saurio
48071b7ad7 tests/fuzz/compound_index_seek: order select cols by definition order 2025-04-23 17:34:32 +03:00
Jussi Saurio
517390a4ea tests/fuzz/compound_index_seek: show which table had failed query 2025-04-23 16:57:43 +03:00
Levy A.
8ff906e353 fix: decrease even more nested operations
this is a worrying trend
2025-04-23 10:15:49 -03:00
Levy A.
613a332e99 doc: add doc for DoubleDouble 2025-04-23 10:13:32 -03:00
Levy A.
2cbb59e3f9 refactor: renaming and better types 2025-04-23 09:53:37 -03:00
Levy A.
ed27f22e2f comment out incompatible operations 2025-04-23 08:34:58 -03:00
Levy A.
f1ee92bf2d numeric types overhaul 2025-04-23 08:34:58 -03:00
Jussi Saurio
3bbd443286 python: add UV project for 'scripts'
mainly so i don't have to install pygithub every time i want to
`uv run scripts/merge-pr.py`
2025-04-23 10:32:38 +03:00
Jussi Saurio
fd2b274556 Merge 'Python script to compare vfs performance' from Preston Thorpe
This PR adds a python script that uses the `TestLimboShell` setup to run
some semi naive benchmarks/comparisons against `io_uring` and `syscall`
IO back-ends.
### Usage:
```sh
make bench-vfs SQL="insert into products (name, price) values ('testing', randomblob(1024*4));" N=50
```
The script will execute the given `SQL` `N` times with each back-end,
get the average/mean and display them.
![image](https://github.com/user-
attachments/assets/b2399196-dbdd-4b98-8210-536e68979edd)
😬

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1377
2025-04-23 10:25:56 +03:00
Preston Thorpe
e1d9bfc792 Merge branch 'main' into bench_vfs 2025-04-22 21:36:07 -04:00
Pekka Enberg
fc5099e2ef antithesis: Enable RUST_BACKTRACE for workload 2025-04-22 13:01:11 +03:00
Pekka Enberg
beaccae664 Merge 'Create an automatic ephemeral index when a nested table scan would otherwise be selected' from Jussi Saurio
Closes #747
- Creates an automatic ephemeral (in-memory) index on the right-side
table of a join if otherwise a nested table scan would be selected.
- This behavior is not hardcoded; instead this PR introduces a (quite
dumb) cost estimator that naturally deincentivizes building ephemeral
indexes where they don't make sense (e.g. the outermost table). I will
probably build this estimator to be smarter in the future when working
on join reordering optimizations
### Example bytecode plans and runtimes (note that this is debug mode)
Example query with no persistent indexes to choose from. Without
ephemeral index it's a nested scan:
```sql
limbo> explain select * from t1 natural join t2;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     13    0                    0   Start at 13
1     OpenRead           0     2     0                    0   table=t1, root=2
2     OpenRead           1     3     0                    0   table=t2, root=3
3     Rewind             0     12    0                    0   Rewind t1
4       Rewind           1     11    0                    0   Rewind t2
5         Column         0     0     2                    0   r[2]=t1.a
6         Column         1     0     3                    0   r[3]=t2.a
7         Ne             2     3     10                   0   if r[2]!=r[3] goto 10
8         Column         0     0     1                    0   r[1]=t1.a
9         ResultRow      1     1     0                    0   output=r[1]
10      Next             1     5     0                    0
11    Next               0     4     0                    0
12    Halt               0     0     0                    0
13    Transaction        0     0     0                    0   write=false
14    Goto               0     1     0                    0

limbo> .timer on
limbo> select * from t1 natural join t2;
┌───┐
│ a │
├───┤
└───┘
Command stats:
----------------------------
total: 953 ms (this includes parsing/coloring of cli app)
```
Same query with autoindexing enabled:
```sql
limbo> explain select * from t1 natural join t2;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     22    0                    0   Start at 22
1     OpenRead           0     2     0                    0   table=t1, root=2
2     OpenRead           1     3     0                    0   table=t2, root=3
3     Rewind             0     21    0                    0   Rewind t1
4       Once             12    0     0                    0   goto 12 # execute block 5-11 only once, on subsequent iters jump straight to 12
5       OpenAutoindex    3     0     0                    0   cursor=3
6       Rewind           1     12    0                    0   Rewind t2 # open source table for ephemeral index
7         Column         1     0     2                    0   r[2]=t2.a
8         RowId          1     3     0                    0   r[3]=t2.rowid
9         MakeRecord     2     2     4                    0   r[4]=mkrec(r[2..3])
10        IdxInsert      3     4     2                    0   key=r[4] # insert stuff to ephemeral index
11      Next             1     7     0                    0
12      Column           0     0     5                    0   r[5]=t1.a
13      IsNull           5     20    0                    0   if (r[5]==NULL) goto 20
14      SeekGE           3     20    5                    0   key=[5..5] # perform seek on ephemeral index
15        IdxGT          3     20    5                    0   key=[5..5]
16        DeferredSeek   3     1     0                    0
17        Column         0     0     1                    0   r[1]=t1.a
18        ResultRow      1     1     0                    0   output=r[1]
19      Next             2     15    0                    0
20    Next               0     4     0                    0
21    Halt               0     0     0                    0
22    Transaction        0     0     0                    0   write=false
23    Goto               0     1     0                    0

limbo> .timer on
limbo> select * from t1 natural join t2;
┌───┐
│ a │
├───┤
└───┘
Command stats:
----------------------------
total: 220 ms (this includes parsing/coloring of cli app)
```

Closes #1356
2025-04-22 13:00:06 +03:00
Pekka Enberg
936365a44e Update README.md 2025-04-22 12:11:23 +03:00
Pekka Enberg
c2cf4756ef Update README.md 2025-04-22 12:10:02 +03:00
Pekka Enberg
d92fb75262 Merge 'Fix incorrect between expression documentation' from Pedro Muniz
I was reading through the `translate_expr` function and `COMPAT.md` to
see what was not implemented yet. I saw that `Expr::Between` was marked
as a `todo!` so I set trying to implement it only to find that it was
being rewritten in the optimizer haha. This PR just adjusts the docs and
add an `unreachable` in the appropriate locations.

Closes #1378
2025-04-22 11:56:01 +03:00
Pekka Enberg
e41bf3993a Merge 'bindings/rust: Add Statement.columns() support' from Timo Kösters
This PR adds the statement.columns() function, inspired from Rusqlite: h
ttps://docs.rs/rusqlite/latest/rusqlite/struct.Statement.html#method.col
umns
Note that the rusqlite documentation says
> If associated DB schema can be altered concurrently, you should make
sure that current statement has already been stepped once before calling
this method.
Do we have this requirement as well?
The first commit is just the rust binding. The second commit implements
the column name for the rowid column.

Closes #1376
2025-04-22 10:52:25 +03:00
Pekka Enberg
7308f6d6e8 Merge 'Bump julian_day_converter to 0.4.5' from meteorgan
The previous version of `julian_day-converter` had precision issues,
potentially causing loss of precision when converting between
`julianday` and `datetime`
![image](https://github.com/user-
attachments/assets/84042ca3-28cc-4020-a248-714df6298791)

Reviewed-by: Diego Reis (@diegoreis42)

Closes #1344
2025-04-22 10:48:36 +03:00
Timo Kösters
68d8b86bb7 fix: get name of rowid column 2025-04-22 08:46:37 +02:00
Pekka Enberg
094fd0e211 Add TPC-H instructions to PERF.md 2025-04-22 09:46:16 +03:00
pedrocarlo
1928dcfa10 Correct docs regarding between 2025-04-21 23:05:01 -03:00
PThorpe92
2e33ce6896 Add release build to bench vfs in makefile to ensure there is an exec target 2025-04-21 12:31:38 -04:00
PThorpe92
f180de4d95 Write quick note about vfs benchmark script in PERF.md 2025-04-21 12:24:18 -04:00
PThorpe92
9bbd6a3a7f Add vfs bench to testing pyproject.toml 2025-04-21 12:23:06 -04:00
PThorpe92
2037fbeba5 Add bench-vfs command to makefile 2025-04-21 12:22:40 -04:00
PThorpe92
7f170756ae Add python script to benchmark vfs against eachother 2025-04-21 12:22:20 -04:00
Jussi Saurio
f256fb46fd remove print spam from index insert 2025-04-21 14:59:13 +03:00
Jussi Saurio
3b44b269a3 optimizer: try to build ephemeral index to avoid nested table scan 2025-04-21 14:59:13 +03:00
Jussi Saurio
6924424f11 optimizer: add highly unintelligent heuristics-based cost estimation 2025-04-21 14:59:13 +03:00
Jussi Saurio
a50fa03d24 optimizer: allow calling try_extract_index... without any persistent indexes 2025-04-21 14:59:13 +03:00
Jussi Saurio
af21f60887 translate/main_loop: create autoindex when index.ephemeral=true 2025-04-21 14:59:13 +03:00
Jussi Saurio
c1b2dfc32b TableReference: add method column_is_used() 2025-04-21 14:59:13 +03:00
Jussi Saurio
09ad6d8f01 vdbe: resolve labels for Insn::Once 2025-04-21 14:59:13 +03:00