Jussi Saurio
029e5eddde
Fix existing resolve_label() calls to work with new system
2025-04-24 11:05:21 +03:00
Jussi Saurio
e557503091
expr.rs: use constant spans to optimize constant expressions
2025-04-24 11:05:21 +03:00
Jussi Saurio
0f5c791784
vdbe: refactor label resolution to account for insn offsets changing
2025-04-24 11:05:21 +03:00
Jussi Saurio
b4b38bdb3c
vdbe: resolve labels for InitCoroutine::start_offset
2025-04-24 11:05:21 +03:00
Jussi Saurio
47f3f3bda3
vdbe: replace constant_insns with constant_spans
2025-04-24 11:05:21 +03:00
Jussi Saurio
e5bab63522
add expr.is_constant()
2025-04-24 11:05:21 +03:00
Jussi Saurio
5bed331505
add Func::is_deterministic()
2025-04-24 11:05:21 +03:00
Jussi Saurio
b36c898842
rename check_constant() to less confusing name
2025-04-24 11:05:21 +03:00
Jussi Saurio
6ff5ff49b7
Merge 'perf/btree: use binary search for Index seek operations' from Jussi Saurio
...
## Beef
Followup to #1357 which did the same treatment for table btrees only.
After this PR, all of our seeks use binary search for both interior and
leaf pages.
## Perf comparison
using TPC-H 1GB db for this query:
```sql
limbo> explain select count(1) from lineitem join partsupp on l_partkey = ps_partkey and l_suppkey = ps_suppkey;
addr opcode p1 p2 p3 p4 p5 comment
---- ----------------- ---- ---- ---- ------------- -- -------
0 Init 0 18 0 0 Start at 18
1 OpenRead 0 10 0 0 table=lineitem, root=10
2 OpenRead 1 7 0 0 table=sqlite_autoindex_partsupp_1, root=7
3 Rewind 0 14 0 0 Rewind lineitem
4 Column 0 1 2 0 r[2]=lineitem.l_partkey
5 IsNull 2 13 0 0 if (r[2]==NULL) goto 13
6 Column 0 2 3 0 r[3]=lineitem.l_suppkey
7 IsNull 3 13 0 0 if (r[3]==NULL) goto 13
8 SeekGE 1 13 2 0 key=[2..3] <-- index seek here, for every row in lineitem
9 IdxGT 1 13 2 0 key=[2..3]
10 Integer 1 5 0 0 r[5]=1
11 AggStep 0 5 4 count 0 accum=r[4] step(r[5])
12 Next 1 9 0 0
13 Next 0 4 0 0
14 AggFinal 0 4 0 count 0 accum=r[4]
15 Copy 4 1 0 0 r[1]=r[4]
16 ResultRow 1 1 0 0 output=r[1]
17 Halt 0 0 0 0
18 Transaction 0 0 0 0 write=false
19 Goto 0 1 0 0
```
main:
```sql
limbo> select count(1) from lineitem join partsupp on l_partkey = ps_partkey and l_suppkey = ps_suppkey;
┌───────────┐
│ count (1) │
├───────────┤
│ 6001215 │
└───────────┘
Command stats:
----------------------------
total: 40.292102375 s (this includes parsing/coloring of cli app)
```
PR:
```sql
limbo> select count(1) from lineitem join partsupp on l_partkey = ps_partkey and l_suppkey = ps_suppkey;
┌───────────┐
│ count (1) │
├───────────┤
│ 6001215 │
└───────────┘
Command stats:
----------------------------
total: 14.021689916 s (this includes parsing/coloring of cli app)
```
almost 3x faster. buzzkill: still 3x slower than sqlite :)
Closes #1387
2025-04-24 10:53:35 +03:00
Jussi Saurio
c88c579154
Merge 'expr.is_nonnull(): return true if col.primary_key || col.notnull' from Jussi Saurio
...
This avoids redundant `IsNull` instructions during index seeks if the
seek key columns are primary keys of other tables, which they often are.
Reviewed-by: Preston Thorpe (@PThorpe92)
Closes #1388
2025-04-24 10:32:00 +03:00
Jussi Saurio
c09e4d1d38
Merge 'Numeric Types Overhaul' from Levy A.
...
### Summary
- Sqlite compatible string to float conversion
- Accompanied with the new `cast_real` fuzz target
- `NonNan` wrapper type over `f64`
- Now we can guarantee that operations that can make result in a NaN
need to be handled
- `Numeric` and `NullableInteger` types that encapsulate all numeric
and bitwise operations
- This is now guaranteed to be 100% compatible with sqlite with the
`expression` fuzz target (with the exception of the commented out
operation that will be implemented in a later PR)
One thing that might be reworked here is the heavy use of traits and
operator overloading, but looks reasonable to me.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com >
Closes #1386
2025-04-23 18:34:32 +03:00
Jussi Saurio
9e1f15c679
Merge 'python: add UV project for 'scripts'' from Jussi Saurio
...
mainly so i don't have to install pygithub every time i want to `uv run
scripts/merge-pr.py`
Closes #1385
2025-04-23 18:33:57 +03:00
Jussi Saurio
a7488496d5
expr.is_nonnull(): return true if col.primary_key || col.notnull
2025-04-23 18:10:33 +03:00
Jussi Saurio
af703110f8
btree: remove extra iter_dir argument that can be derived from seek_op
2025-04-23 17:38:48 +03:00
Jussi Saurio
044339efc7
btree: rename tablebtree_move_to_binsearch -> tablebtree_move_to
2025-04-23 17:35:22 +03:00
Jussi Saurio
8c338438dd
btree: use binary search for index interior cell seek
2025-04-23 17:34:32 +03:00
Jussi Saurio
7a133f422f
btree: use binary search for index leaves
2025-04-23 17:34:32 +03:00
Jussi Saurio
8743dcd0da
btree: extract indexbtree_seek() into a function like tablebtree_seek()
2025-04-23 17:34:32 +03:00
Jussi Saurio
48071b7ad7
tests/fuzz/compound_index_seek: order select cols by definition order
2025-04-23 17:34:32 +03:00
Jussi Saurio
517390a4ea
tests/fuzz/compound_index_seek: show which table had failed query
2025-04-23 16:57:43 +03:00
Levy A.
8ff906e353
fix: decrease even more nested operations
...
this is a worrying trend
2025-04-23 10:15:49 -03:00
Levy A.
613a332e99
doc: add doc for DoubleDouble
2025-04-23 10:13:32 -03:00
Levy A.
2cbb59e3f9
refactor: renaming and better types
2025-04-23 09:53:37 -03:00
Levy A.
ed27f22e2f
comment out incompatible operations
2025-04-23 08:34:58 -03:00
Levy A.
f1ee92bf2d
numeric types overhaul
2025-04-23 08:34:58 -03:00
Jussi Saurio
3bbd443286
python: add UV project for 'scripts'
...
mainly so i don't have to install pygithub every time i want to
`uv run scripts/merge-pr.py`
2025-04-23 10:32:38 +03:00
Jussi Saurio
fd2b274556
Merge 'Python script to compare vfs performance' from Preston Thorpe
...
This PR adds a python script that uses the `TestLimboShell` setup to run
some semi naive benchmarks/comparisons against `io_uring` and `syscall`
IO back-ends.
### Usage:
```sh
make bench-vfs SQL="insert into products (name, price) values ('testing', randomblob(1024*4));" N=50
```
The script will execute the given `SQL` `N` times with each back-end,
get the average/mean and display them.

😬
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com >
Closes #1377
2025-04-23 10:25:56 +03:00
Preston Thorpe
e1d9bfc792
Merge branch 'main' into bench_vfs
2025-04-22 21:36:07 -04:00
Pekka Enberg
fc5099e2ef
antithesis: Enable RUST_BACKTRACE for workload
2025-04-22 13:01:11 +03:00
Pekka Enberg
beaccae664
Merge 'Create an automatic ephemeral index when a nested table scan would otherwise be selected' from Jussi Saurio
...
Closes #747
- Creates an automatic ephemeral (in-memory) index on the right-side
table of a join if otherwise a nested table scan would be selected.
- This behavior is not hardcoded; instead this PR introduces a (quite
dumb) cost estimator that naturally deincentivizes building ephemeral
indexes where they don't make sense (e.g. the outermost table). I will
probably build this estimator to be smarter in the future when working
on join reordering optimizations
### Example bytecode plans and runtimes (note that this is debug mode)
Example query with no persistent indexes to choose from. Without
ephemeral index it's a nested scan:
```sql
limbo> explain select * from t1 natural join t2;
addr opcode p1 p2 p3 p4 p5 comment
---- ----------------- ---- ---- ---- ------------- -- -------
0 Init 0 13 0 0 Start at 13
1 OpenRead 0 2 0 0 table=t1, root=2
2 OpenRead 1 3 0 0 table=t2, root=3
3 Rewind 0 12 0 0 Rewind t1
4 Rewind 1 11 0 0 Rewind t2
5 Column 0 0 2 0 r[2]=t1.a
6 Column 1 0 3 0 r[3]=t2.a
7 Ne 2 3 10 0 if r[2]!=r[3] goto 10
8 Column 0 0 1 0 r[1]=t1.a
9 ResultRow 1 1 0 0 output=r[1]
10 Next 1 5 0 0
11 Next 0 4 0 0
12 Halt 0 0 0 0
13 Transaction 0 0 0 0 write=false
14 Goto 0 1 0 0
limbo> .timer on
limbo> select * from t1 natural join t2;
┌───┐
│ a │
├───┤
└───┘
Command stats:
----------------------------
total: 953 ms (this includes parsing/coloring of cli app)
```
Same query with autoindexing enabled:
```sql
limbo> explain select * from t1 natural join t2;
addr opcode p1 p2 p3 p4 p5 comment
---- ----------------- ---- ---- ---- ------------- -- -------
0 Init 0 22 0 0 Start at 22
1 OpenRead 0 2 0 0 table=t1, root=2
2 OpenRead 1 3 0 0 table=t2, root=3
3 Rewind 0 21 0 0 Rewind t1
4 Once 12 0 0 0 goto 12 # execute block 5-11 only once, on subsequent iters jump straight to 12
5 OpenAutoindex 3 0 0 0 cursor=3
6 Rewind 1 12 0 0 Rewind t2 # open source table for ephemeral index
7 Column 1 0 2 0 r[2]=t2.a
8 RowId 1 3 0 0 r[3]=t2.rowid
9 MakeRecord 2 2 4 0 r[4]=mkrec(r[2..3])
10 IdxInsert 3 4 2 0 key=r[4] # insert stuff to ephemeral index
11 Next 1 7 0 0
12 Column 0 0 5 0 r[5]=t1.a
13 IsNull 5 20 0 0 if (r[5]==NULL) goto 20
14 SeekGE 3 20 5 0 key=[5..5] # perform seek on ephemeral index
15 IdxGT 3 20 5 0 key=[5..5]
16 DeferredSeek 3 1 0 0
17 Column 0 0 1 0 r[1]=t1.a
18 ResultRow 1 1 0 0 output=r[1]
19 Next 2 15 0 0
20 Next 0 4 0 0
21 Halt 0 0 0 0
22 Transaction 0 0 0 0 write=false
23 Goto 0 1 0 0
limbo> .timer on
limbo> select * from t1 natural join t2;
┌───┐
│ a │
├───┤
└───┘
Command stats:
----------------------------
total: 220 ms (this includes parsing/coloring of cli app)
```
Closes #1356
2025-04-22 13:00:06 +03:00
Pekka Enberg
936365a44e
Update README.md
2025-04-22 12:11:23 +03:00
Pekka Enberg
c2cf4756ef
Update README.md
2025-04-22 12:10:02 +03:00
Pekka Enberg
d92fb75262
Merge 'Fix incorrect between expression documentation' from Pedro Muniz
...
I was reading through the `translate_expr` function and `COMPAT.md` to
see what was not implemented yet. I saw that `Expr::Between` was marked
as a `todo!` so I set trying to implement it only to find that it was
being rewritten in the optimizer haha. This PR just adjusts the docs and
add an `unreachable` in the appropriate locations.
Closes #1378
2025-04-22 11:56:01 +03:00
Pekka Enberg
e41bf3993a
Merge 'bindings/rust: Add Statement.columns() support' from Timo Kösters
...
This PR adds the statement.columns() function, inspired from Rusqlite: h
ttps://docs.rs/rusqlite/latest/rusqlite/struct.Statement.html#method.col
umns
Note that the rusqlite documentation says
> If associated DB schema can be altered concurrently, you should make
sure that current statement has already been stepped once before calling
this method.
Do we have this requirement as well?
The first commit is just the rust binding. The second commit implements
the column name for the rowid column.
Closes #1376
2025-04-22 10:52:25 +03:00
Pekka Enberg
7308f6d6e8
Merge 'Bump julian_day_converter to 0.4.5' from meteorgan
...
The previous version of `julian_day-converter` had precision issues,
potentially causing loss of precision when converting between
`julianday` and `datetime`

Reviewed-by: Diego Reis (@diegoreis42)
Closes #1344
2025-04-22 10:48:36 +03:00
Timo Kösters
68d8b86bb7
fix: get name of rowid column
2025-04-22 08:46:37 +02:00
Pekka Enberg
094fd0e211
Add TPC-H instructions to PERF.md
2025-04-22 09:46:16 +03:00
pedrocarlo
1928dcfa10
Correct docs regarding between
2025-04-21 23:05:01 -03:00
PThorpe92
2e33ce6896
Add release build to bench vfs in makefile to ensure there is an exec target
2025-04-21 12:31:38 -04:00
PThorpe92
f180de4d95
Write quick note about vfs benchmark script in PERF.md
2025-04-21 12:24:18 -04:00
PThorpe92
9bbd6a3a7f
Add vfs bench to testing pyproject.toml
2025-04-21 12:23:06 -04:00
PThorpe92
2037fbeba5
Add bench-vfs command to makefile
2025-04-21 12:22:40 -04:00
PThorpe92
7f170756ae
Add python script to benchmark vfs against eachother
2025-04-21 12:22:20 -04:00
Jussi Saurio
f256fb46fd
remove print spam from index insert
2025-04-21 14:59:13 +03:00
Jussi Saurio
3b44b269a3
optimizer: try to build ephemeral index to avoid nested table scan
2025-04-21 14:59:13 +03:00
Jussi Saurio
6924424f11
optimizer: add highly unintelligent heuristics-based cost estimation
2025-04-21 14:59:13 +03:00
Jussi Saurio
a50fa03d24
optimizer: allow calling try_extract_index... without any persistent indexes
2025-04-21 14:59:13 +03:00
Jussi Saurio
af21f60887
translate/main_loop: create autoindex when index.ephemeral=true
2025-04-21 14:59:13 +03:00
Jussi Saurio
c1b2dfc32b
TableReference: add method column_is_used()
2025-04-21 14:59:13 +03:00
Jussi Saurio
09ad6d8f01
vdbe: resolve labels for Insn::Once
2025-04-21 14:59:13 +03:00