Commit Graph

4155 Commits

Author SHA1 Message Date
Pekka Enberg
117dbe6c8c Fix Antithesis Docker file some more 2025-04-24 21:12:40 +03:00
Pekka Enberg
fa5d6dcf6b Fix Antithesis Docker file 2025-04-24 21:03:19 +03:00
Pekka Enberg
31677c9c94 scripts/antithesis: Build Docker image for x86-64 2025-04-24 20:55:30 +03:00
Pekka Enberg
2a5eb8e5bc stress: Make Clippy happy 2025-04-24 20:46:26 +03:00
Pekka Enberg
ebc2e475b6 Merge 'Add Antithesis Tests' from eric-dinh-antithesis
This PR adds 2 autonomous test suites for use with Antithesis: `bank-
test` and `stress-composer`. It also modifies the existing
`limbo_stress` test to run as a singleton and modifies other Antithesis-
related configuration files.
**bank-test**
- `first_setup.py`
  - initializes a DB table with columns accounts and balance
  - generates random balances for each account
  - stores initial state of the table
- `parallel_driver_generate_transaction.py`
  - selects 2 accounts from the table as sender and receiver
  - generates a random value which is subtracted from sender and added
to receiver
- `anytime/eventually/finally_validate.py`
  - checks that sum of initial balances == sum of current balances
**stress-composer**
- Breaks `limbo_stress` into component parts
- `first_setup.py`
  - creates up to 10 tables with up to 10 columns
  - stores table details in a separate db
- `parallel_driver_insert.py`
  - randomly generates and executes up to 100 insert statements into a
single table using random values derived from the table details
- `parallel_driver_update.py`
  - randomly generates and executes up to 100 updates into a single
table using random values derived from the table details
- `parallel_driver_delete.py`
  - randomly generates and executes up to 100 deletes from a single
table using random values derived from the table details

Closes #1401
2025-04-24 20:44:42 +03:00
eric-dinh-antithesis
27e15364c4 stress: suppress logfile since it's too big 2025-04-24 12:27:58 -04:00
eric-dinh-antithesis
b8885777dc stress: move sdk setup_complete from limbo_stress to docker-entrypoint 2025-04-24 12:27:05 -04:00
eric-dinh-antithesis
75ae5dbd13 stress: update docker-compose 2025-04-24 12:26:00 -04:00
eric-dinh-antithesis
8390233b99 Dockerfile.antithesis: update limbo_stress build step 2025-04-24 12:25:19 -04:00
eric-dinh-antithesis
5953d32e4d Dockerfile.antithesis: add symbols for rust, cataloging for python, and antithesis tests to image, update entrypoint 2025-04-24 12:24:44 -04:00
eric-dinh-antithesis
62e2745c3c Dockerfile.antithesis: install dependencies 2025-04-24 12:23:22 -04:00
eric-dinh-antithesis
364a78b270 Cargo.toml: add profile for antithesis builds for full debug 2025-04-24 12:22:03 -04:00
eric-dinh-antithesis
f993a22023 antithesis-tests: add all tests 2025-04-24 12:20:41 -04:00
Jussi Saurio
80d39929ad Merge 'types: refactor serialtype again to make it faster' from Jussi Saurio
basically serialtype got slower in #1398, maybe because of the wasted
space of `enum SerialType` being 16 bytes, so i've now refactored
`SerialType` to be a transparent newtype wrapper over `u64` and
introduced a separate `SerialTypeKind` enum
at least on my machine the perf regression was nullified, if not even a
bit better

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1399
2025-04-24 18:59:31 +03:00
Jussi Saurio
7921d7c2e0 types: refactor serialtype again to make it faster 2025-04-24 17:28:31 +03:00
Jussi Saurio
2ffeefe165 Merge 'core/types: remove duplicate serialtype implementation' from Jussi Saurio
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1398
2025-04-24 16:17:17 +03:00
Jussi Saurio
04adf8242a faster validate 2025-04-24 16:05:12 +03:00
Jussi Saurio
af6a783f4d core/types: remove duplicate serialtype implementation 2025-04-24 15:38:47 +03:00
Jussi Saurio
0c800524af Merge 'Bugfix: Explain command should display syntax errors in CLI' from Anton Harniakou
Closes #1392

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1396
2025-04-24 15:11:59 +03:00
Anton Harniakou
fdf3dd9796 Bugfix: Explain command should display syntax errors in CLI
Closes #1392
2025-04-24 13:25:00 +03:00
Jussi Saurio
dc3e97887f Merge 'replace vec with array in btree balancing' from Lâm Hoàng Phúc
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1354
2025-04-24 11:22:07 +03:00
Jussi Saurio
2e8042510e Merge 'Pragma page size reading' from Anton Harniakou
1) Fix a bug where cli pretty mode would not print pragma results;
2) Add ability to read page_size using PRAGMA page_size;

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1394
2025-04-24 11:08:55 +03:00
Jussi Saurio
6ff5ff49b7 Merge 'perf/btree: use binary search for Index seek operations' from Jussi Saurio
## Beef
Followup to #1357 which did the same treatment for table btrees only.
After this PR, all of our seeks use binary search for both interior and
leaf pages.
## Perf comparison
using TPC-H 1GB db for this query:
```sql
limbo> explain select count(1) from lineitem join partsupp on l_partkey = ps_partkey and l_suppkey = ps_suppkey;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     18    0                    0   Start at 18
1     OpenRead           0     10    0                    0   table=lineitem, root=10
2     OpenRead           1     7     0                    0   table=sqlite_autoindex_partsupp_1, root=7
3     Rewind             0     14    0                    0   Rewind lineitem
4       Column           0     1     2                    0   r[2]=lineitem.l_partkey
5       IsNull           2     13    0                    0   if (r[2]==NULL) goto 13
6       Column           0     2     3                    0   r[3]=lineitem.l_suppkey
7       IsNull           3     13    0                    0   if (r[3]==NULL) goto 13
8       SeekGE           1     13    2                    0   key=[2..3] <-- index seek here, for every row in lineitem
9         IdxGT          1     13    2                    0   key=[2..3]
10        Integer        1     5     0                    0   r[5]=1
11        AggStep        0     5     4     count          0   accum=r[4] step(r[5])
12      Next             1     9     0                    0
13    Next               0     4     0                    0
14    AggFinal           0     4     0     count          0   accum=r[4]
15    Copy               4     1     0                    0   r[1]=r[4]
16    ResultRow          1     1     0                    0   output=r[1]
17    Halt               0     0     0                    0
18    Transaction        0     0     0                    0   write=false
19    Goto               0     1     0                    0
```
main:
```sql
limbo> select count(1) from lineitem join partsupp on l_partkey = ps_partkey and l_suppkey = ps_suppkey;
┌───────────┐
│ count (1) │
├───────────┤
│   6001215 │
└───────────┘
Command stats:
----------------------------
total: 40.292102375 s (this includes parsing/coloring of cli app)
```
PR:
```sql
limbo> select count(1) from lineitem join partsupp on l_partkey = ps_partkey and l_suppkey = ps_suppkey;
┌───────────┐
│ count (1) │
├───────────┤
│   6001215 │
└───────────┘
Command stats:
----------------------------
total: 14.021689916 s (this includes parsing/coloring of cli app)
```
almost 3x faster. buzzkill: still 3x slower than sqlite :)

Closes #1387
2025-04-24 10:53:35 +03:00
Anton Harniakou
51fc1773ea Fix missing documentation warning; improve the documentation message 2025-04-24 10:36:23 +03:00
Jussi Saurio
c88c579154 Merge 'expr.is_nonnull(): return true if col.primary_key || col.notnull' from Jussi Saurio
This avoids redundant `IsNull` instructions during index seeks if the
seek key columns are primary keys of other tables, which they often are.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1388
2025-04-24 10:32:00 +03:00
Anton Harniakou
0a69ea0138 Support reading db page size using PRAGMA page_size 2025-04-24 10:12:02 +03:00
Jussi Saurio
c09e4d1d38 Merge 'Numeric Types Overhaul' from Levy A.
### Summary
  - Sqlite compatible string to float conversion
    - Accompanied with the new `cast_real` fuzz target
  - `NonNan` wrapper type over `f64`
    - Now we can guarantee that operations that can make result in a NaN
need to be handled
  - `Numeric` and `NullableInteger` types that encapsulate all numeric
and bitwise operations
    - This is now guaranteed to be 100% compatible with sqlite with the
`expression` fuzz target (with the exception of the commented out
operation that will be implemented in a later PR)
One thing that might be reworked here is the heavy use of traits and
operator overloading, but looks reasonable to me.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1386
2025-04-23 18:34:32 +03:00
Jussi Saurio
9e1f15c679 Merge 'python: add UV project for 'scripts'' from Jussi Saurio
mainly so i don't have to install pygithub every time i want to `uv run
scripts/merge-pr.py`

Closes #1385
2025-04-23 18:33:57 +03:00
Jussi Saurio
a7488496d5 expr.is_nonnull(): return true if col.primary_key || col.notnull 2025-04-23 18:10:33 +03:00
Jussi Saurio
af703110f8 btree: remove extra iter_dir argument that can be derived from seek_op 2025-04-23 17:38:48 +03:00
Jussi Saurio
044339efc7 btree: rename tablebtree_move_to_binsearch -> tablebtree_move_to 2025-04-23 17:35:22 +03:00
Jussi Saurio
8c338438dd btree: use binary search for index interior cell seek 2025-04-23 17:34:32 +03:00
Jussi Saurio
7a133f422f btree: use binary search for index leaves 2025-04-23 17:34:32 +03:00
Jussi Saurio
8743dcd0da btree: extract indexbtree_seek() into a function like tablebtree_seek() 2025-04-23 17:34:32 +03:00
Jussi Saurio
48071b7ad7 tests/fuzz/compound_index_seek: order select cols by definition order 2025-04-23 17:34:32 +03:00
Jussi Saurio
517390a4ea tests/fuzz/compound_index_seek: show which table had failed query 2025-04-23 16:57:43 +03:00
Anton Harniakou
5c18c1c57a Draw table if it contains any row
Some table can be headerless, for example results of PRAGMA calls
2025-04-23 16:36:43 +03:00
Levy A.
8ff906e353 fix: decrease even more nested operations
this is a worrying trend
2025-04-23 10:15:49 -03:00
Levy A.
613a332e99 doc: add doc for DoubleDouble 2025-04-23 10:13:32 -03:00
Levy A.
2cbb59e3f9 refactor: renaming and better types 2025-04-23 09:53:37 -03:00
Levy A.
ed27f22e2f comment out incompatible operations 2025-04-23 08:34:58 -03:00
Levy A.
f1ee92bf2d numeric types overhaul 2025-04-23 08:34:58 -03:00
Jussi Saurio
3bbd443286 python: add UV project for 'scripts'
mainly so i don't have to install pygithub every time i want to
`uv run scripts/merge-pr.py`
2025-04-23 10:32:38 +03:00
Jussi Saurio
fd2b274556 Merge 'Python script to compare vfs performance' from Preston Thorpe
This PR adds a python script that uses the `TestLimboShell` setup to run
some semi naive benchmarks/comparisons against `io_uring` and `syscall`
IO back-ends.
### Usage:
```sh
make bench-vfs SQL="insert into products (name, price) values ('testing', randomblob(1024*4));" N=50
```
The script will execute the given `SQL` `N` times with each back-end,
get the average/mean and display them.
![image](https://github.com/user-
attachments/assets/b2399196-dbdd-4b98-8210-536e68979edd)
😬

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1377
2025-04-23 10:25:56 +03:00
Preston Thorpe
e1d9bfc792 Merge branch 'main' into bench_vfs 2025-04-22 21:36:07 -04:00
Pekka Enberg
fc5099e2ef antithesis: Enable RUST_BACKTRACE for workload 2025-04-22 13:01:11 +03:00
Pekka Enberg
beaccae664 Merge 'Create an automatic ephemeral index when a nested table scan would otherwise be selected' from Jussi Saurio
Closes #747
- Creates an automatic ephemeral (in-memory) index on the right-side
table of a join if otherwise a nested table scan would be selected.
- This behavior is not hardcoded; instead this PR introduces a (quite
dumb) cost estimator that naturally deincentivizes building ephemeral
indexes where they don't make sense (e.g. the outermost table). I will
probably build this estimator to be smarter in the future when working
on join reordering optimizations
### Example bytecode plans and runtimes (note that this is debug mode)
Example query with no persistent indexes to choose from. Without
ephemeral index it's a nested scan:
```sql
limbo> explain select * from t1 natural join t2;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     13    0                    0   Start at 13
1     OpenRead           0     2     0                    0   table=t1, root=2
2     OpenRead           1     3     0                    0   table=t2, root=3
3     Rewind             0     12    0                    0   Rewind t1
4       Rewind           1     11    0                    0   Rewind t2
5         Column         0     0     2                    0   r[2]=t1.a
6         Column         1     0     3                    0   r[3]=t2.a
7         Ne             2     3     10                   0   if r[2]!=r[3] goto 10
8         Column         0     0     1                    0   r[1]=t1.a
9         ResultRow      1     1     0                    0   output=r[1]
10      Next             1     5     0                    0
11    Next               0     4     0                    0
12    Halt               0     0     0                    0
13    Transaction        0     0     0                    0   write=false
14    Goto               0     1     0                    0

limbo> .timer on
limbo> select * from t1 natural join t2;
┌───┐
│ a │
├───┤
└───┘
Command stats:
----------------------------
total: 953 ms (this includes parsing/coloring of cli app)
```
Same query with autoindexing enabled:
```sql
limbo> explain select * from t1 natural join t2;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     22    0                    0   Start at 22
1     OpenRead           0     2     0                    0   table=t1, root=2
2     OpenRead           1     3     0                    0   table=t2, root=3
3     Rewind             0     21    0                    0   Rewind t1
4       Once             12    0     0                    0   goto 12 # execute block 5-11 only once, on subsequent iters jump straight to 12
5       OpenAutoindex    3     0     0                    0   cursor=3
6       Rewind           1     12    0                    0   Rewind t2 # open source table for ephemeral index
7         Column         1     0     2                    0   r[2]=t2.a
8         RowId          1     3     0                    0   r[3]=t2.rowid
9         MakeRecord     2     2     4                    0   r[4]=mkrec(r[2..3])
10        IdxInsert      3     4     2                    0   key=r[4] # insert stuff to ephemeral index
11      Next             1     7     0                    0
12      Column           0     0     5                    0   r[5]=t1.a
13      IsNull           5     20    0                    0   if (r[5]==NULL) goto 20
14      SeekGE           3     20    5                    0   key=[5..5] # perform seek on ephemeral index
15        IdxGT          3     20    5                    0   key=[5..5]
16        DeferredSeek   3     1     0                    0
17        Column         0     0     1                    0   r[1]=t1.a
18        ResultRow      1     1     0                    0   output=r[1]
19      Next             2     15    0                    0
20    Next               0     4     0                    0
21    Halt               0     0     0                    0
22    Transaction        0     0     0                    0   write=false
23    Goto               0     1     0                    0

limbo> .timer on
limbo> select * from t1 natural join t2;
┌───┐
│ a │
├───┤
└───┘
Command stats:
----------------------------
total: 220 ms (this includes parsing/coloring of cli app)
```

Closes #1356
2025-04-22 13:00:06 +03:00
Pekka Enberg
936365a44e Update README.md 2025-04-22 12:11:23 +03:00
Pekka Enberg
c2cf4756ef Update README.md 2025-04-22 12:10:02 +03:00
Pekka Enberg
d92fb75262 Merge 'Fix incorrect between expression documentation' from Pedro Muniz
I was reading through the `translate_expr` function and `COMPAT.md` to
see what was not implemented yet. I saw that `Expr::Between` was marked
as a `todo!` so I set trying to implement it only to find that it was
being rewritten in the optimizer haha. This PR just adjusts the docs and
add an `unreachable` in the appropriate locations.

Closes #1378
2025-04-22 11:56:01 +03:00