Commit Graph

4183 Commits

Author SHA1 Message Date
meteorgan
eabe5e1631 temporarily comment the pragma-page-count-empty test case 2025-04-26 21:45:18 +08:00
meteorgan
f3f09a5b7b Fix pragma page_count 2025-04-26 21:45:18 +08:00
Pekka Enberg
bde2d4f0a3 Fix Antithesis docker-compose.yaml 2025-04-26 09:14:24 +03:00
Jussi Saurio
0d77ea9446 Merge 'Optimization: only initialize Rustyline if we are in a tty' from Pedro Muniz
This is small nitpick, but it will be useful for #1258. If we are
testing or just piping some sql through stdin, we can just not
initialize `Rustyline` and save some execution time.
On `Select 1` bench, I got a minor performance bump, but it starts to
become less apparent on more complex queries.
<img width="759" alt="image" src="https://github.com/user-
attachments/assets/12e22675-e081-4284-a5ed-15d53a9c5579" />

Closes #1372
2025-04-25 23:02:42 +03:00
Jussi Saurio
454a409cae Merge 'refactor database open_file and open' from meteorgan
reduce redundant code

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1406
2025-04-25 22:03:49 +03:00
Jussi Saurio
3553d05c32 Merge 'Give name to hard-coded page_size values' from Anton Harniakou
Related to #1379
I guess there are more hard-coded values.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1404
2025-04-25 22:03:43 +03:00
meteorgan
6a860e75b8 fix cargo clippy 2025-04-25 22:06:44 +08:00
meteorgan
0202fa3ed0 add back one comment 2025-04-25 21:57:35 +08:00
Jussi Saurio
fe65d6e991 Merge 'Performance: hoist entire expressions out of hot loops if they are constant' from Jussi Saurio
## Problem:
- We have cases where we are evaluating expressions in a hot loop that
could only be evaluated once. For example: `CAST('2025-01-01' as
DATETIME)` -- the value of this never changes, so we should only run it
once.
- We have no robust way of doing this right now for entire _expressions_
-- the only existing facility we have is
`program.mark_last_insn_constant()`, which has no concept of how many
instructions translating a given _expression_ spends, and breaks very
easily for this reason.
## Main ideas of this PR:
- Add `expr.is_constant()` determining whether the expression is
compile-time constant. Tries to be conservative and not deem something
compile-time constant if there is no certainty.
- Whenever we think a compile-time constant expression is about to be
translated into bytecode in `translate_expr()`, start a so called
`constant span`, which means a range of instructions that are part of a
compile-time constant expression.
- At the end of translating the program, all `constant spans` are
hoisted outside of any table loops so they only get evaluated once.
- The target offsets of any jump instructions (e.g. `Goto`) are moved to
the correct place, taking into account all instructions whose offsets
were shifted due to moving the compile-time constant expressions around.
- An escape hatch wrapper `translate_expr_no_constant_opt()` is added
for cases where we should not hoist constants even if we otherwise
could. Right now the only example of this is cases where we are reusing
the same register(s) in multiple iterations of some kind of loop, e.g.
`VALUES(...)` or in the `coalesce()` function implementation.
## Performance effects
Here is an example of a modified/simplified TPC-H query where the
`CAST()` calls were previously run millions of times in a hot loop, but
now they are optimized out of the loop.
**BYTECODE PLAN BEFORE:**
```sql
limbo> explain select
        l_orderkey,
        3 as revenue,
        o_orderdate,
        o_shippriority
from
        lineitem,
        orders,
        customer
where
        c_mktsegment = 'FURNITURE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < cast('1995-03-29' as datetime)
        and l_shipdate > cast('1995-03-29' as datetime);
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     26    0                    0   Start at 26
1     OpenRead           0     10    0                    0   table=lineitem, root=10
2     OpenRead           1     9     0                    0   table=orders, root=9
3     OpenRead           2     8     0                    0   table=customer, root=8
4     Rewind             0     25    0                    0   Rewind lineitem
5       Column           0     10    5                    0   r[5]=lineitem.l_shipdate
6       String8          0     7     0     1995-03-29     0   r[7]='1995-03-29'
7       Function         0     7     6     cast           0   r[6]=func(r[7..8])  <-- CAST() executed millions of times
8       Le               5     6     24                   0   if r[5]<=r[6] goto 24
9       Column           0     0     9                    0   r[9]=lineitem.l_orderkey
10      SeekRowid        1     9     24                   0   if (r[9]!=orders.rowid) goto 24
11      Column           1     4     10                   0   r[10]=orders.o_orderdate
12      String8          0     12    0     1995-03-29     0   r[12]='1995-03-29'
13      Function         0     12    11    cast           0   r[11]=func(r[12..13])
14      Ge               10    11    24                   0   if r[10]>=r[11] goto 24
15      Column           1     1     14                   0   r[14]=orders.o_custkey
16      SeekRowid        2     14    24                   0   if (r[14]!=customer.rowid) goto 24
17      Column           2     6     15                   0   r[15]=customer.c_mktsegment
18      Ne               15    16    24                   0   if r[15]!=r[16] goto 24
19      Column           0     0     1                    0   r[1]=lineitem.l_orderkey
20      Integer          3     2     0                    0   r[2]=3
21      Column           1     4     3                    0   r[3]=orders.o_orderdate
22      Column           1     7     4                    0   r[4]=orders.o_shippriority
23      ResultRow        1     4     0                    0   output=r[1..4]
24    Next               0     5     0                    0
25    Halt               0     0     0                    0
26    Transaction        0     0     0                    0   write=false
27    String8            0     8     0     DATETIME       0   r[8]='DATETIME'
28    String8            0     13    0     DATETIME       0   r[13]='DATETIME'
29    String8            0     16    0     FURNITURE      0   r[16]='FURNITURE'
30    Goto               0     1     0
```
**BYTECODE PLAN AFTER**:
```sql
limbo> explain select
        l_orderkey,
        3 as revenue,
        o_orderdate,
        o_shippriority
from
        lineitem,
        orders,
        customer
where
        c_mktsegment = 'FURNITURE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < cast('1995-03-29' as datetime)
        and l_shipdate > cast('1995-03-29' as datetime);
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     21    0                    0   Start at 21
1     OpenRead           0     10    0                    0   table=lineitem, root=10
2     OpenRead           1     9     0                    0   table=orders, root=9
3     OpenRead           2     8     0                    0   table=customer, root=8
4     Rewind             0     20    0                    0   Rewind lineitem
5       Column           0     10    5                    0   r[5]=lineitem.l_shipdate
6       Le               5     6     19                   0   if r[5]<=r[6] goto 19
7       Column           0     0     9                    0   r[9]=lineitem.l_orderkey
8       SeekRowid        1     9     19                   0   if (r[9]!=orders.rowid) goto 19
9       Column           1     4     10                   0   r[10]=orders.o_orderdate
10      Ge               10    11    19                   0   if r[10]>=r[11] goto 19
11      Column           1     1     14                   0   r[14]=orders.o_custkey
12      SeekRowid        2     14    19                   0   if (r[14]!=customer.rowid) goto 19
13      Column           2     6     15                   0   r[15]=customer.c_mktsegment
14      Ne               15    16    19                   0   if r[15]!=r[16] goto 19
15      Column           0     0     1                    0   r[1]=lineitem.l_orderkey
16      Column           1     4     3                    0   r[3]=orders.o_orderdate
17      Column           1     7     4                    0   r[4]=orders.o_shippriority
18      ResultRow        1     4     0                    0   output=r[1..4]
19    Next               0     5     0                    0
20    Halt               0     0     0                    0
21    Transaction        0     0     0                    0   write=false
22    String8            0     7     0     1995-03-29     0   r[7]='1995-03-29'
23    String8            0     8     0     DATETIME       0   r[8]='DATETIME'
24    Function           1     7     6     cast           0   r[6]=func(r[7..8]) <-- CAST() executed twice
25    String8            0     12    0     1995-03-29     0   r[12]='1995-03-29'
26    String8            0     13    0     DATETIME       0   r[13]='DATETIME'
27    Function           1     12    11    cast           0   r[11]=func(r[12..13])
28    String8            0     16    0     FURNITURE      0   r[16]='FURNITURE'
29    Integer            3     2     0                    0   r[2]=3
30    Goto               0     1     0                    0
```
**EXECUTION RUNTIME BEFORE:**
```sql
limbo> select
        l_orderkey,
        3 as revenue,
        o_orderdate,
        o_shippriority
from
        lineitem,
        orders,
        customer
where
        c_mktsegment = 'FURNITURE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < cast('1995-03-29' as datetime)
        and l_shipdate > cast('1995-03-29' as datetime);
┌────────────┬─────────┬─────────────┬────────────────┐
│ l_orderkey │ revenue │ o_orderdate │ o_shippriority │
├────────────┼─────────┼─────────────┼────────────────┤
└────────────┴─────────┴─────────────┴────────────────┘
Command stats:
----------------------------
total: 3.633396667 s (this includes parsing/coloring of cli app)
```
**EXECUTION RUNTIME AFTER:**
```sql
limbo> select
        l_orderkey,
        3 as revenue,
        o_orderdate,
        o_shippriority
from
        lineitem,
        orders,
        customer
where
        c_mktsegment = 'FURNITURE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < cast('1995-03-29' as datetime)
        and l_shipdate > cast('1995-03-29' as datetime);
┌────────────┬─────────┬─────────────┬────────────────┐
│ l_orderkey │ revenue │ o_orderdate │ o_shippriority │
├────────────┼─────────┼─────────────┼────────────────┤
└────────────┴─────────┴─────────────┴────────────────┘
Command stats:
----------------------------
total: 2.0923475 s (this includes parsing/coloring of cli app)
````

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1359
2025-04-25 16:55:41 +03:00
meteorgan
f464d15f8b refactor database open_file and open 2025-04-25 21:45:18 +08:00
Anton Harniakou
dd7c0ad1c8 Give name to hard-coded page_size values 2025-04-25 14:15:15 +03:00
Jussi Saurio
7137f4ab3b Merge 'Feature: Composite Primary key constraint' from Pedro Muniz
Closes #1384 . This PR implements Primary Key constraint for inserts. As
can be seen in the issue, if you created an Index with a Primary Key
constraint, it could trigger `Unique Constraint` error, but still insert
the record. Sqlite uses the opcode `NoConflict` to check if the record
already exists in the Btree. As we did not have this Opcode yet, I
implemented it. It is very similar to `NotFound` with the difference
that if any value in the Record is Null, it will immediately jump to the
offset. The added benefit of implementing this, is that now we fully
support Composite Primary Keys. Also, I think with the current
implementation, it will be trivial to implement the Unique opcode for
Insert. To support Updates, I need to understand more of the plan
optimizer to and find where we are Making the Record and opening the
autoindex.
For testing, I have written a test generator to generate many different
tables that can have a varying numbers of Primary Keys.
```sql
limbo> CREATE TABLE users (id INT, username TEXT, PRIMARY KEY (id, username));
limbo> INSERT INTO users VALUES (1, 'alice');
limbo> explain INSERT INTO users VALUES (1, 'alice');
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     16    0                    0   Start at 16
1     OpenWrite          0     2     0                    0
2     Integer            1     2     0                    0   r[2]=1
3     String8            0     3     0     alice          0   r[3]='alice'
4     OpenWrite          1     3     0                    0
5     NewRowId           0     1     0                    0
6     Copy               2     5     0                    0   r[5]=r[2]
7     Copy               3     6     0                    0   r[6]=r[3]
8     Copy               1     7     0                    0   r[7]=r[1]
9     MakeRecord         5     3     8                    0   r[8]=mkrec(r[5..7])
10    NoConflict         1     12    5     2              0   key=r[5]
11    Halt               1555  0     0     users.id, users.username  0
12    IdxInsert          1     8     5                    0   key=r[8]
13    MakeRecord         2     2     4                    0   r[4]=mkrec(r[2..3])
14    Insert             0     4     1                    0
15    Halt               0     0     0                    0
16    Transaction        0     1     0                    0   write=true
17    Goto               0     1     0                    0
limbo> INSERT INTO users VALUES (1, 'alice');
  × Runtime error: UNIQUE constraint failed: users.id, users.username (19)
limbo> INSERT INTO users VALUES (1, 'bob');
limbo> INSERT INTO users VALUES (1, 'bob');
  × Runtime error: UNIQUE constraint failed: users.id, users.username (19)
```

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1393
2025-04-24 23:25:30 +03:00
Pekka Enberg
4d0c40a435 One more fix to Antithesis Dockerfile 2025-04-24 21:17:36 +03:00
Pekka Enberg
117dbe6c8c Fix Antithesis Docker file some more 2025-04-24 21:12:40 +03:00
Pekka Enberg
fa5d6dcf6b Fix Antithesis Docker file 2025-04-24 21:03:19 +03:00
Pekka Enberg
31677c9c94 scripts/antithesis: Build Docker image for x86-64 2025-04-24 20:55:30 +03:00
Pekka Enberg
2a5eb8e5bc stress: Make Clippy happy 2025-04-24 20:46:26 +03:00
Pekka Enberg
ebc2e475b6 Merge 'Add Antithesis Tests' from eric-dinh-antithesis
This PR adds 2 autonomous test suites for use with Antithesis: `bank-
test` and `stress-composer`. It also modifies the existing
`limbo_stress` test to run as a singleton and modifies other Antithesis-
related configuration files.
**bank-test**
- `first_setup.py`
  - initializes a DB table with columns accounts and balance
  - generates random balances for each account
  - stores initial state of the table
- `parallel_driver_generate_transaction.py`
  - selects 2 accounts from the table as sender and receiver
  - generates a random value which is subtracted from sender and added
to receiver
- `anytime/eventually/finally_validate.py`
  - checks that sum of initial balances == sum of current balances
**stress-composer**
- Breaks `limbo_stress` into component parts
- `first_setup.py`
  - creates up to 10 tables with up to 10 columns
  - stores table details in a separate db
- `parallel_driver_insert.py`
  - randomly generates and executes up to 100 insert statements into a
single table using random values derived from the table details
- `parallel_driver_update.py`
  - randomly generates and executes up to 100 updates into a single
table using random values derived from the table details
- `parallel_driver_delete.py`
  - randomly generates and executes up to 100 deletes from a single
table using random values derived from the table details

Closes #1401
2025-04-24 20:44:42 +03:00
eric-dinh-antithesis
27e15364c4 stress: suppress logfile since it's too big 2025-04-24 12:27:58 -04:00
eric-dinh-antithesis
b8885777dc stress: move sdk setup_complete from limbo_stress to docker-entrypoint 2025-04-24 12:27:05 -04:00
eric-dinh-antithesis
75ae5dbd13 stress: update docker-compose 2025-04-24 12:26:00 -04:00
eric-dinh-antithesis
8390233b99 Dockerfile.antithesis: update limbo_stress build step 2025-04-24 12:25:19 -04:00
eric-dinh-antithesis
5953d32e4d Dockerfile.antithesis: add symbols for rust, cataloging for python, and antithesis tests to image, update entrypoint 2025-04-24 12:24:44 -04:00
eric-dinh-antithesis
62e2745c3c Dockerfile.antithesis: install dependencies 2025-04-24 12:23:22 -04:00
eric-dinh-antithesis
364a78b270 Cargo.toml: add profile for antithesis builds for full debug 2025-04-24 12:22:03 -04:00
eric-dinh-antithesis
f993a22023 antithesis-tests: add all tests 2025-04-24 12:20:41 -04:00
pedrocarlo
2e147b20a8 Adjustments and explicitely just emitting NoConflict on unique indexes 2025-04-24 13:13:39 -03:00
Jussi Saurio
80d39929ad Merge 'types: refactor serialtype again to make it faster' from Jussi Saurio
basically serialtype got slower in #1398, maybe because of the wasted
space of `enum SerialType` being 16 bytes, so i've now refactored
`SerialType` to be a transparent newtype wrapper over `u64` and
introduced a separate `SerialTypeKind` enum
at least on my machine the perf regression was nullified, if not even a
bit better

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1399
2025-04-24 18:59:31 +03:00
Jussi Saurio
7921d7c2e0 types: refactor serialtype again to make it faster 2025-04-24 17:28:31 +03:00
Jussi Saurio
2ffeefe165 Merge 'core/types: remove duplicate serialtype implementation' from Jussi Saurio
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1398
2025-04-24 16:17:17 +03:00
Jussi Saurio
04adf8242a faster validate 2025-04-24 16:05:12 +03:00
Jussi Saurio
af6a783f4d core/types: remove duplicate serialtype implementation 2025-04-24 15:38:47 +03:00
Jussi Saurio
0c800524af Merge 'Bugfix: Explain command should display syntax errors in CLI' from Anton Harniakou
Closes #1392

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1396
2025-04-24 15:11:59 +03:00
Anton Harniakou
fdf3dd9796 Bugfix: Explain command should display syntax errors in CLI
Closes #1392
2025-04-24 13:25:00 +03:00
Jussi Saurio
dc3e97887f Merge 'replace vec with array in btree balancing' from Lâm Hoàng Phúc
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1354
2025-04-24 11:22:07 +03:00
Jussi Saurio
2e8042510e Merge 'Pragma page size reading' from Anton Harniakou
1) Fix a bug where cli pretty mode would not print pragma results;
2) Add ability to read page_size using PRAGMA page_size;

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1394
2025-04-24 11:08:55 +03:00
Jussi Saurio
c3441f9685 vdbe: move comments if instructions were moved around in emit_constant_insns() 2025-04-24 11:05:21 +03:00
Jussi Saurio
029e5eddde Fix existing resolve_label() calls to work with new system 2025-04-24 11:05:21 +03:00
Jussi Saurio
e557503091 expr.rs: use constant spans to optimize constant expressions 2025-04-24 11:05:21 +03:00
Jussi Saurio
0f5c791784 vdbe: refactor label resolution to account for insn offsets changing 2025-04-24 11:05:21 +03:00
Jussi Saurio
b4b38bdb3c vdbe: resolve labels for InitCoroutine::start_offset 2025-04-24 11:05:21 +03:00
Jussi Saurio
47f3f3bda3 vdbe: replace constant_insns with constant_spans 2025-04-24 11:05:21 +03:00
Jussi Saurio
e5bab63522 add expr.is_constant() 2025-04-24 11:05:21 +03:00
Jussi Saurio
5bed331505 add Func::is_deterministic() 2025-04-24 11:05:21 +03:00
Jussi Saurio
b36c898842 rename check_constant() to less confusing name 2025-04-24 11:05:21 +03:00
Jussi Saurio
6ff5ff49b7 Merge 'perf/btree: use binary search for Index seek operations' from Jussi Saurio
## Beef
Followup to #1357 which did the same treatment for table btrees only.
After this PR, all of our seeks use binary search for both interior and
leaf pages.
## Perf comparison
using TPC-H 1GB db for this query:
```sql
limbo> explain select count(1) from lineitem join partsupp on l_partkey = ps_partkey and l_suppkey = ps_suppkey;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     18    0                    0   Start at 18
1     OpenRead           0     10    0                    0   table=lineitem, root=10
2     OpenRead           1     7     0                    0   table=sqlite_autoindex_partsupp_1, root=7
3     Rewind             0     14    0                    0   Rewind lineitem
4       Column           0     1     2                    0   r[2]=lineitem.l_partkey
5       IsNull           2     13    0                    0   if (r[2]==NULL) goto 13
6       Column           0     2     3                    0   r[3]=lineitem.l_suppkey
7       IsNull           3     13    0                    0   if (r[3]==NULL) goto 13
8       SeekGE           1     13    2                    0   key=[2..3] <-- index seek here, for every row in lineitem
9         IdxGT          1     13    2                    0   key=[2..3]
10        Integer        1     5     0                    0   r[5]=1
11        AggStep        0     5     4     count          0   accum=r[4] step(r[5])
12      Next             1     9     0                    0
13    Next               0     4     0                    0
14    AggFinal           0     4     0     count          0   accum=r[4]
15    Copy               4     1     0                    0   r[1]=r[4]
16    ResultRow          1     1     0                    0   output=r[1]
17    Halt               0     0     0                    0
18    Transaction        0     0     0                    0   write=false
19    Goto               0     1     0                    0
```
main:
```sql
limbo> select count(1) from lineitem join partsupp on l_partkey = ps_partkey and l_suppkey = ps_suppkey;
┌───────────┐
│ count (1) │
├───────────┤
│   6001215 │
└───────────┘
Command stats:
----------------------------
total: 40.292102375 s (this includes parsing/coloring of cli app)
```
PR:
```sql
limbo> select count(1) from lineitem join partsupp on l_partkey = ps_partkey and l_suppkey = ps_suppkey;
┌───────────┐
│ count (1) │
├───────────┤
│   6001215 │
└───────────┘
Command stats:
----------------------------
total: 14.021689916 s (this includes parsing/coloring of cli app)
```
almost 3x faster. buzzkill: still 3x slower than sqlite :)

Closes #1387
2025-04-24 10:53:35 +03:00
Anton Harniakou
51fc1773ea Fix missing documentation warning; improve the documentation message 2025-04-24 10:36:23 +03:00
Jussi Saurio
c88c579154 Merge 'expr.is_nonnull(): return true if col.primary_key || col.notnull' from Jussi Saurio
This avoids redundant `IsNull` instructions during index seeks if the
seek key columns are primary keys of other tables, which they often are.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1388
2025-04-24 10:32:00 +03:00
Anton Harniakou
0a69ea0138 Support reading db page size using PRAGMA page_size 2025-04-24 10:12:02 +03:00
pedrocarlo
9dd1ced5ad added tests 2025-04-23 20:38:08 -03:00