Commit Graph

4349 Commits

Author SHA1 Message Date
meteorgan
f3f09a5b7b Fix pragma page_count 2025-04-26 21:45:18 +08:00
Jussi Saurio
46d45a6bf4 don't recompute cell_count 2025-04-26 14:50:42 +03:00
Jussi Saurio
75c6678a06 sqlite3_ondisk: use debug asserts for cell_table_interior_read... funcs 2025-04-26 14:50:42 +03:00
Jussi Saurio
ac1bc17ea4 btree/tablebtree_seek: remove some more useless calls to set_cell_index() 2025-04-26 13:41:30 +03:00
Pekka Enberg
e46c01928c antithesis: Enable Rust backtraces again 2025-04-26 12:59:19 +03:00
Jussi Saurio
5060f1a1fa don't use integer division in binary search halfpoint calculation 2025-04-26 12:38:15 +03:00
Anton Harniakou
6d3c63fb01 Add the .indexes command 2025-04-26 12:27:08 +03:00
Jussi Saurio
23ae5a27c4 btree/tablebtree_move_to: micro-optimizations 2025-04-26 11:32:24 +03:00
Pekka Enberg
e8bc3086f2 antithesis-tests: Fix generate_random_value() on older Python versions 2025-04-26 10:53:31 +03:00
Pekka Enberg
a7537be2b6 antithesis-tests: Fix accounts to be at least one 2025-04-26 09:38:03 +03:00
Pekka Enberg
bde2d4f0a3 Fix Antithesis docker-compose.yaml 2025-04-26 09:14:24 +03:00
Jussi Saurio
0d77ea9446 Merge 'Optimization: only initialize Rustyline if we are in a tty' from Pedro Muniz
This is small nitpick, but it will be useful for #1258. If we are
testing or just piping some sql through stdin, we can just not
initialize `Rustyline` and save some execution time.
On `Select 1` bench, I got a minor performance bump, but it starts to
become less apparent on more complex queries.
<img width="759" alt="image" src="https://github.com/user-
attachments/assets/12e22675-e081-4284-a5ed-15d53a9c5579" />

Closes #1372
2025-04-25 23:02:42 +03:00
Jussi Saurio
454a409cae Merge 'refactor database open_file and open' from meteorgan
reduce redundant code

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1406
2025-04-25 22:03:49 +03:00
Jussi Saurio
3553d05c32 Merge 'Give name to hard-coded page_size values' from Anton Harniakou
Related to #1379
I guess there are more hard-coded values.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1404
2025-04-25 22:03:43 +03:00
meteorgan
6a860e75b8 fix cargo clippy 2025-04-25 22:06:44 +08:00
meteorgan
0202fa3ed0 add back one comment 2025-04-25 21:57:35 +08:00
Jussi Saurio
fe65d6e991 Merge 'Performance: hoist entire expressions out of hot loops if they are constant' from Jussi Saurio
## Problem:
- We have cases where we are evaluating expressions in a hot loop that
could only be evaluated once. For example: `CAST('2025-01-01' as
DATETIME)` -- the value of this never changes, so we should only run it
once.
- We have no robust way of doing this right now for entire _expressions_
-- the only existing facility we have is
`program.mark_last_insn_constant()`, which has no concept of how many
instructions translating a given _expression_ spends, and breaks very
easily for this reason.
## Main ideas of this PR:
- Add `expr.is_constant()` determining whether the expression is
compile-time constant. Tries to be conservative and not deem something
compile-time constant if there is no certainty.
- Whenever we think a compile-time constant expression is about to be
translated into bytecode in `translate_expr()`, start a so called
`constant span`, which means a range of instructions that are part of a
compile-time constant expression.
- At the end of translating the program, all `constant spans` are
hoisted outside of any table loops so they only get evaluated once.
- The target offsets of any jump instructions (e.g. `Goto`) are moved to
the correct place, taking into account all instructions whose offsets
were shifted due to moving the compile-time constant expressions around.
- An escape hatch wrapper `translate_expr_no_constant_opt()` is added
for cases where we should not hoist constants even if we otherwise
could. Right now the only example of this is cases where we are reusing
the same register(s) in multiple iterations of some kind of loop, e.g.
`VALUES(...)` or in the `coalesce()` function implementation.
## Performance effects
Here is an example of a modified/simplified TPC-H query where the
`CAST()` calls were previously run millions of times in a hot loop, but
now they are optimized out of the loop.
**BYTECODE PLAN BEFORE:**
```sql
limbo> explain select
        l_orderkey,
        3 as revenue,
        o_orderdate,
        o_shippriority
from
        lineitem,
        orders,
        customer
where
        c_mktsegment = 'FURNITURE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < cast('1995-03-29' as datetime)
        and l_shipdate > cast('1995-03-29' as datetime);
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     26    0                    0   Start at 26
1     OpenRead           0     10    0                    0   table=lineitem, root=10
2     OpenRead           1     9     0                    0   table=orders, root=9
3     OpenRead           2     8     0                    0   table=customer, root=8
4     Rewind             0     25    0                    0   Rewind lineitem
5       Column           0     10    5                    0   r[5]=lineitem.l_shipdate
6       String8          0     7     0     1995-03-29     0   r[7]='1995-03-29'
7       Function         0     7     6     cast           0   r[6]=func(r[7..8])  <-- CAST() executed millions of times
8       Le               5     6     24                   0   if r[5]<=r[6] goto 24
9       Column           0     0     9                    0   r[9]=lineitem.l_orderkey
10      SeekRowid        1     9     24                   0   if (r[9]!=orders.rowid) goto 24
11      Column           1     4     10                   0   r[10]=orders.o_orderdate
12      String8          0     12    0     1995-03-29     0   r[12]='1995-03-29'
13      Function         0     12    11    cast           0   r[11]=func(r[12..13])
14      Ge               10    11    24                   0   if r[10]>=r[11] goto 24
15      Column           1     1     14                   0   r[14]=orders.o_custkey
16      SeekRowid        2     14    24                   0   if (r[14]!=customer.rowid) goto 24
17      Column           2     6     15                   0   r[15]=customer.c_mktsegment
18      Ne               15    16    24                   0   if r[15]!=r[16] goto 24
19      Column           0     0     1                    0   r[1]=lineitem.l_orderkey
20      Integer          3     2     0                    0   r[2]=3
21      Column           1     4     3                    0   r[3]=orders.o_orderdate
22      Column           1     7     4                    0   r[4]=orders.o_shippriority
23      ResultRow        1     4     0                    0   output=r[1..4]
24    Next               0     5     0                    0
25    Halt               0     0     0                    0
26    Transaction        0     0     0                    0   write=false
27    String8            0     8     0     DATETIME       0   r[8]='DATETIME'
28    String8            0     13    0     DATETIME       0   r[13]='DATETIME'
29    String8            0     16    0     FURNITURE      0   r[16]='FURNITURE'
30    Goto               0     1     0
```
**BYTECODE PLAN AFTER**:
```sql
limbo> explain select
        l_orderkey,
        3 as revenue,
        o_orderdate,
        o_shippriority
from
        lineitem,
        orders,
        customer
where
        c_mktsegment = 'FURNITURE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < cast('1995-03-29' as datetime)
        and l_shipdate > cast('1995-03-29' as datetime);
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     21    0                    0   Start at 21
1     OpenRead           0     10    0                    0   table=lineitem, root=10
2     OpenRead           1     9     0                    0   table=orders, root=9
3     OpenRead           2     8     0                    0   table=customer, root=8
4     Rewind             0     20    0                    0   Rewind lineitem
5       Column           0     10    5                    0   r[5]=lineitem.l_shipdate
6       Le               5     6     19                   0   if r[5]<=r[6] goto 19
7       Column           0     0     9                    0   r[9]=lineitem.l_orderkey
8       SeekRowid        1     9     19                   0   if (r[9]!=orders.rowid) goto 19
9       Column           1     4     10                   0   r[10]=orders.o_orderdate
10      Ge               10    11    19                   0   if r[10]>=r[11] goto 19
11      Column           1     1     14                   0   r[14]=orders.o_custkey
12      SeekRowid        2     14    19                   0   if (r[14]!=customer.rowid) goto 19
13      Column           2     6     15                   0   r[15]=customer.c_mktsegment
14      Ne               15    16    19                   0   if r[15]!=r[16] goto 19
15      Column           0     0     1                    0   r[1]=lineitem.l_orderkey
16      Column           1     4     3                    0   r[3]=orders.o_orderdate
17      Column           1     7     4                    0   r[4]=orders.o_shippriority
18      ResultRow        1     4     0                    0   output=r[1..4]
19    Next               0     5     0                    0
20    Halt               0     0     0                    0
21    Transaction        0     0     0                    0   write=false
22    String8            0     7     0     1995-03-29     0   r[7]='1995-03-29'
23    String8            0     8     0     DATETIME       0   r[8]='DATETIME'
24    Function           1     7     6     cast           0   r[6]=func(r[7..8]) <-- CAST() executed twice
25    String8            0     12    0     1995-03-29     0   r[12]='1995-03-29'
26    String8            0     13    0     DATETIME       0   r[13]='DATETIME'
27    Function           1     12    11    cast           0   r[11]=func(r[12..13])
28    String8            0     16    0     FURNITURE      0   r[16]='FURNITURE'
29    Integer            3     2     0                    0   r[2]=3
30    Goto               0     1     0                    0
```
**EXECUTION RUNTIME BEFORE:**
```sql
limbo> select
        l_orderkey,
        3 as revenue,
        o_orderdate,
        o_shippriority
from
        lineitem,
        orders,
        customer
where
        c_mktsegment = 'FURNITURE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < cast('1995-03-29' as datetime)
        and l_shipdate > cast('1995-03-29' as datetime);
┌────────────┬─────────┬─────────────┬────────────────┐
│ l_orderkey │ revenue │ o_orderdate │ o_shippriority │
├────────────┼─────────┼─────────────┼────────────────┤
└────────────┴─────────┴─────────────┴────────────────┘
Command stats:
----------------------------
total: 3.633396667 s (this includes parsing/coloring of cli app)
```
**EXECUTION RUNTIME AFTER:**
```sql
limbo> select
        l_orderkey,
        3 as revenue,
        o_orderdate,
        o_shippriority
from
        lineitem,
        orders,
        customer
where
        c_mktsegment = 'FURNITURE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < cast('1995-03-29' as datetime)
        and l_shipdate > cast('1995-03-29' as datetime);
┌────────────┬─────────┬─────────────┬────────────────┐
│ l_orderkey │ revenue │ o_orderdate │ o_shippriority │
├────────────┼─────────┼─────────────┼────────────────┤
└────────────┴─────────┴─────────────┴────────────────┘
Command stats:
----------------------------
total: 2.0923475 s (this includes parsing/coloring of cli app)
````

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1359
2025-04-25 16:55:41 +03:00
meteorgan
f464d15f8b refactor database open_file and open 2025-04-25 21:45:18 +08:00
Anton Harniakou
dd7c0ad1c8 Give name to hard-coded page_size values 2025-04-25 14:15:15 +03:00
Jussi Saurio
7137f4ab3b Merge 'Feature: Composite Primary key constraint' from Pedro Muniz
Closes #1384 . This PR implements Primary Key constraint for inserts. As
can be seen in the issue, if you created an Index with a Primary Key
constraint, it could trigger `Unique Constraint` error, but still insert
the record. Sqlite uses the opcode `NoConflict` to check if the record
already exists in the Btree. As we did not have this Opcode yet, I
implemented it. It is very similar to `NotFound` with the difference
that if any value in the Record is Null, it will immediately jump to the
offset. The added benefit of implementing this, is that now we fully
support Composite Primary Keys. Also, I think with the current
implementation, it will be trivial to implement the Unique opcode for
Insert. To support Updates, I need to understand more of the plan
optimizer to and find where we are Making the Record and opening the
autoindex.
For testing, I have written a test generator to generate many different
tables that can have a varying numbers of Primary Keys.
```sql
limbo> CREATE TABLE users (id INT, username TEXT, PRIMARY KEY (id, username));
limbo> INSERT INTO users VALUES (1, 'alice');
limbo> explain INSERT INTO users VALUES (1, 'alice');
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     16    0                    0   Start at 16
1     OpenWrite          0     2     0                    0
2     Integer            1     2     0                    0   r[2]=1
3     String8            0     3     0     alice          0   r[3]='alice'
4     OpenWrite          1     3     0                    0
5     NewRowId           0     1     0                    0
6     Copy               2     5     0                    0   r[5]=r[2]
7     Copy               3     6     0                    0   r[6]=r[3]
8     Copy               1     7     0                    0   r[7]=r[1]
9     MakeRecord         5     3     8                    0   r[8]=mkrec(r[5..7])
10    NoConflict         1     12    5     2              0   key=r[5]
11    Halt               1555  0     0     users.id, users.username  0
12    IdxInsert          1     8     5                    0   key=r[8]
13    MakeRecord         2     2     4                    0   r[4]=mkrec(r[2..3])
14    Insert             0     4     1                    0
15    Halt               0     0     0                    0
16    Transaction        0     1     0                    0   write=true
17    Goto               0     1     0                    0
limbo> INSERT INTO users VALUES (1, 'alice');
  × Runtime error: UNIQUE constraint failed: users.id, users.username (19)
limbo> INSERT INTO users VALUES (1, 'bob');
limbo> INSERT INTO users VALUES (1, 'bob');
  × Runtime error: UNIQUE constraint failed: users.id, users.username (19)
```

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1393
2025-04-24 23:25:30 +03:00
Pekka Enberg
4d0c40a435 One more fix to Antithesis Dockerfile 2025-04-24 21:17:36 +03:00
Pekka Enberg
117dbe6c8c Fix Antithesis Docker file some more 2025-04-24 21:12:40 +03:00
Pekka Enberg
fa5d6dcf6b Fix Antithesis Docker file 2025-04-24 21:03:19 +03:00
Pekka Enberg
31677c9c94 scripts/antithesis: Build Docker image for x86-64 2025-04-24 20:55:30 +03:00
Pekka Enberg
2a5eb8e5bc stress: Make Clippy happy 2025-04-24 20:46:26 +03:00
Pekka Enberg
ebc2e475b6 Merge 'Add Antithesis Tests' from eric-dinh-antithesis
This PR adds 2 autonomous test suites for use with Antithesis: `bank-
test` and `stress-composer`. It also modifies the existing
`limbo_stress` test to run as a singleton and modifies other Antithesis-
related configuration files.
**bank-test**
- `first_setup.py`
  - initializes a DB table with columns accounts and balance
  - generates random balances for each account
  - stores initial state of the table
- `parallel_driver_generate_transaction.py`
  - selects 2 accounts from the table as sender and receiver
  - generates a random value which is subtracted from sender and added
to receiver
- `anytime/eventually/finally_validate.py`
  - checks that sum of initial balances == sum of current balances
**stress-composer**
- Breaks `limbo_stress` into component parts
- `first_setup.py`
  - creates up to 10 tables with up to 10 columns
  - stores table details in a separate db
- `parallel_driver_insert.py`
  - randomly generates and executes up to 100 insert statements into a
single table using random values derived from the table details
- `parallel_driver_update.py`
  - randomly generates and executes up to 100 updates into a single
table using random values derived from the table details
- `parallel_driver_delete.py`
  - randomly generates and executes up to 100 deletes from a single
table using random values derived from the table details

Closes #1401
2025-04-24 20:44:42 +03:00
eric-dinh-antithesis
27e15364c4 stress: suppress logfile since it's too big 2025-04-24 12:27:58 -04:00
eric-dinh-antithesis
b8885777dc stress: move sdk setup_complete from limbo_stress to docker-entrypoint 2025-04-24 12:27:05 -04:00
eric-dinh-antithesis
75ae5dbd13 stress: update docker-compose 2025-04-24 12:26:00 -04:00
eric-dinh-antithesis
8390233b99 Dockerfile.antithesis: update limbo_stress build step 2025-04-24 12:25:19 -04:00
eric-dinh-antithesis
5953d32e4d Dockerfile.antithesis: add symbols for rust, cataloging for python, and antithesis tests to image, update entrypoint 2025-04-24 12:24:44 -04:00
eric-dinh-antithesis
62e2745c3c Dockerfile.antithesis: install dependencies 2025-04-24 12:23:22 -04:00
eric-dinh-antithesis
364a78b270 Cargo.toml: add profile for antithesis builds for full debug 2025-04-24 12:22:03 -04:00
eric-dinh-antithesis
f993a22023 antithesis-tests: add all tests 2025-04-24 12:20:41 -04:00
pedrocarlo
2e147b20a8 Adjustments and explicitely just emitting NoConflict on unique indexes 2025-04-24 13:13:39 -03:00
Jussi Saurio
80d39929ad Merge 'types: refactor serialtype again to make it faster' from Jussi Saurio
basically serialtype got slower in #1398, maybe because of the wasted
space of `enum SerialType` being 16 bytes, so i've now refactored
`SerialType` to be a transparent newtype wrapper over `u64` and
introduced a separate `SerialTypeKind` enum
at least on my machine the perf regression was nullified, if not even a
bit better

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1399
2025-04-24 18:59:31 +03:00
Pere Diaz Bou
3ba5c2349f add corrupt error if no matching record found for idxdelete
a
2025-04-24 16:29:05 +02:00
Jussi Saurio
7921d7c2e0 types: refactor serialtype again to make it faster 2025-04-24 17:28:31 +03:00
Pere Diaz Bou
b7970a286d implement IdxDelete
clippy

revert op_idx_ge changes

fmt

fmt again

rever op_idx_gt changes
2025-04-24 16:23:34 +02:00
Jussi Saurio
2ffeefe165 Merge 'core/types: remove duplicate serialtype implementation' from Jussi Saurio
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1398
2025-04-24 16:17:17 +03:00
Jussi Saurio
04adf8242a faster validate 2025-04-24 16:05:12 +03:00
Jussi Saurio
af6a783f4d core/types: remove duplicate serialtype implementation 2025-04-24 15:38:47 +03:00
Jussi Saurio
0c800524af Merge 'Bugfix: Explain command should display syntax errors in CLI' from Anton Harniakou
Closes #1392

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1396
2025-04-24 15:11:59 +03:00
Anton Harniakou
fdf3dd9796 Bugfix: Explain command should display syntax errors in CLI
Closes #1392
2025-04-24 13:25:00 +03:00
Jussi Saurio
dc3e97887f Merge 'replace vec with array in btree balancing' from Lâm Hoàng Phúc
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1354
2025-04-24 11:22:07 +03:00
Jussi Saurio
2e8042510e Merge 'Pragma page size reading' from Anton Harniakou
1) Fix a bug where cli pretty mode would not print pragma results;
2) Add ability to read page_size using PRAGMA page_size;

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1394
2025-04-24 11:08:55 +03:00
Jussi Saurio
c3441f9685 vdbe: move comments if instructions were moved around in emit_constant_insns() 2025-04-24 11:05:21 +03:00
Jussi Saurio
029e5eddde Fix existing resolve_label() calls to work with new system 2025-04-24 11:05:21 +03:00
Jussi Saurio
e557503091 expr.rs: use constant spans to optimize constant expressions 2025-04-24 11:05:21 +03:00
Jussi Saurio
0f5c791784 vdbe: refactor label resolution to account for insn offsets changing 2025-04-24 11:05:21 +03:00