Commit Graph

4223 Commits

Author SHA1 Message Date
jnesss
69965b4eee remove TestTransactions that was being skipped. added back in second PR 2025-05-02 14:44:49 -07:00
jnesss
aed36c5b30 change to address CI failure. libs dir must exist with at least one file during build time. Add libs/.gitkeep as placeholder. Update .gitignore to exclude built libs but keep placeholder. This change is for CI purposes only 2025-05-02 13:02:52 -07:00
jnesss
bcb2f9f307 add documentation for new embedded library feature, including usage instructions and implementation notes 2025-05-02 12:45:30 -07:00
jnesss
1a3c3866b8 Update Windows library loading to prioritize the embedded library while maintaining compatibility with PATH-based lookup 2025-05-02 12:44:46 -07:00
jnesss
1e0b4676dc Update library loading mechanism to first attempt using the embedded library before falling back to traditional LD_LIBRARY_PATH lookup 2025-05-02 12:44:24 -07:00
jnesss
2476d2c6c2 embeds and extracts platform-specific libraries at runtime using Go's embed package 2025-05-02 12:43:36 -07:00
jnesss
322c2859e6 platform-specific build script that generates and organizes library for embedding into Go binaries 2025-05-02 12:42:38 -07:00
jnesss
992324f318 Add .gitignore for generated library files 2025-05-02 12:41:47 -07:00
jnesss
a9b5fc7f63 Add tests for vector operations and date/time functions in Go adapter 2025-05-02 11:30:31 -07:00
Jussi Saurio
6096cfb3d8 Merge 'Add PRAGMA schema_version' from Anton Harniakou
This PR adds `PRAGMA schema_version` to get the value of the schema-
version integer at offset 40 in the database header.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1427
2025-05-01 10:50:02 +03:00
Jussi Saurio
a25f228ea7 Merge 'Fix setting default value for primary key on UPDATE' from Pere Diaz Bou
I noticed when updating a table with a primary key, it would sometimes
set primary key column to null. I believe the problem was due to
incorrect condition that was inconsistent with the comment above: "don't
emit null for pkey of virtual tables."
cc: @PThorpe92

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1422
2025-05-01 10:47:17 +03:00
Jussi Saurio
a525feb7ad Merge 'Fix: allow page_size=65536' from meteorgan
Since `page_size` in `DatabaseHeader` can be 1 representing 65526 bytes,
it can't be used it directly.  Additionally, we should use `u32` instead
of `u16` or `usize` in other contexts.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1411
2025-05-01 10:46:19 +03:00
Jussi Saurio
7643b7666c Merge 'Fix page_count pragma' from meteorgan
This issue was introduced in #819. However, I believe the solution is
suboptimal because `pragma page_count` can never return 1, which is
inconsistent with SQLite.
<img width="442" alt="image" src="https://github.com/user-
attachments/assets/c772eae7-3e9f-4687-a94a-230deb0eb034" />
To align with SQLite's behavior, we should allocate the first page when
the first schema object is created, rather than immediately after
creating database. And it's always preferable to return an accurate page
count.

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1407
2025-05-01 10:36:36 +03:00
Pere Diaz Bou
1a2a383635 fix setting default value for primary key on UPDATE
I noticed when updating a table with a primary key, it would sometimes
set primary key column to null. A primary key can be nullified if it
isn't a rowid alias, meaning it isn't a INTEGER PRIMAR KEY.
2025-05-01 09:46:48 +03:00
Anton Harniakou
525b7fdbaa Add PRAGMA schema_version 2025-04-30 09:41:04 +03:00
Pekka Enberg
f60fc26578 Merge 'Support literal-value current_time, current_date and current_timestamp' from meteorgan
I haven't found a way to automate these tests.
<p><img width="361" alt="image" src="https://github.com/user-
attachments/assets/a1563776-97e0-4aa5-844a-b9b23c5273e5" /></p>
<p><img width="279" alt="image" src="https://github.com/user-
attachments/assets/df036951-2649-4835-bffa-f25e6f59bb07" /></p>

Closes #1424
2025-04-29 21:51:33 +03:00
Pekka Enberg
3e6ac7c4a0 Merge 'Save history on exit' from Piotr Rżysko
Before this change, the history was only saved when the shell was
interrupted (e.g., Ctrl-C pressed twice). With this change, history is
now also saved when the `.exit` or `.quit` commands are used.
I attempted to add shell tests to cover the changes introduced in this
PR, but emulating a terminal/TTY that would work cross-platform seems to
require significant changes to `TestLimboShell` and `LimboShell`. For
example, the `pty` module [currently doesn't support
Windows](https://bugs.python.org/issue41663). I'm open to experimenting,
but I’m unsure if complicating these classes is worthwhile, as saving
history doesn't seem to be critical.
Additionally, it might be worth considering a refactor of the CLI so
that exit and cleanup operations are performed in one place.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1414
2025-04-29 21:50:41 +03:00
Pekka Enberg
f3a144638d Merge 'Fix broken fuzz target due to old name' from Levy A.
Closes #1416
2025-04-29 21:50:07 +03:00
Pekka Enberg
6409f347fd Merge 'Add state machine for op_idx_delete + DeleteState simplification' from Pere Diaz Bou
DeleteState had a bit too many unnecessary states so I removed them.
Usually we care about having a different state when I/O is triggered
requiring a state to be stored for later.
Furthermore, there was a bug with op_idx_delete where if balance is
triggered, op_idx_delete wouldn't be re-entrant. So a state machine was
added to prevent that from happening.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1421
2025-04-29 21:49:31 +03:00
meteorgan
51d43074f3 Support literal-value current_time, current_date and current_timestamp 2025-04-29 22:35:26 +08:00
Pere Diaz Bou
a30241ca91 Add state machine for op_idx_delete + DeleteState simplification
DeleteState had a bit too many unnecessary states so I removed them.
Usually we care about having a different state when I/O is triggered
requiring a state to be stored for later.

Furthermore, there was a bug with op_idx_delete where if balance is
triggered, op_idx_delete wouldn't be re-entrant. So a state machine was
added to prevent that from happening.
2025-04-29 14:58:20 +03:00
Levy A.
3e70cc3b68 fix: old name 2025-04-28 11:33:46 +03:00
meteorgan
d1a50f8a69 skip unneccessary conversion 2025-04-28 16:13:07 +08:00
meteorgan
d2dce740f7 fix some issues about page_size 2025-04-28 16:13:07 +08:00
Jussi Saurio
3459c1f7dd Merge 'btree/tablebtree_move_to: micro-optimizations' from Jussi Saurio
```bash
jussi@Jussis-MacBook-Pro limbo % git co main && cargo build --bin limbo --release && hyperfine --shell=none --warmup 5 './target/release/limbo TPC-H.db "select l_orderkey, 3 as revenue, o_orderdate, o_shippriority from lineitem, orders, customer where c_mktsegment = '\''FURNITURE'\'' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < cast('\''1995-03-29'\'' as datetime) and l_shipdate > cast('\''1995-03-29'\'' as datetime);"'

...

Benchmark 1: ./target/release/limbo TPC-H.db "select l_orderkey, 3 as revenue, o_orderdate, o_shippriority from lineitem, orders, customer where c_mktsegment = 'FURNITURE' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < cast('1995-03-29' as datetime) and l_shipdate > cast('1995-03-29' as datetime);"
  Time (mean ± σ):      2.104 s ±  0.006 s    [User: 1.952 s, System: 0.151 s]
  Range (min … max):    2.094 s …  2.115 s    10 runs

jussi@Jussis-MacBook-Pro limbo % git co move-to-micro-opt && cargo build --bin limbo --release && hyperfine --shell=none --warmup 5 './target/release/limbo TPC-H.db "select l_orderkey, 3 as revenue, o_orderdate, o_shippriority from lineitem, orders, customer where c_mktsegment = '\''FURNITURE'\'' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < cast('\''1995-03-29'\'' as datetime) and l_shipdate > cast('\''1995-03-29'\'' as datetime);"'

...

Benchmark 1: ./target/release/limbo TPC-H.db "select l_orderkey, 3 as revenue, o_orderdate, o_shippriority from lineitem, orders, customer where c_mktsegment = 'FURNITURE' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < cast('1995-03-29' as datetime) and l_shipdate > cast('1995-03-29' as datetime);"
  Time (mean ± σ):      1.883 s ±  0.012 s    [User: 1.733 s, System: 0.146 s]
  Range (min … max):    1.866 s …  1.908 s    10 runs
```

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1408
2025-04-28 10:27:59 +03:00
Piotr Rzysko
33d230771f Save history on exit 2025-04-28 08:59:25 +02:00
Pere Diaz Bou
63a94e7c62 Merge 'Emit IdxDelete instruction and some fixes on seek after deletion' from Pere Diaz Bou
Previously `DELETE FROM ...` only emitted deletes for main table, but
this is incorrect as we want to remove entries from index tables as
well.

Closes #1383
2025-04-28 09:13:54 +03:00
Pekka Enberg
ab841c47bc Merge 'Add the .indexes command' from Anton Harniakou
Adds ability to view database indices using the `.indexes ?TABLE?`
command.

Closes #1409
2025-04-27 20:46:02 +03:00
meteorgan
eabe5e1631 temporarily comment the pragma-page-count-empty test case 2025-04-26 21:45:18 +08:00
meteorgan
f3f09a5b7b Fix pragma page_count 2025-04-26 21:45:18 +08:00
Jussi Saurio
46d45a6bf4 don't recompute cell_count 2025-04-26 14:50:42 +03:00
Jussi Saurio
75c6678a06 sqlite3_ondisk: use debug asserts for cell_table_interior_read... funcs 2025-04-26 14:50:42 +03:00
Jussi Saurio
ac1bc17ea4 btree/tablebtree_seek: remove some more useless calls to set_cell_index() 2025-04-26 13:41:30 +03:00
Pekka Enberg
e46c01928c antithesis: Enable Rust backtraces again 2025-04-26 12:59:19 +03:00
Jussi Saurio
5060f1a1fa don't use integer division in binary search halfpoint calculation 2025-04-26 12:38:15 +03:00
Anton Harniakou
6d3c63fb01 Add the .indexes command 2025-04-26 12:27:08 +03:00
Jussi Saurio
23ae5a27c4 btree/tablebtree_move_to: micro-optimizations 2025-04-26 11:32:24 +03:00
Pekka Enberg
e8bc3086f2 antithesis-tests: Fix generate_random_value() on older Python versions 2025-04-26 10:53:31 +03:00
Pekka Enberg
a7537be2b6 antithesis-tests: Fix accounts to be at least one 2025-04-26 09:38:03 +03:00
Pekka Enberg
bde2d4f0a3 Fix Antithesis docker-compose.yaml 2025-04-26 09:14:24 +03:00
Jussi Saurio
0d77ea9446 Merge 'Optimization: only initialize Rustyline if we are in a tty' from Pedro Muniz
This is small nitpick, but it will be useful for #1258. If we are
testing or just piping some sql through stdin, we can just not
initialize `Rustyline` and save some execution time.
On `Select 1` bench, I got a minor performance bump, but it starts to
become less apparent on more complex queries.
<img width="759" alt="image" src="https://github.com/user-
attachments/assets/12e22675-e081-4284-a5ed-15d53a9c5579" />

Closes #1372
2025-04-25 23:02:42 +03:00
Jussi Saurio
454a409cae Merge 'refactor database open_file and open' from meteorgan
reduce redundant code

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1406
2025-04-25 22:03:49 +03:00
Jussi Saurio
3553d05c32 Merge 'Give name to hard-coded page_size values' from Anton Harniakou
Related to #1379
I guess there are more hard-coded values.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1404
2025-04-25 22:03:43 +03:00
meteorgan
6a860e75b8 fix cargo clippy 2025-04-25 22:06:44 +08:00
meteorgan
0202fa3ed0 add back one comment 2025-04-25 21:57:35 +08:00
Jussi Saurio
fe65d6e991 Merge 'Performance: hoist entire expressions out of hot loops if they are constant' from Jussi Saurio
## Problem:
- We have cases where we are evaluating expressions in a hot loop that
could only be evaluated once. For example: `CAST('2025-01-01' as
DATETIME)` -- the value of this never changes, so we should only run it
once.
- We have no robust way of doing this right now for entire _expressions_
-- the only existing facility we have is
`program.mark_last_insn_constant()`, which has no concept of how many
instructions translating a given _expression_ spends, and breaks very
easily for this reason.
## Main ideas of this PR:
- Add `expr.is_constant()` determining whether the expression is
compile-time constant. Tries to be conservative and not deem something
compile-time constant if there is no certainty.
- Whenever we think a compile-time constant expression is about to be
translated into bytecode in `translate_expr()`, start a so called
`constant span`, which means a range of instructions that are part of a
compile-time constant expression.
- At the end of translating the program, all `constant spans` are
hoisted outside of any table loops so they only get evaluated once.
- The target offsets of any jump instructions (e.g. `Goto`) are moved to
the correct place, taking into account all instructions whose offsets
were shifted due to moving the compile-time constant expressions around.
- An escape hatch wrapper `translate_expr_no_constant_opt()` is added
for cases where we should not hoist constants even if we otherwise
could. Right now the only example of this is cases where we are reusing
the same register(s) in multiple iterations of some kind of loop, e.g.
`VALUES(...)` or in the `coalesce()` function implementation.
## Performance effects
Here is an example of a modified/simplified TPC-H query where the
`CAST()` calls were previously run millions of times in a hot loop, but
now they are optimized out of the loop.
**BYTECODE PLAN BEFORE:**
```sql
limbo> explain select
        l_orderkey,
        3 as revenue,
        o_orderdate,
        o_shippriority
from
        lineitem,
        orders,
        customer
where
        c_mktsegment = 'FURNITURE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < cast('1995-03-29' as datetime)
        and l_shipdate > cast('1995-03-29' as datetime);
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     26    0                    0   Start at 26
1     OpenRead           0     10    0                    0   table=lineitem, root=10
2     OpenRead           1     9     0                    0   table=orders, root=9
3     OpenRead           2     8     0                    0   table=customer, root=8
4     Rewind             0     25    0                    0   Rewind lineitem
5       Column           0     10    5                    0   r[5]=lineitem.l_shipdate
6       String8          0     7     0     1995-03-29     0   r[7]='1995-03-29'
7       Function         0     7     6     cast           0   r[6]=func(r[7..8])  <-- CAST() executed millions of times
8       Le               5     6     24                   0   if r[5]<=r[6] goto 24
9       Column           0     0     9                    0   r[9]=lineitem.l_orderkey
10      SeekRowid        1     9     24                   0   if (r[9]!=orders.rowid) goto 24
11      Column           1     4     10                   0   r[10]=orders.o_orderdate
12      String8          0     12    0     1995-03-29     0   r[12]='1995-03-29'
13      Function         0     12    11    cast           0   r[11]=func(r[12..13])
14      Ge               10    11    24                   0   if r[10]>=r[11] goto 24
15      Column           1     1     14                   0   r[14]=orders.o_custkey
16      SeekRowid        2     14    24                   0   if (r[14]!=customer.rowid) goto 24
17      Column           2     6     15                   0   r[15]=customer.c_mktsegment
18      Ne               15    16    24                   0   if r[15]!=r[16] goto 24
19      Column           0     0     1                    0   r[1]=lineitem.l_orderkey
20      Integer          3     2     0                    0   r[2]=3
21      Column           1     4     3                    0   r[3]=orders.o_orderdate
22      Column           1     7     4                    0   r[4]=orders.o_shippriority
23      ResultRow        1     4     0                    0   output=r[1..4]
24    Next               0     5     0                    0
25    Halt               0     0     0                    0
26    Transaction        0     0     0                    0   write=false
27    String8            0     8     0     DATETIME       0   r[8]='DATETIME'
28    String8            0     13    0     DATETIME       0   r[13]='DATETIME'
29    String8            0     16    0     FURNITURE      0   r[16]='FURNITURE'
30    Goto               0     1     0
```
**BYTECODE PLAN AFTER**:
```sql
limbo> explain select
        l_orderkey,
        3 as revenue,
        o_orderdate,
        o_shippriority
from
        lineitem,
        orders,
        customer
where
        c_mktsegment = 'FURNITURE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < cast('1995-03-29' as datetime)
        and l_shipdate > cast('1995-03-29' as datetime);
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     21    0                    0   Start at 21
1     OpenRead           0     10    0                    0   table=lineitem, root=10
2     OpenRead           1     9     0                    0   table=orders, root=9
3     OpenRead           2     8     0                    0   table=customer, root=8
4     Rewind             0     20    0                    0   Rewind lineitem
5       Column           0     10    5                    0   r[5]=lineitem.l_shipdate
6       Le               5     6     19                   0   if r[5]<=r[6] goto 19
7       Column           0     0     9                    0   r[9]=lineitem.l_orderkey
8       SeekRowid        1     9     19                   0   if (r[9]!=orders.rowid) goto 19
9       Column           1     4     10                   0   r[10]=orders.o_orderdate
10      Ge               10    11    19                   0   if r[10]>=r[11] goto 19
11      Column           1     1     14                   0   r[14]=orders.o_custkey
12      SeekRowid        2     14    19                   0   if (r[14]!=customer.rowid) goto 19
13      Column           2     6     15                   0   r[15]=customer.c_mktsegment
14      Ne               15    16    19                   0   if r[15]!=r[16] goto 19
15      Column           0     0     1                    0   r[1]=lineitem.l_orderkey
16      Column           1     4     3                    0   r[3]=orders.o_orderdate
17      Column           1     7     4                    0   r[4]=orders.o_shippriority
18      ResultRow        1     4     0                    0   output=r[1..4]
19    Next               0     5     0                    0
20    Halt               0     0     0                    0
21    Transaction        0     0     0                    0   write=false
22    String8            0     7     0     1995-03-29     0   r[7]='1995-03-29'
23    String8            0     8     0     DATETIME       0   r[8]='DATETIME'
24    Function           1     7     6     cast           0   r[6]=func(r[7..8]) <-- CAST() executed twice
25    String8            0     12    0     1995-03-29     0   r[12]='1995-03-29'
26    String8            0     13    0     DATETIME       0   r[13]='DATETIME'
27    Function           1     12    11    cast           0   r[11]=func(r[12..13])
28    String8            0     16    0     FURNITURE      0   r[16]='FURNITURE'
29    Integer            3     2     0                    0   r[2]=3
30    Goto               0     1     0                    0
```
**EXECUTION RUNTIME BEFORE:**
```sql
limbo> select
        l_orderkey,
        3 as revenue,
        o_orderdate,
        o_shippriority
from
        lineitem,
        orders,
        customer
where
        c_mktsegment = 'FURNITURE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < cast('1995-03-29' as datetime)
        and l_shipdate > cast('1995-03-29' as datetime);
┌────────────┬─────────┬─────────────┬────────────────┐
│ l_orderkey │ revenue │ o_orderdate │ o_shippriority │
├────────────┼─────────┼─────────────┼────────────────┤
└────────────┴─────────┴─────────────┴────────────────┘
Command stats:
----------------------------
total: 3.633396667 s (this includes parsing/coloring of cli app)
```
**EXECUTION RUNTIME AFTER:**
```sql
limbo> select
        l_orderkey,
        3 as revenue,
        o_orderdate,
        o_shippriority
from
        lineitem,
        orders,
        customer
where
        c_mktsegment = 'FURNITURE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < cast('1995-03-29' as datetime)
        and l_shipdate > cast('1995-03-29' as datetime);
┌────────────┬─────────┬─────────────┬────────────────┐
│ l_orderkey │ revenue │ o_orderdate │ o_shippriority │
├────────────┼─────────┼─────────────┼────────────────┤
└────────────┴─────────┴─────────────┴────────────────┘
Command stats:
----------------------------
total: 2.0923475 s (this includes parsing/coloring of cli app)
````

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1359
2025-04-25 16:55:41 +03:00
meteorgan
f464d15f8b refactor database open_file and open 2025-04-25 21:45:18 +08:00
Anton Harniakou
dd7c0ad1c8 Give name to hard-coded page_size values 2025-04-25 14:15:15 +03:00
Jussi Saurio
7137f4ab3b Merge 'Feature: Composite Primary key constraint' from Pedro Muniz
Closes #1384 . This PR implements Primary Key constraint for inserts. As
can be seen in the issue, if you created an Index with a Primary Key
constraint, it could trigger `Unique Constraint` error, but still insert
the record. Sqlite uses the opcode `NoConflict` to check if the record
already exists in the Btree. As we did not have this Opcode yet, I
implemented it. It is very similar to `NotFound` with the difference
that if any value in the Record is Null, it will immediately jump to the
offset. The added benefit of implementing this, is that now we fully
support Composite Primary Keys. Also, I think with the current
implementation, it will be trivial to implement the Unique opcode for
Insert. To support Updates, I need to understand more of the plan
optimizer to and find where we are Making the Record and opening the
autoindex.
For testing, I have written a test generator to generate many different
tables that can have a varying numbers of Primary Keys.
```sql
limbo> CREATE TABLE users (id INT, username TEXT, PRIMARY KEY (id, username));
limbo> INSERT INTO users VALUES (1, 'alice');
limbo> explain INSERT INTO users VALUES (1, 'alice');
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     16    0                    0   Start at 16
1     OpenWrite          0     2     0                    0
2     Integer            1     2     0                    0   r[2]=1
3     String8            0     3     0     alice          0   r[3]='alice'
4     OpenWrite          1     3     0                    0
5     NewRowId           0     1     0                    0
6     Copy               2     5     0                    0   r[5]=r[2]
7     Copy               3     6     0                    0   r[6]=r[3]
8     Copy               1     7     0                    0   r[7]=r[1]
9     MakeRecord         5     3     8                    0   r[8]=mkrec(r[5..7])
10    NoConflict         1     12    5     2              0   key=r[5]
11    Halt               1555  0     0     users.id, users.username  0
12    IdxInsert          1     8     5                    0   key=r[8]
13    MakeRecord         2     2     4                    0   r[4]=mkrec(r[2..3])
14    Insert             0     4     1                    0
15    Halt               0     0     0                    0
16    Transaction        0     1     0                    0   write=true
17    Goto               0     1     0                    0
limbo> INSERT INTO users VALUES (1, 'alice');
  × Runtime error: UNIQUE constraint failed: users.id, users.username (19)
limbo> INSERT INTO users VALUES (1, 'bob');
limbo> INSERT INTO users VALUES (1, 'bob');
  × Runtime error: UNIQUE constraint failed: users.id, users.username (19)
```

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1393
2025-04-24 23:25:30 +03:00
Pekka Enberg
4d0c40a435 One more fix to Antithesis Dockerfile 2025-04-24 21:17:36 +03:00