Commit Graph

3913 Commits

Author SHA1 Message Date
TcMits
ee660187dc fix negative free space after balance-shallower 2025-04-14 14:25:18 +07:00
TcMits
b3c2593980 btree balance-shallower 2025-04-14 12:49:30 +07:00
TcMits
a4a4879f3b fix cargo fmt check 2025-04-11 14:53:10 +07:00
TcMits
9d7a779757 Fix drop empty page in balancing 2025-04-11 14:41:56 +07:00
Pekka Enberg
e3a4400329 Merge 'Multi column indexes + index seek refactor' from Jussi Saurio
# Multi column indexes + index seek refactor
## PR reader guide
I would say mostly you should just focus on the content of
`optimizer.rs` and `plan.rs` because the rest is just small type
changes, or in the case of `main_loop.rs`, a bunch of logic was just
moved out of there and rewritten.
## New feature - multi column index seeks
This PR adds support for utilizing multi-column indexes properly, i.e.
using as many columns in the seek key as possible. Previously, we only
used max one column per index. I've modified the existing compound index
seek fuzz test to use this functionality.
## Refactoring of index seek related logic
This PR moves a lot of index seek related logic out of `main_loop.rs`
into `optimizer.rs` and `plan.rs` and introduces a bunch of helper
structures to model finding and using an index to perform a seek + scan.
## Examples
Here are some examples of multi-column seeks:
### Example table setup:
```sql
sqlite> CREATE TABLE t(a,b,c,d,e);
sqlite> CREATE INDEX abc ON t (a,b,c);
-- create 10000 rows with random values between 0-9 for all columns
sqlite >INSERT INTO t SELECT ABS(RANDOM() % 10),ABS(RANDOM() % 10),ABS(RANDOM() % 10),ABS(RANDOM() % 10),ABS(RANDOM() % 10) FROM generate_series(1,10000,1);
```
### Example bytecode plans, results and timings vs main branch:
```sql
limbo> EXPLAIN SELECT * FROM t WHERE a = 5 and b = 6 and c = 7;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     20    0                    0   Start at 20
1     OpenReadAsync      0     2     0                    0   table=t, root=2
2     OpenReadAwait      0     0     0                    0
3     OpenReadAsync      1     3     0                    0   table=abc, root=3
4     OpenReadAwait      0     0     0                    0
5     Integer            5     6     0                    0   r[6]=5
6     Integer            6     7     0                    0   r[7]=6
7     Integer            7     8     0                    0   r[8]=7
8     SeekGE             1     19    6                    0   key=[6..8]
9       IdxGT            1     19    6                    0   key=[6..8]
10      DeferredSeek     1     0     0                    0
11      Column           0     0     1                    0   r[1]=t.a
12      Column           0     1     2                    0   r[2]=t.b
13      Column           0     2     3                    0   r[3]=t.c
14      Column           0     3     4                    0   r[4]=t.d
15      Column           0     4     5                    0   r[5]=t.e
16      ResultRow        1     5     0                    0   output=r[1..5]
17    NextAsync          1     0     0                    0
18    NextAwait          1     9     0                    0
19    Halt               0     0     0                    0
20    Transaction        0     0     0                    0   write=false
21    Goto               0     1     0                    0

limbo> SELECT * FROM t WHERE a = 5 and b = 6 and c = 7;
5|6|7|9|9
5|6|7|4|7
5|6|7|3|2
5|6|7|3|7
5|6|7|5|2
5|6|7|5|3
5|6|7|9|7

runtime (debug build, this branch): total: 2 ms (this includes parsing/coloring of cli app)
runtime (debud build, main branch): total: 67 ms (this includes parsing/coloring of cli app)

```
```sql
limbo> EXPLAIN SELECT * FROM t WHERE a = 5 and b = 6 and c < 7;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     21    0                    0   Start at 21
1     OpenReadAsync      0     2     0                    0   table=t, root=2
2     OpenReadAwait      0     0     0                    0
3     OpenReadAsync      1     3     0                    0   table=abc, root=3
4     OpenReadAwait      0     0     0                    0
5     Integer            5     6     0                    0   r[6]=5
6     Integer            6     7     0                    0   r[7]=6
7     Null               0     8     0                    0   r[8]=NULL
8     SeekGT             1     20    6                    0   key=[6..8]
9       Integer          7     8     0                    0   r[8]=7
10      IdxGE            1     20    6                    0   key=[6..8]
11      DeferredSeek     1     0     0                    0
12      Column           0     0     1                    0   r[1]=t.a
13      Column           0     1     2                    0   r[2]=t.b
14      Column           0     2     3                    0   r[3]=t.c
15      Column           0     3     4                    0   r[4]=t.d
16      Column           0     4     5                    0   r[5]=t.e
17      ResultRow        1     5     0                    0   output=r[1..5]
18    NextAsync          1     0     0                    0
19    NextAwait          1     10    0                    0
20    Halt               0     0     0                    0
21    Transaction        0     0     0                    0   write=false
22    Goto               0     1     0                    0

limbo> SELECT * FROM t WHERE a = 5 and b = 6 and c < 7;
5|6|0|0|3
5|6|0|5|1
5|6|0|3|1
5|6|0|6|3
5|6|0|8|1
5|6|0|2|7
5|6|0|9|9
5|6|0|5|3
5|6|0|4|2
5|6|0|4|2
5|6|0|0|2
5|6|0|7|2
5|6|1|8|5
5|6|1|7|5
5|6|1|7|2
5|6|1|1|2
5|6|1|6|5
5|6|1|1|5
5|6|1|5|7
5|6|1|1|9
5|6|1|4|3
5|6|1|1|2
5|6|1|2|2
5|6|1|4|4
5|6|1|9|6
5|6|1|2|5
5|6|1|2|4
5|6|1|7|1
5|6|2|0|9
5|6|2|6|9
5|6|2|4|5
5|6|2|9|3
5|6|2|5|2
5|6|2|9|0
5|6|2|7|1
5|6|3|6|5
5|6|3|8|5
5|6|3|5|4
5|6|3|5|2
5|6|3|1|1
5|6|3|2|0
5|6|3|9|3
5|6|3|6|9
5|6|3|7|6
5|6|3|3|5
5|6|3|0|8
5|6|3|6|4
5|6|4|1|1
5|6|4|9|8
5|6|4|3|7
5|6|4|1|3
5|6|4|8|9
5|6|4|9|7
5|6|4|7|9
5|6|4|8|8
5|6|4|3|1
5|6|4|2|6
5|6|4|5|7
5|6|4|2|6
5|6|4|4|3
5|6|5|2|4
5|6|5|6|7
5|6|5|3|8
5|6|5|7|8
5|6|5|9|6
5|6|5|2|7
5|6|5|1|7
5|6|5|0|6
5|6|6|2|4
5|6|6|9|4
5|6|6|4|9
5|6|6|5|6
5|6|6|2|2
5|6|6|0|6

runtime (debug build, this branch): total: 9 ms (this includes parsing/coloring of cli app)
runtime (debug build, main branch): total: 71 ms (this includes parsing/coloring of cli app)

```
```sql
limbo> EXPLAIN SELECT * FROM t WHERE a = 5 and b = 6 and c < 7 ORDER BY a desc, b desc, c desc;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     20    0                    0   Start at 20
1     OpenReadAsync      0     2     0                    0   table=t, root=2
2     OpenReadAwait      0     0     0                    0
3     OpenReadAsync      1     3     0                    0   table=abc, root=3
4     OpenReadAwait      0     0     0                    0
5     Integer            5     6     0                    0   r[6]=5
6     Integer            6     7     0                    0   r[7]=6
7     Integer            7     8     0                    0   r[8]=7
8     SeekLT             1     19    6                    0   key=[6..8]
9       IdxLT            1     19    6                    0   key=[6..7]
10      DeferredSeek     1     0     0                    0
11      Column           0     0     1                    0   r[1]=t.a
12      Column           0     1     2                    0   r[2]=t.b
13      Column           0     2     3                    0   r[3]=t.c
14      Column           0     3     4                    0   r[4]=t.d
15      Column           0     4     5                    0   r[5]=t.e
16      ResultRow        1     5     0                    0   output=r[1..5]
17    PrevAsync          1     0     0                    0
18    PrevAwait          1     0     0                    0
19    Halt               0     0     0                    0
20    Transaction        0     0     0                    0   write=false
21    Goto               0     1     0                    0

limbo> SELECT * FROM t WHERE a = 5 and b = 6 and c < 7 ORDER BY a desc, b desc, c desc;
5|6|6|0|6
5|6|6|2|2
5|6|6|5|6
5|6|6|4|9
5|6|6|9|4
5|6|6|2|4
5|6|5|0|6
5|6|5|1|7
5|6|5|2|7
5|6|5|9|6
5|6|5|7|8
5|6|5|3|8
5|6|5|6|7
5|6|5|2|4
5|6|4|4|3
5|6|4|2|6
5|6|4|5|7
5|6|4|2|6
5|6|4|3|1
5|6|4|8|8
5|6|4|7|9
5|6|4|9|7
5|6|4|8|9
5|6|4|1|3
5|6|4|3|7
5|6|4|9|8
5|6|4|1|1
5|6|3|6|4
5|6|3|0|8
5|6|3|3|5
5|6|3|7|6
5|6|3|6|9
5|6|3|9|3
5|6|3|2|0
5|6|3|1|1
5|6|3|5|2
5|6|3|5|4
5|6|3|8|5
5|6|3|6|5
5|6|2|7|1
5|6|2|9|0
5|6|2|5|2
5|6|2|9|3
5|6|2|4|5
5|6|2|6|9
5|6|2|0|9
5|6|1|7|1
5|6|1|2|4
5|6|1|2|5
5|6|1|9|6
5|6|1|4|4
5|6|1|2|2
5|6|1|1|2
5|6|1|4|3
5|6|1|1|9
5|6|1|5|7
5|6|1|1|5
5|6|1|6|5
5|6|1|1|2
5|6|1|7|2
5|6|1|7|5
5|6|1|8|5
5|6|0|7|2
5|6|0|0|2
5|6|0|4|2
5|6|0|4|2
5|6|0|5|3
5|6|0|9|9
5|6|0|2|7
5|6|0|8|1
5|6|0|6|3
5|6|0|3|1
5|6|0|5|1
5|6|0|0|3

runtime (debug build, this branch): total: 9 ms (this includes parsing/coloring of cli app)
runtime (debug build, main branch): total: 71 ms (this includes parsing/coloring of cli app)
```

Closes #1288
2025-04-11 09:36:25 +03:00
Pekka Enberg
2752c77cc2 Merge 'simulator: Add Bug Database(BugBase)' from Alperen Keleş
Previously, simulator used `tempfile` for storing the resulting
interaction plans, database file, seeds, and all relevant information.
This posed the problem that this information became ephemeral, and we
were not able to properly use the results of previous runs for
optimizing future runs. This PR removes the CLI option `output_dir`,
bases the storage infrastructure on top of `BugBase` interface.

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1276
2025-04-11 09:35:09 +03:00
Pekka Enberg
d67e1b604b Merge 'Added 'likelihood' scalar function' from Sachin Kumar Singh
The `likelihood(X,Y)` function returns argument X unchanged. The value Y
in likelihood(X,Y) must be a floating point constant between 0.0 and
1.0, inclusive.
```
sqlite> explain SELECT likelihood(42, 0.0);
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     6     0                    0   Start at 6
1     Once           0     3     0                    0
2     Integer        42    2     0                    0   r[2]=42
3     Copy           2     1     0                    0   r[1]=r[2]
4     ResultRow      1     1     0                    0   output=r[1]
5     Halt           0     0     0                    0
6     Goto           0     1     0                    0
```
```
limbo> explain SELECT likelihood(42, 0.0);
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     4     0                    0   Start at 4
1     Copy               2     1     0                    0   r[1]=r[2]
2     ResultRow          1     1     0                    0   output=r[1]
3     Halt               0     0     0                    0
4     Integer            42    2     0                    0   r[2]=42
5     Goto               0     1     0                    0
```

Closes #1303
2025-04-11 09:34:36 +03:00
Pekka Enberg
13516fd53d Merge 'feat: Add timediff data and time function' from Sachin Kumar Singh
This PR implemets the `timediff(A,B)` function, which returns a string
that describes the amount of time that must be added to B in order to
reach time A. I used sqlite's timediff function for format reference:
https://github.com/sqlite/sqlite/blob/master/src/date.c#L1694
Op-codes seems to be in order:
```
limbo> explain SELECT timediff('12:30:45.123', '12:30:44.987');
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     6     0                    0   Start at 6
1     String8            0     2     0     12:30:45.123   0   r[2]='12:30:45.123'
2     String8            0     3     0     12:30:44.987   0   r[3]='12:30:44.987'
3     Function           0     2     1     timediff       0   r[1]=func(r[2..3])
4     ResultRow          1     1     0                    0   output=r[1]
5     Halt               0     0     0                    0
6     Goto               0     1     0                    0
```
```
sqlite> explain SELECT timediff('12:30:45.123', '12:30:44.987');
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     8     0                    0   Start at 8
1     Once           0     5     0                    0
2     String8        0     3     0     12:30:45.123   0   r[3]='12:30:45.123'
3     String8        0     4     0     12:30:44.987   0   r[4]='12:30:44.987'
4     Function       3     3     2     timediff(2)    0   r[2]=func(r[3..4])
5     Copy           2     1     0                    0   r[1]=r[2]
6     ResultRow      1     1     0                    0   output=r[1]
7     Halt           0     0     0                    0
8     Goto           0     1     0                    0
```
My first PR, I just followed the [contributing guides](https://github.co
m/tursodatabase/limbo/blob/main/CONTRIBUTING.md) and started.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1302
2025-04-11 09:34:04 +03:00
Sachin Singh
23ab387143 handle formatting issues 2025-04-11 09:59:27 +05:30
Pekka Enberg
2f428b7dcc Merge 'Fix overwrite cell with size less than cell size' from Pere Diaz Bou
We cannot simply paste a new payload into a cell with a payload with
less size because we need to track fragmentation + free blocks. Let's
keep it simple by only overwriting if size is the same.
Btw I feel like update is not re-entrant.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1301
2025-04-11 07:17:53 +03:00
Sachin Singh
01fa02364d correctly handle edge cases 2025-04-11 08:34:29 +05:30
Sachin Singh
5ffdd42f12 Additional tests 2025-04-11 06:02:07 +05:30
Sachin Singh
482e93bfd0 feat: add likelihood scalar function 2025-04-11 05:54:23 +05:30
Sachin Singh
05b4b7b9f1 edit compat.md 2025-04-11 04:41:59 +05:30
Sachin Singh
ded308ccfa additional tests 2025-04-11 04:40:09 +05:30
Sachin Singh
b7acfa490c feat: add timediff data and time function 2025-04-11 04:30:57 +05:30
Pere Diaz Bou
745c2b92d0 unnecessary dirty set on overwrite 2025-04-10 22:24:15 +02:00
Pere Diaz Bou
038d78f096 overwrite when payload is equal size as current cell only
Prevoiusly we would overwrite even though size less than cell size. This
was wrong because it didn't update any fragment size or free blocks it
could. To be safe let's just overwrite only if local size is the same
amount.
2025-04-10 22:24:15 +02:00
Pere Diaz Bou
506c1a236c find_free_cell fix use of no_offset writes 2025-04-10 22:24:15 +02:00
Pekka Enberg
17b206297e Merge 'Emit ANSI codes only when tracing is outputting to terminal' from Preston Thorpe
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1289
2025-04-10 20:54:21 +03:00
Pekka Enberg
ef893da6c7 Merge 'core/btree: Add PageContent::new() helper' from Pekka Enberg
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1294
2025-04-10 20:53:41 +03:00
Pekka Enberg
a27126cd05 Merge 'B-Tree code cleanups' from Pekka Enberg
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1290
2025-04-10 20:53:33 +03:00
Jussi Saurio
4daad0a858 Fix bug: accidentally skipped index selection for other tables except first found 2025-04-10 18:57:14 +03:00
Pekka Enberg
1d748de273 Merge 'btree index selection on rightmost pointer in balance_non_root' from Pere Diaz Bou
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1297
2025-04-10 18:39:51 +03:00
Pekka Enberg
712a4caa22 stress: Fix per-thread query generation 2025-04-10 18:39:20 +03:00
Pere Diaz Bou
62d0febdb6 panic on corruption 2025-04-10 16:01:24 +02:00
Pere Diaz Bou
b35d805a81 tracing lock stress 2025-04-10 16:01:24 +02:00
Pere Diaz Bou
8e93471d00 fix cell index selection while balancing
Cell index doesn't move in `move_to` unless we don't need to check next
cell. On the other hand, with rightmost pointer, we advance cell index
by 1 even though where we are moving to was to that page
2025-04-10 16:01:24 +02:00
Pere Diaz Bou
4755acb571 init tracing in stress tool 2025-04-10 16:01:24 +02:00
Pere Diaz Bou
0c4e56ecf9 Merge 'Add support to load log file with stress test' from Pere Diaz Bou
run with: `RUST_BACKTRACE=1 cargo run -p limbo_stress -- -t 1 -l`
and then if you want to repeat same plan: `RUST_BACKTRACE=1 cargo run -p
limbo_stress -- -t 1 -L`

Closes #1296
2025-04-10 16:01:11 +02:00
Jussi Saurio
457bded14d optimizer: refactor optimizer to support multicolumn index scans 2025-04-10 15:53:02 +03:00
Jussi Saurio
afad06fb23 vdbe/explain: add key info to Seek/Idx insns 2025-04-10 15:06:45 +03:00
Jussi Saurio
3d1b4c5292 test/fuzz: modify compound index scan fuzz to utilize both pk columns in where clause 2025-04-10 15:06:18 +03:00
Pere Diaz Bou
cdcbcafbdd clipppy 2025-04-10 13:46:40 +02:00
Pere Diaz Bou
f795a9e331 Add support to load log file with stress test 2025-04-10 13:41:10 +02:00
Jussi Saurio
579d04f521 Merge 'io/linux: make syscallio the default (io_uring is really slow)' from Jussi Saurio
context: https://github.com/tursodatabase/limbo/issues/1275

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1295
2025-04-10 13:55:06 +03:00
Jussi Saurio
60a13c129f io/linux: make syscallio the default (io_uring is really slow) 2025-04-10 13:32:26 +03:00
Pekka Enberg
53633e8b6f core/btree: Add PageContent::new() helper 2025-04-10 13:14:38 +03:00
Pekka Enberg
6ffa9cf56a Merge 'Stress improvements' from Pekka Enberg
Closes #1292
2025-04-10 12:18:53 +03:00
Pekka Enberg
277efeb5ee Merge 'VDBE code cleanups' from Pekka Enberg
Closes #1291
2025-04-10 12:10:22 +03:00
Pekka Enberg
3fd378cf9f Fix Antithesis Dockerfile to include JavaScript bindings 2025-04-10 12:08:31 +03:00
Pekka Enberg
441cd637b5 stress: Make database file configurable 2025-04-10 11:59:25 +03:00
Pekka Enberg
c4d983bcfe stress: Log SQL statements to a file 2025-04-10 11:59:25 +03:00
Pekka Enberg
39cee1b146 stress: Increase default number of iterations 2025-04-10 11:59:25 +03:00
Pekka Enberg
f50662205e stress: Fix schema creation 2025-04-10 11:59:25 +03:00
Pekka Enberg
207563208f stress: Add support for INSERT, DELETE, and UPDATE 2025-04-10 11:59:25 +03:00
Pekka Enberg
6aaa105321 stress: Add schema generation support 2025-04-10 11:43:32 +03:00
Pekka Enberg
31f0d174d7 core/vdbe: Move exec_*() funtions to execute.rs 2025-04-10 09:42:03 +03:00
Pekka Enberg
3fd51cdf06 core/vdbe: Move Insn implementation close to struct definition 2025-04-10 09:28:43 +03:00
Pekka Enberg
5906d7971a core/vdbe: Clean up imports 2025-04-10 09:25:15 +03:00