Commit Graph

4082 Commits

Author SHA1 Message Date
PThorpe92
7f170756ae Add python script to benchmark vfs against eachother 2025-04-21 12:22:20 -04:00
Pere Diaz Bou
a6dccdd12c Merge 'docs: add Rust to "Getting Started" section' from Timo Kösters
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1374
2025-04-21 12:19:10 +02:00
Pere Diaz Bou
fc4deb2b7b Merge 'btree: avoid reading entire cell when only rowid needed' from Jussi Saurio
This PR is based on #1357 and further improves performance:
```sql
limbo> select l_orderkey, 3 as revenue, o_orderdate, o_shippriority from lineitem, orders, customer where c_mktsegment = 'FURNITURE' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < cast('1995-03-29' as datetime) and l_shipdate > cast('1995-03-29' as datetime);
┌────────────┬─────────┬─────────────┬────────────────┐
│ l_orderkey │ revenue │ o_orderdate │ o_shippriority │
├────────────┼─────────┼─────────────┼────────────────┤
└────────────┴─────────┴─────────────┴────────────────┘
Command stats:
----------------------------
total: 3.728050958 s (this includes parsing/coloring of cli app)
```

Reviewed-by: Preston Thorpe (@PThorpe92)
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1358
2025-04-21 12:14:21 +02:00
Timo Kösters
b945e9b2a0 docs: add Rust to "Getting Started" section 2025-04-21 10:21:21 +02:00
Jussi Saurio
53061f5642 Merge 'Fix bug: left join null flag not being cleared' from Jussi Saurio
In left joins, even if the join condition is not matched, the system
must emit a row for every row of the outer table:
```
-- this must return t1.count() rows, with NULLs for all columns of t2
SELECT * FROM t1 LEFT JOIN t2 ON FALSE;
```
To achieve this, we set a "null flag" on the right table cursor which
tells our VDBE to emit NULLs for any columns of that cursor until the
flag is cleared.
Our logic for clearing the null flag was to do it in Next/Prev. However,
this is problematic for a few reasons:
- If the inner table of the left join is using SeekRowid, then Next/Prev
is never called on its cursor, so the null flag doesn't get cleared.
- If the inner table of the left join is using a non-covering index
seek, i.e. it iterates its rows using an index, but seeks to the main
table to fetch data, then Next/Prev is never called on the main table,
and the main table's null flag doesn't get cleared.
What this results in is NULL values incorrectly being emitted for the
inner table after the first correct NULL row, since the null flag is
correctly set to true, but never cleared.
This PR fixes the issue by clearing the null flag whenever seek() is
invoked on the cursor. Hence, the null flag is now cleared on:
- next()
- prev()
- seek()

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1364
2025-04-19 20:39:30 +03:00
Jussi Saurio
83c509a613 Fix bug: left join null flag not being cleared
In left joins, even if the join condition is not matched, the system
must emit a row for every row of the outer table:

-- this must return t1.count() rows, with NULLs for all columns of t2
SELECT * FROM t1 LEFT JOIN t2 ON FALSE;

Our logic for clearing the null flag was to do it in Next/Prev. However,
this is problematic for a few reasons:

- If the inner table of the left join is using SeekRowid, then Next/Prev
  is never called on its cursor, so the null flag doesn't get cleared.
- If the inner table of the left join is using a non-covering index seek,
  i.e. it iterates its rows using an index, but seeks to the main table
  to fetch data, then Next/Prev is never called on the main table, and the
  main table's null flag doesn't get cleared.

What this results in is NULL values incorrectly being emitted for the
inner table after the first correct NULL row, since the null flag is
correctly set to true, but never cleared.

This PR fixes the issue by clearing the null flag whenever seek() is
invoked on the cursor. Hence, the null flag is now cleared on:

- next()
- prev()
- seek()
2025-04-19 13:56:52 +03:00
Jussi Saurio
017cdb9568 btree: avoid reading entire cell when only rowid needed 2025-04-18 16:52:05 +03:00
Jussi Saurio
ac8ffa645d Merge 'btree: use binary search in seek/move_to for table btrees' from Jussi Saurio
Implements binary search to find the correct cell within a page,
specialized for table btrees only due to lack of energy at 8:30 PM
---
I used a [1GB TPC-H database](https://github.com/lovasoa/TPCH-
sqlite/releases/download/v1.0/TPC-H.db) for benchmarking and ran this
query which does a lot of seeks:
before
```sql
limbo> .timer on
limbo> select
        l_orderkey,
        3 as revenue,
        o_orderdate,
        o_shippriority
from
        lineitem,
        orders,
        customer
where
        c_mktsegment = 'FURNITURE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < cast('1995-03-29' as datetime)
        and l_shipdate > cast('1995-03-29' as datetime);
┌────────────┬─────────┬─────────────┬────────────────┐
│ l_orderkey │ revenue │ o_orderdate │ o_shippriority │
├────────────┼─────────┼─────────────┼────────────────┤
└────────────┴─────────┴─────────────┴────────────────┘
Command stats:
----------------------------
total: 16.267797375 s (this includes parsing/coloring of cli app)
```
after
```sql
limbo> .timer on
limbo> select
        l_orderkey,
        3 as revenue,
        o_orderdate,
        o_shippriority
from
        lineitem,
        orders,
        customer
where
        c_mktsegment = 'FURNITURE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < cast('1995-03-29' as datetime)
        and l_shipdate > cast('1995-03-29' as datetime);
┌────────────┬─────────┬─────────────┬────────────────┐
│ l_orderkey │ revenue │ o_orderdate │ o_shippriority │
├────────────┼─────────┼─────────────┼────────────────┤
└────────────┴─────────┴─────────────┴────────────────┘
Command stats:
----------------------------
total: 5.20604125 s (this includes parsing/coloring of cli app)
```
BTW sqlite completes this in 600 milliseconds so there's still a lot of
fuckiness somewhere.
---
UPDATE: refactored table btree seek (on leaf pages) to use binary search
too. I also updated the above numbers so that I ran each a few times and
took the lowest time i got for each. This is after binsearch on leaf
pages too:
```sql
limbo> select l_orderkey, 3 as revenue, o_orderdate, o_shippriority from lineitem, orders, customer where c_mktsegment = 'FURNITURE' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < cast('1995-03-29' as datetime) and l_shipdate > cast('1995-03-29' as datetime);
┌────────────┬─────────┬─────────────┬────────────────┐
│ l_orderkey │ revenue │ o_orderdate │ o_shippriority │
├────────────┼─────────┼─────────────┼────────────────┤
└────────────┴─────────┴─────────────┴────────────────┘
Command stats:
----------------------------
total: 4.529645958 s (this includes parsing/coloring of cli app)
```

Closes #1357
2025-04-18 16:44:20 +03:00
Jussi Saurio
3dab59201d Separate both table&index move_to impls into different funcs 2025-04-18 16:27:50 +03:00
Jussi Saurio
0974ba6e71 default to using tablebtree_move_to in all calls to move_to with rowids 2025-04-18 16:11:36 +03:00
Jussi Saurio
12e689b9fc btree: use binary search on table leaf pages too 2025-04-18 16:11:36 +03:00
Jussi Saurio
3f9bdbdf14 btree: use binary search in move_to() for table btrees 2025-04-18 16:11:36 +03:00
Jussi Saurio
1ccc321030 Merge 'Feat: Covering indexes' from Jussi Saurio
Closes #364
Covering indexes mean being able to read all the necessary data from an
index instead of using the underlying table at all. This PR adds that
functionality.
This PR can be reviewed commit-by-commit as the first commits are
enablers for the actual covering index usage functionality
Example of a scan where covering index can be used:
```sql
limbo> .schema
CREATE TABLE t(a,b,c,d,e);
CREATE INDEX abc ON t (a,b,c);
limbo> explain select b+1,concat(a, c) from t;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     12    0                    0   Start at 12
1     OpenRead           0     3     0                    0   table=abc, root=3
2     Rewind             0     11    0                    0   Rewind abc
3       Column           0     1     3                    0   r[3]=abc.b
4       Integer          1     4     0                    0   r[4]=1
5       Add              3     4     1                    0   r[1]=r[3]+r[4]
6       Column           0     0     5                    0   r[5]=abc.a
7       Column           0     2     6                    0   r[6]=abc.c
8       Function         0     5     2     concat         0   r[2]=func(r[5..6])
9       ResultRow        1     2     0                    0   output=r[1..2]
10    Next               0     3     0                    0
11    Halt               0     0     0                    0
12    Transaction        0     0     0                    0   write=false
13    Goto               0     1     0                    0
```
Example of a scan where it can't be used:
```sql
limbo> .schema
CREATE TABLE t(a,b,c,d,e);
CREATE INDEX abc ON t (a,b,c);
limbo> explain select a,b,c,d from t limit 5;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     11    0                    0   Start at 11
1     OpenRead           0     2     0                    0   table=t, root=2
2     Rewind             0     10    0                    0   Rewind t
3       Column           0     0     4                    0   r[4]=t.a
4       Column           0     1     5                    0   r[5]=t.b
5       Column           0     2     6                    0   r[6]=t.c
6       Column           0     3     7                    0   r[7]=t.d
7       ResultRow        4     4     0                    0   output=r[4..7]
8       DecrJumpZero     1     10    0                    0   if (--r[1]==0) goto 10
9     Next               0     3     0                    0
10    Halt               0     0     0                    0
11    Transaction        0     0     0                    0   write=false
12    Integer            5     1     0                    0   r[1]=5
13    Integer            0     2     0                    0   r[2]=0
14    OffsetLimit        1     3     2                    0   if r[1]>0 then r[3]=r[1]+max(0,r[2]) else r[3]=(-1)
15    Goto               0     1     0                    0
```

Closes #1351
2025-04-18 15:27:27 +03:00
Jussi Saurio
9d553c50cc Merge 'allow index entry delete' from Pere Diaz Bou
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1341
2025-04-18 15:26:05 +03:00
Jussi Saurio
bf2e198a57 Merge 'Fix out of bounds access on parse_numeric_str' from Levy A.
Fixes #1361.

Closes #1362
2025-04-18 15:24:37 +03:00
Jussi Saurio
8477ff0d3d tests/fuzz: amend compound index key fuzz to include nonindexed columns some of the time 2025-04-18 15:13:13 +03:00
Jussi Saurio
6c73db6fd3 feat: use covering indexes whenever possible 2025-04-18 15:13:09 +03:00
Jussi Saurio
5b71d3a3da eliminate_unnecessary_orderby: add edge case handling 2025-04-18 15:12:06 +03:00
Jussi Saurio
40d880c3b0 TableReference: add resolve_cursors() method 2025-04-18 15:12:06 +03:00
Jussi Saurio
d5a6553e63 TableReference: add open_cursors() 2025-04-18 15:12:06 +03:00
Jussi Saurio
4ab4a3f6c3 TableReference: add index_is_covering() and utilizes_covering_index() 2025-04-18 15:12:06 +03:00
Levy A.
5fd2ed0bae fix: handle empty case 2025-04-17 20:20:57 -03:00
Levy A.
32d59b8c78 refactor+fix: using a more robust pattern matching approach 2025-04-17 20:08:05 -03:00
Jussi Saurio
48bee334cf Merge 'Support xBestIndex in vtab API' from Preston Thorpe
closes #1185
## The Problem:
The underlying schema of virtual tables is hidden from the query
planner, and it currently has no way of optimizing select queries with
vtab table refs by using indexes or removing non-constant predicates.
All vtabs are currently rewound completely each time and any conditional
filtering is done in the vdbe layer instead of in the `VFilter`.
## The solution:
Add xBestIndex to the vtab module API to let extensions return some
`IndexInfo` that will allow the query planner to make better
optimizations and possibly omit conditionals
## Examples:
table `t`: vtab: (key, value)
table `t2`: table: (a,b)
### Join where vtab is outer table:
![image](https://github.com/user-
attachments/assets/87f4233f-7d32-4a5e-8f95-4bebd3549304)
Properly pushes predicate to VFilter, which receives the idx_str
`key_eq` arg, telling it that there is a useable where clause on the key
"index"
### Join where vtab is inner table:
![image](https://github.com/user-
attachments/assets/f8fcf6d3-42bc-41a3-ad86-16e497ec6056)
Constraint is not sent because it is marked as unusable
### Where clause on "indexed" column:
![image](https://github.com/user-
attachments/assets/8817cc45-177c-404d-8323-4d33180e280c)
Pushed down and the predicate is omitted from the VDBE layer.
### Where clause on regular column:
![image](https://github.com/user-
attachments/assets/85595c7f-920f-4047-8388-a7dddd01778c)
No idx info received from BestIndex, VDBE handles conditional.
## TODO:
OrderBy info needs to be sent to xBestIndex and its not in a great
position in `open_loop` currently

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1264
2025-04-17 23:17:01 +03:00
Jussi Saurio
bef2058f1c Merge 'Fix post balance validation ' from Pere Diaz Bou
Closes #1360
2025-04-17 22:48:51 +03:00
PThorpe92
d02900294e Remove 2nd shell in vtab tests, fix expr translation in main loop 2025-04-17 14:01:45 -04:00
PThorpe92
a25a02efe1 Improve xBestIndex call site and allow for proper handling of join and where constraints 2025-04-17 14:01:45 -04:00
PThorpe92
245e7f94f6 Store packed field on ConstraintInfo to optimize planning for vfilter 2025-04-17 14:01:45 -04:00
PThorpe92
95a2fdc096 Fix array from ptr in bestindex ffi method in proc macro 2025-04-17 14:01:45 -04:00
PThorpe92
d53c60e071 Prevent double allocations for VFilter args in vdbe 2025-04-17 14:01:45 -04:00
PThorpe92
e17fd7edc4 Add comments and address PR review 2025-04-17 14:01:44 -04:00
PThorpe92
528a9b6c7e Clean up allocations in main loop and fix ext tests 2025-04-17 14:01:44 -04:00
PThorpe92
7d271edf8a Remove unused function in core/util.rs 2025-04-17 14:01:44 -04:00
PThorpe92
6f2c6c6a61 Actually skip omitted predicates in open loop 2025-04-17 14:01:44 -04:00
PThorpe92
de27c2fe4c Properly handle pushing predicates for query optimization from xBestIndex 2025-04-17 14:01:37 -04:00
PThorpe92
0f34a813ff Add can_pushdown_predicate fn to evaluate ast expressions for constness 2025-04-17 13:53:28 -04:00
PThorpe92
853af16946 Implement xBestIndex for virtual table api to improve query planning 2025-04-17 13:53:27 -04:00
Pere Diaz Bou
262c630c16 fix validation with overflow cells 2025-04-17 18:28:42 +02:00
Jussi Saurio
30c488e35d Merge 'Feat: add support for descending indexes' from Jussi Saurio
### Feat:
- Adds support for descending indexes
### Testing:
- Augments existing compound key index seek fuzz test to test for
various combinations of ascending and descending indexed columns. To
illustrate, the test runs 10000 queries like this on 8 different tables
with all different asc/desc permutations of a three-column primary key
index:
```sql
query: SELECT * FROM t WHERE x = 2826  LIMIT 5
query: SELECT * FROM t WHERE x = 671 AND y >= 2447 ORDER BY x ASC, y DESC LIMIT 5
query: SELECT * FROM t WHERE x = 2412 AND y = 589 AND z >= 894 ORDER BY x DESC LIMIT 5
query: SELECT * FROM t WHERE x = 1217 AND y = 1437 AND z <= 265 ORDER BY x ASC, y ASC, z DESC LIMIT 5
query: SELECT * FROM t WHERE x < 138 ORDER BY x DESC LIMIT 5
query: SELECT * FROM t WHERE x = 1312 AND y = 2757 AND z > 39 ORDER BY x DESC, y ASC, z ASC LIMIT 5
query: SELECT * FROM t WHERE x = 1829 AND y >= 1629 ORDER BY x ASC, y ASC LIMIT 5
query: SELECT * FROM t WHERE x = 2047 ORDER BY x DESC LIMIT 5
query: SELECT * FROM t WHERE x = 893 AND y > 432 ORDER BY y DESC LIMIT 5
query: SELECT * FROM t WHERE x = 1865 AND y = 784 AND z <= 785 ORDER BY x DESC, y DESC, z DESC LIMIT 5
query: SELECT * FROM t WHERE x = 213 AND y = 1475 AND z <= 2870 ORDER BY x ASC, y ASC, z ASC LIMIT 5
query: SELECT * FROM t WHERE x >= 1780 ORDER BY x ASC LIMIT 5
query: SELECT * FROM t WHERE x = 1983 AND y = 602 AND z = 485 ORDER BY y ASC, z ASC LIMIT 5
query: SELECT * FROM t WHERE x = 2311 AND y >= 31 ORDER BY y DESC LIMIT 5
query: SELECT * FROM t WHERE x = 81 AND y >= 1037 ORDER BY x ASC, y DESC LIMIT 5
query: SELECT * FROM t WHERE x < 2698 ORDER BY x ASC LIMIT 5
query: SELECT * FROM t WHERE x = 1503 AND y = 554 AND z >= 185 ORDER BY x DESC, y DESC, z DESC LIMIT 5
query: SELECT * FROM t WHERE x = 619 AND y > 1414 ORDER BY x DESC, y ASC LIMIT 5
query: SELECT * FROM t WHERE x >= 865 ORDER BY x DESC LIMIT 5
query: SELECT * FROM t WHERE x = 1596 AND y = 622 AND z = 62 ORDER BY x DESC, z ASC LIMIT 5
query: SELECT * FROM t WHERE x = 1555 AND y = 1257 AND z < 1929 ORDER BY x ASC, y ASC, z ASC LIMIT 5
query: SELECT * FROM t WHERE x > 2598  LIMIT 5
query: SELECT * FROM t WHERE x = 302 AND y = 2476 AND z < 2302 ORDER BY z DESC LIMIT 5
query: SELECT * FROM t WHERE x = 2197 AND y = 2195 AND z > 2089 ORDER BY y ASC, z DESC LIMIT 5
query: SELECT * FROM t WHERE x = 1030 AND y = 1717 AND z < 987  LIMIT 5
query: SELECT * FROM t WHERE x = 2899 AND y >= 382 ORDER BY y DESC LIMIT 5
query: SELECT * FROM t WHERE x = 62 AND y = 2980 AND z < 1109 ORDER BY x DESC, y DESC, z DESC LIMIT 5
query: SELECT * FROM t WHERE x = 550 AND y > 221 ORDER BY y DESC LIMIT 5
query: SELECT * FROM t WHERE x = 376 AND y = 1874 AND z < 206 ORDER BY y DESC, z ASC LIMIT 5
query: SELECT * FROM t WHERE x = 859 AND y = 2157 ORDER BY x DESC LIMIT 5
query: SELECT * FROM t WHERE x = 2166 AND y = 2079 AND z < 301 ORDER BY x DESC, y ASC LIMIT 5
```
the queries are run against both sqlite and limbo.

Reviewed-by: Pere Diaz Bou (@pereman2)
Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1330
2025-04-16 15:38:59 +03:00
Pekka Enberg
7a3fc33592 Limbo 0.0.19 2025-04-16 15:23:02 +03:00
Pekka Enberg
c7935f4fb7 Update CHANGELOG.md 2025-04-16 15:21:48 +03:00
Jussi Saurio
95bc644244 tests/fuzz: make compound key fuzz test a bit stricter with ordering 2025-04-16 14:10:25 +03:00
Pekka Enberg
38dab4c184 Limbo 0.0.19-pre.5 2025-04-16 14:00:17 +03:00
Jussi Saurio
1189b7a288 codegen: add support for descending indexes 2025-04-16 13:58:12 +03:00
Jussi Saurio
b1073da4a5 btree: add support fo descending indexes 2025-04-16 13:58:12 +03:00
Jussi Saurio
af09025088 schema: keep track of primary key column sort order 2025-04-16 13:58:12 +03:00
Jussi Saurio
8757510606 test/fuzz: revamp compound key seek fuzz test to include desc indexes and be more efficient 2025-04-16 13:58:12 +03:00
Jussi Saurio
bde808f731 Merge 'Test: write tests for file backed db' from Pedro Muniz
First attempt at closing #1212. Also with this PR, I added the option of
using `with` syntax for `TestLimboShell`. It automatically closes the
shell on error, and facilitates error handling overall. If this is
merged, I can update the other python tests to use `with` as well.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1230
2025-04-16 11:16:18 +03:00
Jussi Saurio
1d9c6d6981 Merge 'btree: move some blocks of code to more reasonable places' from Jussi Saurio
Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1343
2025-04-16 11:13:15 +03:00
Jussi Saurio
913367409e Merge 'Parse hex integers 2' from Anton Harniakou
Continuation of #1329

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1347
2025-04-16 11:13:01 +03:00