turso

mirror of https://github.com/aljazceru/turso.git synced 2025-12-30 14:34:22 +01:00

Author	SHA1	Message	Date
Preston Thorpe	e1d9bfc792	Merge branch 'main' into bench_vfs	2025-04-22 21:36:07 -04:00
Pekka Enberg	fc5099e2ef	antithesis: Enable RUST_BACKTRACE for workload	2025-04-22 13:01:11 +03:00
Pekka Enberg	beaccae664	Merge 'Create an automatic ephemeral index when a nested table scan would otherwise be selected' from Jussi Saurio Closes #747 - Creates an automatic ephemeral (in-memory) index on the right-side table of a join if otherwise a nested table scan would be selected. - This behavior is not hardcoded; instead this PR introduces a (quite dumb) cost estimator that naturally deincentivizes building ephemeral indexes where they don't make sense (e.g. the outermost table). I will probably build this estimator to be smarter in the future when working on join reordering optimizations ### Example bytecode plans and runtimes (note that this is debug mode) Example query with no persistent indexes to choose from. Without ephemeral index it's a nested scan: ```sql limbo> explain select * from t1 natural join t2; addr opcode p1 p2 p3 p4 p5 comment ---- ----------------- ---- ---- ---- ------------- -- ------- 0 Init 0 13 0 0 Start at 13 1 OpenRead 0 2 0 0 table=t1, root=2 2 OpenRead 1 3 0 0 table=t2, root=3 3 Rewind 0 12 0 0 Rewind t1 4 Rewind 1 11 0 0 Rewind t2 5 Column 0 0 2 0 r[2]=t1.a 6 Column 1 0 3 0 r[3]=t2.a 7 Ne 2 3 10 0 if r[2]!=r[3] goto 10 8 Column 0 0 1 0 r[1]=t1.a 9 ResultRow 1 1 0 0 output=r[1] 10 Next 1 5 0 0 11 Next 0 4 0 0 12 Halt 0 0 0 0 13 Transaction 0 0 0 0 write=false 14 Goto 0 1 0 0 limbo> .timer on limbo> select * from t1 natural join t2; ┌───┐ │ a │ ├───┤ └───┘ Command stats: ---------------------------- total: 953 ms (this includes parsing/coloring of cli app) ``` Same query with autoindexing enabled: ```sql limbo> explain select * from t1 natural join t2; addr opcode p1 p2 p3 p4 p5 comment ---- ----------------- ---- ---- ---- ------------- -- ------- 0 Init 0 22 0 0 Start at 22 1 OpenRead 0 2 0 0 table=t1, root=2 2 OpenRead 1 3 0 0 table=t2, root=3 3 Rewind 0 21 0 0 Rewind t1 4 Once 12 0 0 0 goto 12 # execute block 5-11 only once, on subsequent iters jump straight to 12 5 OpenAutoindex 3 0 0 0 cursor=3 6 Rewind 1 12 0 0 Rewind t2 # open source table for ephemeral index 7 Column 1 0 2 0 r[2]=t2.a 8 RowId 1 3 0 0 r[3]=t2.rowid 9 MakeRecord 2 2 4 0 r[4]=mkrec(r[2..3]) 10 IdxInsert 3 4 2 0 key=r[4] # insert stuff to ephemeral index 11 Next 1 7 0 0 12 Column 0 0 5 0 r[5]=t1.a 13 IsNull 5 20 0 0 if (r[5]==NULL) goto 20 14 SeekGE 3 20 5 0 key=[5..5] # perform seek on ephemeral index 15 IdxGT 3 20 5 0 key=[5..5] 16 DeferredSeek 3 1 0 0 17 Column 0 0 1 0 r[1]=t1.a 18 ResultRow 1 1 0 0 output=r[1] 19 Next 2 15 0 0 20 Next 0 4 0 0 21 Halt 0 0 0 0 22 Transaction 0 0 0 0 write=false 23 Goto 0 1 0 0 limbo> .timer on limbo> select * from t1 natural join t2; ┌───┐ │ a │ ├───┤ └───┘ Command stats: ---------------------------- total: 220 ms (this includes parsing/coloring of cli app) ``` Closes #1356	2025-04-22 13:00:06 +03:00
Pekka Enberg	936365a44e	Update README.md	2025-04-22 12:11:23 +03:00
Pekka Enberg	c2cf4756ef	Update README.md	2025-04-22 12:10:02 +03:00
Pekka Enberg	d92fb75262	Merge 'Fix incorrect between expression documentation' from Pedro Muniz I was reading through the `translate_expr` function and `COMPAT.md` to see what was not implemented yet. I saw that `Expr::Between` was marked as a `todo!` so I set trying to implement it only to find that it was being rewritten in the optimizer haha. This PR just adjusts the docs and add an `unreachable` in the appropriate locations. Closes #1378	2025-04-22 11:56:01 +03:00
Pekka Enberg	e41bf3993a	Merge 'bindings/rust: Add Statement.columns() support' from Timo Kösters This PR adds the statement.columns() function, inspired from Rusqlite: h ttps://docs.rs/rusqlite/latest/rusqlite/struct.Statement.html#method.col umns Note that the rusqlite documentation says > If associated DB schema can be altered concurrently, you should make sure that current statement has already been stepped once before calling this method. Do we have this requirement as well? The first commit is just the rust binding. The second commit implements the column name for the rowid column. Closes #1376	2025-04-22 10:52:25 +03:00
Pekka Enberg	7308f6d6e8	Merge 'Bump julian_day_converter to 0.4.5' from meteorgan The previous version of `julian_day-converter` had precision issues, potentially causing loss of precision when converting between `julianday` and `datetime` ![image](https://github.com/user- attachments/assets/84042ca3-28cc-4020-a248-714df6298791) Reviewed-by: Diego Reis (@diegoreis42) Closes #1344	2025-04-22 10:48:36 +03:00
Timo Kösters	68d8b86bb7	fix: get name of rowid column	2025-04-22 08:46:37 +02:00
Pekka Enberg	094fd0e211	Add TPC-H instructions to PERF.md	2025-04-22 09:46:16 +03:00
pedrocarlo	1928dcfa10	Correct docs regarding between	2025-04-21 23:05:01 -03:00
PThorpe92	2e33ce6896	Add release build to bench vfs in makefile to ensure there is an exec target	2025-04-21 12:31:38 -04:00
PThorpe92	f180de4d95	Write quick note about vfs benchmark script in PERF.md	2025-04-21 12:24:18 -04:00
PThorpe92	9bbd6a3a7f	Add vfs bench to testing pyproject.toml	2025-04-21 12:23:06 -04:00
PThorpe92	2037fbeba5	Add bench-vfs command to makefile	2025-04-21 12:22:40 -04:00
PThorpe92	7f170756ae	Add python script to benchmark vfs against eachother	2025-04-21 12:22:20 -04:00
Jussi Saurio	f256fb46fd	remove print spam from index insert	2025-04-21 14:59:13 +03:00
Jussi Saurio	3b44b269a3	optimizer: try to build ephemeral index to avoid nested table scan	2025-04-21 14:59:13 +03:00
Jussi Saurio	6924424f11	optimizer: add highly unintelligent heuristics-based cost estimation	2025-04-21 14:59:13 +03:00
Jussi Saurio	a50fa03d24	optimizer: allow calling try_extract_index... without any persistent indexes	2025-04-21 14:59:13 +03:00
Jussi Saurio	af21f60887	translate/main_loop: create autoindex when index.ephemeral=true	2025-04-21 14:59:13 +03:00
Jussi Saurio	c1b2dfc32b	TableReference: add method column_is_used()	2025-04-21 14:59:13 +03:00
Jussi Saurio	09ad6d8f01	vdbe: resolve labels for Insn::Once	2025-04-21 14:59:13 +03:00
Jussi Saurio	d0da7307be	Index: add new field ephemeral: bool	2025-04-21 14:59:13 +03:00
Timo Kösters	1c82752473	feat: Statement::columns function for Rust bindings	2025-04-21 13:17:50 +02:00
Pere Diaz Bou	a6dccdd12c	Merge 'docs: add Rust to "Getting Started" section' from Timo Kösters Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #1374	2025-04-21 12:19:10 +02:00
Pere Diaz Bou	fc4deb2b7b	Merge 'btree: avoid reading entire cell when only rowid needed' from Jussi Saurio This PR is based on #1357 and further improves performance: ```sql limbo> select l_orderkey, 3 as revenue, o_orderdate, o_shippriority from lineitem, orders, customer where c_mktsegment = 'FURNITURE' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < cast('1995-03-29' as datetime) and l_shipdate > cast('1995-03-29' as datetime); ┌────────────┬─────────┬─────────────┬────────────────┐ │ l_orderkey │ revenue │ o_orderdate │ o_shippriority │ ├────────────┼─────────┼─────────────┼────────────────┤ └────────────┴─────────┴─────────────┴────────────────┘ Command stats: ---------------------------- total: 3.728050958 s (this includes parsing/coloring of cli app) ``` Reviewed-by: Preston Thorpe (@PThorpe92) Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #1358	2025-04-21 12:14:21 +02:00
Timo Kösters	b945e9b2a0	docs: add Rust to "Getting Started" section	2025-04-21 10:21:21 +02:00
Jussi Saurio	53061f5642	Merge 'Fix bug: left join null flag not being cleared' from Jussi Saurio In left joins, even if the join condition is not matched, the system must emit a row for every row of the outer table: ``` -- this must return t1.count() rows, with NULLs for all columns of t2 SELECT * FROM t1 LEFT JOIN t2 ON FALSE; ``` To achieve this, we set a "null flag" on the right table cursor which tells our VDBE to emit NULLs for any columns of that cursor until the flag is cleared. Our logic for clearing the null flag was to do it in Next/Prev. However, this is problematic for a few reasons: - If the inner table of the left join is using SeekRowid, then Next/Prev is never called on its cursor, so the null flag doesn't get cleared. - If the inner table of the left join is using a non-covering index seek, i.e. it iterates its rows using an index, but seeks to the main table to fetch data, then Next/Prev is never called on the main table, and the main table's null flag doesn't get cleared. What this results in is NULL values incorrectly being emitted for the inner table after the first correct NULL row, since the null flag is correctly set to true, but never cleared. This PR fixes the issue by clearing the null flag whenever seek() is invoked on the cursor. Hence, the null flag is now cleared on: - next() - prev() - seek() Reviewed-by: Preston Thorpe (@PThorpe92) Closes #1364	2025-04-19 20:39:30 +03:00
Jussi Saurio	83c509a613	Fix bug: left join null flag not being cleared In left joins, even if the join condition is not matched, the system must emit a row for every row of the outer table: -- this must return t1.count() rows, with NULLs for all columns of t2 SELECT * FROM t1 LEFT JOIN t2 ON FALSE; Our logic for clearing the null flag was to do it in Next/Prev. However, this is problematic for a few reasons: - If the inner table of the left join is using SeekRowid, then Next/Prev is never called on its cursor, so the null flag doesn't get cleared. - If the inner table of the left join is using a non-covering index seek, i.e. it iterates its rows using an index, but seeks to the main table to fetch data, then Next/Prev is never called on the main table, and the main table's null flag doesn't get cleared. What this results in is NULL values incorrectly being emitted for the inner table after the first correct NULL row, since the null flag is correctly set to true, but never cleared. This PR fixes the issue by clearing the null flag whenever seek() is invoked on the cursor. Hence, the null flag is now cleared on: - next() - prev() - seek()	2025-04-19 13:56:52 +03:00
Jussi Saurio	017cdb9568	btree: avoid reading entire cell when only rowid needed	2025-04-18 16:52:05 +03:00
Jussi Saurio	ac8ffa645d	Merge 'btree: use binary search in seek/move_to for table btrees' from Jussi Saurio Implements binary search to find the correct cell within a page, specialized for table btrees only due to lack of energy at 8:30 PM --- I used a [1GB TPC-H database](https://github.com/lovasoa/TPCH- sqlite/releases/download/v1.0/TPC-H.db) for benchmarking and ran this query which does a lot of seeks: before ```sql limbo> .timer on limbo> select l_orderkey, 3 as revenue, o_orderdate, o_shippriority from lineitem, orders, customer where c_mktsegment = 'FURNITURE' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < cast('1995-03-29' as datetime) and l_shipdate > cast('1995-03-29' as datetime); ┌────────────┬─────────┬─────────────┬────────────────┐ │ l_orderkey │ revenue │ o_orderdate │ o_shippriority │ ├────────────┼─────────┼─────────────┼────────────────┤ └────────────┴─────────┴─────────────┴────────────────┘ Command stats: ---------------------------- total: 16.267797375 s (this includes parsing/coloring of cli app) ``` after ```sql limbo> .timer on limbo> select l_orderkey, 3 as revenue, o_orderdate, o_shippriority from lineitem, orders, customer where c_mktsegment = 'FURNITURE' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < cast('1995-03-29' as datetime) and l_shipdate > cast('1995-03-29' as datetime); ┌────────────┬─────────┬─────────────┬────────────────┐ │ l_orderkey │ revenue │ o_orderdate │ o_shippriority │ ├────────────┼─────────┼─────────────┼────────────────┤ └────────────┴─────────┴─────────────┴────────────────┘ Command stats: ---------------------------- total: 5.20604125 s (this includes parsing/coloring of cli app) ``` BTW sqlite completes this in 600 milliseconds so there's still a lot of fuckiness somewhere. --- UPDATE: refactored table btree seek (on leaf pages) to use binary search too. I also updated the above numbers so that I ran each a few times and took the lowest time i got for each. This is after binsearch on leaf pages too: ```sql limbo> select l_orderkey, 3 as revenue, o_orderdate, o_shippriority from lineitem, orders, customer where c_mktsegment = 'FURNITURE' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < cast('1995-03-29' as datetime) and l_shipdate > cast('1995-03-29' as datetime); ┌────────────┬─────────┬─────────────┬────────────────┐ │ l_orderkey │ revenue │ o_orderdate │ o_shippriority │ ├────────────┼─────────┼─────────────┼────────────────┤ └────────────┴─────────┴─────────────┴────────────────┘ Command stats: ---------------------------- total: 4.529645958 s (this includes parsing/coloring of cli app) ``` Closes #1357	2025-04-18 16:44:20 +03:00
Jussi Saurio	3dab59201d	Separate both table&index move_to impls into different funcs	2025-04-18 16:27:50 +03:00
Jussi Saurio	0974ba6e71	default to using tablebtree_move_to in all calls to move_to with rowids	2025-04-18 16:11:36 +03:00
Jussi Saurio	12e689b9fc	btree: use binary search on table leaf pages too	2025-04-18 16:11:36 +03:00
Jussi Saurio	3f9bdbdf14	btree: use binary search in move_to() for table btrees	2025-04-18 16:11:36 +03:00
Jussi Saurio	1ccc321030	Merge 'Feat: Covering indexes' from Jussi Saurio Closes #364 Covering indexes mean being able to read all the necessary data from an index instead of using the underlying table at all. This PR adds that functionality. This PR can be reviewed commit-by-commit as the first commits are enablers for the actual covering index usage functionality Example of a scan where covering index can be used: ```sql limbo> .schema CREATE TABLE t(a,b,c,d,e); CREATE INDEX abc ON t (a,b,c); limbo> explain select b+1,concat(a, c) from t; addr opcode p1 p2 p3 p4 p5 comment ---- ----------------- ---- ---- ---- ------------- -- ------- 0 Init 0 12 0 0 Start at 12 1 OpenRead 0 3 0 0 table=abc, root=3 2 Rewind 0 11 0 0 Rewind abc 3 Column 0 1 3 0 r[3]=abc.b 4 Integer 1 4 0 0 r[4]=1 5 Add 3 4 1 0 r[1]=r[3]+r[4] 6 Column 0 0 5 0 r[5]=abc.a 7 Column 0 2 6 0 r[6]=abc.c 8 Function 0 5 2 concat 0 r[2]=func(r[5..6]) 9 ResultRow 1 2 0 0 output=r[1..2] 10 Next 0 3 0 0 11 Halt 0 0 0 0 12 Transaction 0 0 0 0 write=false 13 Goto 0 1 0 0 ``` Example of a scan where it can't be used: ```sql limbo> .schema CREATE TABLE t(a,b,c,d,e); CREATE INDEX abc ON t (a,b,c); limbo> explain select a,b,c,d from t limit 5; addr opcode p1 p2 p3 p4 p5 comment ---- ----------------- ---- ---- ---- ------------- -- ------- 0 Init 0 11 0 0 Start at 11 1 OpenRead 0 2 0 0 table=t, root=2 2 Rewind 0 10 0 0 Rewind t 3 Column 0 0 4 0 r[4]=t.a 4 Column 0 1 5 0 r[5]=t.b 5 Column 0 2 6 0 r[6]=t.c 6 Column 0 3 7 0 r[7]=t.d 7 ResultRow 4 4 0 0 output=r[4..7] 8 DecrJumpZero 1 10 0 0 if (--r[1]==0) goto 10 9 Next 0 3 0 0 10 Halt 0 0 0 0 11 Transaction 0 0 0 0 write=false 12 Integer 5 1 0 0 r[1]=5 13 Integer 0 2 0 0 r[2]=0 14 OffsetLimit 1 3 2 0 if r[1]>0 then r[3]=r[1]+max(0,r[2]) else r[3]=(-1) 15 Goto 0 1 0 0 ``` Closes #1351	2025-04-18 15:27:27 +03:00
Jussi Saurio	9d553c50cc	Merge 'allow index entry delete' from Pere Diaz Bou Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #1341	2025-04-18 15:26:05 +03:00
Jussi Saurio	bf2e198a57	Merge 'Fix out of bounds access on `parse_numeric_str`' from Levy A. Fixes #1361. Closes #1362	2025-04-18 15:24:37 +03:00
Jussi Saurio	8477ff0d3d	tests/fuzz: amend compound index key fuzz to include nonindexed columns some of the time	2025-04-18 15:13:13 +03:00
Jussi Saurio	6c73db6fd3	feat: use covering indexes whenever possible	2025-04-18 15:13:09 +03:00
Jussi Saurio	5b71d3a3da	eliminate_unnecessary_orderby: add edge case handling	2025-04-18 15:12:06 +03:00
Jussi Saurio	40d880c3b0	TableReference: add resolve_cursors() method	2025-04-18 15:12:06 +03:00
Jussi Saurio	d5a6553e63	TableReference: add open_cursors()	2025-04-18 15:12:06 +03:00
Jussi Saurio	4ab4a3f6c3	TableReference: add index_is_covering() and utilizes_covering_index()	2025-04-18 15:12:06 +03:00
Levy A.	5fd2ed0bae	fix: handle empty case	2025-04-17 20:20:57 -03:00
Levy A.	32d59b8c78	refactor+fix: using a more robust pattern matching approach	2025-04-17 20:08:05 -03:00
Jussi Saurio	48bee334cf	Merge 'Support xBestIndex in vtab API' from Preston Thorpe closes #1185 ## The Problem: The underlying schema of virtual tables is hidden from the query planner, and it currently has no way of optimizing select queries with vtab table refs by using indexes or removing non-constant predicates. All vtabs are currently rewound completely each time and any conditional filtering is done in the vdbe layer instead of in the `VFilter`. ## The solution: Add xBestIndex to the vtab module API to let extensions return some `IndexInfo` that will allow the query planner to make better optimizations and possibly omit conditionals ## Examples: table `t`: vtab: (key, value) table `t2`: table: (a,b) ### Join where vtab is outer table: ![image](https://github.com/user- attachments/assets/87f4233f-7d32-4a5e-8f95-4bebd3549304) Properly pushes predicate to VFilter, which receives the idx_str `key_eq` arg, telling it that there is a useable where clause on the key "index" ### Join where vtab is inner table: ![image](https://github.com/user- attachments/assets/f8fcf6d3-42bc-41a3-ad86-16e497ec6056) Constraint is not sent because it is marked as unusable ### Where clause on "indexed" column: ![image](https://github.com/user- attachments/assets/8817cc45-177c-404d-8323-4d33180e280c) Pushed down and the predicate is omitted from the VDBE layer. ### Where clause on regular column: ![image](https://github.com/user- attachments/assets/85595c7f-920f-4047-8388-a7dddd01778c) No idx info received from BestIndex, VDBE handles conditional. ## TODO: OrderBy info needs to be sent to xBestIndex and its not in a great position in `open_loop` currently Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #1264	2025-04-17 23:17:01 +03:00
Jussi Saurio	bef2058f1c	Merge 'Fix post balance validation ' from Pere Diaz Bou Closes #1360	2025-04-17 22:48:51 +03:00
PThorpe92	d02900294e	Remove 2nd shell in vtab tests, fix expr translation in main loop	2025-04-17 14:01:45 -04:00

1 2 3 4 5 ...

4107 Commits