turso

mirror of https://github.com/aljazceru/turso.git synced 2025-12-20 01:44:19 +01:00

Author	SHA1	Message	Date
Jussi Saurio	3f633247f7	perf/stmt: avoid checking for SchemaUpdated errors if it's impossible	2025-08-05 15:10:55 +03:00
pedrocarlo	d2019e95f3	pass schema to epilogue for schema_version checking + do not Pragma Schema Version in `open_with_flags` to avoid infinite loop in reprepare. Just access the database header directly	2025-08-04 12:32:34 -03:00
pedrocarlo	736748cdf7	Simplify program epilogue by tracking the transaction mode and rollback status in the ProgramBuilder and then calling epilogue just once	2025-08-04 12:32:34 -03:00
pedrocarlo	c567636deb	Adjust Transaction OpCode to accept schema cookie + check if cookie changed	2025-08-04 12:32:34 -03:00
pedrocarlo	54636241c2	store Sql String inside `Program` for reprepare	2025-08-04 12:32:34 -03:00
Pere Diaz Bou	752a876f9a	change every Rc to Arc in schema internals	2025-07-28 10:51:17 +02:00
bit-aloo	9a54ef214e	parser: Distinguish quoted identifiers and unify Id into Name enum This commit replaces the `Name(pub String)` struct with a `Name` enum that explicitly models how the name appeared in the source either as an unquoted identifier (`Ident`) or a quoted string (`Quoted`). In the process, the separate `Id` wrapper type has been coalesced into the `Name` enum, simplifying the AST and reducing duplication in identifier handling logic. While this increases the size of some AST nodes (notably `yyStackEntry`), it improves correctness and makes source structure more explicit for later phases.	2025-07-24 14:40:19 +05:30
Glauber Costa	65312baee6	fix opcodes missing a database register Two of the opcodes we implement (OpenRead and Transaction) should have an opcode specifying the database to use, but they don't. Add it, and for now always use 0 (the main database).	2025-07-20 12:27:26 -05:00
pedrocarlo	c15f1e02d3	make most instrumentation levels to be Debug or Trace instead. Span creation in debug mode is very slow and impacts our ability to run the Simulator fast enough	2025-07-17 16:48:24 -03:00
Nils Koch	828d4f5016	fix clippy errors for rust 1.88.0 (auto fix)	2025-07-12 18:58:41 +03:00
Pekka Enberg	341f963a8e	Merge 'Fix infinite loops, rollback problems, and other bugs found by I/O fault injection' from Pedro Muniz Was running the sim with I/O faults enabled and fixed some nasty bugs. Now, there are some more nasty bugs to fix as well. This is the command that I use to run the simulator `cargo run -p limbo_sim -- --minimum- tests 10 --maximum-tests 1000` This PR mainly fixes the following bugs: - Not decrementing in flight write counter when `pwrite` fails - not rolling back the transaction on `step` error - not rolling back the transaction on `run_once` error - some functions were just being unwrapped when they could suffer io errors - Only change max_frame after wal sync's Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #1946	2025-07-07 21:31:26 +03:00
pedrocarlo	b85687658d	change instrumentation level to INFO	2025-07-07 11:53:45 -03:00
Nikita Sivukhin	a988bbaffe	allow to specify table in the capture_data_changes PRAGMA	2025-07-06 22:19:32 +04:00
Nikita Sivukhin	04f2efeaa4	small renames	2025-07-06 21:16:57 +04:00
Nikita Sivukhin	cf7ae031c7	add ProgramBuilderFlags to the builder	2025-07-06 21:16:25 +04:00
Nikita Sivukhin	c9c5ef4e25	remote query_mode from ProgramBuilderOpts and from function arguments - mode never changes and ProgramBuilder already created with proper mode set correctly	2025-07-02 13:24:12 +04:00
Levy A.	ffd6844b5b	refactor: remove `PseudoTable` from `Table` the only reason for `PseudoTable` to exist, is to provide column information for `PseudoCursor` creation. this should not be part of the schema.	2025-06-30 14:31:58 -03:00
Pekka Enberg	725c3e4ddc	Rename `limbo_sqlite3_parser` crate to `turso_sqlite3_parser`	2025-06-29 12:34:46 +03:00
Pere Diaz Bou	d66c683a4c	implement rollback translation	2025-06-25 13:45:32 +02:00
Jussi Saurio	cc2e14b11c	Read page 1 from pager always, no separate db_header	2025-06-24 14:41:49 -03:00
Nils Koch	2827b86917	chore: fix clippy warnings	2025-06-23 19:52:13 +01:00
Pekka Enberg	90c1e3fc06	Switch Connection to use Arc instead of Rc Connection needs to be Arc so that bindings can wrap it with `Mutex` for multi-threading.	2025-06-16 10:43:19 +03:00
Levy A.	01a680b69e	feat(fuzz)+fix: add schema fuzz testing and fix some bugs	2025-06-11 14:19:06 -03:00
Levy A.	41cb13aa74	fix: ignore non-constants	2025-06-11 14:18:41 -03:00
Levy A.	15e0cab8d8	refactor+fix: precompute default values from schema	2025-06-11 14:18:39 -03:00
Levy A.	6945c0c09e	fix+refactor: incorrect label placement also added a `cursor_loop` helper on `ProgramBuilder` to avoid making this mistake in the future. this is zero-cost, and will be optimized to the same thing (hopefully).	2025-06-11 14:17:36 -03:00
Anton Harniakou	d802075ea9	Resolve merge conflict: Add columns names to result set for pragma statement output	2025-06-09 10:40:04 +03:00
pedrocarlo	bc563266b3	add instrumentation to more functions for debugging + adjust how cursors are opened	2025-05-30 20:35:50 -03:00
Jussi Saurio	819a6138d0	Merge 'Fix: aggregate regs must be initialized as NULL at the start' from Jussi Saurio Again found when fuzzing nested where clause subqueries: Aggregate registers need to be NULLed at the start because the same registers might be reused on another invocation of a subquery, and if they are not NULLed, the 2nd invocation of the same subquery will have values left over from the first invocation. Reviewed-by: Preston Thorpe (@PThorpe92) Closes #1614	2025-05-30 09:39:37 +03:00
Jussi Saurio	f8257df77b	Fix: aggregate regs must be initialized as NULL at the start	2025-05-29 18:44:53 +03:00
Jussi Saurio	cc405dea7e	Use new TableReferences struct everywhere	2025-05-29 11:44:56 +03:00
Jussi Saurio	592ba41137	Add assertion forbidding duplicate cursor keys	2025-05-29 01:04:45 +03:00
Jussi Saurio	77ce4780d9	Fix ProgramBuilder::cursor_ref not having unique keys Currently we have this: program.alloc_cursor_id(Option<String>, CursorType)` where the String is the table's name or alias ('users' or 'u' in the query). This is problematic because this can happen: `SELECT * FROM t WHERE EXISTS (SELECT * FROM t)` There are two cursors, both with identifier 't'. This causes a bug where the program will use the same cursor for both the main query and the subquery, since they are keyed by 't'. Instead introduce `CursorKey`, which is a combination of: 1. `TableInternalId`, and 2. index name (Option<String> -- in case of index cursors. This should provide key uniqueness for cursors: `SELECT * FROM t WHERE EXISTS (SELECT * FROM t)` here the first 't' will have a different `TableInternalId` than the second `t`, so there is no clash.	2025-05-29 00:59:24 +03:00
pedrocarlo	e3fd1e589e	support using a INSERT SELECT that references the same table in both statements	2025-05-25 19:15:28 -03:00
Jussi Saurio	7c07c09300	Add stable internal_id property to TableReference Currently our "table id"/"table no"/"table idx" references always use the direct index of the `TableReference` in the plan, e.g. in `SelectPlan::table_references`. For example: ```rust Expr::Column { table: 0, column: 3, .. } ``` refers to the 0'th table in the `table_references` list. This is a fragile approach because it assumes the table_references list is stable for the lifetime of the query processing. This has so far been the case, but there exist certain query transformations, e.g. subquery unnesting, that may fold new table references from a subquery (which has its own table ref list) into the table reference list of the parent. If such a transformation is made, then potentially all of the Expr::Column references to tables will become invalid. Consider this example: ```sql -- Assume tables: users(id, age), orders(user_id, amount) -- Get total amount spent per user on orders over $100 SELECT u.id, sub.total FROM users u JOIN (SELECT user_id, SUM(amount) as total FROM orders o WHERE o.amount > 100 GROUP BY o.user_id) sub WHERE u.id = sub.user_id -- Before subquery unnesting: -- Main query table_references: [users, sub] -- u.id refers to table 0, column 0 -- sub.total refers to table 1, column 1 -- -- Subquery table_references: [orders] -- o.user_id refers to table 0, column 0 -- o.amount refers to table 0, column 1 -- -- After unnesting and folding subquery tables into main query, -- the query might look like this: SELECT u.id, SUM(o.amount) as total FROM users u JOIN orders o ON u.id = o.user_id WHERE o.amount > 100 GROUP BY u.id; -- Main query table_references: [users, orders] -- u.id refers to table index 0 (correct) -- o.amount refers to table index 0 (incorrect, should be 1) -- o.user_id refers to table index 0 (incorrect, should be 1) ``` We could ofc traverse every expression in the subquery and rewrite the table indexes to be correct, but if we instead use stable identifiers for each table reference, then all the column references will continue to be correct. Hence, this PR introduces a `TableInternalId` used in `TableReference` as well as `Expr::Column` and `Expr::Rowid` so that this kind of query transformations can happen with less pain.	2025-05-25 20:26:17 +03:00
pedrocarlo	53bf5d5ef5	adjust translate functions to take a program instead of `Option<ProgramBuilder>` + remove any Init emission in traslate functions + use epilogue in all places necessary	2025-05-21 16:41:10 -03:00
pedrocarlo	1c12535d9f	push prologue to top-level translate function	2025-05-21 15:50:43 -03:00
pedrocarlo	3090dd91fa	push translate_ctx creation outside of prologue	2025-05-21 13:06:25 -03:00
pedrocarlo	f5d6d11d16	extract prologue and epilogue to program builder	2025-05-21 12:47:51 -03:00
pedrocarlo	517c7c81cd	refactor to include optional program builder argument	2025-05-21 12:47:51 -03:00
Pekka Enberg	e102cd0be5	Merge 'Add support for DISTINCT aggregate functions' from Jussi Saurio Reviewable commit by commit. CI failures are not related. Adds support for e.g. `select first_name, sum(distinct age), count(distinct age), avg(distinct age) from users group by 1` Implementation details: - Creates an ephemeral index per distinct aggregate, and jumps over the accumulation step if a duplicate is found Closes #1507	2025-05-20 13:58:57 +03:00
pedrocarlo	5b15d6aa32	Get the table correctly from the connection instead of table_references + test to confirm unique constraint	2025-05-19 15:22:55 -03:00
pedrocarlo	a818b6924c	Removed repeated binary expression translation. Adjusted the set_collation to capture additional context of whether it was set by a Collate expression or not. Added some tests to prove those modifications were necessary.	2025-05-19 15:22:14 -03:00
pedrocarlo	d0a63429a6	Naive implementation of collate for queries. Not implemented for column constraints	2025-05-19 15:22:14 -03:00
Jussi Saurio	8d66347729	vdbe: add Insn::Found	2025-05-17 15:33:55 +03:00
Jussi Saurio	fe65d6e991	Merge 'Performance: hoist entire expressions out of hot loops if they are constant' from Jussi Saurio ## Problem: - We have cases where we are evaluating expressions in a hot loop that could only be evaluated once. For example: `CAST('2025-01-01' as DATETIME)` -- the value of this never changes, so we should only run it once. - We have no robust way of doing this right now for entire _expressions_ -- the only existing facility we have is `program.mark_last_insn_constant()`, which has no concept of how many instructions translating a given _expression_ spends, and breaks very easily for this reason. ## Main ideas of this PR: - Add `expr.is_constant()` determining whether the expression is compile-time constant. Tries to be conservative and not deem something compile-time constant if there is no certainty. - Whenever we think a compile-time constant expression is about to be translated into bytecode in `translate_expr()`, start a so called `constant span`, which means a range of instructions that are part of a compile-time constant expression. - At the end of translating the program, all `constant spans` are hoisted outside of any table loops so they only get evaluated once. - The target offsets of any jump instructions (e.g. `Goto`) are moved to the correct place, taking into account all instructions whose offsets were shifted due to moving the compile-time constant expressions around. - An escape hatch wrapper `translate_expr_no_constant_opt()` is added for cases where we should not hoist constants even if we otherwise could. Right now the only example of this is cases where we are reusing the same register(s) in multiple iterations of some kind of loop, e.g. `VALUES(...)` or in the `coalesce()` function implementation. ## Performance effects Here is an example of a modified/simplified TPC-H query where the `CAST()` calls were previously run millions of times in a hot loop, but now they are optimized out of the loop. BYTECODE PLAN BEFORE: ```sql limbo> explain select l_orderkey, 3 as revenue, o_orderdate, o_shippriority from lineitem, orders, customer where c_mktsegment = 'FURNITURE' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < cast('1995-03-29' as datetime) and l_shipdate > cast('1995-03-29' as datetime); addr opcode p1 p2 p3 p4 p5 comment ---- ----------------- ---- ---- ---- ------------- -- ------- 0 Init 0 26 0 0 Start at 26 1 OpenRead 0 10 0 0 table=lineitem, root=10 2 OpenRead 1 9 0 0 table=orders, root=9 3 OpenRead 2 8 0 0 table=customer, root=8 4 Rewind 0 25 0 0 Rewind lineitem 5 Column 0 10 5 0 r[5]=lineitem.l_shipdate 6 String8 0 7 0 1995-03-29 0 r[7]='1995-03-29' 7 Function 0 7 6 cast 0 r[6]=func(r[7..8]) <-- CAST() executed millions of times 8 Le 5 6 24 0 if r[5]<=r[6] goto 24 9 Column 0 0 9 0 r[9]=lineitem.l_orderkey 10 SeekRowid 1 9 24 0 if (r[9]!=orders.rowid) goto 24 11 Column 1 4 10 0 r[10]=orders.o_orderdate 12 String8 0 12 0 1995-03-29 0 r[12]='1995-03-29' 13 Function 0 12 11 cast 0 r[11]=func(r[12..13]) 14 Ge 10 11 24 0 if r[10]>=r[11] goto 24 15 Column 1 1 14 0 r[14]=orders.o_custkey 16 SeekRowid 2 14 24 0 if (r[14]!=customer.rowid) goto 24 17 Column 2 6 15 0 r[15]=customer.c_mktsegment 18 Ne 15 16 24 0 if r[15]!=r[16] goto 24 19 Column 0 0 1 0 r[1]=lineitem.l_orderkey 20 Integer 3 2 0 0 r[2]=3 21 Column 1 4 3 0 r[3]=orders.o_orderdate 22 Column 1 7 4 0 r[4]=orders.o_shippriority 23 ResultRow 1 4 0 0 output=r[1..4] 24 Next 0 5 0 0 25 Halt 0 0 0 0 26 Transaction 0 0 0 0 write=false 27 String8 0 8 0 DATETIME 0 r[8]='DATETIME' 28 String8 0 13 0 DATETIME 0 r[13]='DATETIME' 29 String8 0 16 0 FURNITURE 0 r[16]='FURNITURE' 30 Goto 0 1 0 ``` BYTECODE PLAN AFTER: ```sql limbo> explain select l_orderkey, 3 as revenue, o_orderdate, o_shippriority from lineitem, orders, customer where c_mktsegment = 'FURNITURE' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < cast('1995-03-29' as datetime) and l_shipdate > cast('1995-03-29' as datetime); addr opcode p1 p2 p3 p4 p5 comment ---- ----------------- ---- ---- ---- ------------- -- ------- 0 Init 0 21 0 0 Start at 21 1 OpenRead 0 10 0 0 table=lineitem, root=10 2 OpenRead 1 9 0 0 table=orders, root=9 3 OpenRead 2 8 0 0 table=customer, root=8 4 Rewind 0 20 0 0 Rewind lineitem 5 Column 0 10 5 0 r[5]=lineitem.l_shipdate 6 Le 5 6 19 0 if r[5]<=r[6] goto 19 7 Column 0 0 9 0 r[9]=lineitem.l_orderkey 8 SeekRowid 1 9 19 0 if (r[9]!=orders.rowid) goto 19 9 Column 1 4 10 0 r[10]=orders.o_orderdate 10 Ge 10 11 19 0 if r[10]>=r[11] goto 19 11 Column 1 1 14 0 r[14]=orders.o_custkey 12 SeekRowid 2 14 19 0 if (r[14]!=customer.rowid) goto 19 13 Column 2 6 15 0 r[15]=customer.c_mktsegment 14 Ne 15 16 19 0 if r[15]!=r[16] goto 19 15 Column 0 0 1 0 r[1]=lineitem.l_orderkey 16 Column 1 4 3 0 r[3]=orders.o_orderdate 17 Column 1 7 4 0 r[4]=orders.o_shippriority 18 ResultRow 1 4 0 0 output=r[1..4] 19 Next 0 5 0 0 20 Halt 0 0 0 0 21 Transaction 0 0 0 0 write=false 22 String8 0 7 0 1995-03-29 0 r[7]='1995-03-29' 23 String8 0 8 0 DATETIME 0 r[8]='DATETIME' 24 Function 1 7 6 cast 0 r[6]=func(r[7..8]) <-- CAST() executed twice 25 String8 0 12 0 1995-03-29 0 r[12]='1995-03-29' 26 String8 0 13 0 DATETIME 0 r[13]='DATETIME' 27 Function 1 12 11 cast 0 r[11]=func(r[12..13]) 28 String8 0 16 0 FURNITURE 0 r[16]='FURNITURE' 29 Integer 3 2 0 0 r[2]=3 30 Goto 0 1 0 0 ``` EXECUTION RUNTIME BEFORE: ```sql limbo> select l_orderkey, 3 as revenue, o_orderdate, o_shippriority from lineitem, orders, customer where c_mktsegment = 'FURNITURE' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < cast('1995-03-29' as datetime) and l_shipdate > cast('1995-03-29' as datetime); ┌────────────┬─────────┬─────────────┬────────────────┐ │ l_orderkey │ revenue │ o_orderdate │ o_shippriority │ ├────────────┼─────────┼─────────────┼────────────────┤ └────────────┴─────────┴─────────────┴────────────────┘ Command stats: ---------------------------- total: 3.633396667 s (this includes parsing/coloring of cli app) ``` EXECUTION RUNTIME AFTER: ```sql limbo> select l_orderkey, 3 as revenue, o_orderdate, o_shippriority from lineitem, orders, customer where c_mktsegment = 'FURNITURE' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < cast('1995-03-29' as datetime) and l_shipdate > cast('1995-03-29' as datetime); ┌────────────┬─────────┬─────────────┬────────────────┐ │ l_orderkey │ revenue │ o_orderdate │ o_shippriority │ ├────────────┼─────────┼─────────────┼────────────────┤ └────────────┴─────────┴─────────────┴────────────────┘ Command stats: ---------------------------- total: 2.0923475 s (this includes parsing/coloring of cli app) ```` Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #1359	2025-04-25 16:55:41 +03:00
Jussi Saurio	c3441f9685	vdbe: move comments if instructions were moved around in emit_constant_insns()	2025-04-24 11:05:21 +03:00
Jussi Saurio	0f5c791784	vdbe: refactor label resolution to account for insn offsets changing	2025-04-24 11:05:21 +03:00
Jussi Saurio	b4b38bdb3c	vdbe: resolve labels for InitCoroutine::start_offset	2025-04-24 11:05:21 +03:00
Jussi Saurio	47f3f3bda3	vdbe: replace constant_insns with constant_spans	2025-04-24 11:05:21 +03:00

1 2 3

120 Commits