turso

mirror of https://github.com/aljazceru/turso.git synced 2025-12-20 01:44:19 +01:00

Author	SHA1	Message	Date
bit-aloo	32e59614c7	remove unnecessary copy instr in likelihood, likely and unlikely	2025-08-14 09:08:32 +05:30
bit-aloo	eda3a82306	strip unylikely and just translate the inner value	2025-08-13 22:46:31 +05:30
bit-aloo	e72097e2b7	strip likely and just translate the inner value	2025-08-13 22:46:22 +05:30
bit-aloo	3581895f18	add unlikely scalar method	2025-08-12 16:27:23 +05:30
Mikaël Francoeur	2cf4e4fe96	handle single, double and unquoted strings in values clause	2025-08-08 09:03:38 -04:00
Piotr Rzysko	718598eab8	Introduce scan type Different scan parameters are required for different table types. Currently, index and iteration direction are only used by B-tree tables, while the remaining table types don’t require any parameters. Planning access to virtual tables, however, will require passing additional information from the planner, such as the virtual table index (distinct from a B-tree index) and the constraints that must be forwarded to the `filter` method.	2025-08-04 20:27:22 +02:00
Glauber Costa	9d41fa4489	implement IN patterns for non-conditional SELECT queries Extracts the core logic of IN from the conditional version, and uses the conditional metadata to determine the jump. Then Uses the AddImm operator we just added to force the integer conversion at the end (like SQLite does).	2025-07-31 08:11:41 -05:00
Glauber Costa	4bd1582e7d	Implement the Cast opcode Our compat matrix mentions a couple of opcodes: ToInt, ToBlob, etc. Those opcodes do not exist. Instead, there is a single Cast opcode, that takes the affinity as a parameter. Currently we just call a function when we need to cast. This PR fixes the compat file, implements the cast opcode, and in at least one instance, when explicitly using the CAST keyword, uses that opcode instead of a function in the generated bytecode.	2025-07-30 20:44:54 -05:00
bit-aloo	a5dce2b50b	add subvector execution flow	2025-07-30 09:51:08 +05:30
bit-aloo	e4d79a6516	add vec_concat execution flow	2025-07-30 06:07:03 +05:30
Diego Reis	bab10909c3	Disable extension loading for wasm We should enable it later when wasm become more mature	2025-07-28 14:49:07 -03:00
Glauber Costa	5d8d08d1b6	Implement the Returning statement for inserts and updates They are very similar. DELETE is very different, so that one we'll do it later.	2025-07-26 09:01:09 -05:00
Pekka Enberg	669b231714	Merge 'parser: Distinguish quoted identifiers and unify Id into Name enum' from bit-aloo Closes: #1947 This PR replaces the `Name(pub String)` struct with a `Name` enum that explicitly models how the name appeared in the source either as an unquoted identifier (`Ident`) or a quoted string (`Quoted`). In the process, the separate `Id` wrapper type has been coalesced into the `Name` enum, simplifying the AST and reducing duplication in identifier handling logic. While this increases the size of some AST nodes (notably `yyStackEntry`). cc: @levydsa Reviewed-by: Levy A. (@levydsa) Reviewed-by: Preston Thorpe (@PThorpe92) Closes #2251	2025-07-25 12:08:54 +03:00
Glauber Costa	988b16f962	Support ATTACH (read only) Support for attaching databases. The main difference from SQLite is that we support an arbitrary number of attached databases, and we are not bound to just 100ish. We for now only support read-only databases. We open them as read-only, but also, to keep things simple, we don't patch any of the insert machinery to resolve foreign tables. So if an insert is tried on an attached database, it will just fail with a "no such table" error - this is perfect for now. The code in core/translate/attach.rs is written by Claude, who also played a key part in the boilerplate for stuff like the .databases command and extending the pragma database_list, and also aided me in the test cases.	2025-07-24 19:19:48 -05:00
bit-aloo	9a54ef214e	parser: Distinguish quoted identifiers and unify Id into Name enum This commit replaces the `Name(pub String)` struct with a `Name` enum that explicitly models how the name appeared in the source either as an unquoted identifier (`Ident`) or a quoted string (`Quoted`). In the process, the separate `Id` wrapper type has been coalesced into the `Name` enum, simplifying the AST and reducing duplication in identifier handling logic. While this increases the size of some AST nodes (notably `yyStackEntry`), it improves correctness and makes source structure more explicit for later phases.	2025-07-24 14:40:19 +05:30
Pekka Enberg	c2a8a6f178	Merge 'improve handling of double quotes' from Glauber Costa I ended up hitting #1974 today and wanted to fix it. I worked with Claude to generate a more comprehensive set of queries that could fail aside from just the insert query described in the issue. He got most of them right - lots of cases were indeed failing. The ones that were gibberish, he told me I was absolutely right for pointing out they were bad. But alas. With the test cases generated, we can work on fixing it. The place where the assertion was hit, all we need to do there is return true (but we assert that this is indeed a string literal, it shouldn't be anything else at this point). There are then just a couple of places where we need to make sure we handle double quotes correctly. We already tested for single quotes in a couple of places, but never for double quotes. There is one funny corner case where you can just select "col" from tbl, and if there is no column "col" on the table, that is treated as a string literal. We handle that too. Fixes #1974 Closes #2152	2025-07-18 20:55:37 +03:00
Glauber Costa	cbdd5c5fc7	improve handling of double quotes I ended up hitting #1974 today and wanted to fix it. I worked with Claude to generate a more comprehensive set of queries that could fail aside from just the insert query described in the issue. He got most of them right - lots of cases were indeed failing. The ones that were gibberish, he told me I was absolutely right for pointing out they were bad. But alas. With the test cases generated, we can work on fixing it. The place where the assertion was hit, all we need to do there is return true (but we assert that this is indeed a string literal, it shouldn't be anything else at this point). There are then just a couple of places where we need to make sure we handle double quotes correctly. We already tested for single quotes in a couple of places, but never for double quotes. There is one funny corner case where you can just select "col" from tbl, and if there is no column "col" on the table, that is treated as a string literal. We handle that too. Fixes #1974	2025-07-18 10:39:02 -05:00
pedrocarlo	c15f1e02d3	make most instrumentation levels to be Debug or Trace instead. Span creation in debug mode is very slow and impacts our ability to run the Simulator fast enough	2025-07-17 16:48:24 -03:00
Nikita Sivukhin	c018b06bf5	fix bug in concat_ws translation	2025-07-16 00:48:17 +04:00
Nikita Sivukhin	be0a607ba8	rename amount -> extra_amount	2025-07-16 00:46:17 +04:00
Pekka Enberg	1653b0883a	Merge 'core/vector: Euclidean distance support for vector search' from KarinaMilet This PR provides Euclidean distance support for limbo's vector search. At the same time, some type abstractions are introduced, such as `DistanceCalculator`, etc. This is because I hope to unify the current vector module in the future to make it more structured, clearer, and more extensible. While practicing Euclidean distance for Limbo, I discovered that many checks could be done using the type system or in advance, rather than waiting until the distance is calculated. By building these checks into the type system or doing them ahead of time, this would allow us to explore more efficient computations, such as automatic vectorization or SIMD acceleration, which is future work. Reviewed-by: Nikita Sivukhin (@sivukhin) Closes #1986	2025-07-14 13:07:20 +03:00
Nikita Sivukhin	e94ebbad04	remove unwanted changes	2025-07-14 11:27:51 +04:00
Nikita Sivukhin	81cd04dd65	add bin_record_json_object and table_columns_json_array functions	2025-07-14 11:19:45 +04:00
Pekka Enberg	f24e254ec6	core/translate: Fix "misuse of aggregate function" error message ``` sqlite> CREATE TABLE test1(f1, f2); sqlite> SELECT SUM(min(f1)) FROM test1; Parse error: misuse of aggregate function min() SELECT SUM(min(f1)) FROM test1; ^--- error here ``` Spotted by SQLite TCL tests.	2025-07-10 14:29:59 +03:00
KaguraMilet	9d6ae78786	Merge branch 'tursodatabase:main' into distance	2025-07-10 19:15:08 +08:00
Pekka Enberg	3f10427f52	core: Fix resolve_function() error messages We need to return the original function name, not normalized one to be compatible with SQLite. Spotted by SQLite TCL tests.	2025-07-09 15:30:57 +03:00
pedrocarlo	b85687658d	change instrumentation level to INFO	2025-07-07 11:53:45 -03:00
pedrocarlo	5559c45011	more instrumentation + write counter should decrement if pwrite fails	2025-07-07 11:50:21 -03:00
pedrocarlo	897426a662	add error tracing to relevant functions + rollback transaction in step_end_write_txn + make move_to_root return result	2025-07-07 11:50:21 -03:00
KaguraMilet	ac95758f76	feat(vector): integrate euclidean distance into limbo	2025-07-07 21:11:51 +08:00
Levy A.	ffd6844b5b	refactor: remove `PseudoTable` from `Table` the only reason for `PseudoTable` to exist, is to provide column information for `PseudoCursor` creation. this should not be part of the schema.	2025-06-30 14:31:58 -03:00
Pekka Enberg	725c3e4ddc	Rename `limbo_sqlite3_parser` crate to `turso_sqlite3_parser`	2025-06-29 12:34:46 +03:00
Piotr Rzysko	116df2ec86	Fix evaluation of ISNULL/NOTNULL in OR expressions Previously, the `jump_if_condition_is_true` flag was not respected. As a result, for expressions like <`ISNULL`/`NOTNULL`> `OR` <rhs>, the <rhs> expression was evaluated even when the left-hand side was true, and its value was incorrectly used as the final result.	2025-06-27 08:21:40 +02:00
Nils Koch	2827b86917	chore: fix clippy warnings	2025-06-23 19:52:13 +01:00
Piotr Rzysko	64a0333119	Fix missing column references in non-aggregate expressions Previously, queries like: ``` SELECT CASE WHEN c0 != 'x' THEN group_concat(c1, ',') ELSE 'x' END FROM t0 GROUP BY c0; ``` would return incorrect results because c0 was not copied during the aggregation loop into a register accessible to the logic processing the grouped results (e.g., the CASE WHEN expression in this example). The same issue applied to expressions in the HAVING and ORDER BY clauses.	2025-06-20 06:19:16 +02:00
Levy A.	b88cb99ff0	fix warnings and some refactoring	2025-06-11 14:19:06 -03:00
Levy A.	49a6ddad97	wip	2025-06-11 14:19:04 -03:00
Levy A.	15e0cab8d8	refactor+fix: precompute default values from schema	2025-06-11 14:18:39 -03:00
Krishna Vishal	0d5cbc4f1d	Add affinity check as a function as `ast::Operator` impl	2025-06-11 00:33:48 +05:30
Krishna Vishal	712c94537c	Add affinity flags to `IS` and `IS NOT` opeartors	2025-06-11 00:33:48 +05:30
krishvishal	5837f7329f	clean up	2025-06-11 00:33:47 +05:30
krishvishal	7bd1589615	Added affinity inference and conversion for comparison ops. Added affinity helper function for `CmpInsFlags`	2025-06-11 00:33:44 +05:30
pedrocarlo	80c480517a	incorrect placeholder label in where clause translation	2025-06-10 12:00:19 -03:00
Jussi Saurio	cc405dea7e	Use new TableReferences struct everywhere	2025-05-29 11:44:56 +03:00
Jussi Saurio	77ce4780d9	Fix ProgramBuilder::cursor_ref not having unique keys Currently we have this: program.alloc_cursor_id(Option<String>, CursorType)` where the String is the table's name or alias ('users' or 'u' in the query). This is problematic because this can happen: `SELECT * FROM t WHERE EXISTS (SELECT * FROM t)` There are two cursors, both with identifier 't'. This causes a bug where the program will use the same cursor for both the main query and the subquery, since they are keyed by 't'. Instead introduce `CursorKey`, which is a combination of: 1. `TableInternalId`, and 2. index name (Option<String> -- in case of index cursors. This should provide key uniqueness for cursors: `SELECT * FROM t WHERE EXISTS (SELECT * FROM t)` here the first 't' will have a different `TableInternalId` than the second `t`, so there is no clash.	2025-05-29 00:59:24 +03:00
Jussi Saurio	51605ad2a4	Use lifetimes in walk_expr() to guarantee that child expr has same lifetime as parent expr	2025-05-28 10:56:30 +03:00
Jussi Saurio	7c07c09300	Add stable internal_id property to TableReference Currently our "table id"/"table no"/"table idx" references always use the direct index of the `TableReference` in the plan, e.g. in `SelectPlan::table_references`. For example: ```rust Expr::Column { table: 0, column: 3, .. } ``` refers to the 0'th table in the `table_references` list. This is a fragile approach because it assumes the table_references list is stable for the lifetime of the query processing. This has so far been the case, but there exist certain query transformations, e.g. subquery unnesting, that may fold new table references from a subquery (which has its own table ref list) into the table reference list of the parent. If such a transformation is made, then potentially all of the Expr::Column references to tables will become invalid. Consider this example: ```sql -- Assume tables: users(id, age), orders(user_id, amount) -- Get total amount spent per user on orders over $100 SELECT u.id, sub.total FROM users u JOIN (SELECT user_id, SUM(amount) as total FROM orders o WHERE o.amount > 100 GROUP BY o.user_id) sub WHERE u.id = sub.user_id -- Before subquery unnesting: -- Main query table_references: [users, sub] -- u.id refers to table 0, column 0 -- sub.total refers to table 1, column 1 -- -- Subquery table_references: [orders] -- o.user_id refers to table 0, column 0 -- o.amount refers to table 0, column 1 -- -- After unnesting and folding subquery tables into main query, -- the query might look like this: SELECT u.id, SUM(o.amount) as total FROM users u JOIN orders o ON u.id = o.user_id WHERE o.amount > 100 GROUP BY u.id; -- Main query table_references: [users, orders] -- u.id refers to table index 0 (correct) -- o.amount refers to table index 0 (incorrect, should be 1) -- o.user_id refers to table index 0 (incorrect, should be 1) ``` We could ofc traverse every expression in the subquery and rewrite the table indexes to be correct, but if we instead use stable identifiers for each table reference, then all the column references will continue to be correct. Hence, this PR introduces a `TableInternalId` used in `TableReference` as well as `Expr::Column` and `Expr::Rowid` so that this kind of query transformations can happen with less pain.	2025-05-25 20:26:17 +03:00
Jussi Saurio	40a4d162bc	Introduce walker expressions for ast::Expr	2025-05-23 15:56:27 +03:00
Jussi Saurio	c4548b51f1	Merge 'Optimization: lift common subexpressions from OR terms' from Jussi Saurio ```sql -- This PR does effectively this transformation: select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#22' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 8 and l_quantity <= 8 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#23' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 10 and l_quantity <= 10 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#12' and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity >= 24 and l_quantity <= 24 + 10 and p_size between 1 and 15 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ); -- Same query with common conjuncts (ANDs) extracted: select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where p_partkey = l_partkey and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' and ( ( p_brand = 'Brand#22' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 8 and l_quantity <= 8 + 10 and p_size between 1 and 5 ) or ( p_brand = 'Brand#23' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 10 and l_quantity <= 10 + 10 and p_size between 1 and 10 ) or ( p_brand = 'Brand#12' and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity >= 24 and l_quantity <= 24 + 10 and p_size between 1 and 15 ) ); ``` This allows Limbo's optimizer to 1. recognize `p_partkey=l_partkey` as an index constraint on `part`, and 2. filter out `lineitem` rows before joining. With this optimization, Limbo completes TPC-H `19.sql` nearly as fast as SQLite on my machine. Without it, Limbo takes forever. This branch: `939ms` Main: `uh, i started running it a few minutes ago and it hasnt finished, and i dont feel like waiting i guess` Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #1520	2025-05-20 14:33:49 +03:00
Jussi Saurio	6790b7479c	Optimization: lift common subexpressions from OR terms ```sql -- This PR does effectively this transformation: select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#22' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 8 and l_quantity <= 8 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#23' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 10 and l_quantity <= 10 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#12' and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity >= 24 and l_quantity <= 24 + 10 and p_size between 1 and 15 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ); -- Same query with common conjuncts (ANDs) extracted: select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where p_partkey = l_partkey and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' and ( ( p_brand = 'Brand#22' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 8 and l_quantity <= 8 + 10 and p_size between 1 and 5 ) or ( p_brand = 'Brand#23' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 10 and l_quantity <= 10 + 10 and p_size between 1 and 10 ) or ( p_brand = 'Brand#12' and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity >= 24 and l_quantity <= 24 + 10 and p_size between 1 and 15 ) ); ```	2025-05-20 14:25:15 +03:00

1 2 3 4 5 ...

382 Commits