turso

mirror of https://github.com/aljazceru/turso.git synced 2026-01-23 01:44:33 +01:00

Author	SHA1	Message	Date
Pekka Enberg	7257fb8aae	Merge 'core: move pragma statement bytecode generator to its own file.' from Sonny What? - no logic change - refactored and moved pragma statement bytecode generation to its own package to better structure. Closes #871	2025-02-03 09:10:33 +02:00
Pekka Enberg	6c34737240	Merge 'Fix rowid generation' from Nikita Sivukhin Fix panic in case when table has row with rowid equals to `-1` (`=u64::max`) ```sql limbo> CREATE TABLE t(x INTEGER PRIMARY KEY) limbo> INSERT INTO t VALUES (-1) limbo> INSERT INTO t VALUES (NULL); thread 'main' panicked at core/vdbe/mod.rs:2499:21: attempt to add with overflow note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace ``` Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #868	2025-02-03 09:09:12 +02:00
Pekka Enberg	662d629666	Rename JoinAwareConditionExpr to WhereTerm We transform all JOIN conditions into WHERE clause terms in the query planner. The JoinAwareConditionExpr name tries to make that point, but I think it makes things more confusing. Let's call it WhereTerm (suggested by Jussi).	2025-02-03 07:46:51 +02:00
Pekka Enberg	bbf73da28f	Merge 'core/translate: refactor query planner again to be simpler' from Jussi Saurio ## Simplify bookkeeping of referenced tables in the query planner This PR refactors the way we track referenced tables and associated planner operations related to them (scans, index searches, etc). ## The problem with what we currently have: - We have a tree data structure called `SourceOperator` which is either a `Join`, `Scan`, `Search`, `Subquery` or `Nothing`. ```rust /** A SourceOperator is a Node in the query plan that reads data from a table. / #[derive(Clone, Debug)] pub enum SourceOperator { // Join operator // This operator is used to join two source operators. // It takes a left and right source operator, a list of predicates to evaluate, // and a boolean indicating whether it is an outer join. Join { id: usize, left: Box<SourceOperator>, right: Box<SourceOperator>, predicates: Option<Vec<ast::Expr>>, outer: bool, using: Option<ast::DistinctNames>, }, // Scan operator // This operator is used to scan a table. // It takes a table to scan and an optional list of predicates to evaluate. // The predicates are used to filter rows from the table. // e.g. SELECT FROM t1 WHERE t1.foo = 5 // The iter_dir are uset to indicate the direction of the iterator. // The use of Option for iter_dir is aimed at implementing a conservative optimization strategy: it only pushes // iter_dir down to Scan when iter_dir is None, to prevent potential result set errors caused by multiple // assignments. for more detailed discussions, please refer to https://github.com/penberg/limbo/pull/376 Scan { id: usize, table_reference: TableReference, predicates: Option<Vec<ast::Expr>>, iter_dir: Option<IterationDirection>, }, // Search operator // This operator is used to search for a row in a table using an index // (i.e. a primary key or a secondary index) Search { id: usize, table_reference: TableReference, search: Search, predicates: Option<Vec<ast::Expr>>, }, Subquery { id: usize, table_reference: TableReference, plan: Box<SelectPlan>, predicates: Option<Vec<ast::Expr>>, }, // Nothing operator // This operator is used to represent an empty query. // e.g. SELECT * from foo WHERE 0 will eventually be optimized to Nothing. Nothing { id: usize, }, } ``` - Logically joins are a tree, but this is at least marginally bad for performance because each `Join` has two boxed child operators, and so e.g. for a 3-table query you have, for example, 3 `Scan` nodes and then 2 `Join` nodes. - There are other bigger problems too, though, related to code structure. We have been carrying around a separate vector of `referenced_tables` that columns can refer to by index: ```rust /// A query plan has a list of TableReference objects, each of which represents a table or subquery. #[derive(Clone, Debug)] pub struct TableReference { /// Table object, which contains metadata about the table, e.g. columns. pub table: Table, /// The name of the table as referred to in the query, either the literal name or an alias e.g. "users" or "u" pub table_identifier: String, /// The index of this reference in the list of TableReference objects in the query plan /// The reference at index 0 is the first table in the FROM clause, the reference at index 1 is the second table in the FROM clause, etc. /// So, the index is relevant for determining when predicates (WHERE, ON filters etc.) should be evaluated. pub table_index: usize, /// The type of the table reference, either BTreeTable or Subquery pub reference_type: TableReferenceType, } ``` - `referenced_tables` is used because SQLite joins are an `n^tables_len` nested loop, and we need to figure out during which loop to evaluate a condition expression. A lot of plumbing in the current code exists for this, e.g. "pushing predicates" in `optimizer` even though "predicate pushdown" as a query planner concept is an _optimization_, but in our current system the "pushdown" is really a _necessity_ to move the condition expressions to the correct `SourceOperator::predicates` vector so that they are evaluated at the right point. - `referenced_tables` is also used to map identifiers in the query to the correct table, e.g. 'foo' `SELECT foo FROM users` becomes an `ast::Expr::Column { table: 0, .. }` if `users` is the first table in `referenced_tables`. In addition to this, we ALSO had a `TableReferenceType` separately for checking whether the upper-level query is reading from a BTree table or a Subquery. ```rust /// The type of the table reference, either BTreeTable or Subquery #[derive(Clone, Debug, PartialEq, Eq)] pub enum TableReferenceType { /// A BTreeTable is a table that is stored on disk in a B-tree index. BTreeTable, /// A subquery. Subquery { /// The index of the first register in the query plan that contains the result columns of the subquery. result_columns_start_reg: usize, }, } ``` ...even though we already have an `Operator::Subquery` that should be able to encode this information, but doesn't, because it's a tree and we refer to things by index in `referenced_tables`. ### Why this is especially stupid `SourceOperator` and `TableReference` are basically just two representations of the same thing, one in tree format and another in vector format. `SourceOperator` even carries around its own copy of `TableReference`, even though we ALSO have `referenced_tables: Vec<TableReference>` 🤡 Note that I'm allowed to call the existing code stupid because I wrote it. ## What we can do instead Basically, we can just fold the concerns from `SourceOperator` into `TableReference` and have a list of those in the query plan, one per table, in loop order (outermost loop is 0, and so on). Funnily enough, when Limbo had very very few features we used to have a Vec of LoopInfos similarly, obviously with a lot less information than now, but for SQLite it's probably the right abstraction. :) ```rust pub struct SelectPlan { /// List of table references in loop order, outermost first. pub table_references: Vec<TableReference>, ...etc... } /// A table reference in the query plan. /// For example, SELECT * FROM users u JOIN products p JOIN (SELECT * FROM users) sub /// has three table references: /// 1. operation=Scan, table=users, table_identifier=u, reference_type=BTreeTable, join_info=None /// 2. operation=Scan, table=products, table_identifier=p, reference_type=BTreeTable, join_info=Some(JoinInfo { outer: false, using: None }), /// 3. operation=Subquery, table=users, table_identifier=sub, reference_type=Subquery, join_info=None #[derive(Debug, Clone)] pub struct TableReference { /// The operation that this table reference performs. pub op: Operation, /// Table object, which contains metadata about the table, e.g. columns. pub table: Table, /// The name of the table as referred to in the query, either the literal name or an alias e.g. "users" or "u" pub identifier: String, /// The join info for this table reference if it is the right side of a join (which all except the first table reference have) pub join_info: Option<JoinInfo>, } ``` And we keep the "operation" part from the "operator", but in a simple form: ```rust pub enum Operation { Scan { iter_dir: Option<IterationDirection>, }, Search(Search), Subquery { plan: Box<SelectPlan>, result_columns_start_reg: usize }, } ``` Now we don't need to carry around both the operator tree and `Vec<TableReference>`, because they are the same thing. If something refers to the `n'th table`, it is just `plan.table_references[n]`. We also don't need to recurse through the operator tree and usually we can just loop from outermost table to innermost table. --- ### Handling the "where to evaluate a condition expression" problem You can see I've also removed the `predicates` vector from `Scan` and friends. Previously each `SourceOperator` had a vector of `predicates` so that we knew at which loop depth to evaluate a condition. Now we align more with what SQLite does -- it puts all the conditions, even the join conditions, in the `WHERE` clause and adds extra metadata to them: ```rust /// In a query plan, WHERE clause conditions and JOIN conditions are all folded into a vector of JoinAwareConditionExpr. /// This is done so that we can evaluate the conditions at the correct loop depth. /// We also need to keep track of whether the condition came from an OUTER JOIN. Take this example: /// SELECT * FROM users u LEFT JOIN products p ON u.id = 5. /// Even though the condition only refers to 'u', we CANNOT evaluate it at the users loop, because we need to emit NULL /// values for the columns of 'p', for EVERY row in 'u', instead of completely skipping any rows in 'u' where the condition is false. #[derive(Debug, Clone)] pub struct JoinAwareConditionExpr { /// The original condition expression. pub expr: ast::Expr, /// Is this condition originally from an OUTER JOIN? /// If so, we need to evaluate it at the loop of the right table in that JOIN, /// regardless of which tables it references. /// We also cannot e.g. short circuit the entire query in the optimizer if the condition is statically false. pub from_outer_join: bool, /// The loop index where to evaluate the condition. /// For example, in `SELECT * FROM u JOIN p WHERE u.id = 5`, the condition can already be evaluated at the first loop (idx 0), /// because that is the rightmost table that it references. pub eval_at_loop: usize, } ``` ### Final notes I've been wanting to make this refactor for a long time now, but the last straw was when I was making a PR trying to reduce some of the massive amount of allocations happening in the read path currently, and I got stuck because of this Operator + referenced_tables shit getting in the way constantly. So I decided I wanted to get it over with now. The PR is again very big, but I've artificially split it up into commits that don't individually compile but at least separate the changes a bit for you, dear reader. Closes #853	2025-02-03 07:45:58 +02:00
sonhmai	2d4bf2eb62	core: move pragma statement bytecode generator to its own file.	2025-02-03 09:21:14 +07:00
Nikita Sivukhin	2b9220992d	fix attempt to add with overflow crash in case of rowid auto-generation	2025-02-02 20:10:58 +04:00
Nikita Sivukhin	e63d84ed50	refine assertions	2025-02-02 20:10:38 +04:00
Nikita Sivukhin	6cc1b778b4	add test with rowid=-1 - now limbo attempts to add with overflow and panic in this case	2025-02-02 20:02:59 +04:00
Pekka Enberg	593febd9a4	Add Limbo internals doc	2025-02-02 11:42:56 +02:00
Jussi Saurio	c18c6ad64d	Marginal changes to use new data structures and field names	2025-02-02 10:18:13 +02:00
Jussi Saurio	82a2850de9	subquery.rs: use iteration instead of recursion and simplify	2025-02-02 10:18:13 +02:00
Jussi Saurio	98439cd936	optimizer.rs: refactor to use new data structures and remove unnecessary stuff We don't need `push_predicates()` because that never REALLY was a predicate pushdown optimization -- it just pushed WHERE clause condition expressions into the correct SourceOperator nodes in the tree. Now that we don't have a SourceOperator tree anymore and we keep the conditions in the WHERE clause instead, we don't need to "push" anything anymore. Leaves room for ACTUAL predicate pushdown optimizations later :) We also don't need any weird bitmask stuff anymore, and perhaps we never did, to determine where conditions should be evaluated.	2025-02-02 10:18:13 +02:00
Jussi Saurio	89fba9305a	main_loop.rs: use iteration instead of recursion Now that we do not have a tree of SourceOperators but rather a Vec of TableReferences, we can just use loops instead of recursion for handling the main query loop.	2025-02-02 10:18:13 +02:00
Jussi Saurio	09b6bad0af	delete.rs: use new data structures when parsing delete	2025-02-02 10:18:13 +02:00
Jussi Saurio	2ddac4bf21	select.rs: use new data structures when parsing select	2025-02-02 10:18:13 +02:00
Jussi Saurio	16a97d3b98	planner.rs: refactor from/join + where parsing logic - use new TableReference and JoinAwareConditionExpr - add utilities for determining at which loop depth a WHERE condition should be evaluated, now that "operators" do not carry condition expressions inside them anymore.	2025-02-02 10:18:13 +02:00
Jussi Saurio	e63256f657	Change Display implementation of Plan to work with new data structures	2025-02-02 10:18:13 +02:00
Jussi Saurio	390d0e673f	plan.rs: refactor data structures - Get rid of SourceOperator tree - Make plan have a Vec of TableReference, and TableReference now contains the information from the old SourceOperator. - Remove `predicates` (conditions) from Table References -- put everything in the WHERE clause like SQLite, and attach metadata to the where clause expressions with JoinAwareConditionExpr struct. - Refactor select_star() to be simpler now that we use a vec, not a tree	2025-02-02 10:18:13 +02:00
Pekka Enberg	dbb7d1a6ba	Merge 'Pagecount' from Glauber Costa This PR implements the Pagecount pragma, as well as its associated bytecode opcode Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #819	2025-02-02 09:32:18 +02:00
Pekka Enberg	635c45a087	Merge 'Fix null expr codegen' from Nikita Sivukhin This PR adjust emitted instructions for expressions which include `IS` / `IS NOT` operators (support for them in the conditions were added in the #847) Reviewed-by: Glauber Costa (@glommer) Closes #857	2025-02-02 09:32:05 +02:00
Pekka Enberg	650b56e203	Merge 'Fix null cmp codegen' from Nikita Sivukhin This PR remove manual null-comparison optimization (introduced in #847) which replace `Eq`/`Ne` instructions with explicit `IsNull`/`NotNull`. There are few factors for this change: 1. Before, manual optimization were incorrect because it ignored `jump_if_condition_is_true` flag which is important to properly build logical condition evaluation 2. Manual optimization covered all scenarios in test cases and scenarios when both sides are non trivial expressions were not covered by tests 3. Limbo already mark literals in the initial emitted bytecode as constants and will evaluate and store them only once - so performance difference from manual optimization seems very minor to me (but I am wrong with high probability) 4. I think (but again, I am wrong with high probability) that such replacement can be done in the generic optimizator layer instead of manually encode them in the first emit phase Fixes #850 Reviewed-by: Glauber Costa (@glommer) Closes #856	2025-02-02 09:32:00 +02:00
Glauber Costa	a3387cfd5f	implement the pragma page_count To do that, we also have to implement the vdbe opcode Pagecount.	2025-02-01 19:39:46 -05:00
Nikita Sivukhin	1bd8b4ef7a	pass null_eq flag for instructions generated for expressions (not in the conditions)	2025-02-02 02:51:51 +04:00
Nikita Sivukhin	4a9292f657	add tests for previously broken case	2025-02-02 02:42:06 +04:00
Nikita Sivukhin	c7aed22e39	null_eq flag disable effect of jump_if_null flag - so it makes no sense to set them both	2025-02-02 02:29:02 +04:00
Nikita Sivukhin	478ee6be8d	remove null optimization which didn't check for jump_if_condition_is_true flag - limbo already store constants only once and more clever optimizations better to do with generic optimizator and not manually	2025-02-02 02:28:07 +04:00
Pekka Enberg	20d3399c71	Merge 'implement is and is not where constraints' from Glauber Costa The main difference between = and != is how null values are handled. SQLite passes a flag "NULLEQ" to Eq and Ne to disambiguate that. In the presence of that flag, NULL = NULL. Some prep work is done to make sure we can pass a flag instead of a boolean to Eq and Ne. I looked into the bitflags crate but got a bit scared with the list of dependencies. Warning: The following query produces a different result for Limbo: ``` select * from demo where value is null or id == 2; ``` I strongly suspect the issue is with the OR implementation, though. The bytecode generated is quite different. Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #847	2025-02-01 17:24:11 +02:00
Pekka Enberg	83f9290394	Merge 'Remove labeler 😥' from Kim Seon Woo Let's add when we can figure out how to use GH_TOKEN well Closes #852	2025-02-01 17:23:41 +02:00
김선우	45e0e86516	Remove labeler 😥	2025-02-02 00:04:49 +09:00
Glauber Costa	3c77797811	also mark IS DISTINCT FROM as supported This seems to really be just an alias for IS: "The IS NOT DISTINCT FROM operator is an alternative spelling for the IS operator. Likewise, the IS DISTINCT FROM operator means the same thing as IS NOT. Standard SQL does not support the compact IS and IS NOT notation. Those compact forms are an SQLite extension. You have to use the prolix and much less readable IS NOT DISTINCT FROM and IS DISTINCT FROM operators on other SQL database engines."	2025-02-01 09:30:06 -05:00
Glauber Costa	c04260ab54	rename Flags to a less ambiguous name Those Flags in SQLite are global, but it doesn't mean it has to be the case for us as well.	2025-02-01 08:09:06 -05:00
Pekka Enberg	51f0c9e8a3	Merge 'Full flake overhaul' from Levy A. Improvements: - Use [rust-overlay](https://github.com/oxalica/rust-overlay), better maintained than fenix and allows for: - Use `rust-toolchain.toml` as the source of truth for the current rust version, instead of tracking with stable. Preventing conflicting versions with non-nix users. - Add flake checks, could be useful for CI in the future, together with crane and cachix. - Add package, allow people to add limbo as a regular nix package. Now we can `nix build .#`, `nix run .#` and `nix shell .#` (this one adds `limbo` to the current `PATH`) - Use [new `apple-sdk` pattern](https://discourse.nixos.org/t/the- darwin-sdks-have-been-updated/55295), no need to declare each framework now. Closes #835	2025-02-01 10:34:21 +02:00
Pekka Enberg	a450b5cd39	Update README.md	2025-02-01 09:46:21 +02:00
Pekka Enberg	8c4ef098ef	Update README.md	2025-02-01 09:42:13 +02:00
Pekka Enberg	e7f18c4736	Merge 'bindings/go: Progress on Go driver, add sync primitives, prevent crashing on concurrent connections' from Preston Thorpe This PR continues work on the Go bindings. - Register all symbols from the library at load time to prevent any repeated `dlsym` calls. - Add locks to prevent multiple concurrent FFI calls to functions that act on the same state. - Adds documentation/example in the go module `README`. - Fixes memory access issue causing segfault due to passing pointer to array of strings, that is difficult to work with in Go without the right primitives. In place, simply return the amount of ResultColumns and Go can provide the index to receive the column name, similar to `rowsGetValue` On next limbo release, I'll add the example to the main `README` next to the other language examples. Until then, `go get github.com/tursodatabase/limbo` will not work so the example will remain in the bindings readme. Closes #845	2025-02-01 09:25:52 +02:00
Pekka Enberg	43d6c2760d	Merge 'update compat list' from Glauber Costa Those two expr seem to be supported Closes #846	2025-02-01 09:24:27 +02:00
Pekka Enberg	db29f43d5c	Merge 'Simplify bytecode emitters' from Glauber Costa Instead of always having the caller specify all instructions, this work introduces convenience functions into the program builder, making the code a lot cleaner. Draft for now, as this is done on top of #841 Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #844	2025-02-01 09:24:11 +02:00
Pekka Enberg	76535d1224	Merge 'github: Configure labeler workflow environment' from Pekka Enberg This fix was suggested by @seonWKim. Closes #848	2025-02-01 09:23:53 +02:00
Pekka Enberg	a3ecc69bbb	github: Configure labeler workflow environment This fix was suggested by @seonWKim.	2025-02-01 09:22:17 +02:00
Glauber Costa	96987db6ca	implement is and is not where constraints The main difference between = and != is how null values are handled. SQLite passes a flag "NULLEQ" to Eq and Ne to disambiguate that. In the presence of that flag, NULL = NULL. Some prep work is done to make sure we can pass a flag instead of a boolean to Eq and Ne. I looked into the bitflags crate but got a bit scared with the list of dependencies.	2025-01-31 23:01:49 -05:00
PThorpe92	7ee52fca4d	bindings/go: update readme with example, change module name	2025-01-31 19:22:21 -05:00
Glauber Costa	f300d2c8e8	rename register for IsNull opcode Now it has the same name as NotNull, so it is easier to write macros	2025-01-31 19:09:01 -05:00
Glauber Costa	7e8b190b9a	update compat list Those two expr seem to be supported	2025-01-31 16:56:19 -05:00
PThorpe92	8d93130809	bindings/go: enable multiple connections, register all symbols at library load	2025-01-31 13:28:05 -05:00
PThorpe92	950f29daab	bindings/go: Adjust tests for multiple concurrent connections	2025-01-31 13:28:05 -05:00
Pekka Enberg	98579ab2e4	Merge 'Implement Noop bytecode' from Pedro Muniz This PR implements Noop. I really don't know what else to say. This bytecode according to sqlite does: _Do nothing. Continue downward to the next opcode._ I advanced the program counter to account for continuing to the next instruction. Closes #795	2025-01-31 18:49:54 +02:00
Pekka Enberg	44e5402464	Merge branch 'main' into feature/noop	2025-01-31 18:49:39 +02:00
Glauber Costa	7aa3cc26ad	simplify the writing of bytecode programs Instead of always having the caller specify all instructions, this work introduces convenience functions into the program builder, making the code a lot cleaner.	2025-01-31 11:35:51 -05:00
Glauber Costa	b37317f68b	avoid allocations during pragma_list If we keep the pragma list sorted when declaring it, we can avoid a vector allocation when printing the pragma_list.	2025-01-31 11:35:51 -05:00
Pekka Enberg	d8a9c57d3a	Merge 'Fix table with single column PRIMARY KEY to not create extra btree' from Krishna Vishal The error is due to comparing the PRIMARY KEY's name to INTEGER when in it was all in lowercase. This was causing `needs_auto_index` to be set to `true`. After the fix: ``` /limbo /tmp/sc2-limbo.db Limbo v0.0.13 Enter ".help" for usage hints. limbo> CREATE TABLE temp (t1 integer, primary key (t1)); hexdump -s 28 -n 4 /tmp/sc2-limbo.db 000001c 0000 0200 -- matches SQLite 0000020 ``` Closes https://github.com/tursodatabase/limbo/issues/824 Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #830	2025-01-31 18:33:28 +02:00

1 2 3 4 5 ...

2340 Commits