turso

mirror of https://github.com/aljazceru/turso.git synced 2025-12-24 19:44:21 +01:00

Author	SHA1	Message	Date
Jussi Saurio	d41dfd0c5d	Merge 'Fix rowid search codegen' from Nikita Sivukhin This PR fixes a bug when index search used incorrect operator if index column were the "rhs" in the expression (not "lhs" as usual, e.g. `SELECT * FROM t WHERE 1 < rowid_alias`) Reviewed-by: Jussi Saurio (@jussisaurio) Closes #870	2025-02-03 12:38:04 +02:00
Pekka Enberg	482dd78f27	Merge 'bindings/go: Add error propagation from core' from Preston Thorpe This PR adds error propagation from limbo, allowing the correct error messages to be displayed, and adds a couple tests 👍 Closes #855	2025-02-03 09:23:34 +02:00
Nikita Sivukhin	10e868bc4b	add test for index search with "opposite" operator	2025-02-03 11:23:05 +04:00
Nikita Sivukhin	5a3587f7a2	use opposite operator for search if WHERE condition is swapped (e.g. 1 > x instead of x < 1)	2025-02-03 11:23:04 +04:00
Pekka Enberg	9458d1ed14	Merge 'Fix shr instruction' from Nikita Sivukhin This PR fixes implementation of binary shift right/left instructions. Before there were a minor incompatibility between limbo and sqlite implementation in case when right shift second argument were more than 64 and first argument were negative. As sqlite implementation of right binary shift is sign-extended - so `-1` should be returned in such case when limbo returned zero. This PR fixes this bug and also introduce a fuzz tests for arithemtic expressions. This fuzz test were written with a help of `GrammarGenerator` which allows to easily define probabilistic context- free grammar and then later sample random strings from it. Closes #867	2025-02-03 09:22:39 +02:00
Pekka Enberg	fc5f2c7897	Merge 'bindings/java: Change logger dependency ' from Kim Seon Woo # The purpose of this PR - Current implementation forces users to use logback as their logging framework # Changes - Only add abstraction layer for loggin(which is slf4j in this case) - In tests, use logback to log out messages(this doesn't affect the users) # References https://github.com/tursodatabase/limbo/issues/615 Closes #863	2025-02-03 09:21:26 +02:00
Pekka Enberg	ba24b45185	Merge 'bindings/java: Load native library from jar ' from Kim Seon Woo # Purpose of this PR - When loading native libraries, search jar path as well - When packaging into a jar file, add native libraries under `/lib` ## How sqlite-jdbc packages their jar file > Our SQLiteJDBC library requires no configuration since native libraries for major OSs, including Windows, macOS, Linux etc., are assembled into a single JAR (Java Archive) file. => sqlite-jdbc also packages all major OSs native libraries into a single JAR # Changes - Add build commands in Makefile to build and publish locally - Load native libraries from `/lib` under jar path - Add java example under `/example` # TODO - Publish to maven central. I might need some help from maintainers. - We can do better by not adding all the native libraries for every OSs into the jar(as things can get big, though in compared to JVM, it's relatively small). We can build for independent OSs and upload the native libraries somewhere and let users download it and place the native libraries under system path(which their java apps can discover). - Or maybe introduce AOT(not sure how to) # Reference - [Issue](https://github.com/tursodatabase/limbo/issues/615) - Now we can do something like this ![image](https://github.com/user- attachments/assets/c6711e97-3fa1-47f5-a0bf-e9d4cf8dba88) Closes #862	2025-02-03 09:20:47 +02:00
Pekka Enberg	7257fb8aae	Merge 'core: move pragma statement bytecode generator to its own file.' from Sonny What? - no logic change - refactored and moved pragma statement bytecode generation to its own package to better structure. Closes #871	2025-02-03 09:10:33 +02:00
Pekka Enberg	6c34737240	Merge 'Fix rowid generation' from Nikita Sivukhin Fix panic in case when table has row with rowid equals to `-1` (`=u64::max`) ```sql limbo> CREATE TABLE t(x INTEGER PRIMARY KEY) limbo> INSERT INTO t VALUES (-1) limbo> INSERT INTO t VALUES (NULL); thread 'main' panicked at core/vdbe/mod.rs:2499:21: attempt to add with overflow note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace ``` Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #868	2025-02-03 09:09:12 +02:00
Pekka Enberg	662d629666	Rename JoinAwareConditionExpr to WhereTerm We transform all JOIN conditions into WHERE clause terms in the query planner. The JoinAwareConditionExpr name tries to make that point, but I think it makes things more confusing. Let's call it WhereTerm (suggested by Jussi).	2025-02-03 07:46:51 +02:00
Pekka Enberg	bbf73da28f	Merge 'core/translate: refactor query planner again to be simpler' from Jussi Saurio ## Simplify bookkeeping of referenced tables in the query planner This PR refactors the way we track referenced tables and associated planner operations related to them (scans, index searches, etc). ## The problem with what we currently have: - We have a tree data structure called `SourceOperator` which is either a `Join`, `Scan`, `Search`, `Subquery` or `Nothing`. ```rust /** A SourceOperator is a Node in the query plan that reads data from a table. / #[derive(Clone, Debug)] pub enum SourceOperator { // Join operator // This operator is used to join two source operators. // It takes a left and right source operator, a list of predicates to evaluate, // and a boolean indicating whether it is an outer join. Join { id: usize, left: Box<SourceOperator>, right: Box<SourceOperator>, predicates: Option<Vec<ast::Expr>>, outer: bool, using: Option<ast::DistinctNames>, }, // Scan operator // This operator is used to scan a table. // It takes a table to scan and an optional list of predicates to evaluate. // The predicates are used to filter rows from the table. // e.g. SELECT FROM t1 WHERE t1.foo = 5 // The iter_dir are uset to indicate the direction of the iterator. // The use of Option for iter_dir is aimed at implementing a conservative optimization strategy: it only pushes // iter_dir down to Scan when iter_dir is None, to prevent potential result set errors caused by multiple // assignments. for more detailed discussions, please refer to https://github.com/penberg/limbo/pull/376 Scan { id: usize, table_reference: TableReference, predicates: Option<Vec<ast::Expr>>, iter_dir: Option<IterationDirection>, }, // Search operator // This operator is used to search for a row in a table using an index // (i.e. a primary key or a secondary index) Search { id: usize, table_reference: TableReference, search: Search, predicates: Option<Vec<ast::Expr>>, }, Subquery { id: usize, table_reference: TableReference, plan: Box<SelectPlan>, predicates: Option<Vec<ast::Expr>>, }, // Nothing operator // This operator is used to represent an empty query. // e.g. SELECT * from foo WHERE 0 will eventually be optimized to Nothing. Nothing { id: usize, }, } ``` - Logically joins are a tree, but this is at least marginally bad for performance because each `Join` has two boxed child operators, and so e.g. for a 3-table query you have, for example, 3 `Scan` nodes and then 2 `Join` nodes. - There are other bigger problems too, though, related to code structure. We have been carrying around a separate vector of `referenced_tables` that columns can refer to by index: ```rust /// A query plan has a list of TableReference objects, each of which represents a table or subquery. #[derive(Clone, Debug)] pub struct TableReference { /// Table object, which contains metadata about the table, e.g. columns. pub table: Table, /// The name of the table as referred to in the query, either the literal name or an alias e.g. "users" or "u" pub table_identifier: String, /// The index of this reference in the list of TableReference objects in the query plan /// The reference at index 0 is the first table in the FROM clause, the reference at index 1 is the second table in the FROM clause, etc. /// So, the index is relevant for determining when predicates (WHERE, ON filters etc.) should be evaluated. pub table_index: usize, /// The type of the table reference, either BTreeTable or Subquery pub reference_type: TableReferenceType, } ``` - `referenced_tables` is used because SQLite joins are an `n^tables_len` nested loop, and we need to figure out during which loop to evaluate a condition expression. A lot of plumbing in the current code exists for this, e.g. "pushing predicates" in `optimizer` even though "predicate pushdown" as a query planner concept is an _optimization_, but in our current system the "pushdown" is really a _necessity_ to move the condition expressions to the correct `SourceOperator::predicates` vector so that they are evaluated at the right point. - `referenced_tables` is also used to map identifiers in the query to the correct table, e.g. 'foo' `SELECT foo FROM users` becomes an `ast::Expr::Column { table: 0, .. }` if `users` is the first table in `referenced_tables`. In addition to this, we ALSO had a `TableReferenceType` separately for checking whether the upper-level query is reading from a BTree table or a Subquery. ```rust /// The type of the table reference, either BTreeTable or Subquery #[derive(Clone, Debug, PartialEq, Eq)] pub enum TableReferenceType { /// A BTreeTable is a table that is stored on disk in a B-tree index. BTreeTable, /// A subquery. Subquery { /// The index of the first register in the query plan that contains the result columns of the subquery. result_columns_start_reg: usize, }, } ``` ...even though we already have an `Operator::Subquery` that should be able to encode this information, but doesn't, because it's a tree and we refer to things by index in `referenced_tables`. ### Why this is especially stupid `SourceOperator` and `TableReference` are basically just two representations of the same thing, one in tree format and another in vector format. `SourceOperator` even carries around its own copy of `TableReference`, even though we ALSO have `referenced_tables: Vec<TableReference>` 🤡 Note that I'm allowed to call the existing code stupid because I wrote it. ## What we can do instead Basically, we can just fold the concerns from `SourceOperator` into `TableReference` and have a list of those in the query plan, one per table, in loop order (outermost loop is 0, and so on). Funnily enough, when Limbo had very very few features we used to have a Vec of LoopInfos similarly, obviously with a lot less information than now, but for SQLite it's probably the right abstraction. :) ```rust pub struct SelectPlan { /// List of table references in loop order, outermost first. pub table_references: Vec<TableReference>, ...etc... } /// A table reference in the query plan. /// For example, SELECT * FROM users u JOIN products p JOIN (SELECT * FROM users) sub /// has three table references: /// 1. operation=Scan, table=users, table_identifier=u, reference_type=BTreeTable, join_info=None /// 2. operation=Scan, table=products, table_identifier=p, reference_type=BTreeTable, join_info=Some(JoinInfo { outer: false, using: None }), /// 3. operation=Subquery, table=users, table_identifier=sub, reference_type=Subquery, join_info=None #[derive(Debug, Clone)] pub struct TableReference { /// The operation that this table reference performs. pub op: Operation, /// Table object, which contains metadata about the table, e.g. columns. pub table: Table, /// The name of the table as referred to in the query, either the literal name or an alias e.g. "users" or "u" pub identifier: String, /// The join info for this table reference if it is the right side of a join (which all except the first table reference have) pub join_info: Option<JoinInfo>, } ``` And we keep the "operation" part from the "operator", but in a simple form: ```rust pub enum Operation { Scan { iter_dir: Option<IterationDirection>, }, Search(Search), Subquery { plan: Box<SelectPlan>, result_columns_start_reg: usize }, } ``` Now we don't need to carry around both the operator tree and `Vec<TableReference>`, because they are the same thing. If something refers to the `n'th table`, it is just `plan.table_references[n]`. We also don't need to recurse through the operator tree and usually we can just loop from outermost table to innermost table. --- ### Handling the "where to evaluate a condition expression" problem You can see I've also removed the `predicates` vector from `Scan` and friends. Previously each `SourceOperator` had a vector of `predicates` so that we knew at which loop depth to evaluate a condition. Now we align more with what SQLite does -- it puts all the conditions, even the join conditions, in the `WHERE` clause and adds extra metadata to them: ```rust /// In a query plan, WHERE clause conditions and JOIN conditions are all folded into a vector of JoinAwareConditionExpr. /// This is done so that we can evaluate the conditions at the correct loop depth. /// We also need to keep track of whether the condition came from an OUTER JOIN. Take this example: /// SELECT * FROM users u LEFT JOIN products p ON u.id = 5. /// Even though the condition only refers to 'u', we CANNOT evaluate it at the users loop, because we need to emit NULL /// values for the columns of 'p', for EVERY row in 'u', instead of completely skipping any rows in 'u' where the condition is false. #[derive(Debug, Clone)] pub struct JoinAwareConditionExpr { /// The original condition expression. pub expr: ast::Expr, /// Is this condition originally from an OUTER JOIN? /// If so, we need to evaluate it at the loop of the right table in that JOIN, /// regardless of which tables it references. /// We also cannot e.g. short circuit the entire query in the optimizer if the condition is statically false. pub from_outer_join: bool, /// The loop index where to evaluate the condition. /// For example, in `SELECT * FROM u JOIN p WHERE u.id = 5`, the condition can already be evaluated at the first loop (idx 0), /// because that is the rightmost table that it references. pub eval_at_loop: usize, } ``` ### Final notes I've been wanting to make this refactor for a long time now, but the last straw was when I was making a PR trying to reduce some of the massive amount of allocations happening in the read path currently, and I got stuck because of this Operator + referenced_tables shit getting in the way constantly. So I decided I wanted to get it over with now. The PR is again very big, but I've artificially split it up into commits that don't individually compile but at least separate the changes a bit for you, dear reader. Closes #853	2025-02-03 07:45:58 +02:00
sonhmai	2d4bf2eb62	core: move pragma statement bytecode generator to its own file.	2025-02-03 09:21:14 +07:00
Nikita Sivukhin	41419ab11a	add env logger and fix range	2025-02-02 20:12:56 +04:00
Nikita Sivukhin	2b9220992d	fix attempt to add with overflow crash in case of rowid auto-generation	2025-02-02 20:10:58 +04:00
Nikita Sivukhin	e63d84ed50	refine assertions	2025-02-02 20:10:38 +04:00
Nikita Sivukhin	6cc1b778b4	add test with rowid=-1 - now limbo attempts to add with overflow and panic in this case	2025-02-02 20:02:59 +04:00
Nikita Sivukhin	3ff76e657e	allow a bit of dead code for now	2025-02-02 19:55:04 +04:00
Nikita Sivukhin	8d513b229f	add simple tcl tests	2025-02-02 19:43:13 +04:00
Nikita Sivukhin	300f278ff3	use TempDatabase from commons in tests/	2025-02-02 19:34:15 +04:00
Nikita Sivukhin	43c9fc3c5c	fix binary shift implementation	2025-02-02 19:24:22 +04:00
Nikita Sivukhin	9cc6cc99d4	add examples found by fuzzer	2025-02-02 18:42:40 +04:00
Nikita Sivukhin	91fcb67b06	rewrite grammar generator and add fuzz test for arithmetic expressions	2025-02-02 18:39:24 +04:00
PThorpe92	1493d499e5	bindings/go: Add error propagation from bindings lib	2025-02-02 07:40:28 -05:00
김선우	997f12426f	Add example project	2025-02-02 20:10:29 +09:00
Nikita Sivukhin	f716919b10	setup basic playground for fuzzing against sqlite	2025-02-02 14:12:12 +04:00
Nikita Sivukhin	2c958d7e2d	derive Debug trait for limbo step result	2025-02-02 14:11:41 +04:00
Pekka Enberg	593febd9a4	Add Limbo internals doc	2025-02-02 11:42:56 +02:00
Jussi Saurio	c18c6ad64d	Marginal changes to use new data structures and field names	2025-02-02 10:18:13 +02:00
Jussi Saurio	82a2850de9	subquery.rs: use iteration instead of recursion and simplify	2025-02-02 10:18:13 +02:00
Jussi Saurio	98439cd936	optimizer.rs: refactor to use new data structures and remove unnecessary stuff We don't need `push_predicates()` because that never REALLY was a predicate pushdown optimization -- it just pushed WHERE clause condition expressions into the correct SourceOperator nodes in the tree. Now that we don't have a SourceOperator tree anymore and we keep the conditions in the WHERE clause instead, we don't need to "push" anything anymore. Leaves room for ACTUAL predicate pushdown optimizations later :) We also don't need any weird bitmask stuff anymore, and perhaps we never did, to determine where conditions should be evaluated.	2025-02-02 10:18:13 +02:00
Jussi Saurio	89fba9305a	main_loop.rs: use iteration instead of recursion Now that we do not have a tree of SourceOperators but rather a Vec of TableReferences, we can just use loops instead of recursion for handling the main query loop.	2025-02-02 10:18:13 +02:00
Jussi Saurio	09b6bad0af	delete.rs: use new data structures when parsing delete	2025-02-02 10:18:13 +02:00
Jussi Saurio	2ddac4bf21	select.rs: use new data structures when parsing select	2025-02-02 10:18:13 +02:00
Jussi Saurio	16a97d3b98	planner.rs: refactor from/join + where parsing logic - use new TableReference and JoinAwareConditionExpr - add utilities for determining at which loop depth a WHERE condition should be evaluated, now that "operators" do not carry condition expressions inside them anymore.	2025-02-02 10:18:13 +02:00
Jussi Saurio	e63256f657	Change Display implementation of Plan to work with new data structures	2025-02-02 10:18:13 +02:00
Jussi Saurio	390d0e673f	plan.rs: refactor data structures - Get rid of SourceOperator tree - Make plan have a Vec of TableReference, and TableReference now contains the information from the old SourceOperator. - Remove `predicates` (conditions) from Table References -- put everything in the WHERE clause like SQLite, and attach metadata to the where clause expressions with JoinAwareConditionExpr struct. - Refactor select_star() to be simpler now that we use a vec, not a tree	2025-02-02 10:18:13 +02:00
김선우	5343f0a813	Update README.md on how to use limbo jdbc	2025-02-02 17:02:38 +09:00
Pekka Enberg	dbb7d1a6ba	Merge 'Pagecount' from Glauber Costa This PR implements the Pagecount pragma, as well as its associated bytecode opcode Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #819	2025-02-02 09:32:18 +02:00
Pekka Enberg	635c45a087	Merge 'Fix null expr codegen' from Nikita Sivukhin This PR adjust emitted instructions for expressions which include `IS` / `IS NOT` operators (support for them in the conditions were added in the #847) Reviewed-by: Glauber Costa (@glommer) Closes #857	2025-02-02 09:32:05 +02:00
Pekka Enberg	650b56e203	Merge 'Fix null cmp codegen' from Nikita Sivukhin This PR remove manual null-comparison optimization (introduced in #847) which replace `Eq`/`Ne` instructions with explicit `IsNull`/`NotNull`. There are few factors for this change: 1. Before, manual optimization were incorrect because it ignored `jump_if_condition_is_true` flag which is important to properly build logical condition evaluation 2. Manual optimization covered all scenarios in test cases and scenarios when both sides are non trivial expressions were not covered by tests 3. Limbo already mark literals in the initial emitted bytecode as constants and will evaluate and store them only once - so performance difference from manual optimization seems very minor to me (but I am wrong with high probability) 4. I think (but again, I am wrong with high probability) that such replacement can be done in the generic optimizator layer instead of manually encode them in the first emit phase Fixes #850 Reviewed-by: Glauber Costa (@glommer) Closes #856	2025-02-02 09:32:00 +02:00
김선우	985c5139be	Let's not force users to use specific logging framework	2025-02-02 13:00:03 +09:00
김선우	5d5261637b	Fix debugger to print out INFO messages by default	2025-02-02 12:54:06 +09:00
김선우	3332381f6a	Remove unused	2025-02-02 12:38:55 +09:00
김선우	80adeb520a	Update LimboDB.java to load libraries from /lib	2025-02-02 12:35:34 +09:00
김선우	6168ad2f6e	Add maven-publish plugins to publish jar using gradle	2025-02-02 12:24:28 +09:00
김선우	94dff512c9	Add makefile commands to build for mac and windows	2025-02-02 12:24:04 +09:00
Glauber Costa	a3387cfd5f	implement the pragma page_count To do that, we also have to implement the vdbe opcode Pagecount.	2025-02-01 19:39:46 -05:00
Nikita Sivukhin	1bd8b4ef7a	pass null_eq flag for instructions generated for expressions (not in the conditions)	2025-02-02 02:51:51 +04:00
Nikita Sivukhin	4a9292f657	add tests for previously broken case	2025-02-02 02:42:06 +04:00
Nikita Sivukhin	c7aed22e39	null_eq flag disable effect of jump_if_null flag - so it makes no sense to set them both	2025-02-02 02:29:02 +04:00

1 2 3 4 5 ...

2365 Commits