turso

mirror of https://github.com/aljazceru/turso.git synced 2025-12-17 16:44:19 +01:00

Author	SHA1	Message	Date
PThorpe92	ae975afe49	Remove unnecessary FK resolution on schema parsing	2025-10-07 16:45:16 -04:00
Glauber Costa	e2694ff88b	implement is null / is not null tests for mview filter Just overlook on our side that they were not generated before.	2025-10-06 21:22:30 -05:00
Jussi Saurio	35b584f050	Merge 'core: change root_page to i64' from Pere Diaz Bou Closes #3454	2025-09-30 12:50:23 +03:00
Pere Diaz Bou	0f631101df	core: change page idx type from usize to i64 MVCC is like the annoying younger cousin (I know because I was him) that needs to be treated differently. MVCC requires us to use root_pages that might not be allocated yet, and the plan is to use negative root_pages for that case. Therefore, we need i64 in order to fit this change.	2025-09-29 18:38:43 +02:00
Preston Thorpe	cdab174350	Merge 'Fix column fetch in joins' from Glauber Costa In comparisons for joins, we were assuming that the left column belonged to the left join (and vice-versa). That is incorrect, because you can write the comparison condition in any order. Fixes #3368 Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #3400	2025-09-29 12:34:45 -04:00
Nikita Sivukhin	86a95e813d	Merge branch 'main' into quoting-fix-attempt-2	2025-09-29 10:58:51 +04:00
Glauber Costa	78ee8b8627	Fix column fetch in joins In comparisons for joins, we were assuming that the left column belonged to the left join (and vice-versa). That is incorrect, because you can write the comparison condition in any order. Fixes #3368	2025-09-27 12:08:47 -03:00
Glauber Costa	3ee97ddf36	Make sure complex expressions in filters go through Project We had code for this, but the code had a fatal flaw: it tried to detect a complex operation (an operation that needs projection), and return false (no need for projection), for the others. This is the exact opposite of what we should do: we should identify the simple operations, and then return true (needs projection) for the rest. CAST is a special beast, since it is not a function, but rather, a special opcode. Everything else above is the true just the same. But for CAST, we have to do the extra work to capture it in the logical plan and pass it down. Fixes #3372 Fixes #3370 Fixes #3369	2025-09-27 07:21:03 -03:00
Nikita Sivukhin	fdf8ca88fd	introduce exact(...) function - because enum variant will disappear	2025-09-26 13:01:49 +04:00
Pekka Enberg	9461e22c06	Merge 'Improve DBSP view serialization' from Glauber Costa Improve serialization for DBSP views. The serialization code was written organically, without much forward thinking about stability as we evolved the table and operator format. Now that this is done, we are at at point where we can actually make it suck less and take a considerable step towards making this production ready. We also add a simple version check (in the table name, because that is much easier than reading contents in parse_schema_row) to prevent views to be used if we had to do anything to evolve the format of the circuit (including the operators) Closes #3351	2025-09-26 09:18:45 +03:00
Glauber Costa	1b5e74060a	make sure that we are able to prevent views from being corrupted as we make changes to the way materialized views are generated (think adding new operators, changing the id of existing operators, etc), we will need to persist the topology of the circuit itself. This is a change that I believe to be premature. For now, it is enough to reserve the first operator id for it, and add a version number to the table name. We can just detect that something changed, and ask the user to drop the view. We can get away with it due to the fact that the views are experimental.	2025-09-25 22:52:08 -03:00
Pere Diaz Bou	91cff65e44	Merge 'Autoincrement' from Pavan Nambi fixes #1976 and #1605 ```zsh turso> DROP TABLE IF EXISTS t; CREATE TABLE t ( id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT ); turso> INSERT INTO t (name) VALUES ('A'); SELECT * FROM sqlite_sequence; ┌──────┬─────┐ │ name │ seq │ ├──────┼─────┤ │ t │ 1 │ └──────┴─────┘ turso> DROP TABLE IF EXISTS t; CREATE TABLE t ( id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT ); turso> INSERT INTO t (name) VALUES ('A'); SELECT * FROM sqlite_sequence; ┌──────┬─────┐ │ name │ seq │ ├──────┼─────┤ │ t │ 1 │ └──────┴─────┘ turso> INSERT INTO t (name) VALUES ('A'); SELECT * FROM sqlite_sequence; ┌──────┬─────┐ │ name │ seq │ ├──────┼─────┤ │ t │ 2 │ └──────┴─────┘ turso> ``` Reviewed-by: Preston Thorpe <preston@turso.tech> Closes #2983	2025-09-25 18:57:24 +02:00
Pekka Enberg	a50771fe38	core: Wrap Connection::query_only with AtomicBool	2025-09-24 19:23:13 +03:00
Pavan-Nambi	49d5141f2d	Merge remote-tracking branch 'origin/main' into cdc_fail_autoincrement	2025-09-24 18:06:02 +05:30
Pekka Enberg	fa8065ca52	core: Wrap Connection::autocommit in AtomicBool	2025-09-23 13:18:49 +03:00
Pavan Nambi	f1ac855441	Merge branch 'main' into cdc_fail_autoincrement	2025-09-22 21:11:26 +05:30
Pekka Enberg	aa454a6637	core: Wrap Connection::pager in RwLock	2025-09-22 17:02:08 +03:00
Pekka Enberg	0144ea8059	Merge 'Support UNION queries in DBSP-based Materialized Views' from Glauber Costa UNION queries, while useful on their own, are a cornerstone of recursive CTEs. This PR implements: * the merge operator, required to merge both sides of a union query. * the circuitry necessary to issue the Merge operator. * extraction of tables mentioned in union and CTE expressions, so we can correctly populate tables that contain them. Closes #3234	2025-09-22 11:33:19 +03:00
Glauber Costa	2627ad44de	support union statements in the DBSP circuit compiler	2025-09-21 21:00:27 -03:00
Pavan-Nambi	51cf410b56	add has_autoincrement to all test tables from main branch	2025-09-21 16:10:45 +05:30
Pavan Nambi	47194d7658	Merge branch 'tursodatabase:main' into cdc_fail_autoincrement	2025-09-21 16:03:38 +05:30
Glauber Costa	13260349b0	Return a parse error for a non-equality join We currently don't handle non equality, but end up just returning a bogus result. Let's parse error.	2025-09-20 20:35:10 -03:00
Glauber Costa	832a4d7034	generate projection nodes inside filter clauses We are currently not able to properly compute things like WHERE a+b=2. Let's generate a projection node inside a filter when needed.	2025-09-19 03:59:28 -05:00
Glauber Costa	627f61aa81	support column comparisons in the filter operator We currently only support column / literal comparisons in the filter operator. But with JOINs, comparisons are usually against two columns. Do the work to support it.	2025-09-19 03:59:28 -05:00
Glauber Costa	47097fbec6	Add tests for project operator working with ambiguous columns Unlike the other operators, project works just fine with ambiguous columsn, because it works with compiled expressions. We don't need to patch it, but let's make sure it keeps working by writing a test.	2025-09-19 03:59:28 -05:00
Glauber Costa	e80dd8e5e1	move the filter operator to accept indexes instead of names We already did similarly for the AggregateOperator: for joins you can have the same column name in many tables. And passing schema information to the operator is a layering violation (the operator may be operating on the result of a previous node, and at that point there is no more "schema"). Therefore we pass indexes into the column set the operator has. The FilterOperator has a complication: we are using it to generate the SQL for the populate statement, and that needs column names. However, we should not be using the FilterOperator for that, and that is a relic from the time where we had operator information directly inside the IncrementalView. To enable moving the FilterOperator to index-based, we rework that code. For joins, we'll need to populate many tables anyway, so we take the time to do that work here.	2025-09-19 03:59:28 -05:00
Glauber Costa	f149b40e75	Implement JOINs in the DBSP circuit This PR improves the DBSP circuit so that it handles the JOIN operator. The JOIN operator exposes a weakness of our current model: we usually pass a list of columns between operators, and find the right column by name when needed. But with JOINs, many tables can have the same columns. The operators will then find the wrong column (same name, different table), and produce incorrect results. To fix this, we must do two things: 1) Change the Logical Plan. It needs to track table provenance. 2) Fix the aggregators: it needs to operate on indexes, not names. For the aggregators, note that table provenance is the wrong abstraction. The aggregator is likely working with a logical table that is the result of previous nodes in the circuit. So we just need to be able to tell it which index in the column array it should use.	2025-09-19 03:59:28 -05:00
Pavan-Nambi	020921f803	Merge remote-tracking branch 'upstream/main' into cdc_fail_autoincrement	2025-09-18 19:27:19 +05:30
Pekka Enberg	17e9f05ea4	core: Convert Rc<Pager> to Arc<Pager>	2025-09-17 09:32:49 +03:00
Glauber Costa	6bee6bb785	implement min/max We have not implemented them before because they require the raw elements to be kept. It is easy to see why in the following example: current_min = 3; insert(2) => current_min = 2 // can be done without state delete(2) => needs to look at the state to determine new min! The aggregator state was a very simple key-value structure. To accomodate for min/max, we will make it into a more complex table, where we can encode a more complex structure. The key insight is that we can use a primary key composed of: 1) storage_id 2) zset_id, 3) element The storage_id and zset_id are our previous key, except they are now exploded to support a larger range of storage_id. With more bits available in the storage_id, we can encode information about which column we are storing. For aggregations in multiple columns, we will need to keep a different list of values for min/max! The element is just the values of the columns. Because this is a primary key, the data will be sorted in the btree. We can then just do a prefix search in the first two components of the key and easily find the min/max when needed. This new format is also adequate for joins. Joins will just have a new storage_id which encodes two "columns" (left side, right side).	2025-09-15 22:30:48 -05:00
Glauber Costa	3565e7978a	Add an index to the dbsp internal table And also change the schema of the main table. I have come to see the current key-value schema as inadequate for non-aggregate operators. Calculating Min/Max, for example, doesn't feat in this schema because we have to be able to track existing values and index them. Another alternative is to keep one table per operator type, but this quickly leads to an explosion of tables.	2025-09-15 22:30:48 -05:00
Pavan-Nambi	fdb4f98e11	Merge remote-tracking branch 'upstream/main' into cdc_fail_autoincrement	2025-09-13 07:17:18 +05:30
Preston Thorpe	b09dcceeef	Merge 'Fixes views' from Glauber Costa This is a collection of fixes for materialized views ahead of adding support for JOINs. It is mostly issues with how we assume there is a single table, with a single delta, but we have to send more than one. Those are things that are just objectively wrong, so I am sending it separately to make the JOIN PR smaller. Reviewed-by: Preston Thorpe <preston@turso.tech> Closes #3009	2025-09-12 07:43:32 -04:00
Pavan-Nambi	7191f1cc1c	Merge remote-tracking branch 'upstream/main' into cdc_fail_autoincrement	2025-09-12 15:17:12 +05:30
Glauber Costa	874047276e	views: pass a DeltaSet for merge_delta A DeltaSet is a collection of Deltas, one per table. We'll need that for joins. The populate step for now will still generate a single set. That will be our next step to fix.	2025-09-11 05:30:46 -07:00
Glauber Costa	e6008e532a	Add a second delta to the EvalState, Commit We will assert that the second one is always empty for the existing operators - as they should be! But joins will need both.	2025-09-11 05:30:46 -07:00
Glauber Costa	1fd345f382	unify code used for persistence. We have code written for BTree (ZSet) persistence in both compiler.rs and operator.rs, because there are minor differences between them. With joins coming, it is time to unify this code.	2025-09-11 05:30:46 -07:00
Jussi Saurio	e3bd00883b	Fix creation of automatic indexes indexes with the naming scheme "sqlite_autoindex_<tblname>_<number>" are automatically created when a table is created with UNIQUE or PRIMARY KEY definitions. these indexes must map to the table definition SQL in definition order, i.e. sqlite_autoindex_foo_1 must be the first instance of UNIQUE or PRIMARY KEY and so on. this commit fixes our autoindex creation / parsing so that this invariant is upheld.	2025-09-11 14:11:30 +03:00
Pavan-Nambi	e5d3594fa2	fmt	2025-09-10 07:35:20 +05:30
Pavan-Nambi	6728384b47	Merge remote-tracking branch 'origin/main' into cdc_fail_autoincrement	2025-09-10 07:30:22 +05:30
Pavan-Nambi	b833e71c20	inserting ain't working hell yeah concurrency tests passing now woosh finally write tests passed Most of the cdc tests are passing yay autoincremeent draft remove shared schema code that broke transactions sequnce table should reset if table is drop fmt fmt fmt	2025-09-09 20:07:52 +05:30
Glauber Costa	08b2e685d5	Persistence for DBSP-based materialized views This fairly long commit implements persistence for materialized view. It is hard to split because of all the interdependencies between components, so it is a one big thing. This commit message will at least try to go into details about the basic architecture. Materialized Views as tables ============================ Materialized views are now a normal table - whereas before they were a virtual table. By making a materialized view a table, we can reuse all the infrastructure for dealing with tables (cursors, etc). One of the advantages of doing this is that we can create indexes on view columns. Later, we should also be able to write those views to separate files with ATTACH write. Materialized Views as Zsets =========================== The contents of the table are a ZSet: rowid, values, weight. Readers will notice that because of this, the usage of the ZSet data structure dwindles throughout the codebase. The main difference between our materialized ZSet and the standard DBSP ZSet, is that obviously ours is backed by a BTree, not a Hash (since SQLite tables are BTrees) Aggregator State ================ In DBSP, the aggregator nodes also have state. To store that state, there is a second table. The table holds all aggregators in the view, and there is one table per view. That is __turso_internal_dbsp_state_{view_name}. The format of that table is similar to a ZSet: rowid, serialized_values, weight. We serialize the values because there will be many aggregators in the table. We can't rely on a particular format for the values. The Materialized View Cursor ============================ Reading from a Materialized View essentially means reading from the persisted ZSet, and enhancing that with data that exists within the transaction. Transaction data is ephemeral, so we do not materialize this anywhere: we have a carefully crafted implementation of seek that takes care of merging weights and stitching the two sets together.	2025-09-05 07:04:33 -05:00
Glauber Costa	29b93e3e58	add DBSP circuit compiler The next step is to adapt the view code to use circuits instead of listing the operators manually.	2025-08-27 14:21:32 -05:00

43 Commits