turso

mirror of https://github.com/aljazceru/turso.git synced 2025-12-19 01:24:20 +01:00

Author	SHA1	Message	Date
Glauber Costa	9f54f60d45	make sure that complex select statements are captured by MV populate The population code extracts table information from the select statement so it can populate the materialized view. But the code, as written today, is naive. It doesn't capture table information correctly if there is more than one select statement (such in the case of a union query).	2025-09-21 21:00:27 -03:00
Glauber Costa	f2f7f817e4	populate all tables in IncrementalView For joins to work, we have to populate all referenced tables when we create the view.	2025-09-19 03:59:28 -05:00
Glauber Costa	e5a106d8d6	enable joins in IncrementalView	2025-09-19 03:59:28 -05:00
Glauber Costa	e80dd8e5e1	move the filter operator to accept indexes instead of names We already did similarly for the AggregateOperator: for joins you can have the same column name in many tables. And passing schema information to the operator is a layering violation (the operator may be operating on the result of a previous node, and at that point there is no more "schema"). Therefore we pass indexes into the column set the operator has. The FilterOperator has a complication: we are using it to generate the SQL for the populate statement, and that needs column names. However, we should not be using the FilterOperator for that, and that is a relic from the time where we had operator information directly inside the IncrementalView. To enable moving the FilterOperator to index-based, we rework that code. For joins, we'll need to populate many tables anyway, so we take the time to do that work here.	2025-09-19 03:59:28 -05:00
Glauber Costa	5b4a6e5c2d	view: catch all tables mentioned, instead of just one. Ahead of the implementation of JOINs, we need to evolve the IncrementalView, which currently only accepts a single base table, to keep a list of tables mentioned in the statement.	2025-09-19 03:57:11 -05:00
Glauber Costa	0b3317d449	extract columns from all tables in case of joins. Our code for view needs to extract the list of columns used in the view. We currently extract only from "the base table", but once we have joins, we need a more complex structure, that keeps the mapping of (tables, columns). This actually affects both views and materialized views: for views, the queries with joins work just fine, because views are just aliases for a query. But the list of columns returned by pragma table_info on the view is incorrect. We add a test to make sure it is fixed. For materialized views, we add extensive tests to make sure that the columns are extracted correctly.	2025-09-19 03:57:11 -05:00
Pekka Enberg	c2b8bb0a2f	core/incremental: Wrap ViewTransactionState in Arc Make it Send.	2025-09-17 12:23:29 +03:00
Pekka Enberg	17e9f05ea4	core: Convert Rc<Pager> to Arc<Pager>	2025-09-17 09:32:49 +03:00
Glauber Costa	3565e7978a	Add an index to the dbsp internal table And also change the schema of the main table. I have come to see the current key-value schema as inadequate for non-aggregate operators. Calculating Min/Max, for example, doesn't feat in this schema because we have to be able to track existing values and index them. Another alternative is to keep one table per operator type, but this quickly leads to an explosion of tables.	2025-09-15 22:30:48 -05:00
PThorpe92	5849819a59	Fix tests for views	2025-09-12 08:20:40 -04:00
Glauber Costa	874047276e	views: pass a DeltaSet for merge_delta A DeltaSet is a collection of Deltas, one per table. We'll need that for joins. The populate step for now will still generate a single set. That will be our next step to fix.	2025-09-11 05:30:46 -07:00
Glauber Costa	841de334b7	view: catch all tables mentioned, instead of just one. Ahead of the implementation of JOINs, we need to evolve the IncrementalView, which currently only accepts a single base table, to keep a list of tables mentioned in the statement.	2025-09-11 05:30:46 -07:00
Glauber Costa	08b2e685d5	Persistence for DBSP-based materialized views This fairly long commit implements persistence for materialized view. It is hard to split because of all the interdependencies between components, so it is a one big thing. This commit message will at least try to go into details about the basic architecture. Materialized Views as tables ============================ Materialized views are now a normal table - whereas before they were a virtual table. By making a materialized view a table, we can reuse all the infrastructure for dealing with tables (cursors, etc). One of the advantages of doing this is that we can create indexes on view columns. Later, we should also be able to write those views to separate files with ATTACH write. Materialized Views as Zsets =========================== The contents of the table are a ZSet: rowid, values, weight. Readers will notice that because of this, the usage of the ZSet data structure dwindles throughout the codebase. The main difference between our materialized ZSet and the standard DBSP ZSet, is that obviously ours is backed by a BTree, not a Hash (since SQLite tables are BTrees) Aggregator State ================ In DBSP, the aggregator nodes also have state. To store that state, there is a second table. The table holds all aggregators in the view, and there is one table per view. That is __turso_internal_dbsp_state_{view_name}. The format of that table is similar to a ZSet: rowid, serialized_values, weight. We serialize the values because there will be many aggregators in the table. We can't rely on a particular format for the values. The Materialized View Cursor ============================ Reading from a Materialized View essentially means reading from the persisted ZSet, and enhancing that with data that exists within the transaction. Transaction data is ephemeral, so we do not materialize this anywhere: we have a carefully crafted implementation of seek that takes care of merging weights and stitching the two sets together.	2025-09-05 07:04:33 -05:00
TcMits	33a04fbaf7	resolve conflict	2025-09-02 17:30:10 +07:00
Glauber Costa	565c2a698a	adjust views to use circuits	2025-08-27 14:21:32 -05:00
Glauber Costa	898c0260f3	move operator to eval / commit pattern We need a read only phase and a commit phase. Otherwise we will never be able to rollback changes properly. We currently do that, but we do that in the view. Before we move to circuits, this needs to be internalized by the operator.	2025-08-27 14:21:32 -05:00
TcMits	4ddfdb2a62	finish	2025-08-27 14:58:35 +07:00
Pekka Enberg	e3ffc82a1d	core/incremental: Fix expression compiler to use new parser	2025-08-25 17:48:20 +03:00
Glauber Costa	097510216e	implement the projector operator for DBSP My goal with this patch is to be able to implement the ProjectOperator for DBSP circuits using VDBE for expression evaluation. not doing so is dangerous for the following reason: we will end up with different, subtle, and incompatible behavior between SQLite expressions if they are used in views versus outside of views. In fact, even in our prototype had them: our projection tests, which used to pass, were actually wrong =) (sqlite would return something different if those functions were executed outside the view context) For optimization reasons, we single out trivial expressions: they don't have go through VDBE. Trivial expressions are expressions that only involve Columns, Literals, and simple operators on elements of the same type. Even type coercion takes this out of the realm of trivial. Everything that is not trivial, is then translated with translate_expr - in the same way SQLite will, and then compiled with VDBE. We can, over time, make this process much better. There are essentially infinite opportunities for optimization here. But for now, the main warts are: * VDBE execution needs a connection * There is no good way in VDBE to pass parameters to a program. * It is almost trivial to pollute the original connection. For example, we need to issue HALT for the program to stop, but seeing that halt will usually cause the program to try and halt the original program. Subprograms, like the ones we use in triggers are a possible solution, but they are much more expensive to execute, especially given that our execution would essentially have to have a program with no other role than to wrap the subprogram. Therefore, what I am doing is: * There is an in-memory database inside the projection operator (an obvious optimization is to share it with all projection operators). * We obtain a connection to that database when the operator is created * We use that connection to execute our VDBE, which offers a clean, safe and isolated way to execute the expression. * We feed the values to the program manually by editing the registers directly.	2025-08-25 17:48:17 +03:00
Levy A.	ee12ef9fb5	remove unnecessary `Box<ast::Select>`	2025-08-21 17:20:25 -03:00
Levy A.	4ba1304fb9	complete parser integration	2025-08-21 15:23:59 -03:00
Levy A.	186e2f5d8e	switch to new parser	2025-08-21 15:19:16 -03:00
PThorpe92	2c526c4c37	Add io_yield_x macros to reduce boilerplate	2025-08-16 16:14:00 -04:00
pedrocarlo	82b75330bc	adjust `types.rs` `util.rs` `view.rs` and mvcc to bubble io	2025-08-13 10:24:55 +03:00
Glauber Costa	770f86e490	move our dbsp-based views to materialized views We will implement normal SQLite-style view-as-an-alias for compatibility, and will call our incremental views materialized views.	2025-08-12 14:19:17 -05:00
Pekka Enberg	db54c953bd	Merge 'Implement Aggregations for DBSP views' from Glauber Costa ``` turso> create table t(a, b); turso> insert into t(a,b) values (2,2), (3,3); turso> insert into t(a,b) values (6,6), (7,7); turso> insert into t(a,b) values (6,6), (7,7); turso> create view tt as select b, sum(a) from t where b > 2 group by b; turso> select * from tt; ┌───┬─────────┐ │ b │ sum (a) │ ├───┼─────────┤ │ 3 │ 3 │ ├───┼─────────┤ │ 6 │ 12 │ ├───┼─────────┤ │ 7 │ 14 │ └───┴─────────┘ turso> insert into t(a,b) values (1,3); turso> select * from tt; ┌───┬─────────┐ │ b │ sum (a) │ ├───┼─────────┤ │ 3 │ 4 │ ├───┼─────────┤ │ 6 │ 12 │ ├───┼─────────┤ │ 7 │ 14 │ └───┴─────────┘ turso> ``` Closes #2547	2025-08-12 09:52:22 +03:00
Glauber Costa	333c5c435b	unify populate populate now has its own code path to apply changes to the view. It was okay until now because all we do is filter. But now that we are also applying aggregations, we'll end up with two disjoint code paths. A better approach is to just apply the results of our select to the delta set, and apply it.	2025-08-11 15:06:57 -05:00
Glauber Costa	27c22a64b3	views: implement aggregations Hook up the AggregateOperator. Also wires up the tracker, allowing us to verify how much work was done.	2025-08-11 15:06:57 -05:00
Jussi Saurio	a50c799e05	stop silently ignoring unsupported features in incremental view WHERE clauses	2025-08-11 17:44:41 +03:00
Pekka Enberg	87322ad1e4	core/incremental: Evaluate view expressions ...tests were failing because we are testing with expressions, but didn't support them.	2025-08-11 08:27:10 +03:00
Glauber Costa	145d6eede7	Implement very basic views using DBSP This is just the bare minimum that I needed to convince myself that this approach will work. The only views that we support are slices of the main table: no aggregations, no joins, no projections. drop view is implemented. view population is implemented. deletes, inserts and updates are implemented. much like indexes before, a flag must be passed to enable views.	2025-08-10 23:34:04 -05:00

31 Commits