turso

mirror of https://github.com/aljazceru/turso.git synced 2026-01-20 00:24:20 +01:00

Author	SHA1	Message	Date
Glauber Costa	9f3d119a5a	move hashable row tests to dbsp.rs The operator.rs file was so huge, that we didn't even notice there was a test block in the middle of the file that was testing things that were long moved to dbsp.rs (the HashableRow). Move the tests there now.	2025-09-19 03:59:28 -05:00
Glauber Costa	e2f0e372a1	move the join operator to its own file. The code is becoming impossible to reason about with everything in operator.rs	2025-09-19 03:59:28 -05:00
Glauber Costa	aa8fcdbe54	move the aggregate operator to its own file. The code is becoming impossible to reason about with everything in operator.rs	2025-09-19 03:59:24 -05:00
Glauber Costa	7178d8d31c	move the project operator to its own file. The code is becoming impossible to reason about with everything in operator.rs	2025-09-19 03:57:11 -05:00
Glauber Costa	ee914fc543	move the filter operator to its own file. The code is becoming impossible to reason about with everything in operator.rs	2025-09-19 03:57:11 -05:00
Glauber Costa	9747d6c6b6	move the input operator to its own file. The code is becoming impossible to reason about with everything in operator.rs	2025-09-19 03:57:11 -05:00
Glauber Costa	6be5eb74d9	Implement the Join Operator The join operator is also a stateful operator. It keeps the input deltas stored in the state, for both the left and right branches of the join. JOINs extract a join key, which is the values that were used in the join's equality statement. That key is now our zset_id, and it points to a collection of rows.	2025-09-19 03:57:11 -05:00
Glauber Costa	5b4a6e5c2d	view: catch all tables mentioned, instead of just one. Ahead of the implementation of JOINs, we need to evolve the IncrementalView, which currently only accepts a single base table, to keep a list of tables mentioned in the statement.	2025-09-19 03:57:11 -05:00
Glauber Costa	0b3317d449	extract columns from all tables in case of joins. Our code for view needs to extract the list of columns used in the view. We currently extract only from "the base table", but once we have joins, we need a more complex structure, that keeps the mapping of (tables, columns). This actually affects both views and materialized views: for views, the queries with joins work just fine, because views are just aliases for a query. But the list of columns returned by pragma table_info on the view is incorrect. We add a test to make sure it is fixed. For materialized views, we add extensive tests to make sure that the columns are extracted correctly.	2025-09-19 03:57:11 -05:00
Pekka Enberg	3f35267b7c	core/mvcc: Kill noop storage We don't need it for anything.	2025-09-19 08:52:57 +03:00
Pekka Enberg	0ce6469a4b	Merge 'Fix some Rust compilation warnings' from Samuel Marks Nothing fancy yet, assuming you merge this I'll do this one next: ``` warning: function pointer comparisons do not produce meaningful results since their addresses are not guaranteed to be unique --> core/types.rs:403:5 \| 398 \| #[derive(Debug, Clone, PartialEq)] \| --------- in this derive macro expansion ... 402 \| pub step_fn: StepFunction, \| ^^^^^^^^^^^^^^^^^^^^^^^^^ 403 \| pub finalize_fn: FinalizeFunction, \| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ \| = note: the address of the same function can vary between different codegen units = note: furthermore, different functions could have the same address after being merged together = note: for more information visit <https://doc.rust-lang.org/nightly/core/ptr/fn.fn_addr_eq.html> ``` And fix a test failure that I resolved in Python (specific to macOS hosts). Basically this PR is putting my toe in the water to see how open you are to contribs! Closes #3211	2025-09-19 08:28:53 +03:00
Samuel Marks	e333f151ba	[*.rs] Resolve warnings (mostly "hiding a lifetime that's elided elsewhere is confusing")	2025-09-18 22:47:43 -05:00
Pere Diaz Bou	ff3c79d5d7	remove mvvmode and set logical log as default	2025-09-18 18:22:25 +02:00
Pere Diaz Bou	e2824835dc	fix all open_file use cases for mvcc mode	2025-09-18 18:22:05 +02:00
Pere Diaz Bou	de8a975a0b	core/mvcc: introduce MvccMode Logical Log	2025-09-18 18:21:04 +02:00
Pavan-Nambi	020921f803	Merge remote-tracking branch 'upstream/main' into cdc_fail_autoincrement	2025-09-18 19:27:19 +05:30
Jussi Saurio	91ef4e5e9d	Merge 'Introduce instruction VTABLE' from Lâm Hoàng Phúc this PR improves 3-6% for `prepare` benchmark without slowing down others. After this PR we don't have to store `InsnFunction` in `Program` and `ProgramBuilder` anymore, because `to_function` will return result without matching. Reviewed-by: Preston Thorpe <preston@turso.tech> Closes #3098	2025-09-18 09:18:48 +03:00
Pekka Enberg	c2b8bb0a2f	core/incremental: Wrap ViewTransactionState in Arc Make it Send.	2025-09-17 12:23:29 +03:00
Jussi Saurio	9a2797963a	Merge 'Remove LimboResult enum and InsnFunctionStepResult::Busy variant' from Jussi Saurio We can just use `LimboError::Busy` for both of these. Reviewed-by: Pekka Enberg <penberg@iki.fi> Closes #3170	2025-09-17 12:06:54 +03:00
TcMits	668f1f721c	resolve conflict	2025-09-17 15:25:58 +07:00
Jussi Saurio	b9ceacc356	Remove InsnFunctionStepResult::Busy we don't need all these busy variants, let's just handle LimboError::Busy	2025-09-17 11:22:49 +03:00
Pekka Enberg	17e9f05ea4	core: Convert Rc<Pager> to Arc<Pager>	2025-09-17 09:32:49 +03:00
rajajisai	e605aff31b	Merge branch 'main' into enc-page-1	2025-09-16 10:06:00 -04:00
rajajisai	89caa868f9	Encryption support for database header page	2025-09-16 10:04:30 -04:00
Glauber Costa	6bee6bb785	implement min/max We have not implemented them before because they require the raw elements to be kept. It is easy to see why in the following example: current_min = 3; insert(2) => current_min = 2 // can be done without state delete(2) => needs to look at the state to determine new min! The aggregator state was a very simple key-value structure. To accomodate for min/max, we will make it into a more complex table, where we can encode a more complex structure. The key insight is that we can use a primary key composed of: 1) storage_id 2) zset_id, 3) element The storage_id and zset_id are our previous key, except they are now exploded to support a larger range of storage_id. With more bits available in the storage_id, we can encode information about which column we are storing. For aggregations in multiple columns, we will need to keep a different list of values for min/max! The element is just the values of the columns. Because this is a primary key, the data will be sorted in the btree. We can then just do a prefix search in the first two components of the key and easily find the min/max when needed. This new format is also adequate for joins. Joins will just have a new storage_id which encodes two "columns" (left side, right side).	2025-09-15 22:30:48 -05:00
Glauber Costa	3565e7978a	Add an index to the dbsp internal table And also change the schema of the main table. I have come to see the current key-value schema as inadequate for non-aggregate operators. Calculating Min/Max, for example, doesn't feat in this schema because we have to be able to track existing values and index them. Another alternative is to keep one table per operator type, but this quickly leads to an explosion of tables.	2025-09-15 22:30:48 -05:00
TcMits	cab0c7b545	peft tuning	2025-09-14 18:53:53 +07:00
TcMits	01da48fde9	introduce instruction virtual table	2025-09-13 16:35:17 +07:00
Pavan-Nambi	0afae0db20	update tests after merging	2025-09-13 07:33:43 +05:30
Pavan-Nambi	fdb4f98e11	Merge remote-tracking branch 'upstream/main' into cdc_fail_autoincrement	2025-09-13 07:17:18 +05:30
PThorpe92	5849819a59	Fix tests for views	2025-09-12 08:20:40 -04:00
Preston Thorpe	b09dcceeef	Merge 'Fixes views' from Glauber Costa This is a collection of fixes for materialized views ahead of adding support for JOINs. It is mostly issues with how we assume there is a single table, with a single delta, but we have to send more than one. Those are things that are just objectively wrong, so I am sending it separately to make the JOIN PR smaller. Reviewed-by: Preston Thorpe <preston@turso.tech> Closes #3009	2025-09-12 07:43:32 -04:00
Pavan-Nambi	7191f1cc1c	Merge remote-tracking branch 'upstream/main' into cdc_fail_autoincrement	2025-09-12 15:17:12 +05:30
Glauber Costa	874047276e	views: pass a DeltaSet for merge_delta A DeltaSet is a collection of Deltas, one per table. We'll need that for joins. The populate step for now will still generate a single set. That will be our next step to fix.	2025-09-11 05:30:46 -07:00
Glauber Costa	841de334b7	view: catch all tables mentioned, instead of just one. Ahead of the implementation of JOINs, we need to evolve the IncrementalView, which currently only accepts a single base table, to keep a list of tables mentioned in the statement.	2025-09-11 05:30:46 -07:00
Glauber Costa	c15ac87a3c	fix cursor validation We are validating that the weights on the materialized view table are -1, 0, and 1. This is only true for the aggregator operator. For DBSP in general, any number will do. Our algorithm, however, would have deleted anything from the BTree that is <= 0. So we don't expect them here.	2025-09-11 05:30:46 -07:00
Glauber Costa	e6008e532a	Add a second delta to the EvalState, Commit We will assert that the second one is always empty for the existing operators - as they should be! But joins will need both.	2025-09-11 05:30:46 -07:00
Glauber Costa	6541a43670	move hashable_row to dbsp.rs There will be a new type for joins, so it makes less sense to have a separate file just for it. dbsp.rs is good.	2025-09-11 05:30:46 -07:00
Glauber Costa	1fd345f382	unify code used for persistence. We have code written for BTree (ZSet) persistence in both compiler.rs and operator.rs, because there are minor differences between them. With joins coming, it is time to unify this code.	2025-09-11 05:30:46 -07:00
Jussi Saurio	e3bd00883b	Fix creation of automatic indexes indexes with the naming scheme "sqlite_autoindex_<tblname>_<number>" are automatically created when a table is created with UNIQUE or PRIMARY KEY definitions. these indexes must map to the table definition SQL in definition order, i.e. sqlite_autoindex_foo_1 must be the first instance of UNIQUE or PRIMARY KEY and so on. this commit fixes our autoindex creation / parsing so that this invariant is upheld.	2025-09-11 14:11:30 +03:00
Pavan-Nambi	e5d3594fa2	fmt	2025-09-10 07:35:20 +05:30
Pavan-Nambi	6728384b47	Merge remote-tracking branch 'origin/main' into cdc_fail_autoincrement	2025-09-10 07:30:22 +05:30
Pavan-Nambi	b833e71c20	inserting ain't working hell yeah concurrency tests passing now woosh finally write tests passed Most of the cdc tests are passing yay autoincremeent draft remove shared schema code that broke transactions sequnce table should reset if table is drop fmt fmt fmt	2025-09-09 20:07:52 +05:30
Pekka Enberg	832e0dee81	core/incremental: Fix typos in cursor.rs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-09-05 15:40:45 +03:00
Glauber Costa	08b2e685d5	Persistence for DBSP-based materialized views This fairly long commit implements persistence for materialized view. It is hard to split because of all the interdependencies between components, so it is a one big thing. This commit message will at least try to go into details about the basic architecture. Materialized Views as tables ============================ Materialized views are now a normal table - whereas before they were a virtual table. By making a materialized view a table, we can reuse all the infrastructure for dealing with tables (cursors, etc). One of the advantages of doing this is that we can create indexes on view columns. Later, we should also be able to write those views to separate files with ATTACH write. Materialized Views as Zsets =========================== The contents of the table are a ZSet: rowid, values, weight. Readers will notice that because of this, the usage of the ZSet data structure dwindles throughout the codebase. The main difference between our materialized ZSet and the standard DBSP ZSet, is that obviously ours is backed by a BTree, not a Hash (since SQLite tables are BTrees) Aggregator State ================ In DBSP, the aggregator nodes also have state. To store that state, there is a second table. The table holds all aggregators in the view, and there is one table per view. That is __turso_internal_dbsp_state_{view_name}. The format of that table is similar to a ZSet: rowid, serialized_values, weight. We serialize the values because there will be many aggregators in the table. We can't rely on a particular format for the values. The Materialized View Cursor ============================ Reading from a Materialized View essentially means reading from the persisted ZSet, and enhancing that with data that exists within the transaction. Transaction data is ephemeral, so we do not materialize this anywhere: we have a carefully crafted implementation of seek that takes care of merging weights and stitching the two sets together.	2025-09-05 07:04:33 -05:00
Pekka Enberg	1de647758f	Merge 'refactor parser fmt' from Lâm Hoàng Phúc @penberg this PR try to clean up `turso_parser`'s`fmt` code. - `get_table_name` and `get_column_name` should return None when table/column does not exist. ```rust /// Context to be used in ToSqlString pub trait ToSqlContext { /// Given an id, get the table name /// First Option indicates whether the table exists /// /// Currently not considering aliases fn get_table_name(&self, _id: TableInternalId) -> Option<&str> { None } /// Given a table id and a column index, get the column name /// First Option indicates whether the column exists /// Second Option indicates whether the column has a name fn get_column_name(&self, _table_id: TableInternalId, _col_idx: usize) -> Option<Option<&str>> { None } // help function to handle missing table/column names fn get_table_and_column_names( &self, table_id: TableInternalId, col_idx: usize, ) -> (String, String) { let table_name = self .get_table_name(table_id) .map(\|s\| s.to_owned()) .unwrap_or_else(\|\| format!("t{}", table_id.0)); let column_name = self .get_column_name(table_id, col_idx) .map(\|opt\| { opt.map(\|s\| s.to_owned()) .unwrap_or_else(\|\| format!("c{col_idx}")) }) .unwrap_or_else(\|\| format!("c{col_idx}")); (table_name, column_name) } } ``` - remove `FmtTokenStream` because it is same as `WriteTokenStream ` - remove useless functions and simplify `ToTokens` ```rust /// Generate token(s) from AST node /// Also implements Display to make sure devs won't forget Display pub trait ToTokens: Display { /// Send token(s) to the specified stream with context fn to_tokens<S: TokenStream + ?Sized, C: ToSqlContext>( &self, s: &mut S, context: &C, ) -> Result<(), S::Error>; // Return displayer representation with context fn displayer<'a, 'b, C: ToSqlContext>(&'b self, ctx: &'a C) -> SqlDisplayer<'a, 'b, C, Self> where Self: Sized, { SqlDisplayer::new(ctx, self) } } ``` Closes #2748	2025-09-02 18:35:43 +03:00
TcMits	33a04fbaf7	resolve conflict	2025-09-02 17:30:10 +07:00
TcMits	37f33dc45f	add eq/contains/starts_with/ends_with_ignore_ascii_case	2025-08-31 16:18:42 +07:00
Glauber Costa	565c2a698a	adjust views to use circuits	2025-08-27 14:21:32 -05:00
Glauber Costa	29b93e3e58	add DBSP circuit compiler The next step is to adapt the view code to use circuits instead of listing the operators manually.	2025-08-27 14:21:32 -05:00

1 2 3

126 Commits