turso

mirror of https://github.com/aljazceru/turso.git synced 2026-01-06 17:54:20 +01:00

Author	SHA1	Message	Date
Jussi Saurio	c930f28643	Handle case where null flag is set in op_column	2025-09-09 00:00:19 +03:00
Pekka Enberg	0c6398c935	core/vdbe: Fix apply_affinity_char() text parsing We need strict parsing in apply_affinity_char() to avoid transforming non-numeric values (for example, "1a") into numeric values.	2025-09-08 18:49:13 +03:00
Pekka Enberg	f88f39082a	core/vdbe: Fix MakeRecord affinity handling The MakeRecord instruction now accepts an optional affinity_str parameter that applies column-specific type conversions before creating records. When provided, the affinity string is applied character-by-character to each register using the existing apply_affinity_char() function, matching SQLite's behavior. Fixes #2040 Fixes #2041	2025-09-08 18:49:13 +03:00
Nikita Sivukhin	87d49cd039	cargo fmt after rebase	2025-09-07 20:08:10 +04:00
Nikita Sivukhin	db7c6b3370	try to speed up count(*) where 1 = 1	2025-09-07 19:55:42 +04:00
Pekka Enberg	9c24b8d088	Merge 'Remove RefCell from Cursor' from Pedro Muniz Closes #2944	2025-09-06 15:03:23 +03:00
pedrocarlo	e6344db5b1	remove Refcell from Cursor	2025-09-06 01:46:21 -03:00
PThorpe92	03d5598cfb	Use sieve algorithm in page cache in place of full LRU	2025-09-05 16:13:26 -04:00
Jussi Saurio	a0613ef781	Avoid allocating and then immediately fallbacking errors in affinity On the syscall IO backend, on TPC-H query 12, the _dominating_ part of the stack trace is trying to construct affinities from a character, failing, allocating an error&string, and then immediately falling back to Blob affinity and dropping the error&string. Since I'm on vacation I won't spend cycles on figuring out why we are passing an incorrect affinity in `flags.get_affinity()` and instead make this lazy PR just to improve performance and stop doing silly things :]	2025-09-05 18:34:23 +03:00
Pekka Enberg	811c5a7ce0	Merge 'Fix float formatting and comparison + Blob concat' from Levy A. These changes can be verified with the expression fuzzer. Fixes https://github.com/tursodatabase/turso/issues/2881. - Compatible float formatting - Compatible integer-float comparisons - Blob concatenation Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #2929	2025-09-05 17:02:51 +03:00
Pekka Enberg	b2664e12c2	cargo fmt	2025-09-05 16:12:12 +03:00
Pekka Enberg	5dcffadad6	core/vdbe: Remove empty loop Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-09-05 16:03:25 +03:00
Levy A.	a7b60e6b00	fix: return NULL for negative base or input on exec_math_log	2025-09-05 10:00:59 -03:00
Glauber Costa	08b2e685d5	Persistence for DBSP-based materialized views This fairly long commit implements persistence for materialized view. It is hard to split because of all the interdependencies between components, so it is a one big thing. This commit message will at least try to go into details about the basic architecture. Materialized Views as tables ============================ Materialized views are now a normal table - whereas before they were a virtual table. By making a materialized view a table, we can reuse all the infrastructure for dealing with tables (cursors, etc). One of the advantages of doing this is that we can create indexes on view columns. Later, we should also be able to write those views to separate files with ATTACH write. Materialized Views as Zsets =========================== The contents of the table are a ZSet: rowid, values, weight. Readers will notice that because of this, the usage of the ZSet data structure dwindles throughout the codebase. The main difference between our materialized ZSet and the standard DBSP ZSet, is that obviously ours is backed by a BTree, not a Hash (since SQLite tables are BTrees) Aggregator State ================ In DBSP, the aggregator nodes also have state. To store that state, there is a second table. The table holds all aggregators in the view, and there is one table per view. That is __turso_internal_dbsp_state_{view_name}. The format of that table is similar to a ZSet: rowid, serialized_values, weight. We serialize the values because there will be many aggregators in the table. We can't rely on a particular format for the values. The Materialized View Cursor ============================ Reading from a Materialized View essentially means reading from the persisted ZSet, and enhancing that with data that exists within the transaction. Transaction data is ephemeral, so we do not materialize this anywhere: we have a carefully crafted implementation of seek that takes care of merging weights and stitching the two sets together.	2025-09-05 07:04:33 -05:00
Levy A.	73e901010c	fix: float formating and float comparison	2025-09-05 02:35:03 -03:00
Pekka Enberg	ecbcd1ecd3	Merge ' core/mvcc: make commit_txn return on I/O ' from Pere Diaz Bou `commit_txn` in MVCC was hacking its way through I/O until now. After adding this and the test for concurrent writers we now see `busy` errors returning as expected because there is no `commit` queueing happening yet until next PR I open. Closes #2895	2025-09-04 21:24:10 +03:00
Pekka Enberg	44357f93a2	Merge branch 'main' into 2025-08-21-make-limit-and-offset-expr	2025-09-04 09:54:45 +03:00
Pere Diaz Bou	8db5cead07	core/mvcc: only commit if there is a txn	2025-09-03 14:12:48 +02:00
Pere Diaz Bou	b8f83e1fc0	clippy and fmt stuff because if not pekka will tweet	2025-09-03 12:47:55 +02:00
TcMits	b0f4dd49d5	use match_ignore_ascii_case macro	2025-09-03 12:01:52 +07:00
Pekka Enberg	1de647758f	Merge 'refactor parser fmt' from Lâm Hoàng Phúc @penberg this PR try to clean up `turso_parser`'s`fmt` code. - `get_table_name` and `get_column_name` should return None when table/column does not exist. ```rust /// Context to be used in ToSqlString pub trait ToSqlContext { /// Given an id, get the table name /// First Option indicates whether the table exists /// /// Currently not considering aliases fn get_table_name(&self, _id: TableInternalId) -> Option<&str> { None } /// Given a table id and a column index, get the column name /// First Option indicates whether the column exists /// Second Option indicates whether the column has a name fn get_column_name(&self, _table_id: TableInternalId, _col_idx: usize) -> Option<Option<&str>> { None } // help function to handle missing table/column names fn get_table_and_column_names( &self, table_id: TableInternalId, col_idx: usize, ) -> (String, String) { let table_name = self .get_table_name(table_id) .map(\|s\| s.to_owned()) .unwrap_or_else(\|\| format!("t{}", table_id.0)); let column_name = self .get_column_name(table_id, col_idx) .map(\|opt\| { opt.map(\|s\| s.to_owned()) .unwrap_or_else(\|\| format!("c{col_idx}")) }) .unwrap_or_else(\|\| format!("c{col_idx}")); (table_name, column_name) } } ``` - remove `FmtTokenStream` because it is same as `WriteTokenStream ` - remove useless functions and simplify `ToTokens` ```rust /// Generate token(s) from AST node /// Also implements Display to make sure devs won't forget Display pub trait ToTokens: Display { /// Send token(s) to the specified stream with context fn to_tokens<S: TokenStream + ?Sized, C: ToSqlContext>( &self, s: &mut S, context: &C, ) -> Result<(), S::Error>; // Return displayer representation with context fn displayer<'a, 'b, C: ToSqlContext>(&'b self, ctx: &'a C) -> SqlDisplayer<'a, 'b, C, Self> where Self: Sized, { SqlDisplayer::new(ctx, self) } } ``` Closes #2748	2025-09-02 18:35:43 +03:00
Pere Diaz Bou	13c505109a	core/mvcc: make commit_txn return on I/O	2025-09-02 17:07:38 +02:00
TcMits	bfff05faba	merge main	2025-09-02 18:25:20 +07:00
TcMits	33a04fbaf7	resolve conflict	2025-09-02 17:30:10 +07:00
Pekka Enberg	87d3f74e6e	Merge 'Evict page from cache if page is unlocked and unloaded' from Pedro Muniz Because we can abort a read_page completion, this means a page can be in the cache but be unloaded and unlocked. However, if we do not evict that page from the page cache, we will return an unloaded page later which will trigger assertions later on. This is worsened by the fact that page cache is not per `Statement`, so you can abort a completion in one Statement, and trigger some error in the next one if we don't evict the page in these circumstances. Also, to propagate IO errors we need to return the Error from IOCompletions on step. Closes #2785	2025-09-02 09:08:12 +03:00
Pekka Enberg	d959319b42	Merge 'Use u64 for file offsets in I/O and calculate such offsets in u64' from Preston Thorpe Using `usize` to compute file offsets caps us at ~16GB on 32-bit systems. For example, with 4 KiB pages we can only address up to 1048576 pages; attempting the next page overflows a 32-bit usize and can wrap the write offset, corrupting data. Switching our I/O APIs and offset math to u64 avoids this overflow on 32-bit targets Closes #2791	2025-09-02 09:06:49 +03:00
Pekka Enberg	cfaba4ab10	Merge 'Implement libSQL's `ALTER COLUMN` extension' from Levy A. Implement `ALTER COLUMN` as described here: https://github.com/tursodatabase/libsql/blob/main/libsql- sqlite3/doc/libsql_extensions.md#altering-columns - [x] Add `ALTER COLUMN` to parser - [x] Implement `Insn::AlterColumn` - [x] Add tests Closes #2814	2025-09-02 09:06:03 +03:00
PThorpe92	e9b50b63fb	Return sqlite_version() without being initialized	2025-09-01 13:36:41 -04:00
pedrocarlo	53cfae1db4	return Error from step if IO failed	2025-09-01 11:10:39 -03:00
TcMits	37f33dc45f	add eq/contains/starts_with/ends_with_ignore_ascii_case	2025-08-31 16:18:42 +07:00
Levy A.	293865c2d6	feat+fix: add tests and restrict altering some constraints	2025-08-30 03:43:31 -03:00
Levy A.	ad639b2b23	fix: reintroduce rename we don't store the parsed column to replace just the name, this will be refactored later with a more general approach	2025-08-30 03:10:39 -03:00
Levy A.	5b378e3730	feat: add `AlterColumn` instruction also refactor `RenameColumn` to reuse the logic from `AlterColumn`	2025-08-30 03:10:39 -03:00
themixednuts	eb93e4edc9	remove to_upper_case in favor of eq_ignore_ascii_case	2025-08-29 20:24:43 -05:00
themixednuts	6ffbdb4908	fix: column case sensitivity on strict table	2025-08-29 20:24:43 -05:00
Pekka Enberg	9fc5947fa6	core/vdbe: Micro-optimize "zero_or_null" opcode It's a hot instruction for TPC-H, for example, so worth optimizing. Reduces op_zero_or_null() from 5.6% to 2.4% in CPU flamegraph for TCP-H Q1.	2025-08-29 14:38:50 +03:00
PThorpe92	0a56d23402	Use u64 for file offsets in IO and calculate such offsets in u64	2025-08-28 09:44:00 -04:00
Pekka Enberg	2ea4354afe	Merge 'Improve integrity check' from Nikita Sivukhin - check free list trunk and pages - use shared hash map to check for duplicate references for pages - properly check overflow pages Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #2816	2025-08-28 16:06:15 +03:00
Pere Diaz Bou	48e5ad7a55	core/schema: get_dependent_materialized_views_unnormalized If we get a table name for in memory structure, it's safe to assume it's already normalized.	2025-08-28 13:11:40 +02:00
Nikita Sivukhin	ae705445bf	improve integrity check - check free list trunk and pages - use shared hash map to check for duplicate references for pages - properly check overflow pages	2025-08-27 23:14:21 +04:00
TcMits	4ddfdb2a62	finish	2025-08-27 14:58:35 +07:00
bit-aloo	ea3ab2a9c7	add ifNeg op_code	2025-08-26 19:55:42 +05:30
Glauber Costa	097510216e	implement the projector operator for DBSP My goal with this patch is to be able to implement the ProjectOperator for DBSP circuits using VDBE for expression evaluation. not doing so is dangerous for the following reason: we will end up with different, subtle, and incompatible behavior between SQLite expressions if they are used in views versus outside of views. In fact, even in our prototype had them: our projection tests, which used to pass, were actually wrong =) (sqlite would return something different if those functions were executed outside the view context) For optimization reasons, we single out trivial expressions: they don't have go through VDBE. Trivial expressions are expressions that only involve Columns, Literals, and simple operators on elements of the same type. Even type coercion takes this out of the realm of trivial. Everything that is not trivial, is then translated with translate_expr - in the same way SQLite will, and then compiled with VDBE. We can, over time, make this process much better. There are essentially infinite opportunities for optimization here. But for now, the main warts are: * VDBE execution needs a connection * There is no good way in VDBE to pass parameters to a program. * It is almost trivial to pollute the original connection. For example, we need to issue HALT for the program to stop, but seeing that halt will usually cause the program to try and halt the original program. Subprograms, like the ones we use in triggers are a possible solution, but they are much more expensive to execute, especially given that our execution would essentially have to have a program with no other role than to wrap the subprogram. Therefore, what I am doing is: * There is an in-memory database inside the projection operator (an obvious optimization is to share it with all projection operators). * We obtain a connection to that database when the operator is created * We use that connection to execute our VDBE, which offers a clean, safe and isolated way to execute the expression. * We feed the values to the program manually by editing the registers directly.	2025-08-25 17:48:17 +03:00
Nikita Sivukhin	f7ad55b680	remove unnecessary argument	2025-08-25 12:24:39 +04:00
Jussi Saurio	54ff656c9d	Do not clear txn state inside nested statement If a connection does e.g. CREATE TABLE, it will start a "child statement" to reparse the schema. That statement does not start its own transaction, and so should not try to end the existing one either. We had a logic bug where these steps would happen: - `CREATE TABLE` executed successfully - pread fault happens inside `ParseSchema` child stmt - `handle_program_error()` is called - `pager.end_tx()` returns immediately because `is_nested_stmt` is true and we correctly no-op it. - however, crucially: `handle_program_error()` then sets tx state to None - parent statement now catches error from nested stmt and calls `handle_program_error()`, which calls `pager.end_tx()` again, and since txn state is None, when it calls `rollback()` we panic on the assertion `"dirty pages should be empty for read txn"` Solution: Do not do _any_ error processing in `handle_program_error()` inside a nested stmt. This means that the parent write txn is still active when it processes the error from the child and we avoid this panic.	2025-08-25 08:49:22 +03:00
Pekka Enberg	9d2f26bb04	sqlite3: Implement sqlite3_clear_bindings()	2025-08-24 19:33:18 +03:00
Levy A.	4ba1304fb9	complete parser integration	2025-08-21 15:23:59 -03:00
Levy A.	186e2f5d8e	switch to new parser	2025-08-21 15:19:16 -03:00
Jussi Saurio	dd2e0ea596	Fix: always emit rowid when column is rowid alias SQLite does not store the rowid alias column in the record at all when it is a rowid alias, because the rowid is always stored anyway in the record header.	2025-08-21 16:40:10 +03:00
Pekka Enberg	1dc6fb97c0	Merge 'core/mvcc: store txid in conn and reset transaction state on commit ' from Pere Diaz Bou We were storing `txid` in `ProgramState`, this meant it was impossible to track interactive transactions. This was extracted to `Connection` instead. Moreover, transaction state for mvcc now is reset on commit. Closes #2689	2025-08-20 16:51:41 +03:00

1 2 3 4 5 ...

1211 Commits