turso

mirror of https://github.com/aljazceru/turso.git synced 2025-12-28 05:24:22 +01:00

Author	SHA1	Message	Date
Jussi Saurio	b1cb897216	Merge 'Fix another "should have been rewritten" translation panic' from Jussi Saurio Closes #2158 Closes #3702	2025-10-15 09:25:01 +03:00
Preston Thorpe	74bbb0d5a3	Merge 'Allow using indexes to iterate rows in UPDATE statements' from Jussi Saurio Closes #2600 ## Problem Every btree has a key it is sorted by - this is the integer `rowid` for tables and an arbitrary-sized, potentially multi-column key for indexes. Executing an UPDATE in a loop is not safe if the update modifies any part of the key of the btree that is used for iterating the rows in said loop. For example: - Using the table itself to iterate rows is not safe if the UPDATE modifies the rowid (or rowid alias) of a row, because since it modifies the iteration order itself, it may cause rows to be skipped: ```sql CREATE TABLE t(x INTEGER PRIMARY KEY, y); INSERT <something> UPDATE t SET y = RANDOM() where x > 100; // safe to iterate 't', 'y' is not being modified UPDATE t SET x = RANDOM() where x > 100; // not safe to iterate 't', 'x' is being modified ``` - Using an index to iterate rows is not safe if the UPDATE modifies any of the columns in the index key ```sql CREATE TABLE t(x, y, z); CREATE INDEX txy ON t (x,y); INSERT <something> UPDATE t SET z = RANDOM() where x = 100 and y > 0; // safe to iterate txy, neither x or y is being modified UPDATE t SET x = RANDOM() where x = 100 and y > 0; // not safe to iterate txy, 'x' is being modified UPDATE t SET y = RANDOM() where x = 100 and y > 0; // not safe to iterate txy, 'y' is being modified ``` ## Current solution in tursodb Our current `main` code recognizes this issue and adopts this pseudocode algorithm from SQLite: - open a table or index for reading the rows of the source table, - for each row that matches the condition in the UPDATE statement, write the row into a temporary table - then use that temporary table for iteration in the UPDATE loop. This guarantees that the iteration order will not be affected by the UPDATEs because the ephemeral table is not under modification. ## Problem with current solution Our `main` code specialcases the ephemeral table solution to rowids / rowid aliases only. Using indexes for UPDATE iteration was disabled in an earlier PR (#2599) due to the safety issue mentioned above, which means that many UPDATE statements become full table scans: ```sql turso> create table t(x PRIMARY KEY); turso> insert into t select value from generate_series(1,10000); turso> explain update t set x = x + 100000 where x > 50 and x < 60; addr opcode p1 p2 p3 p4 p5 comment ---- ----------------- ---- ---- ---- ------------- -- ------- 0 Init 0 28 0 0 Start at 28 1 OpenWrite 0 2 0 0 root=2; iDb=0 2 OpenWrite 1 3 0 0 root=3; iDb=0 -- scan entire 't' despite very narrow update range! 3 Rewind 0 27 0 0 Rewind table t ... ``` ## Solution We move the ephemeral table logic to _after_ the optimizer has selected the best access path for the table, and then, if the UPDATE modifies the key of the chosen access path (table or index; whichever was selected by the optimizer), we change the plan to include the ephemeral table prepopulation. Hence, the same query from above becomes: ```sql turso> explain update t set x = x + 100000 where x > 50 and x < 60; addr opcode p1 p2 p3 p4 p5 comment ---- ----------------- ---- ---- ---- ------------- -- ------- 0 Init 0 35 0 0 Start at 35 1 OpenEphemeral 0 1 0 0 cursor=0 is_table=true 2 OpenRead 1 3 0 0 index=sqlite_autoindex_t_1, root=3, iDb=0 3 Integer 50 2 0 0 r[2]=50 -- index seek on PRIMARY KEY index 4 SeekGT 1 10 2 0 key=[2..2] 5 Integer 60 2 0 0 r[2]=60 6 IdxGE 1 10 2 0 key=[2..2] 7 IdxRowId 1 1 0 0 r[1]=cursor 1 for index sqlite_autoindex_t_1.rowid 8 Insert 0 3 1 ephemeral_scratch 2 intkey=r[1] data=r[3] 9 Next 1 6 0 0 10 OpenWrite 2 2 0 0 root=2; iDb=0 11 OpenWrite 3 3 0 0 root=3; iDb=0 -- only scan rows that were inserted to ephemeral index 12 Rewind 0 34 0 0 Rewind table ephemeral_scratch 13 RowId 0 5 0 0 r[5]=ephemeral_scratch.rowid ``` Note that an ephemeral index does not have to be used if the index is not affected: ```sql turso> create table t(x PRIMARY KEY, data); turso> explain update t set data = 'some_data' where x > 50 and x < 60; addr opcode p1 p2 p3 p4 p5 comment ---- ----------------- ---- ---- ---- ------------- -- ------- 0 Init 0 15 0 0 Start at 15 1 OpenWrite 0 2 0 0 root=2; iDb=0 2 OpenWrite 1 3 0 0 root=3; iDb=0 3 Integer 50 1 0 0 r[1]=50 -- direct index seek 4 SeekGT 1 14 1 0 key=[1..1] ``` Reviewed-by: Preston Thorpe <preston@turso.tech> Closes #3728	2025-10-14 16:11:25 -04:00
Pere Diaz Bou	1a464664a7	Merge 'increment Changes() only once conditionally ' from Pavan Nambi closes #3688 Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #3692	2025-10-14 20:26:04 +02:00
Pere Diaz Bou	a2097188f0	Merge 'make comparison case sensitive' from Pavan Nambi closes https://github.com/tursodatabase/turso/issues/3672 Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #3686	2025-10-14 20:20:02 +02:00
Jussi Saurio	b3b07252dc	Add TCL smoke tests for UPDATEs affecting indexes	2025-10-14 16:25:05 +03:00
Pekka Enberg	07b94faeb7	Merge 'Add test case for vector() format crash' from Pedro Muniz Added test to close #1454. The Go code incorrectly, did not quote the vector array. Closes #3716	2025-10-14 09:37:11 +03:00
Jussi Saurio	4e34c6be51	Merge 'names shall not be shared between tables,indexs,vtabs,views' from Pavan Nambi closes #3675 Closes #3681	2025-10-14 07:30:37 +03:00
Jussi Saurio	bd15fee1f8	Merge 'Get aliases to where shall they be used' from Pavan Nambi closes #3678 Closes #3680	2025-10-14 07:28:09 +03:00
Jussi Saurio	cce2bf9328	Merge 'Add correct unique constraint test for tcl' from Pedro Muniz Closes #1710 We should use `do_execsql_test_in_memory_error_content` to test errors instead Closes #3718	2025-10-14 07:26:48 +03:00
pedrocarlo	2e722af93c	proof issue 1710	2025-10-13 20:51:21 -03:00
pedrocarlo	83dde9b55c	fix backwards compatible rowid alias behaviour	2025-10-13 20:41:45 -03:00
pedrocarlo	2798fafa6c	proof issue 1454	2025-10-13 16:14:29 -03:00
Pavan-Nambi	57a06835bf	add test and fmt and clippy i was stupid remove comment	2025-10-13 18:07:51 +05:30
Jussi Saurio	c54e150a52	Merge 'Fix: Table entry is not removed from `sqlite_schema` when a table is dropped' from Fixes #3682 . Ignore case of table name when dropping table. Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #3683	2025-10-13 15:02:29 +03:00
Jussi Saurio	523b155df1	Fix another "should have been rewritten" translation panic Closes #2158	2025-10-13 11:02:42 +03:00
Pavan-Nambi	295612feea	add test	2025-10-12 23:11:28 +05:30
Pavan-Nambi	e1f23aeb2c	fmt and add tests	2025-10-12 22:23:04 +05:30
Pavan-Nambi	7e8dabaee5	make comparison case sensitive	2025-10-12 18:02:03 +05:30
Pavan-Nambi	36bf88119f	add tests clippy expect err to make clippy happy cleanup	2025-10-12 16:38:12 +05:30
Pavan-Nambi	bd9ce7c485	add test	2025-10-12 15:58:10 +05:30
rajajisai	9061024fad	add test	2025-10-11 21:39:46 -04:00
Jussi Saurio	74e04634aa	Fix incorrectly using an equality constraint twice for index seek Prevents something like `WHERE x = 5 AND x = 5` from becoming a two component index key. Closes #3656	2025-10-10 13:19:19 +03:00
Diego Reis	da323fa0c4	Some clean ups and correctly working on WHERE clauses	2025-10-09 11:57:15 -03:00
Jussi Saurio	7948259d37	Merge 'optimizer: optimize range scans to use upper and lower bounds more efficiently' from Jussi Saurio Made a new PR based on @sivukhin 's PR #2869 that had a lot of conflicts. You can check out the PR description from there. ## The main idea is: Before, if we had an index on `x` and had a query like `WHERE x > 100 and x < 200`, the plan would be something like: ``` - Seek to first row where x > 100 - Then, for every row, discard the row if x >= 200 ``` This is highly wasteful in cases where there are a lot of rows where `x >= 200`. Since our index is sorted on `x`, we know that once we hit the _first_ row where `x >= 200`, we can stop iterating entirely. So, the new plan is: ``` - Seek to first row where x > 100 - Then, iterate rows until x >= 200, and then stop ``` This also improves the situation for multi-column indexes. Imagine index on `(x,y)` and a condition like `WHERE x = 100 and y > 100 and y < 200`. Before, the plan was: ``` - Seek to first row where x=100 and y > 100 - Then, iterate rows while x = 100 and discard the row if y >= 200 - Stop when x > 100 ``` This also suffers from a problem where if there are a lot of rows where `x=100` and `y >= 200`, we go through those rows unnecessarily. The new plan is: ``` - Seek to first row where x=100 and y > 100 - Then, iterate rows while x = 100 and y < 200 - Stop when either x > 100 or y >= 200 ``` Which prevents us from iterating rows like `x=100, y = 666` unnecessarily because we know the index is sorted on `(x,y)` - once we hit any row where `x>100` OR `x=100, y >= 200`, we can stop. Closes #3644	2025-10-09 14:47:15 +03:00
Jussi Saurio	e726803ab4	Merge 'translate: make bind_and_rewrite_expr() reject unbound identifiers if no referenced tables exist' from Jussi Saurio Before, we just skipped evaluating `Id`, `Qualified` and `DoublyQualified` if `referenced_tables` was `None`, leading to shit like #3621. Let's eagerly return `"No such column"` parse errors in these cases instead, and punch exceptions for cases where that doesn't cleanly work Top tip: use `Hide whitespace` toggle when inspecting the diff of this PR Closes #3621 Reviewed-by: Preston Thorpe <preston@turso.tech> Closes #3626	2025-10-09 12:45:16 +03:00
Nikita Sivukhin	4313f57ecb	Optimize range scans	2025-10-09 11:47:41 +03:00
Pavan-Nambi	f0d9ead19f	add more tests refactor and use sort_unstable_by_key	2025-10-09 08:28:59 +05:30
Pavan-Nambi	f138448da2	don't allow duplicate col names in create table	2025-10-09 08:09:31 +05:30
PThorpe92	a232e3cc7a	Implement proper handling of deferred foreign keys	2025-10-07 16:45:23 -04:00
PThorpe92	f56f37fae5	Add more tests for self-referencing FKs and remove unneeded FkIfZero checks/labels in emitter	2025-10-07 16:45:23 -04:00
PThorpe92	99ae96c5f6	Fix self-referential FK relationships and validation of FKs	2025-10-07 16:45:22 -04:00
PThorpe92	ae975afe49	Remove unnecessary FK resolution on schema parsing	2025-10-07 16:45:16 -04:00
Jussi Saurio	a343dacaaf	translate: make bind_and_rewrite_expr() reject identifiers if no referenced tables exist	2025-10-07 23:34:26 +03:00
PThorpe92	16d19fd39e	Add tcl tests for foreign keys	2025-10-07 16:28:04 -04:00
Pekka Enberg	dacb8e3350	Merge 'Fix attach I/O error with in-memory databases' from Preston Thorpe closes #3540 Closes #3602	2025-10-07 09:00:02 +03:00
PThorpe92	addb9ef65b	Add regression test for #3540 attach issue	2025-10-06 21:33:42 -04:00
Glauber Costa	beb44e8e8c	fix mviews with re-insertion of data with the same key There is currently a bug found in our materialized view implementation that happens when we delete a row, and then re-insert another row with the same primary key. Our insert code needs to detect updates and generate a DELETE + INSERT. But in this case, after the initial DELETE, the fresh insert generates another delete. We ended up with the wrong response for aggregations (and I am pretty sure even filter-only views would manifest the bug as well), where groups that should still be present just disappeared because of the extra delete. A new test case is added that fails without the fix.	2025-10-06 20:12:49 -05:00
Pekka Enberg	b063d0d41a	Merge 'Don't panic if doing INSERT INTO ... SELECT rowid' from Jussi Saurio Backport: 0.2 Closes #3567 Reviewed-by: Preston Thorpe <preston@turso.tech> Closes #3572	2025-10-04 10:11:09 +03:00
Pekka Enberg	50607607fa	Merge 'Actually enforce uniqueness in create unique index' from Jussi Saurio we just weren't doing it 🤡 Backport: 0.2 Closes #3568 Reviewed-by: Preston Thorpe <preston@turso.tech> Closes #3571	2025-10-04 10:07:44 +03:00
Jussi Saurio	81b437c690	Don't panic if doing INSERT INTO ... SELECT rowid Closes #3567	2025-10-03 23:12:24 +03:00
Jussi Saurio	8dac1ba21a	Fix: actually enforce uniqueness in CREATE UNIQUE INDEX ...we just didn't do it	2025-10-03 22:58:42 +03:00
Pekka Enberg	1b42f77300	Merge 'Add short writes to unreliable-libc' from FamHaggs Add short writes in the faulty_libc As @PThorpe92 stated in #3209, this should be implemented here instead of the memory io in the simulator. Running this in the stress test I caught a logic bug in the try_pwritev_raw I will create a pr for that small fix. I will close #3209 in favor of this pr. Closes #3569	2025-10-03 21:52:47 +03:00
FHaggs	dd6e092a5c	Add short writes to pwritev in faulty_libc.	2025-10-03 18:35:03 +02:00
Jussi Saurio	d2f5e67b25	Merge 'Fix COLLATE' from Jussi Saurio Fixes the following problems with COLLATE: - Fix: incorrectly used e.g. `x COLLATE NOCASE = 'fOo'` as index constraint on an index whose column was not case-insensitively collated - Fix: various ephemeral indexes (in GROUP BY, ORDER BY, DISTINCT) and subqueries did not retain proper collation information of columns - Fix: collation of a given expression was not determined properly according to SQLite's rules Adds TCL tests and fuzz test Closes #3476 Closes #1524 Closes #3305 Reviewed-by: Preston Thorpe <preston@turso.tech> Closes #3538	2025-10-03 09:34:24 +03:00
Jussi Saurio	ce7fe54841	Collate: add more TCL tests	2025-10-02 21:49:33 +03:00
PThorpe92	361bd70a26	Add regression test for rowid affinity	2025-10-02 14:31:22 -04:00
Pekka Enberg	dc1463c70d	Merge 'Improve error handling for cyclic views' from Duy Dang The cycle is detected by marking a seen view, if a seen view is process again, that's a cycle and we throw an error. Close #3404 Closes #3467	2025-10-02 19:33:12 +03:00
FHaggs	095e72eac9	Add short write to pwrite in faulty_libc.	2025-10-02 16:11:15 +02:00
Jussi Saurio	30e6524c4e	Fix: JOIN USING should pick columns from left table, not right Closes #3468 Closes #3479	2025-10-02 06:56:52 +03:00
Jussi Saurio	c0da38e24a	Merge 'Clear WhereTerm 'from_outer_join' state when LEFT JOIN is optimized to INNER JOIN' from Jussi Saurio Closes #3470 ## Background In a query like `SELECT * FROM t LEFT JOIN s ON t.a=s.a WHERE s.a = 'foo'` we can remove the LEFT JOIN and replace it with an `INNER JOIN` because NULL values will never be equal to 'foo'. Rewriting as `INNER JOIN` allows the optimizer to also reorder the table join order to come up with a more efficient query plan. In fact, we have this optimization already. ## Problem However, there is a dumb bug where `WhereTerm`s involving this join still retain their `from_outer_join` state, resulting in forcing the evaluation of those terms at the original join index, which results in completely wrong bytecode if the join optimizer decides to reorder the join as `s JOIN t` instead. Effectively it will evaluate `t.a=s.a` after table `s` is open but table `t` is not open yet. ## Fix This PR fixes that issue by clearing `from_outer_join` properly from the relevant `WhereTerm`s. Closes #3475	2025-10-02 06:56:07 +03:00

1 2 3 4 5 ...

1039 Commits