turso

mirror of https://github.com/aljazceru/turso.git synced 2026-01-06 17:54:20 +01:00

Author	SHA1	Message	Date
Jussi Saurio	f3ea9a603a	add support for SELECT DISTINCT	2025-05-22 16:51:03 +03:00
Jussi Saurio	b0c3483e94	Allocate ephemeral index for SELECT DISTINCT	2025-05-22 16:51:03 +03:00
Jussi Saurio	76227ec274	Rename to Distinctness + add distinctness information to SelectPlan	2025-05-22 16:51:03 +03:00
Jussi Saurio	533a00eae3	Fix bug in op_decr_jump_zero()	2025-05-22 11:40:49 +03:00
Jussi Saurio	8bec75d804	Merge 'Initial Support for Nested Translation' from Pedro Muniz This PR introduces some modifications to the Program Builder to allow us to use nested parsing. By focusing the emission of Init and the last Goto (prologue and epilogue), inside the ProgramBuilder, we can just not emit them if we are parsing/translating in a nested context. For this PR, I only migrated insert to use these functions as I need them to support Insert statements that use `SELECT FROM` syntax. Nested parsing overall enables code reuse for us and arguably is one of the only ways to parse deeply nested queries without a lot of code duplication. #1528 Closes #1543	2025-05-22 10:52:00 +03:00
Jussi Saurio	c7f984c5c8	Merge 'Page cache fixes' from Pere Diaz Bou This PR builds on top of https://github.com/tursodatabase/limbo/pull/1368 and adds few things like allowing inserting pages with the same page key, fix fuzz tests by adding transactions and some minor improvements to cacheflush. Closes #1523	2025-05-22 10:12:56 +03:00
Jussi Saurio	fc150b12c9	Merge 'CSV virtual table extension' from Piotr Rżysko This PR adds a port of [SQLite's CSV virtual table extension](https://www.sqlite.org/csv.html). Planned follow-ups: * Pass detailed error messages from `VTabModule::create`, not just `ResultCode`s. * Address the TODO in `VTabModuleImpl::create_schema`. Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #1544	2025-05-22 09:48:53 +03:00
pedrocarlo	53bf5d5ef5	adjust translate functions to take a program instead of `Option<ProgramBuilder>` + remove any Init emission in traslate functions + use epilogue in all places necessary	2025-05-21 16:41:10 -03:00
pedrocarlo	1c12535d9f	push prologue to top-level translate function	2025-05-21 15:50:43 -03:00
pedrocarlo	8084d54c26	lift pragma statement handling as it cannot be created in a nested context	2025-05-21 14:13:28 -03:00
pedrocarlo	d21229d4a3	create inner translate function to enable calling it from a nested context	2025-05-21 14:08:02 -03:00
pedrocarlo	3090dd91fa	push translate_ctx creation outside of prologue	2025-05-21 13:06:25 -03:00
pedrocarlo	fc08f786fc	use prologue and epilogue in insert	2025-05-21 12:47:51 -03:00
pedrocarlo	f5d6d11d16	extract prologue and epilogue to program builder	2025-05-21 12:47:51 -03:00
pedrocarlo	517c7c81cd	refactor to include optional program builder argument	2025-05-21 12:47:51 -03:00
Pere Diaz Bou	b135bf449f	reduce attempts for fuzz_long overflow	2025-05-21 15:40:42 +02:00
Pere Diaz Bou	7143e43dd4	clippy	2025-05-21 15:27:15 +02:00
Pere Diaz Bou	a69f85be84	cacheflush clear cache	2025-05-21 14:20:11 +02:00
Pere Diaz Bou	4704cdd24f	validate_btree pin pages	2025-05-21 14:20:11 +02:00
Pere Diaz Bou	ddb166f0f0	custom hashmap for page cache	2025-05-21 14:19:56 +02:00
Pere Diaz Bou	c365d79cb1	minimum capacity 10 in page cache	2025-05-21 14:19:56 +02:00
Pere Diaz Bou	b76961ce35	balance mark dirty from start	2025-05-21 14:19:56 +02:00
Pere Diaz Bou	591c674e86	Introduce PageRef wrapper `BTreePage`. One problem we have with PageRef, is that this Page reference can be unloaded, this means if we read the page again instead of loading the page onto the same reference, we will have split brain of references. To solve this we wrap PageRef in `BTreePage` so that if a page is seen as unloaded, we will replace BTreePage::page with the newest version of the page.	2025-05-21 14:19:41 +02:00
Pere Diaz Bou	35f7317724	add default page cache	2025-05-21 14:11:21 +02:00
Pere Diaz Bou	15d24bd818	Start transactions in fuzz tests to flush pages Previously, fuzz tests increase the size of page cache indefinitely, therefore the was no problem of reaching the capacity of a page cache. By adding transactions to fuzz tests we allow pages to remove dirty flags once insert is finished.	2025-05-21 14:11:20 +02:00
Pere Diaz Bou	adf72f2bf8	allow updating a page id in page cache	2025-05-21 14:09:39 +02:00
Pere Diaz Bou	35e2088b7e	cacheflush move dirty page to new snapshot After inserting a page into the wal, we dispose of the modified page. This is unnecessary as we can simply move new page to the newest snapshot where this page can be read.	2025-05-21 14:09:39 +02:00
Pere Diaz Bou	9677997c63	fix page cache fuzz to test whether a key is in the cache, we must use peek without touching the value in order to not promote and change the order of values in lru cache	2025-05-21 14:09:39 +02:00
Pere Diaz Bou	04323f95a5	increase cache size in empty_btree	2025-05-21 14:09:39 +02:00
Pere Diaz Bou	67e260ff71	allow delete of dirty page in cacheflush Dirty pages can be deleted in `cacheflush`. Furthermore, there could be multiple live references in the stack of a cursor so let's allow them to exist while deleting.	2025-05-21 14:09:39 +02:00
Alecco	e2f99a1ad2	page_cache: implement resize	2025-05-21 14:09:39 +02:00
Alecco	e808a28c98	WIP (squash) adapt pager and btree to page cache error handling	2025-05-21 14:09:39 +02:00
Alecco	4ef3c1d04d	page_cache: fix insert and evict logic insert() fails if key exists (there shouldn't be two) and panics if it's different pages, and also fails if it can't make room for the page. Replaced the limited pop_if_not_dirty() function with make_room_for(). It tries to evict many pages as requested spare capacity. It should come handy later by resize() and Pager. make_room_for() tries to make room or fails if it can't evict enough entries. For make_room_for() I also tried with an all-or-nothing approach, so if say a query requests a lot more than possible to make room for, it doesn't evict a bunch of pages from the cache that might be useful. But implementing this approach got very complicated since it needs to keep exclusive PageRefs and collecting this caused segfaults. Might be worth trying again in the future. But beware the rabbit hole. Updated page cache test logic for new insert rules. Updated Pager.allocate_page() to handle failure logic but needs further work. This is to show new cache insert handling. There are many places to update. Left comments on callers of pager and page cache needing to update error handling, for now.	2025-05-21 14:09:39 +02:00
Alecco	bdf427c329	page_cache: proper error handling for deletions Add error handling and results for insert(), delete(), _delete(), _detach(), pop_if_not_dirty(), and clear. Now these functions fail if a page is dirty, locked, or has other references. insert() makes room with pop_if_not_dirty() beforehand to handle cache full and un-evictable, else it would evict this page silently. _delete() returns Ok when key is not present in cache and it tries first to detach the cache entry and clean its page before removing the entry from the map. detach() checks firstt if it's possible to evict the page and if there are no other references to the page before taking its contents. test_detach_via_delete() and test_detach_via_insert() fixed by properly checking before and after dropping the page reference. test_page_cache_fuzz() fixed by reordering and moving reference to the page into insert. Other page cache tests fixed to check new function results. All page cache tests pass. Error handling and test fixes for Pager and BTree will be added in a subsequent commit.	2025-05-21 14:09:39 +02:00
Alecco	c8beddab09	page_cache: split unlink() out of detach() The unlink function removes an entry from the LRU. The detach function removes an entry in the cache and clears page contents.	2025-05-21 14:09:39 +02:00
Alecco	6763aa0cd5	page_cache: tests: helper functions and more tests test_detach_via_insert fails as it repros insert not removing duplicate page entries with same cache key (id, frame) issue #1348	2025-05-21 14:09:39 +02:00
Alecco	7e898eb8ca	page_cache: tests: move helper function up	2025-05-21 14:09:39 +02:00
Jussi Saurio	696c98877c	Merge 'btree: Remove assumption that all btrees have a rowid' from Jussi Saurio For example, implementing `SELECT DISTINCT` (#1517) and `UNION` (#1545) require that we are able to create indexes without a rowid column present. Similarly, `WITHOUT ROWID` tables require this. I implemented this by replacing the `rowid` and `empty_record` properties in `BtreeCursor` with ```rust /// Whether the cursor is currently pointing to a record. #[derive(Debug, Clone, Copy, PartialEq)] enum CursorHasRecord { Yes { rowid: Option<u64>, // not all indexes and btrees have rowids, so this is optional. }, No, } ``` Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #1518	2025-05-21 14:53:00 +03:00
Pekka Enberg	580d55f255	Merge 'bindings/rust: Add pragma methods' from Diego Reis I tried to be the most similar to rusqlite as possible. The only thing that's bothering me is `Vec<Vec<Value>>` which I think can be improved but not so sure how, any inputs on this are welcomed. Closes #1536	2025-05-21 12:38:34 +03:00
Piotr Rzysko	ad9d044a04	Add CSV extension	2025-05-21 09:22:59 +02:00
Piotr Rzysko	9c1dca72db	Introduce VTable This allows storing table arguments parsed in the VTabModule::create method.	2025-05-21 08:33:17 +02:00
Piotr Rzysko	6b454ea36f	Normalize column names when creating virtual tables Ensures consistent handling of column names between virtual and regular tables and allows the use of quoted column names.	2025-05-21 08:30:26 +02:00
Diego Reis	44541cb0d5	wip: Add more pragma methods	2025-05-20 09:50:05 -03:00
Jussi Saurio	c4548b51f1	Merge 'Optimization: lift common subexpressions from OR terms' from Jussi Saurio ```sql -- This PR does effectively this transformation: select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#22' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 8 and l_quantity <= 8 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#23' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 10 and l_quantity <= 10 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#12' and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity >= 24 and l_quantity <= 24 + 10 and p_size between 1 and 15 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ); -- Same query with common conjuncts (ANDs) extracted: select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where p_partkey = l_partkey and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' and ( ( p_brand = 'Brand#22' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 8 and l_quantity <= 8 + 10 and p_size between 1 and 5 ) or ( p_brand = 'Brand#23' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 10 and l_quantity <= 10 + 10 and p_size between 1 and 10 ) or ( p_brand = 'Brand#12' and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity >= 24 and l_quantity <= 24 + 10 and p_size between 1 and 15 ) ); ``` This allows Limbo's optimizer to 1. recognize `p_partkey=l_partkey` as an index constraint on `part`, and 2. filter out `lineitem` rows before joining. With this optimization, Limbo completes TPC-H `19.sql` nearly as fast as SQLite on my machine. Without it, Limbo takes forever. This branch: `939ms` Main: `uh, i started running it a few minutes ago and it hasnt finished, and i dont feel like waiting i guess` Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #1520	2025-05-20 14:33:49 +03:00
Jussi Saurio	14058357ad	Merge 'refactor: replace Operation::Subquery with Table::FromClauseSubquery' from Jussi Saurio Previously the Operation enum consisted of: - Operation::Scan - Operation::Search - Operation::Subquery Which was always a dumb hack because what we really are doing is an Operation::Scan on a "virtual"/"pseudo" table (overloaded names...) derived from a subquery appearing in the FROM clause. Hence, refactor the relevant data structures so that the Table enum now contains a new variant: Table::FromClauseSubquery And the Operation enum only consists of Scan and Search. ``` SELECT * FROM (SELECT ...) sub; -- the subquery here was previously interpreted as Operation::Subquery on a Table::Pseudo, -- with a lot of special handling for Operation::Subquery in different code paths -- now it's an Operation::Scan on a Table::FromClauseSubquery ``` No functional changes (intended, at least!) Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #1529	2025-05-20 14:31:42 +03:00
Jussi Saurio	63457bda14	Adjust logic not to delete WhereTerms, since 'consumed' property was introduced	2025-05-20 14:28:05 +03:00
Jussi Saurio	6790b7479c	Optimization: lift common subexpressions from OR terms ```sql -- This PR does effectively this transformation: select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#22' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 8 and l_quantity <= 8 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#23' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 10 and l_quantity <= 10 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#12' and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity >= 24 and l_quantity <= 24 + 10 and p_size between 1 and 15 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ); -- Same query with common conjuncts (ANDs) extracted: select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where p_partkey = l_partkey and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' and ( ( p_brand = 'Brand#22' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 8 and l_quantity <= 8 + 10 and p_size between 1 and 5 ) or ( p_brand = 'Brand#23' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 10 and l_quantity <= 10 + 10 and p_size between 1 and 10 ) or ( p_brand = 'Brand#12' and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity >= 24 and l_quantity <= 24 + 10 and p_size between 1 and 15 ) ); ```	2025-05-20 14:25:15 +03:00
Jussi Saurio	0f2bd1f3a2	Doc comment for IndexKeyInfo (thanks copilot) Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-05-20 14:22:17 +03:00
Jussi Saurio	42dc824794	Fix: make OpenEphemeral use new_index() instead of new()	2025-05-20 14:22:17 +03:00
Jussi Saurio	e4334dcfdf	Add enum CursorHasRecord to remove assumption that all btrees have rowid	2025-05-20 14:22:17 +03:00

1 2 3 4 5 ...

2633 Commits