Commit Graph

4624 Commits

Author SHA1 Message Date
Pere Diaz Bou
adf72f2bf8 allow updating a page id in page cache 2025-05-21 14:09:39 +02:00
Pere Diaz Bou
35e2088b7e cacheflush move dirty page to new snapshot
After inserting a page into the wal, we dispose of the modified page.
This is unnecessary as we can simply move new page to the newest
snapshot where this page can be read.
2025-05-21 14:09:39 +02:00
Pere Diaz Bou
9677997c63 fix page cache fuzz
to test whether a key is in the cache, we must use peek without touching
the value in order to not promote and change the order of values in lru
cache
2025-05-21 14:09:39 +02:00
Pere Diaz Bou
04323f95a5 increase cache size in empty_btree 2025-05-21 14:09:39 +02:00
Pere Diaz Bou
67e260ff71 allow delete of dirty page in cacheflush
Dirty pages can be deleted in `cacheflush`. Furthermore, there could be
multiple live references in the stack of a cursor so let's allow them to
exist while deleting.
2025-05-21 14:09:39 +02:00
Alecco
e2f99a1ad2 page_cache: implement resize 2025-05-21 14:09:39 +02:00
Alecco
e808a28c98 WIP (squash) adapt pager and btree to page cache error handling 2025-05-21 14:09:39 +02:00
Alecco
4ef3c1d04d page_cache: fix insert and evict logic
insert() fails if key exists (there shouldn't be two) and panics if
it's different pages, and also fails if it can't make room for the page.

Replaced the limited pop_if_not_dirty() function with make_room_for().
It tries to evict many pages as requested spare capacity. It should come
handy later by resize() and Pager. make_room_for() tries to make room or
fails if it can't evict enough entries.

For make_room_for() I also tried with an all-or-nothing approach, so if
say a query requests a lot more than possible to make room for, it
doesn't evict a bunch of pages from the cache that might be useful. But
implementing this approach got very complicated since it needs to keep
exclusive PageRefs and collecting this caused segfaults. Might be worth
trying again in the future. But beware the rabbit hole.

Updated page cache test logic for new insert rules.

Updated Pager.allocate_page() to handle failure logic but needs further
work. This is to show new cache insert handling. There are many places
to update.

Left comments on callers of pager and page cache needing to update
error handling, for now.
2025-05-21 14:09:39 +02:00
Alecco
bdf427c329 page_cache: proper error handling for deletions
Add error handling and results for insert(), delete(), _delete(),
_detach(), pop_if_not_dirty(), and clear.

Now these functions fail if a page is dirty, locked, or has other
references.

insert() makes room with pop_if_not_dirty() beforehand to handle
cache full and un-evictable, else it would evict this page
silently.

_delete() returns Ok when key is not present in cache and it tries
first to detach the cache entry and clean its page *before*
removing the entry from the map.

detach() checks firstt if it's possible to evict the page and if
there are no other references to the page before taking its
contents.

test_detach_via_delete() and test_detach_via_insert() fixed by
properly checking before and after dropping the page reference.

test_page_cache_fuzz() fixed by reordering and moving reference to
the page into insert.

Other page cache tests fixed to check new function results.

All page cache tests pass.

Error handling and test fixes for Pager and BTree will be added in
a subsequent commit.
2025-05-21 14:09:39 +02:00
Alecco
c8beddab09 page_cache: split unlink() out of detach()
The unlink function removes an entry from the LRU.

The detach function removes an entry in the cache and clears page
contents.
2025-05-21 14:09:39 +02:00
Alecco
6763aa0cd5 page_cache: tests: helper functions and more tests
test_detach_via_insert fails as it repros insert not removing duplicate
page entries with same cache key (id, frame) issue #1348
2025-05-21 14:09:39 +02:00
Alecco
7e898eb8ca page_cache: tests: move helper function up 2025-05-21 14:09:39 +02:00
Jussi Saurio
696c98877c Merge 'btree: Remove assumption that all btrees have a rowid' from Jussi Saurio
For example, implementing `SELECT DISTINCT` (#1517) and `UNION` (#1545)
require that we are able to create indexes without a rowid column
present. Similarly, `WITHOUT ROWID` tables require this.
I implemented this by replacing the `rowid` and `empty_record`
properties in `BtreeCursor` with
```rust
/// Whether the cursor is currently pointing to a record.
#[derive(Debug, Clone, Copy, PartialEq)]
enum CursorHasRecord {
    Yes {
        rowid: Option<u64>, // not all indexes and btrees have rowids, so this is optional.
    },
    No,
}
```

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1518
2025-05-21 14:53:00 +03:00
Pekka Enberg
580d55f255 Merge 'bindings/rust: Add pragma methods' from Diego Reis
I tried to be the most similar to rusqlite as possible. The only thing
that's bothering me is `Vec<Vec<Value>>` which I think can be improved
but not so sure how, any inputs on this are welcomed.

Closes #1536
2025-05-21 12:38:34 +03:00
Jussi Saurio
c800de4304 Merge 'Output rust backtrace in python tests' from Preston Thorpe
and any other env vars needed

Closes #1537
2025-05-20 17:59:07 +03:00
Jussi Saurio
28cd14ff1c Merge 'Fix labeler' from Jussi Saurio
Closes #1538
2025-05-20 16:34:16 +03:00
Jussi Saurio
1dc7518551 Fix labeler: checkout repo and add issues:write perm 2025-05-20 16:32:53 +03:00
PThorpe92
c62d3e464f Output rust backtrace in python tests 2025-05-20 09:20:59 -04:00
Diego Reis
44541cb0d5 wip: Add more pragma methods 2025-05-20 09:50:05 -03:00
Jussi Saurio
c4548b51f1 Merge 'Optimization: lift common subexpressions from OR terms' from Jussi Saurio
```sql
-- This PR does effectively this transformation:

select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#22'
		and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
		and l_quantity >= 8 and l_quantity <= 8 + 10
		and p_size between 1 and 5
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#23'
		and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
		and l_quantity >= 10 and l_quantity <= 10 + 10
		and p_size between 1 and 10
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#12'
		and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
		and l_quantity >= 24 and l_quantity <= 24 + 10
		and p_size between 1 and 15
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	);

-- Same query with common conjuncts (ANDs) extracted:
select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	p_partkey = l_partkey
	and l_shipmode in ('AIR', 'AIR REG')
	and l_shipinstruct = 'DELIVER IN PERSON'
	and (
		(
			p_brand = 'Brand#22'
			and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
			and l_quantity >= 8 and l_quantity <= 8 + 10
			and p_size between 1 and 5
		)
		or
		(
			p_brand = 'Brand#23'
			and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
			and l_quantity >= 10 and l_quantity <= 10 + 10
			and p_size between 1 and 10
		)
		or
		(
			p_brand = 'Brand#12'
			and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
			and l_quantity >= 24 and l_quantity <= 24 + 10
			and p_size between 1 and 15
		)
	);
```
This allows Limbo's optimizer to 1. recognize `p_partkey=l_partkey` as
an index constraint on `part`, and 2. filter out `lineitem` rows before
joining. With this optimization, Limbo completes TPC-H `19.sql` nearly
as fast as SQLite on my machine. Without it, Limbo takes forever.
This branch: `939ms`
Main: `uh, i started running it a few minutes ago and it hasnt finished,
and i dont feel like waiting i guess`

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1520
2025-05-20 14:33:49 +03:00
Jussi Saurio
14058357ad Merge 'refactor: replace Operation::Subquery with Table::FromClauseSubquery' from Jussi Saurio
Previously the Operation enum consisted of:
- Operation::Scan
- Operation::Search
- Operation::Subquery
Which was always a dumb hack because what we really are doing is an
Operation::Scan on a "virtual"/"pseudo" table (overloaded names...)
derived from a subquery appearing in the FROM clause.
Hence, refactor the relevant data structures so that the Table enum now
contains a new variant:
Table::FromClauseSubquery
And the Operation enum only consists of Scan and Search.
```
SELECT * FROM (SELECT ...) sub;

-- the subquery here was previously interpreted as Operation::Subquery on a Table::Pseudo,
-- with a lot of special handling for Operation::Subquery in different code paths
-- now it's an Operation::Scan on a Table::FromClauseSubquery
```
No functional changes (intended, at least!)

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1529
2025-05-20 14:31:42 +03:00
Jussi Saurio
63457bda14 Adjust logic not to delete WhereTerms, since 'consumed' property was introduced 2025-05-20 14:28:05 +03:00
Jussi Saurio
6790b7479c Optimization: lift common subexpressions from OR terms
```sql
-- This PR does effectively this transformation:

select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#22'
		and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
		and l_quantity >= 8 and l_quantity <= 8 + 10
		and p_size between 1 and 5
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#23'
		and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
		and l_quantity >= 10 and l_quantity <= 10 + 10
		and p_size between 1 and 10
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#12'
		and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
		and l_quantity >= 24 and l_quantity <= 24 + 10
		and p_size between 1 and 15
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	);

-- Same query with common conjuncts (ANDs) extracted:
select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	p_partkey = l_partkey
	and l_shipmode in ('AIR', 'AIR REG')
	and l_shipinstruct = 'DELIVER IN PERSON'
	and (
		(
			p_brand = 'Brand#22'
			and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
			and l_quantity >= 8 and l_quantity <= 8 + 10
			and p_size between 1 and 5
		)
		or
		(
			p_brand = 'Brand#23'
			and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
			and l_quantity >= 10 and l_quantity <= 10 + 10
			and p_size between 1 and 10
		)
		or
		(
			p_brand = 'Brand#12'
			and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
			and l_quantity >= 24 and l_quantity <= 24 + 10
			and p_size between 1 and 15
		)
	);
```
2025-05-20 14:25:15 +03:00
Jussi Saurio
0f2bd1f3a2 Doc comment for IndexKeyInfo (thanks copilot)
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-05-20 14:22:17 +03:00
Jussi Saurio
42dc824794 Fix: make OpenEphemeral use new_index() instead of new() 2025-05-20 14:22:17 +03:00
Jussi Saurio
e4334dcfdf Add enum CursorHasRecord to remove assumption that all btrees have rowid 2025-05-20 14:22:17 +03:00
Jussi Saurio
35350a2368 Add IndexKeyInfo to btree 2025-05-20 14:22:17 +03:00
Jussi Saurio
a7b33b1509 schema: add Index::has_rowid 2025-05-20 14:22:17 +03:00
Jussi Saurio
9d3aca6e8f Fix compile error after merge 2025-05-20 14:19:32 +03:00
Jussi Saurio
57d8f20135 Merge 'Add collation column to Index struct' from Jussi Saurio
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1532
2025-05-20 14:18:17 +03:00
Pekka Enberg
e102cd0be5 Merge 'Add support for DISTINCT aggregate functions' from Jussi Saurio
Reviewable commit by commit. CI failures are not related.
Adds support for e.g. `select first_name, sum(distinct age),
count(distinct age), avg(distinct age) from users group by 1`
Implementation details:
- Creates an ephemeral index per distinct aggregate, and jumps over the
accumulation step if a duplicate is found

Closes #1507
2025-05-20 13:58:57 +03:00
Jussi Saurio
3121c6cdd3 Replace Operation::Subquery with Table::FromClauseSubquery
Previously the Operation enum consisted of:

- Operation::Scan
- Operation::Search
- Operation::Subquery

Which was always a dumb hack because what we really are doing is
an Operation::Scan on a "virtual"/"pseudo" table (overloaded names...)
derived from a subquery appearing in the FROM clause.

Hence, refactor the relevant data structures so that the Table enum
now contains a new variant:

Table::FromClauseSubquery

And the Operation enum only consists of Scan and Search.

No functional changes (intended, at least!)
2025-05-20 12:56:30 +03:00
Jussi Saurio
9c710b5292 Add collation column to Index struct 2025-05-20 12:52:54 +03:00
Jussi Saurio
32aac8e9ef Merge 'Feature: Collate' from Pedro Muniz
I was implementing `ALTER TABLE .. RENAME TO`, and I noticed that
`COLLATE` was necessary for it to work.
This is a relatively big PR as to properly implement `COLLATE`, I needed
to add a field to a couple of instructions that are emitted frequently,
and there is a lot of boilerplate that is required when you do such a
change.
My main source of reference was this site from SQLite:
https://sqlite.org/datatype3.html#collation. It gives a good description
of the precedence of collation in certain expressions.
I did write a couple of tests that I thought caught the edges cases of
`COLLATE`, but honestly, I may have missed a few. I would appreciate
some help later to write more tests.
`Collate` basically just compares two `TEXT` values according to some
comparison function. If both values are not `TEXT`, just fallback to the
normal comparison we are already doing. `Collate` happens in four main
places:
- `Collate` Expression modifier
- `Binary` Expression
- `Column` Expression
- `Order By` and `Group By`
In `Binary`, `Order By`, `Group By` expressions, the collation sequence
for the comparisons can be derived from explicitly with the use of
`COLLATE` keyword, or implicitly if there is a `COLLATE` definition in
`CREATE TABLE`. If neither are present it defaults to `Binary`
collation.
For the `Column` expression, it tries to use collation in `CREATE TABLE`
column definition. If not present it defaults to `Binary` collation.
Lastly, there was some repetition on how the `Binary` expression was
being translated, so I removed that part. As mentioned in the
`COMPAT.md`, I did not implement custom collation sequences yet, as it
would deter me from properly implementing. I have some ideas of how I
can extend my current implementation to support that with FFI, but I
think that is best served for a different PR.

Closes #1367
2025-05-20 10:52:11 +03:00
Diego Reis
4766c9c286 bind/rust: Fix lifetime issue with pragma_query
Shallow cloning in Row ended up invalidating the pointer
to value
2025-05-19 21:29:07 -03:00
pedrocarlo
52533cab40 only pass collations for index in cursor + adhere to order of columns in index 2025-05-19 15:22:55 -03:00
pedrocarlo
22b6b88f68 fix rebase type errors 2025-05-19 15:22:55 -03:00
pedrocarlo
819fd0f496 use any error method instead, as limbo and sqlite error message differ slightly 2025-05-19 15:22:55 -03:00
pedrocarlo
5b15d6aa32 Get the table correctly from the connection instead of table_references + test to confirm unique constraint 2025-05-19 15:22:55 -03:00
pedrocarlo
4a3119786e refactor BtreeCursor and Sorter to accept Vec of collations 2025-05-19 15:22:55 -03:00
pedrocarlo
f28ce2b757 add collations to btree cursor 2025-05-19 15:22:55 -03:00
pedrocarlo
5bd47d7462 post rebase adjustments to accomodate new instructions that were created before the merge conflicts 2025-05-19 15:22:15 -03:00
pedrocarlo
cc86c789d6 Correct Rtrim 2025-05-19 15:22:15 -03:00
pedrocarlo
6d7a73fd60 More tests 2025-05-19 15:22:15 -03:00
pedrocarlo
bf1fe9e0b3 Actually fixed group by and order by collation 2025-05-19 15:22:15 -03:00
pedrocarlo
0df6c87f07 Fixed Group By collation 2025-05-19 15:22:14 -03:00
pedrocarlo
bba9689674 Fixed matching bug for defining collation context to use 2025-05-19 15:22:14 -03:00
pedrocarlo
a818b6924c Removed repeated binary expression translation. Adjusted the set_collation to capture additional context of whether it was set by a Collate expression or not. Added some tests to prove those modifications were necessary. 2025-05-19 15:22:14 -03:00
pedrocarlo
f8854f180a Added collation to create table columns 2025-05-19 15:22:14 -03:00
pedrocarlo
d0a63429a6 Naive implementation of collate for queries. Not implemented for column constraints 2025-05-19 15:22:14 -03:00