Commit Graph

907 Commits

Author SHA1 Message Date
Jussi Saurio
c4548b51f1 Merge 'Optimization: lift common subexpressions from OR terms' from Jussi Saurio
```sql
-- This PR does effectively this transformation:

select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#22'
		and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
		and l_quantity >= 8 and l_quantity <= 8 + 10
		and p_size between 1 and 5
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#23'
		and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
		and l_quantity >= 10 and l_quantity <= 10 + 10
		and p_size between 1 and 10
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#12'
		and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
		and l_quantity >= 24 and l_quantity <= 24 + 10
		and p_size between 1 and 15
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	);

-- Same query with common conjuncts (ANDs) extracted:
select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	p_partkey = l_partkey
	and l_shipmode in ('AIR', 'AIR REG')
	and l_shipinstruct = 'DELIVER IN PERSON'
	and (
		(
			p_brand = 'Brand#22'
			and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
			and l_quantity >= 8 and l_quantity <= 8 + 10
			and p_size between 1 and 5
		)
		or
		(
			p_brand = 'Brand#23'
			and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
			and l_quantity >= 10 and l_quantity <= 10 + 10
			and p_size between 1 and 10
		)
		or
		(
			p_brand = 'Brand#12'
			and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
			and l_quantity >= 24 and l_quantity <= 24 + 10
			and p_size between 1 and 15
		)
	);
```
This allows Limbo's optimizer to 1. recognize `p_partkey=l_partkey` as
an index constraint on `part`, and 2. filter out `lineitem` rows before
joining. With this optimization, Limbo completes TPC-H `19.sql` nearly
as fast as SQLite on my machine. Without it, Limbo takes forever.
This branch: `939ms`
Main: `uh, i started running it a few minutes ago and it hasnt finished,
and i dont feel like waiting i guess`

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1520
2025-05-20 14:33:49 +03:00
Jussi Saurio
14058357ad Merge 'refactor: replace Operation::Subquery with Table::FromClauseSubquery' from Jussi Saurio
Previously the Operation enum consisted of:
- Operation::Scan
- Operation::Search
- Operation::Subquery
Which was always a dumb hack because what we really are doing is an
Operation::Scan on a "virtual"/"pseudo" table (overloaded names...)
derived from a subquery appearing in the FROM clause.
Hence, refactor the relevant data structures so that the Table enum now
contains a new variant:
Table::FromClauseSubquery
And the Operation enum only consists of Scan and Search.
```
SELECT * FROM (SELECT ...) sub;

-- the subquery here was previously interpreted as Operation::Subquery on a Table::Pseudo,
-- with a lot of special handling for Operation::Subquery in different code paths
-- now it's an Operation::Scan on a Table::FromClauseSubquery
```
No functional changes (intended, at least!)

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1529
2025-05-20 14:31:42 +03:00
Jussi Saurio
63457bda14 Adjust logic not to delete WhereTerms, since 'consumed' property was introduced 2025-05-20 14:28:05 +03:00
Jussi Saurio
6790b7479c Optimization: lift common subexpressions from OR terms
```sql
-- This PR does effectively this transformation:

select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#22'
		and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
		and l_quantity >= 8 and l_quantity <= 8 + 10
		and p_size between 1 and 5
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#23'
		and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
		and l_quantity >= 10 and l_quantity <= 10 + 10
		and p_size between 1 and 10
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#12'
		and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
		and l_quantity >= 24 and l_quantity <= 24 + 10
		and p_size between 1 and 15
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	);

-- Same query with common conjuncts (ANDs) extracted:
select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	p_partkey = l_partkey
	and l_shipmode in ('AIR', 'AIR REG')
	and l_shipinstruct = 'DELIVER IN PERSON'
	and (
		(
			p_brand = 'Brand#22'
			and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
			and l_quantity >= 8 and l_quantity <= 8 + 10
			and p_size between 1 and 5
		)
		or
		(
			p_brand = 'Brand#23'
			and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
			and l_quantity >= 10 and l_quantity <= 10 + 10
			and p_size between 1 and 10
		)
		or
		(
			p_brand = 'Brand#12'
			and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
			and l_quantity >= 24 and l_quantity <= 24 + 10
			and p_size between 1 and 15
		)
	);
```
2025-05-20 14:25:15 +03:00
Jussi Saurio
9d3aca6e8f Fix compile error after merge 2025-05-20 14:19:32 +03:00
Jussi Saurio
57d8f20135 Merge 'Add collation column to Index struct' from Jussi Saurio
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1532
2025-05-20 14:18:17 +03:00
Pekka Enberg
e102cd0be5 Merge 'Add support for DISTINCT aggregate functions' from Jussi Saurio
Reviewable commit by commit. CI failures are not related.
Adds support for e.g. `select first_name, sum(distinct age),
count(distinct age), avg(distinct age) from users group by 1`
Implementation details:
- Creates an ephemeral index per distinct aggregate, and jumps over the
accumulation step if a duplicate is found

Closes #1507
2025-05-20 13:58:57 +03:00
Jussi Saurio
3121c6cdd3 Replace Operation::Subquery with Table::FromClauseSubquery
Previously the Operation enum consisted of:

- Operation::Scan
- Operation::Search
- Operation::Subquery

Which was always a dumb hack because what we really are doing is
an Operation::Scan on a "virtual"/"pseudo" table (overloaded names...)
derived from a subquery appearing in the FROM clause.

Hence, refactor the relevant data structures so that the Table enum
now contains a new variant:

Table::FromClauseSubquery

And the Operation enum only consists of Scan and Search.

No functional changes (intended, at least!)
2025-05-20 12:56:30 +03:00
Jussi Saurio
9c710b5292 Add collation column to Index struct 2025-05-20 12:52:54 +03:00
pedrocarlo
4a3119786e refactor BtreeCursor and Sorter to accept Vec of collations 2025-05-19 15:22:55 -03:00
pedrocarlo
5bd47d7462 post rebase adjustments to accomodate new instructions that were created before the merge conflicts 2025-05-19 15:22:15 -03:00
pedrocarlo
cc86c789d6 Correct Rtrim 2025-05-19 15:22:15 -03:00
pedrocarlo
bf1fe9e0b3 Actually fixed group by and order by collation 2025-05-19 15:22:15 -03:00
pedrocarlo
0df6c87f07 Fixed Group By collation 2025-05-19 15:22:14 -03:00
pedrocarlo
bba9689674 Fixed matching bug for defining collation context to use 2025-05-19 15:22:14 -03:00
pedrocarlo
a818b6924c Removed repeated binary expression translation. Adjusted the set_collation to capture additional context of whether it was set by a Collate expression or not. Added some tests to prove those modifications were necessary. 2025-05-19 15:22:14 -03:00
pedrocarlo
f8854f180a Added collation to create table columns 2025-05-19 15:22:14 -03:00
pedrocarlo
d0a63429a6 Naive implementation of collate for queries. Not implemented for column constraints 2025-05-19 15:22:14 -03:00
pedrocarlo
b5b1010e7c set binary collation as default 2025-05-19 15:22:14 -03:00
pedrocarlo
510c70e919 Create CollationSeq enum and functions. Move strum to workspace dependency to avoid version mismatch with Parser 2025-05-19 15:22:14 -03:00
Jussi Saurio
d2b1be8af7 Merge 'optimizer: fix order by removal logic' from Jussi Saurio
1. `group_by_contains_all` was incorrect - it was not checking that all
order by columns are in group by; it was instead checking that all group
by columns are in order by, which is absolutely incorrect for the
intended purpose.
2. remove ORDER BY clause if GROUP BY clause can sort the rows in the
same way.
Test failures are not related

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1511
2025-05-19 11:29:17 +03:00
Jussi Saurio
b7b4f6a390 Merge 'Mark WHERE terms as consumed instead of deleting them' from Jussi Saurio
We've run into trouble in multiple places due to the fact that we delete
terms from the where clause (e.g. when a constant condition is removed,
or the term becomes part of an index seek key).
A simpler solution is to add a flag indicating that the term is consumed
(used), so that it is not translated in the main loop anymore when WHERE
clause terms are evaluated.
note: CI failures are unrelated

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1477
2025-05-19 11:28:09 +03:00
pedrocarlo
7f081c1ac9 remove transmute. Just iterate over columns. No need for unsafe 2025-05-18 12:32:49 -03:00
pedrocarlo
0e6ef1f478 removed some clone, simplified slightly logic + also inserted the column name of the current column we are iterating, not only the last column contained in PrimaryKeyDefinitionType::Simple 2025-05-17 15:32:58 -03:00
pedrocarlo
166dc2184e fix autoindex creation not detecting if column existed in created table declaration using transmute to avoid cloning 2025-05-17 12:58:00 -03:00
Jussi Saurio
93d88527c3 optimizer: remove order by if group by already sorts the result properly 2025-05-17 17:42:52 +03:00
Jussi Saurio
ce8b2722cf optimizer: fix incorrect logic in group_by_contains_all 2025-05-17 17:28:29 +03:00
Jussi Saurio
d584a1879b Mark WHERE terms as consumed instead of deleting them
We've run into trouble in multiple places due to the fact that
we delete terms from the where clause (e.g. when a constant condition
is removed, or the term becomes part of an index seek key).

A simpler solution is to add a flag indicating that the term is
consumed (used), so that it is not translated in the main loop
anymore when WHERE clause terms are evaluated.
2025-05-17 15:44:12 +03:00
Jussi Saurio
51c75c6014 Support distinct aggregates in GROUP BY 2025-05-17 15:33:55 +03:00
Jussi Saurio
653a3a7e13 Support distinct aggregates in non-GROUPBY context 2025-05-17 15:33:55 +03:00
Jussi Saurio
415c4ee624 Allocate ephemeral index cursors for DISTINCT aggregates 2025-05-17 15:33:55 +03:00
Jussi Saurio
368c45e025 Add distinctness information to Aggregate struct 2025-05-17 15:33:55 +03:00
Pere Diaz Bou
74328f2617 fix allocation of indices BTreeCreate registers
For some reason we always allocated one more index than required when we
had `total_indices>1`.
2025-05-16 10:37:04 +02:00
Pere Diaz Bou
ff524d037d fix autoindex of primary key marked as unique
Primary keys that are marked as unique constraint, do not need to have
separate indexes, one is enough. In the case primary key is integer,
therefore rowid alias, we still need an index to satisfy unique
constraint.
2025-05-16 10:37:04 +02:00
Pekka Enberg
e3f71259d8 Rename OwnedValue -> Value
We have not had enough merge conflicts for a while so let's do a
tree-wide rename.
2025-05-15 09:59:46 +03:00
pedrocarlo
72cc0fcdcb fixes and comments 2025-05-14 13:30:39 -03:00
pedrocarlo
b2615d7739 add CursorValidState and only save context in delete when rebalancing 2025-05-14 13:30:39 -03:00
pedrocarlo
814508981c fixing more rebase issues and cleaning up code. Save cursor context when calling delete for later use when needed 2025-05-14 13:30:39 -03:00
pedrocarlo
c69f503eac rebase adjustments 2025-05-14 13:30:39 -03:00
pedrocarlo
05f4ca28cc btree rewind and next fix. Keep track of rowids seen to avoid infinite loop 2025-05-14 13:30:39 -03:00
pedrocarlo
6588004f80 fix incorrectly detecting if user provided row_id_alias to set clause 2025-05-14 13:30:39 -03:00
pedrocarlo
482634b598 adjust null opcode emission based in rowid_alias 2025-05-14 13:30:39 -03:00
pedrocarlo
758dfff2fe modified tests as we do not have rollback yet. Also correctly raise a contraint error on primary keys only 2025-05-14 13:30:39 -03:00
pedrocarlo
3aaf4206b7 altered constraint tests to create bad update statements. Tests caught a bug where I was copying the wrong values from the registers 2025-05-14 13:30:39 -03:00
pedrocarlo
cf7f60b8f5 changed from resolve_label to preassign_label 2025-05-14 13:30:39 -03:00
pedrocarlo
6457d7675a instruction emitted should be correct, but having an infinite loop bug 2025-05-14 13:30:39 -03:00
pedrocarlo
60a99851f8 emit NoConflict and Halt. Already detects unique constraints 2025-05-14 13:30:39 -03:00
pedrocarlo
5f2216cf8e modify explain for MakeRecord to show index name 2025-05-14 13:30:39 -03:00
pedrocarlo
9aebfa7b5d open cursors for write only once 2025-05-14 13:30:39 -03:00
pedrocarlo
5bae32fe3f modified OpenWrite to include index or table name in explain 2025-05-14 13:30:39 -03:00