Commit Graph

4599 Commits

Author SHA1 Message Date
Jussi Saurio
28cd14ff1c Merge 'Fix labeler' from Jussi Saurio
Closes #1538
2025-05-20 16:34:16 +03:00
Jussi Saurio
1dc7518551 Fix labeler: checkout repo and add issues:write perm 2025-05-20 16:32:53 +03:00
Jussi Saurio
c4548b51f1 Merge 'Optimization: lift common subexpressions from OR terms' from Jussi Saurio
```sql
-- This PR does effectively this transformation:

select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#22'
		and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
		and l_quantity >= 8 and l_quantity <= 8 + 10
		and p_size between 1 and 5
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#23'
		and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
		and l_quantity >= 10 and l_quantity <= 10 + 10
		and p_size between 1 and 10
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#12'
		and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
		and l_quantity >= 24 and l_quantity <= 24 + 10
		and p_size between 1 and 15
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	);

-- Same query with common conjuncts (ANDs) extracted:
select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	p_partkey = l_partkey
	and l_shipmode in ('AIR', 'AIR REG')
	and l_shipinstruct = 'DELIVER IN PERSON'
	and (
		(
			p_brand = 'Brand#22'
			and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
			and l_quantity >= 8 and l_quantity <= 8 + 10
			and p_size between 1 and 5
		)
		or
		(
			p_brand = 'Brand#23'
			and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
			and l_quantity >= 10 and l_quantity <= 10 + 10
			and p_size between 1 and 10
		)
		or
		(
			p_brand = 'Brand#12'
			and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
			and l_quantity >= 24 and l_quantity <= 24 + 10
			and p_size between 1 and 15
		)
	);
```
This allows Limbo's optimizer to 1. recognize `p_partkey=l_partkey` as
an index constraint on `part`, and 2. filter out `lineitem` rows before
joining. With this optimization, Limbo completes TPC-H `19.sql` nearly
as fast as SQLite on my machine. Without it, Limbo takes forever.
This branch: `939ms`
Main: `uh, i started running it a few minutes ago and it hasnt finished,
and i dont feel like waiting i guess`

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1520
2025-05-20 14:33:49 +03:00
Jussi Saurio
14058357ad Merge 'refactor: replace Operation::Subquery with Table::FromClauseSubquery' from Jussi Saurio
Previously the Operation enum consisted of:
- Operation::Scan
- Operation::Search
- Operation::Subquery
Which was always a dumb hack because what we really are doing is an
Operation::Scan on a "virtual"/"pseudo" table (overloaded names...)
derived from a subquery appearing in the FROM clause.
Hence, refactor the relevant data structures so that the Table enum now
contains a new variant:
Table::FromClauseSubquery
And the Operation enum only consists of Scan and Search.
```
SELECT * FROM (SELECT ...) sub;

-- the subquery here was previously interpreted as Operation::Subquery on a Table::Pseudo,
-- with a lot of special handling for Operation::Subquery in different code paths
-- now it's an Operation::Scan on a Table::FromClauseSubquery
```
No functional changes (intended, at least!)

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1529
2025-05-20 14:31:42 +03:00
Jussi Saurio
63457bda14 Adjust logic not to delete WhereTerms, since 'consumed' property was introduced 2025-05-20 14:28:05 +03:00
Jussi Saurio
6790b7479c Optimization: lift common subexpressions from OR terms
```sql
-- This PR does effectively this transformation:

select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#22'
		and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
		and l_quantity >= 8 and l_quantity <= 8 + 10
		and p_size between 1 and 5
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#23'
		and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
		and l_quantity >= 10 and l_quantity <= 10 + 10
		and p_size between 1 and 10
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#12'
		and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
		and l_quantity >= 24 and l_quantity <= 24 + 10
		and p_size between 1 and 15
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	);

-- Same query with common conjuncts (ANDs) extracted:
select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	p_partkey = l_partkey
	and l_shipmode in ('AIR', 'AIR REG')
	and l_shipinstruct = 'DELIVER IN PERSON'
	and (
		(
			p_brand = 'Brand#22'
			and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
			and l_quantity >= 8 and l_quantity <= 8 + 10
			and p_size between 1 and 5
		)
		or
		(
			p_brand = 'Brand#23'
			and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
			and l_quantity >= 10 and l_quantity <= 10 + 10
			and p_size between 1 and 10
		)
		or
		(
			p_brand = 'Brand#12'
			and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
			and l_quantity >= 24 and l_quantity <= 24 + 10
			and p_size between 1 and 15
		)
	);
```
2025-05-20 14:25:15 +03:00
Jussi Saurio
9d3aca6e8f Fix compile error after merge 2025-05-20 14:19:32 +03:00
Jussi Saurio
57d8f20135 Merge 'Add collation column to Index struct' from Jussi Saurio
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1532
2025-05-20 14:18:17 +03:00
Pekka Enberg
e102cd0be5 Merge 'Add support for DISTINCT aggregate functions' from Jussi Saurio
Reviewable commit by commit. CI failures are not related.
Adds support for e.g. `select first_name, sum(distinct age),
count(distinct age), avg(distinct age) from users group by 1`
Implementation details:
- Creates an ephemeral index per distinct aggregate, and jumps over the
accumulation step if a duplicate is found

Closes #1507
2025-05-20 13:58:57 +03:00
Jussi Saurio
3121c6cdd3 Replace Operation::Subquery with Table::FromClauseSubquery
Previously the Operation enum consisted of:

- Operation::Scan
- Operation::Search
- Operation::Subquery

Which was always a dumb hack because what we really are doing is
an Operation::Scan on a "virtual"/"pseudo" table (overloaded names...)
derived from a subquery appearing in the FROM clause.

Hence, refactor the relevant data structures so that the Table enum
now contains a new variant:

Table::FromClauseSubquery

And the Operation enum only consists of Scan and Search.

No functional changes (intended, at least!)
2025-05-20 12:56:30 +03:00
Jussi Saurio
9c710b5292 Add collation column to Index struct 2025-05-20 12:52:54 +03:00
Jussi Saurio
32aac8e9ef Merge 'Feature: Collate' from Pedro Muniz
I was implementing `ALTER TABLE .. RENAME TO`, and I noticed that
`COLLATE` was necessary for it to work.
This is a relatively big PR as to properly implement `COLLATE`, I needed
to add a field to a couple of instructions that are emitted frequently,
and there is a lot of boilerplate that is required when you do such a
change.
My main source of reference was this site from SQLite:
https://sqlite.org/datatype3.html#collation. It gives a good description
of the precedence of collation in certain expressions.
I did write a couple of tests that I thought caught the edges cases of
`COLLATE`, but honestly, I may have missed a few. I would appreciate
some help later to write more tests.
`Collate` basically just compares two `TEXT` values according to some
comparison function. If both values are not `TEXT`, just fallback to the
normal comparison we are already doing. `Collate` happens in four main
places:
- `Collate` Expression modifier
- `Binary` Expression
- `Column` Expression
- `Order By` and `Group By`
In `Binary`, `Order By`, `Group By` expressions, the collation sequence
for the comparisons can be derived from explicitly with the use of
`COLLATE` keyword, or implicitly if there is a `COLLATE` definition in
`CREATE TABLE`. If neither are present it defaults to `Binary`
collation.
For the `Column` expression, it tries to use collation in `CREATE TABLE`
column definition. If not present it defaults to `Binary` collation.
Lastly, there was some repetition on how the `Binary` expression was
being translated, so I removed that part. As mentioned in the
`COMPAT.md`, I did not implement custom collation sequences yet, as it
would deter me from properly implementing. I have some ideas of how I
can extend my current implementation to support that with FFI, but I
think that is best served for a different PR.

Closes #1367
2025-05-20 10:52:11 +03:00
pedrocarlo
52533cab40 only pass collations for index in cursor + adhere to order of columns in index 2025-05-19 15:22:55 -03:00
pedrocarlo
22b6b88f68 fix rebase type errors 2025-05-19 15:22:55 -03:00
pedrocarlo
819fd0f496 use any error method instead, as limbo and sqlite error message differ slightly 2025-05-19 15:22:55 -03:00
pedrocarlo
5b15d6aa32 Get the table correctly from the connection instead of table_references + test to confirm unique constraint 2025-05-19 15:22:55 -03:00
pedrocarlo
4a3119786e refactor BtreeCursor and Sorter to accept Vec of collations 2025-05-19 15:22:55 -03:00
pedrocarlo
f28ce2b757 add collations to btree cursor 2025-05-19 15:22:55 -03:00
pedrocarlo
5bd47d7462 post rebase adjustments to accomodate new instructions that were created before the merge conflicts 2025-05-19 15:22:15 -03:00
pedrocarlo
cc86c789d6 Correct Rtrim 2025-05-19 15:22:15 -03:00
pedrocarlo
6d7a73fd60 More tests 2025-05-19 15:22:15 -03:00
pedrocarlo
bf1fe9e0b3 Actually fixed group by and order by collation 2025-05-19 15:22:15 -03:00
pedrocarlo
0df6c87f07 Fixed Group By collation 2025-05-19 15:22:14 -03:00
pedrocarlo
bba9689674 Fixed matching bug for defining collation context to use 2025-05-19 15:22:14 -03:00
pedrocarlo
a818b6924c Removed repeated binary expression translation. Adjusted the set_collation to capture additional context of whether it was set by a Collate expression or not. Added some tests to prove those modifications were necessary. 2025-05-19 15:22:14 -03:00
pedrocarlo
f8854f180a Added collation to create table columns 2025-05-19 15:22:14 -03:00
pedrocarlo
d0a63429a6 Naive implementation of collate for queries. Not implemented for column constraints 2025-05-19 15:22:14 -03:00
pedrocarlo
b5b1010e7c set binary collation as default 2025-05-19 15:22:14 -03:00
pedrocarlo
510c70e919 Create CollationSeq enum and functions. Move strum to workspace dependency to avoid version mismatch with Parser 2025-05-19 15:22:14 -03:00
Pekka Enberg
4cf9305947 Merge 'bindings/javascript: Add Statement.iterate() method' from Diego Reis
I still didn't find a good way to implement variadic functions, we
should have some sort of wrapper in JS layer but it didn't work so well
for me so far. But once done it will be easily transferable to any
function.
It also should probably be async, but AFAIC napi doesn't have a straight
way to implement async iterators.

Closes #1515
2025-05-19 20:44:40 +03:00
Pekka Enberg
95ea92faca Merge 'Improve debug build validation speed' from Pere Diaz Bou
Various things to improve speed of long fuzz test execution time:
* remove unnecessary debug_validate_cell calls
* Add SortedVec for keys in fuzz tests
* Validate btree's depth in fuzz test every 1K inserts to not overload
test with validations. We add `VALIDATE_BTREE`  env variable to enable
validation on every insert in case it is needed.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1521
2025-05-19 20:42:48 +03:00
Pekka Enberg
5bd85774cf Merge 'Update README.md' from Yusheng Guo
A syntax error.

Closes #1522
2025-05-19 20:42:25 +03:00
Yusheng Guo
810beeea93 Update README.md
A syntax error.
2025-05-19 18:29:57 +08:00
Jussi Saurio
d2b1be8af7 Merge 'optimizer: fix order by removal logic' from Jussi Saurio
1. `group_by_contains_all` was incorrect - it was not checking that all
order by columns are in group by; it was instead checking that all group
by columns are in order by, which is absolutely incorrect for the
intended purpose.
2. remove ORDER BY clause if GROUP BY clause can sort the rows in the
same way.
Test failures are not related

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1511
2025-05-19 11:29:17 +03:00
Jussi Saurio
b7b4f6a390 Merge 'Mark WHERE terms as consumed instead of deleting them' from Jussi Saurio
We've run into trouble in multiple places due to the fact that we delete
terms from the where clause (e.g. when a constant condition is removed,
or the term becomes part of an index seek key).
A simpler solution is to add a flag indicating that the term is consumed
(used), so that it is not translated in the main loop anymore when WHERE
clause terms are evaluated.
note: CI failures are unrelated

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1477
2025-05-19 11:28:09 +03:00
Pere Diaz Bou
f2d0d61962 copilot nice suggestions :) 2025-05-19 09:59:28 +02:00
Pere Diaz Bou
5eab588115 improve debug build validation speed
Various things:
* remove unnecessary debug_validate_cell calls
* Add SortedVec for keys in fuzz tests
* Validate btree's depth in fuzz test every 1K inserts to not overload
test with validations. We add `VALIDATE_BTREE`  env variable to enable
validation on every insert in case it is needed.
2025-05-19 09:53:15 +02:00
Jussi Saurio
092462fa74 fix build 2025-05-19 07:29:02 +03:00
Jussi Saurio
7c6a4410d2 Merge '(btree): Implement support for handling offset-based payload access with overflow support' from Krishna Vishal
This PR adds a new function `read_write_payload_with_offset` to support
reading and writing payload data at specific offsets, handling both
local content and overflow pages. This is a port of SQLite's
`accessPayload` function in `btree.c` and will be essential for
supporting incremental blob I/O in the coming PRs.
- Added a state machine called `PayloadOverflowWithOffset` to make the
procedure reentrant.
- Correctly processes both local payload data and payload stored in
overflow pages
Testing:
- Reading and writing to a column with no overflow pages.
- Reading and writing at an offset with overflow pages (spanning 10
pages)

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1476
2025-05-18 22:58:10 +03:00
Jussi Saurio
3185aabd20 Merge 'Cli config 2' from Pedro Muniz
As there were many merge conflicts for the other PR, I rewrote the code
and condensed it here.
ORIGINAL PR TEXT: Provides the code to almost close
https://github.com/tursodatabase/limbo/issues/1251 . The JsonSchema is
derived, but I am still not sure how to automate the distribution to
SchemaStore for autocomplete. I added some docs for that want to see the
config file description. I still am not sure how to automate this
documentation. Maybe some macro magic?

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1430
2025-05-18 22:56:22 +03:00
Jussi Saurio
372850756d Merge 'Fix updating single value' from Pedro Muniz
Closes #1482. I needed to change the `key_exists_in_index` function
because it zips the values from the records it is comparing, but if one
of the records is empty or not of the same length, the `all` function
could return true incorrectly.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1514
2025-05-18 22:51:11 +03:00
pedrocarlo
fd51c0a970 invalidate records not necessary for fix 2025-05-18 16:43:25 -03:00
Jussi Saurio
071940f9a7 Merge 'Autoindex fix' from Pedro Muniz
Closes #1508 . There were two small issues to fix:
1. We were not checking in the IndexMap of columns, if the unique column
name is declared in the composite declaration exists in the IndexMap.
This solved the first this statement `create table t4(a, unique(b));`.
2. The second thing was that we forgot to add the column_name to the
HashSet of columns.
```rust
Some(PrimaryKeyDefinitionType::Simple { column, .. }) => {
      let mut columns = HashSet::new();
      columns.insert(std::mem::take(column));
      // Have to also insert the current column_name we are iterating over in primary_key_column_results
      columns.insert(column_name.clone()); <-- Fix here
      primary_key_definition =
           Some(PrimaryKeyDefinitionType::Composite { columns });
      }
```
The rest of the modifications are just some small simplifications for
readability and avoiding some clones

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1512
2025-05-18 22:41:02 +03:00
pedrocarlo
c8b768f1ea add tests 2025-05-18 12:43:11 -03:00
pedrocarlo
7f081c1ac9 remove transmute. Just iterate over columns. No need for unsafe 2025-05-18 12:32:49 -03:00
Jussi Saurio
06c4a5dea9 Merge 'use temporary db in sqlite3 wal tests to fix later tests failing' from Preston Thorpe
This prevents the new wal checkpoint tests in `sqlite3/tests/compat`
from writing/creating `test` table to `testing/testing.db`, which is
queried in later tests which fail for having an extra table.
There is another issue with failing tests related to the new `count`
impl that I am in the process of fixing as well, but that will be a
separate PR.

Closes #1513
2025-05-18 09:48:23 +03:00
Diego Reis
bc88b7cb65 bind/js: Formatting 2025-05-18 00:51:49 -03:00
Diego Reis
9f6e242e42 bind/js: Partially implements iterate() method
The API still is sync and isn't variadic
2025-05-18 00:51:23 -03:00
pedrocarlo
af1f9492ef fix updating single value 2025-05-17 19:43:24 -03:00
PThorpe92
6d70e6d048 Add reset db to Makefile to create clean testing db between tests that perform writes 2025-05-17 16:23:17 -04:00