Commit Graph

1255 Commits

Author SHA1 Message Date
jussisaurio
dddf850111 Merge 'Update clap to 4.5' from Pekka Enberg
The Github dependabot complains about anstream, which comes through
`clap`:
https://github.com/tursodatabase/limbo/security/dependabot/8

Closes #444
2024-12-11 17:08:38 +02:00
jussisaurio
eb9374aebf Merge 'Add support for CASE expressions.' from Alex Miller
There's two forms of case:
  CASE (WHEN [bool expr] THEN [value])+ (ELSE [value])? END
which checks a series of boolean conditions, and:
  CASE expr (WHEN [expr] THEN [value})+ (ELSE [value])? END
Which checks a series of equality conditions.
This implements support for both. Note that the ELSE is optional, and
will be equivalent to `ELSE null` if not specified.
sqlite3 gives the implementation as:
```
sqlite> explain select case a WHEN a THEN b WHEN c THEN d ELSE 0 END from casetest;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     16    0                    0   Start at 16
1     OpenRead       0     3     0     4              0   root=3 iDb=0; casetest
2     Rewind         0     15    0                    0
3       Column         0     0     2                    0   r[2]= cursor 0 column 0
4       Column         0     0     3                    0   r[3]= cursor 0 column 0
5       Ne             3     8     2     BINARY-8       83  if r[2]!=r[3] goto 8
6       Column         0     1     1                    0   r[1]= cursor 0 column 1
7       Goto           0     13    0                    0
8       Column         0     2     3                    0   r[3]= cursor 0 column 2
9       Ne             3     12    2     BINARY-8       83  if r[2]!=r[3] goto 12
10      Column         0     3     1                    0   r[1]= cursor 0 column 3
11      Goto           0     13    0                    0
12      Integer        0     1     0                    0   r[1]=0
13      ResultRow      1     1     0                    0   output=r[1]
14    Next           0     3     0                    1
15    Halt           0     0     0                    0
16    Transaction    0     0     2     0              1   usesStmtJournal=0
17    Goto           0     1     0                    0
```
and after this patch, limbo gives:
```
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     19    0                    0   Start at 19
1     OpenReadAsync      0     4     0                    0   table=casetest, root=4
2     OpenReadAwait      0     0     0                    0
3     RewindAsync        0     0     0                    0
4     RewindAwait        0     18    0                    0   Rewind table casetest
5       Column           0     0     2                    0   r[2]=casetest.a
6       Column           0     0     3                    0   r[3]=casetest.a
7       Ne               2     3     10                   0   if r[2]!=r[3] goto 10
8       Column           0     1     1                    0   r[1]=casetest.b
9       Goto             0     15    0                    0
10      Column           0     2     3                    0   r[3]=casetest.c
11      Ne               2     3     14                   0   if r[2]!=r[3] goto 14
12      Column           0     3     1                    0   r[1]=casetest.d
13      Goto             0     15    0                    0
14      Integer          0     1     0                    0   r[1]=0
15      ResultRow        1     1     0                    0   output=r[1]
16    NextAsync          0     0     0                    0
17    NextAwait          0     5     0                    0
18    Halt               0     0     0                    0
19    Transaction        0     0     0                    0
20    Goto               0     1     0                    0
```
And then as there's nowhere to annotate this new support in COMPAT.md, I
added a corresponding heading for SELECT expressions and what is/isn't
supported.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #425
2024-12-11 17:05:41 +02:00
Pekka Enberg
617f95c7b6 Update clap to 4.5
The Github dependabot complains about anstream, which comes through `clap`:

https://github.com/tursodatabase/limbo/security/dependabot/8
2024-12-11 14:39:27 +02:00
Pekka Enberg
c03839cc6a Merge 'Upgrade pprof to 0.14' from Pekka Enberg
Github's dependabot complains that the current version has an unsoudness
issue so let's bump to a newer version:
https://github.com/tursodatabase/limbo/security/dependabot/10

Closes #440
2024-12-11 14:07:26 +02:00
Pekka Enberg
ab07c77036 Upgrade pprof to 0.14
Github's dependabot complains that the current version has an unsoudness
issue so let's bump to a newer version:

https://github.com/tursodatabase/limbo/security/dependabot/10
2024-12-11 11:21:09 +02:00
Pekka Enberg
b8aca48a0f Update CHANGELOG 2024-12-11 10:45:02 +02:00
Pekka Enberg
04f196113a Merge 'Add last_insert_rowid() function' from Krishna Vishal
- Changed `Cursor` trait to be able to get access to `root_page`
- SQLite only updates last_insert_rowid for non-schema inserts. So we
check if the `InsertAwait` is not for `root_page` before   updating
rowid
In SQLite it looks like this:
```
sqlite> EXPLAIN SELECT last_insert_rowid();
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     4     0                    0
1     Function       0     0     1     last_insert_rowid(0) 0
2     ResultRow      1     1     0                    0
3     Halt           0     0     0                    0
4     Goto           0     1     0                    0
```
In limbo it will look like this:
```
limbo> EXPLAIN SELECT last_insert_rowid();
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     4     0                    0   Start at 4
1     Function           0     2     1     last_insert_rowid  0   r[1]=func()
2     ResultRow          1     1     0                    0   output=r[1]
3     Halt               0     0     0                    0
4     Transaction        0     0     0                    0
5     Goto               0     1     0                    0
```

Closes #427
2024-12-11 10:44:34 +02:00
krishvishal
1e89b17462 Ran cargo fmt 2024-12-11 14:08:33 +05:30
krishvishal
b23df24703 Added tests for last_insert_rowid() 2024-12-11 14:01:04 +05:30
Pekka Enberg
f7e9d3a25a Update CHANGELOG 2024-12-11 09:18:36 +02:00
Pekka Enberg
ca272ba937 Merge 'Support JOIN USING and NATURAL JOIN' from Jussi Saurio
Closes #360
Closes #361

Closes #422
2024-12-11 09:17:51 +02:00
Pekka Enberg
eda1f5396c Merge 'Add octet_length scalar function' from Kacper Kołodziej
Adds `octet_length` scalar function.
Part of solution for: #144

Closes #430
2024-12-11 07:44:04 +02:00
Pekka Enberg
1024432b11 Merge 'Fix: length function characters counting' from Kacper Kołodziej
`String::len()` function returns number of bytes. Here we need to use
`chars().count()` to count real characters as specified in
https://www.sqlite.org/lang_corefunc.html#length

Fixes #431

Closes #429
2024-12-11 07:42:47 +02:00
Pekka Enberg
9a004a1742 Merge 'Fixed typo' from Wincent Balin
Closes #435
2024-12-11 07:41:46 +02:00
Pekka Enberg
e3d9082feb scripts/merge-pr.py: Fix pull from a fork 2024-12-11 07:41:33 +02:00
Alex Miller
88c862ce4d Comments, resolve label better, make tests more fun 2024-12-10 19:59:54 -08:00
Wincent Balin
3f747feb5b Fixed typo 2024-12-11 03:42:21 +01:00
Kacper Kołodziej
e4d31cbe34 add tests for octet_length scalar function 2024-12-10 22:56:38 +01:00
Kacper Kołodziej
d4bff2c93e add octet_length scalar function 2024-12-10 22:56:38 +01:00
Kacper Kołodziej
660d3e8d07 fix: count characters in string in length function
`length` function should count characters, not bytes.

https://www.sqlite.org/lang_corefunc.html#length
2024-12-10 22:48:50 +01:00
Kacper Kołodziej
e68a86532a tests: length function with multibyte characters
Depending on encoding, some characters have more than one byte. Add
failing test to verify if current implementation of scalar function
`length` takes that into account.
2024-12-10 22:47:22 +01:00
krishvishal
134b5576ad Ran cargo fmt 2024-12-09 22:55:54 +05:30
krishvishal
7e2928a5f1 Feature: last_insert_rowid()
- Changed `Cursor` trait to be able to get access to `root_page`
- SQLite only updates last_insert_rowid for non-schema inserts. So we check if the `InsertAwait` is not for `root_page` before
  updating rowid
2024-12-09 22:48:42 +05:30
jussisaurio
fe88d45e5e Add more comments to push_predicate/push_predicates 2024-12-09 17:50:29 +02:00
jussisaurio
840caed2f7 Fix bug with multiway joins that include the same table multiple times 2024-12-09 17:50:29 +02:00
jussisaurio
7924f9b64d consider all joined tables instead of just previous in natural/using 2024-12-09 17:50:29 +02:00
jussisaurio
4f027035de tests for multiple joins 2024-12-09 17:50:29 +02:00
jussisaurio
81b6605453 support NATURAL JOIN 2024-12-09 17:50:29 +02:00
jussisaurio
bed932c186 Support join USING 2024-12-09 17:50:29 +02:00
Pekka Enberg
f9b300a608 Update CHANGELOG 2024-12-09 17:31:24 +02:00
Pekka Enberg
ba1f7cd16f Merge 'feat(core/translate): support HAVING' from Jussi Saurio
support the HAVING clause.
note that sqlite (and i think standard sql?) supports HAVING even
without GROUP BY, but `sqlite3-parser` doesn't.
also fixes some issues with the PartialOrd implementation of OwnedValue
and the implementations of `concat` and `round` which i discovered due
to my HAVING tcl tests failing

Closes #420
2024-12-09 17:30:40 +02:00
Pekka Enberg
98a8dc58b1 Update CHANGELOG 2024-12-09 17:29:15 +02:00
Pekka Enberg
36f9565910 Merge 'feat(wasm): add get and iterate func' from Jean Arhancet
Add `get` and `iterate` functions to the wasm module

Closes #421
2024-12-09 17:28:43 +02:00
krishvishal
1e23af7d24 Added last_insert_rowid() function.
Need to fix its behavior. Problem is probably with `Cursor` implementation.
2024-12-09 17:41:28 +05:30
Alex Miller
f7bb7f8dee Fix typo and improve comment 2024-12-08 14:20:23 -08:00
Alex Miller
c2e3957d73 I misunderstood what a constant instruction was 2024-12-08 14:12:45 -08:00
Alex Miller
eb00226cfe Add support for CASE expressions.
There's two forms of case:
  CASE (WHEN [bool expr] THEN [value])+ (ELSE [value])? END
which checks a series of boolean conditions, and:
  CASE expr (WHEN [expr] THEN [value})+ (ELSE [value])? END
Which checks a series of equality conditions.

This implements support for both. Note that the ELSE is optional, and
will be equivalent to `ELSE null` if not specified.

sqlite3 gives the implementation as:

sqlite> explain select case a WHEN a THEN b WHEN c THEN d ELSE 0 END from casetest;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     16    0                    0   Start at 16
1     OpenRead       0     3     0     4              0   root=3 iDb=0; casetest
2     Rewind         0     15    0                    0
3       Column         0     0     2                    0   r[2]= cursor 0 column 0
4       Column         0     0     3                    0   r[3]= cursor 0 column 0
5       Ne             3     8     2     BINARY-8       83  if r[2]!=r[3] goto 8
6       Column         0     1     1                    0   r[1]= cursor 0 column 1
7       Goto           0     13    0                    0
8       Column         0     2     3                    0   r[3]= cursor 0 column 2
9       Ne             3     12    2     BINARY-8       83  if r[2]!=r[3] goto 12
10      Column         0     3     1                    0   r[1]= cursor 0 column 3
11      Goto           0     13    0                    0
12      Integer        0     1     0                    0   r[1]=0
13      ResultRow      1     1     0                    0   output=r[1]
14    Next           0     3     0                    1
15    Halt           0     0     0                    0
16    Transaction    0     0     2     0              1   usesStmtJournal=0
17    Goto           0     1     0                    0

and after this patch, limbo gives:

addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     18    0                    0   Start at 18
1     OpenReadAsync      0     4     0                    0   table=casetest, root=4
2     OpenReadAwait      0     0     0                    0
3     RewindAsync        0     0     0                    0
4     RewindAwait        0     17    0                    0   Rewind table casetest
5       Column           0     0     2                    0   r[2]=casetest.a
6       Column           0     0     3                    0   r[3]=casetest.a
7       Ne               2     3     10                   0   if r[2]!=r[3] goto 10
8       Column           0     1     1                    0   r[1]=casetest.b
9       Goto             0     14    0                    0
10      Column           0     2     3                    0   r[3]=casetest.c
11      Ne               2     3     14                   0   if r[2]!=r[3] goto 14
12      Column           0     3     1                    0   r[1]=casetest.d
13      Goto             0     14    0                    0
14      ResultRow        1     1     0                    0   output=r[1]
15    NextAsync          0     0     0                    0
16    NextAwait          0     5     0                    0
17    Halt               0     0     0                    0
18    Transaction        0     0     0                    0
19    Integer            0     1     0                    0   r[1]=0
20    Goto               0     1     0                    0

And then as there's nowhere to annotate this new support in COMPAT.md, I
added a corresponding heading for SELECT expressions and what is/isn't
supported.
2024-12-08 14:09:03 -08:00
jussisaurio
9bc3ccc394 fmt 2024-12-03 19:11:08 +02:00
jussisaurio
885136a511 Merge 'fix(core/translate): fix bug with multiway joins and clean up left join implementation' from Jussi Saurio
There was a bug where this kind of query (a 3-way join with two seeks
and only one scan loop) would emit a wrong jump target for DecrJumpZero:
```
limbo> explain select u.first_name, u2.last_name, p.name from users u join users u2 on u.id=u2.id join products p on u2.id = p.id limit 3;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     21    0                    0   Start at 21
1     OpenReadAsync      0     2     0                    0   table=u, root=2
2     OpenReadAwait      0     0     0                    0
3     OpenReadAsync      1     2     0                    0   table=u2, root=2
4     OpenReadAwait      0     0     0                    0
5     OpenReadAsync      2     3     0                    0   table=p, root=3
6     OpenReadAwait      0     0     0                    0
7     RewindAsync        0     0     0                    0
8     RewindAwait        0     18    0                    0   Rewind table u
9       RowId            0     1     0                    0   r[1]=u.rowid
10      SeekRowid        1     1     18                   0   if (r[1]!=u2.rowid) goto 18
11      RowId            1     2     0                    0   r[2]=u2.rowid
12      SeekRowid        2     2     18                   0   if (r[2]!=p.rowid) goto 18
13      Column           0     1     3                    0   r[3]=u.first_name
14      Column           1     2     4                    0   r[4]=u2.last_name
15      Column           2     1     5                    0   r[5]=p.name
16      ResultRow        3     3     0                    0   output=r[3..5]
17      DecrJumpZero     6     18    0                    0   if (--r[6]==0) goto 18 <--- this should go to Halt!!!
18    NextAsync          0     0     0                    0
19    NextAwait          0     9     0                    0
20    Halt               0     0     0                    0
21    Transaction        0     0     0                    0
22    Integer            3     6     0                    0   r[6]=3
23    Goto               0     1     0                    0
```
due to incorrect label bookkeeping.
fixed the bookkeeping, plus cleaned up unnecessary crap from the left
join bookkeeping at the same time.

Closes #423
2024-12-03 19:02:28 +02:00
jussisaurio
ca25a73c95 a little bit more explanation about left join handling 2024-11-30 20:54:22 +02:00
jussisaurio
83f8ea1b13 Fix bug with multiway joins and clean up left join implementation 2024-11-30 20:47:48 +02:00
jussisaurio
3e9883bfbd update COMPAT 2024-11-30 10:06:37 +02:00
jussisaurio
3f80e41e7a support HAVING 2024-11-30 10:05:13 +02:00
jussisaurio
fceb9ac62b Merge 'core: (another) refactor of read path query processing logic' from Jussi Saurio
# (another) refactor of read path query processing logic
This PR rewrites our select query processing architecture by moving away
from the stateful operator-based execution model, back to a more direct
bytecode generation approach that, IMO, is easier to follow. A large
part of the bytecode emission itself (`program.emit_insn(...)`) is just
copy-pasted from the old implementation (after all, it did _work_), but
just structured differently.
## Main Changes
1. Removed the `step()` state machine from operators. Previously, each
operator had internal state tracking its execution progress, and parent
operators would call `.step()` on their children until they needed to do
something else. Reading the code and trying to follow the execution was
not very easy, and the abstraction was also too general: there was a lot
of unnecessary pattern matching and special casing to make query
execution fit the model, when honestly the evaluation of a SELECT
without any CTEs or subqueries etc can only go a few different ways.
2. Because of the above change, the main codegen function
`emit_program()` now contains a series of linear conditional steps
instead of kicking off the state machines with `root_operator.step()`.
These steps are just things like: "open the cursors", "open the loops",
"emit a record into either the main output or a sorter", etc.
3. The `Plan` struct now (again) contains most of the familiar SELECT
query components (WHERE clause, GROUP BY, ORDER BY, etc.) rather than
having all of them embedded in a tree of operators. The operator tree
now ONLY consists of operators that read from a source table in some way
-- so it could just be called a join tree, I guess.
4. There's now `plan.result_columns` which is _ALWAYS_ evaluated to get
the final results of a SELECT. Previously the operator state machine
thing had a hodgepodge of different ways of arriving at the result row.
5. Removed operators:
   - Removed Filter operator (even in the previous version the Filter
operator -- which is really the where clause -- had its predicates
pushed down to the table loops, and it didn't really ever exist in the
bytecode emission phase anymore)
   - Removed Projection operator (`plan.result_columns`)
   - Removed Limit operator (`plan.limit`)
   - Removed Aggregate operator (`plan.group_by` and `plan.aggregates`)
   - Removed Order operator (`plan.order_by`)
6. Added `ast::Expr::Column` to the vendored sqlite3 parser -- column
resolution is now done as early as possible. This eliminates repeated
string comparisons during execution. I.e. no need for
`resolve_ident_table()` etc
7. Simplified expression result caching by removing the complex, and
frankly weird, ExpressionResultCache apparatus. The refactored code
handles this by tracking which cursor to read columns from at a given
time, and copies values from existing registers if the expression is a
computation that has already been done in a previous step of the
execution. For example in:
```
limbo> select concat(u.first_name, '-LOL'), sum(u.age) from users u group by concat(u.first_name, '-LOL') order by sum(u.age) desc limit 10;
Michael-LOL|11204
David-LOL|8758
Robert-LOL|8109
Jennifer-LOL|7700
John-LOL|7299
Christopher-LOL|6397
James-LOL|5921
Joseph-LOL|5711
Brian-LOL|5059
William-LOL|5047
```
the query execution engine knows that `concat(u.first_name, '-LOL')` is
the second column of the `ORDER_BY` sorter without any complex caching.
**HACK:** For deduplicating expressions in ORDER BY and the SELECT body,
the code still relies on expression `==` equality to make those
decisions which sucks (e.g. `sum(x) != SUM(x)` -- I've marked the parts
where this is used with a TODO, we should have a custom expression
equality comparison function instead...). This is not a correctness-
breaking thing, but still.
## In short
- No more state machines
- The operator tree is now only a "join tree", pretty much
- No weird general purpose `ExpressionResultCache`
- More direct mapping between SQL operations and generated bytecode --
there's really no harm in carrying the "group by" etc concepts in the
bytecode generation phase instead of burying them inside Operators
- When a ResultRow is emitted, it is _always_ done by evaluating
`plan.result_columns`, instead of the special-casing and hacks that
existed previously
- 600+ LOC removed

Closes #416
2024-11-30 10:03:58 +02:00
JeanArhancet
5693cd1ae0 feat(wasm): add get and iterate func 2024-11-29 21:48:20 +01:00
jussisaurio
84742b81fa Obsolete comment 2024-11-27 22:43:36 +02:00
jussisaurio
da811dc403 add doc comments for members of Plan struct 2024-11-27 19:30:07 +02:00
jussisaurio
db462530f1 metadata instead of m 2024-11-27 19:27:36 +02:00
jussisaurio
7d569aee1f fix stupid comment 2024-11-26 18:37:06 +02:00
jussisaurio
1b34698872 add comments and rename some misleading label variables 2024-11-26 18:28:19 +02:00