Commit Graph

10455 Commits

Author SHA1 Message Date
Pekka Enberg
c03839cc6a Merge 'Upgrade pprof to 0.14' from Pekka Enberg
Github's dependabot complains that the current version has an unsoudness
issue so let's bump to a newer version:
https://github.com/tursodatabase/limbo/security/dependabot/10

Closes #440
2024-12-11 14:07:26 +02:00
Pekka Enberg
ab07c77036 Upgrade pprof to 0.14
Github's dependabot complains that the current version has an unsoudness
issue so let's bump to a newer version:

https://github.com/tursodatabase/limbo/security/dependabot/10
2024-12-11 11:21:09 +02:00
Pekka Enberg
b8aca48a0f Update CHANGELOG 2024-12-11 10:45:02 +02:00
Pekka Enberg
04f196113a Merge 'Add last_insert_rowid() function' from Krishna Vishal
- Changed `Cursor` trait to be able to get access to `root_page`
- SQLite only updates last_insert_rowid for non-schema inserts. So we
check if the `InsertAwait` is not for `root_page` before   updating
rowid
In SQLite it looks like this:
```
sqlite> EXPLAIN SELECT last_insert_rowid();
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     4     0                    0
1     Function       0     0     1     last_insert_rowid(0) 0
2     ResultRow      1     1     0                    0
3     Halt           0     0     0                    0
4     Goto           0     1     0                    0
```
In limbo it will look like this:
```
limbo> EXPLAIN SELECT last_insert_rowid();
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     4     0                    0   Start at 4
1     Function           0     2     1     last_insert_rowid  0   r[1]=func()
2     ResultRow          1     1     0                    0   output=r[1]
3     Halt               0     0     0                    0
4     Transaction        0     0     0                    0
5     Goto               0     1     0                    0
```

Closes #427
2024-12-11 10:44:34 +02:00
krishvishal
1e89b17462 Ran cargo fmt 2024-12-11 14:08:33 +05:30
krishvishal
b23df24703 Added tests for last_insert_rowid() 2024-12-11 14:01:04 +05:30
Pekka Enberg
f7e9d3a25a Update CHANGELOG 2024-12-11 09:18:36 +02:00
Pekka Enberg
ca272ba937 Merge 'Support JOIN USING and NATURAL JOIN' from Jussi Saurio
Closes #360
Closes #361

Closes #422
2024-12-11 09:17:51 +02:00
Pekka Enberg
eda1f5396c Merge 'Add octet_length scalar function' from Kacper Kołodziej
Adds `octet_length` scalar function.
Part of solution for: #144

Closes #430
2024-12-11 07:44:04 +02:00
Pekka Enberg
1024432b11 Merge 'Fix: length function characters counting' from Kacper Kołodziej
`String::len()` function returns number of bytes. Here we need to use
`chars().count()` to count real characters as specified in
https://www.sqlite.org/lang_corefunc.html#length

Fixes #431

Closes #429
2024-12-11 07:42:47 +02:00
Pekka Enberg
9a004a1742 Merge 'Fixed typo' from Wincent Balin
Closes #435
2024-12-11 07:41:46 +02:00
Pekka Enberg
e3d9082feb scripts/merge-pr.py: Fix pull from a fork 2024-12-11 07:41:33 +02:00
Alex Miller
88c862ce4d Comments, resolve label better, make tests more fun 2024-12-10 19:59:54 -08:00
Alex Miller
e85df1c895 resolve labels to current offset. make test clearer. 2024-12-10 19:36:54 -08:00
Wincent Balin
3f747feb5b Fixed typo 2024-12-11 03:42:21 +01:00
Kacper Kołodziej
e4d31cbe34 add tests for octet_length scalar function 2024-12-10 22:56:38 +01:00
Kacper Kołodziej
d4bff2c93e add octet_length scalar function 2024-12-10 22:56:38 +01:00
Kacper Kołodziej
660d3e8d07 fix: count characters in string in length function
`length` function should count characters, not bytes.

https://www.sqlite.org/lang_corefunc.html#length
2024-12-10 22:48:50 +01:00
Kacper Kołodziej
e68a86532a tests: length function with multibyte characters
Depending on encoding, some characters have more than one byte. Add
failing test to verify if current implementation of scalar function
`length` takes that into account.
2024-12-10 22:47:22 +01:00
krishvishal
134b5576ad Ran cargo fmt 2024-12-09 22:55:54 +05:30
krishvishal
7e2928a5f1 Feature: last_insert_rowid()
- Changed `Cursor` trait to be able to get access to `root_page`
- SQLite only updates last_insert_rowid for non-schema inserts. So we check if the `InsertAwait` is not for `root_page` before
  updating rowid
2024-12-09 22:48:42 +05:30
jussisaurio
fe88d45e5e Add more comments to push_predicate/push_predicates 2024-12-09 17:50:29 +02:00
jussisaurio
840caed2f7 Fix bug with multiway joins that include the same table multiple times 2024-12-09 17:50:29 +02:00
jussisaurio
7924f9b64d consider all joined tables instead of just previous in natural/using 2024-12-09 17:50:29 +02:00
jussisaurio
4f027035de tests for multiple joins 2024-12-09 17:50:29 +02:00
jussisaurio
81b6605453 support NATURAL JOIN 2024-12-09 17:50:29 +02:00
jussisaurio
bed932c186 Support join USING 2024-12-09 17:50:29 +02:00
Pekka Enberg
f9b300a608 Update CHANGELOG 2024-12-09 17:31:24 +02:00
Pekka Enberg
ba1f7cd16f Merge 'feat(core/translate): support HAVING' from Jussi Saurio
support the HAVING clause.
note that sqlite (and i think standard sql?) supports HAVING even
without GROUP BY, but `sqlite3-parser` doesn't.
also fixes some issues with the PartialOrd implementation of OwnedValue
and the implementations of `concat` and `round` which i discovered due
to my HAVING tcl tests failing

Closes #420
2024-12-09 17:30:40 +02:00
Pekka Enberg
98a8dc58b1 Update CHANGELOG 2024-12-09 17:29:15 +02:00
Pekka Enberg
36f9565910 Merge 'feat(wasm): add get and iterate func' from Jean Arhancet
Add `get` and `iterate` functions to the wasm module

Closes #421
2024-12-09 17:28:43 +02:00
krishvishal
1e23af7d24 Added last_insert_rowid() function.
Need to fix its behavior. Problem is probably with `Cursor` implementation.
2024-12-09 17:41:28 +05:30
Alex Miller
f7bb7f8dee Fix typo and improve comment 2024-12-08 14:20:23 -08:00
Alex Miller
c2e3957d73 I misunderstood what a constant instruction was 2024-12-08 14:12:45 -08:00
Alex Miller
eb00226cfe Add support for CASE expressions.
There's two forms of case:
  CASE (WHEN [bool expr] THEN [value])+ (ELSE [value])? END
which checks a series of boolean conditions, and:
  CASE expr (WHEN [expr] THEN [value})+ (ELSE [value])? END
Which checks a series of equality conditions.

This implements support for both. Note that the ELSE is optional, and
will be equivalent to `ELSE null` if not specified.

sqlite3 gives the implementation as:

sqlite> explain select case a WHEN a THEN b WHEN c THEN d ELSE 0 END from casetest;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     16    0                    0   Start at 16
1     OpenRead       0     3     0     4              0   root=3 iDb=0; casetest
2     Rewind         0     15    0                    0
3       Column         0     0     2                    0   r[2]= cursor 0 column 0
4       Column         0     0     3                    0   r[3]= cursor 0 column 0
5       Ne             3     8     2     BINARY-8       83  if r[2]!=r[3] goto 8
6       Column         0     1     1                    0   r[1]= cursor 0 column 1
7       Goto           0     13    0                    0
8       Column         0     2     3                    0   r[3]= cursor 0 column 2
9       Ne             3     12    2     BINARY-8       83  if r[2]!=r[3] goto 12
10      Column         0     3     1                    0   r[1]= cursor 0 column 3
11      Goto           0     13    0                    0
12      Integer        0     1     0                    0   r[1]=0
13      ResultRow      1     1     0                    0   output=r[1]
14    Next           0     3     0                    1
15    Halt           0     0     0                    0
16    Transaction    0     0     2     0              1   usesStmtJournal=0
17    Goto           0     1     0                    0

and after this patch, limbo gives:

addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     18    0                    0   Start at 18
1     OpenReadAsync      0     4     0                    0   table=casetest, root=4
2     OpenReadAwait      0     0     0                    0
3     RewindAsync        0     0     0                    0
4     RewindAwait        0     17    0                    0   Rewind table casetest
5       Column           0     0     2                    0   r[2]=casetest.a
6       Column           0     0     3                    0   r[3]=casetest.a
7       Ne               2     3     10                   0   if r[2]!=r[3] goto 10
8       Column           0     1     1                    0   r[1]=casetest.b
9       Goto             0     14    0                    0
10      Column           0     2     3                    0   r[3]=casetest.c
11      Ne               2     3     14                   0   if r[2]!=r[3] goto 14
12      Column           0     3     1                    0   r[1]=casetest.d
13      Goto             0     14    0                    0
14      ResultRow        1     1     0                    0   output=r[1]
15    NextAsync          0     0     0                    0
16    NextAwait          0     5     0                    0
17    Halt               0     0     0                    0
18    Transaction        0     0     0                    0
19    Integer            0     1     0                    0   r[1]=0
20    Goto               0     1     0                    0

And then as there's nowhere to annotate this new support in COMPAT.md, I
added a corresponding heading for SELECT expressions and what is/isn't
supported.
2024-12-08 14:09:03 -08:00
Alex Miller
27ef95e0ab LLM recommended a better way to write args matching 2024-12-07 21:15:04 -08:00
Alex Miller
183ea8e362 Implement support for iif().
In sqlite, iif() looks like:

sqlite> create table iiftest(a int, b int, c int);
sqlite> explain select iif(a,b,c) from iiftest;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     11    0                    0   Start at 11
1     OpenRead       0     2     0     3              0   root=2 iDb=0; iiftest
2     Rewind         0     10    0                    0
3       Column         0     0     2                    0   r[2]= cursor 0 column 0
4       IfNot          2     7     1                    0
5       Column         0     1     1                    0   r[1]= cursor 0 column 1
6       Goto           0     8     0                    0
7       Column         0     2     1                    0   r[1]= cursor 0 column 2
8       ResultRow      1     1     0                    0   output=r[1]
9     Next           0     3     0                    1
10    Halt           0     0     0                    0
11    Transaction    0     0     1     0              1   usesStmtJournal=0
12    Goto           0     1     0                    0

And with this change, in limbo it looks like:

addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     14    0                    0   Start at 14
1     OpenReadAsync      0     2     0                    0   table=iiftest, root=2
2     OpenReadAwait      0     0     0                    0
3     RewindAsync        0     0     0                    0
4     RewindAwait        0     13    0                    0   Rewind table iiftest
5       Column           0     0     2                    0   r[2]=iiftest.a
6       IfNot            2     9     1                    0   if !r[2] goto 9
7       Column           0     1     1                    0   r[1]=iiftest.b
8       Goto             0     10    0                    0
9       Column           0     2     1                    0   r[1]=iiftest.c
10      ResultRow        1     1     0                    0   output=r[1]
11    NextAsync          0     0     0                    0
12    NextAwait          0     5     0                    0
13    Halt               0     0     0                    0
14    Transaction        0     0     0                    0
15    Goto               0     1     0                    0
2024-12-07 21:04:03 -08:00
jussisaurio
9bc3ccc394 fmt 2024-12-03 19:11:08 +02:00
jussisaurio
885136a511 Merge 'fix(core/translate): fix bug with multiway joins and clean up left join implementation' from Jussi Saurio
There was a bug where this kind of query (a 3-way join with two seeks
and only one scan loop) would emit a wrong jump target for DecrJumpZero:
```
limbo> explain select u.first_name, u2.last_name, p.name from users u join users u2 on u.id=u2.id join products p on u2.id = p.id limit 3;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     21    0                    0   Start at 21
1     OpenReadAsync      0     2     0                    0   table=u, root=2
2     OpenReadAwait      0     0     0                    0
3     OpenReadAsync      1     2     0                    0   table=u2, root=2
4     OpenReadAwait      0     0     0                    0
5     OpenReadAsync      2     3     0                    0   table=p, root=3
6     OpenReadAwait      0     0     0                    0
7     RewindAsync        0     0     0                    0
8     RewindAwait        0     18    0                    0   Rewind table u
9       RowId            0     1     0                    0   r[1]=u.rowid
10      SeekRowid        1     1     18                   0   if (r[1]!=u2.rowid) goto 18
11      RowId            1     2     0                    0   r[2]=u2.rowid
12      SeekRowid        2     2     18                   0   if (r[2]!=p.rowid) goto 18
13      Column           0     1     3                    0   r[3]=u.first_name
14      Column           1     2     4                    0   r[4]=u2.last_name
15      Column           2     1     5                    0   r[5]=p.name
16      ResultRow        3     3     0                    0   output=r[3..5]
17      DecrJumpZero     6     18    0                    0   if (--r[6]==0) goto 18 <--- this should go to Halt!!!
18    NextAsync          0     0     0                    0
19    NextAwait          0     9     0                    0
20    Halt               0     0     0                    0
21    Transaction        0     0     0                    0
22    Integer            3     6     0                    0   r[6]=3
23    Goto               0     1     0                    0
```
due to incorrect label bookkeeping.
fixed the bookkeeping, plus cleaned up unnecessary crap from the left
join bookkeeping at the same time.

Closes #423
2024-12-03 19:02:28 +02:00
jussisaurio
ca25a73c95 a little bit more explanation about left join handling 2024-11-30 20:54:22 +02:00
jussisaurio
83f8ea1b13 Fix bug with multiway joins and clean up left join implementation 2024-11-30 20:47:48 +02:00
jussisaurio
3e9883bfbd update COMPAT 2024-11-30 10:06:37 +02:00
jussisaurio
3f80e41e7a support HAVING 2024-11-30 10:05:13 +02:00
jussisaurio
fceb9ac62b Merge 'core: (another) refactor of read path query processing logic' from Jussi Saurio
# (another) refactor of read path query processing logic
This PR rewrites our select query processing architecture by moving away
from the stateful operator-based execution model, back to a more direct
bytecode generation approach that, IMO, is easier to follow. A large
part of the bytecode emission itself (`program.emit_insn(...)`) is just
copy-pasted from the old implementation (after all, it did _work_), but
just structured differently.
## Main Changes
1. Removed the `step()` state machine from operators. Previously, each
operator had internal state tracking its execution progress, and parent
operators would call `.step()` on their children until they needed to do
something else. Reading the code and trying to follow the execution was
not very easy, and the abstraction was also too general: there was a lot
of unnecessary pattern matching and special casing to make query
execution fit the model, when honestly the evaluation of a SELECT
without any CTEs or subqueries etc can only go a few different ways.
2. Because of the above change, the main codegen function
`emit_program()` now contains a series of linear conditional steps
instead of kicking off the state machines with `root_operator.step()`.
These steps are just things like: "open the cursors", "open the loops",
"emit a record into either the main output or a sorter", etc.
3. The `Plan` struct now (again) contains most of the familiar SELECT
query components (WHERE clause, GROUP BY, ORDER BY, etc.) rather than
having all of them embedded in a tree of operators. The operator tree
now ONLY consists of operators that read from a source table in some way
-- so it could just be called a join tree, I guess.
4. There's now `plan.result_columns` which is _ALWAYS_ evaluated to get
the final results of a SELECT. Previously the operator state machine
thing had a hodgepodge of different ways of arriving at the result row.
5. Removed operators:
   - Removed Filter operator (even in the previous version the Filter
operator -- which is really the where clause -- had its predicates
pushed down to the table loops, and it didn't really ever exist in the
bytecode emission phase anymore)
   - Removed Projection operator (`plan.result_columns`)
   - Removed Limit operator (`plan.limit`)
   - Removed Aggregate operator (`plan.group_by` and `plan.aggregates`)
   - Removed Order operator (`plan.order_by`)
6. Added `ast::Expr::Column` to the vendored sqlite3 parser -- column
resolution is now done as early as possible. This eliminates repeated
string comparisons during execution. I.e. no need for
`resolve_ident_table()` etc
7. Simplified expression result caching by removing the complex, and
frankly weird, ExpressionResultCache apparatus. The refactored code
handles this by tracking which cursor to read columns from at a given
time, and copies values from existing registers if the expression is a
computation that has already been done in a previous step of the
execution. For example in:
```
limbo> select concat(u.first_name, '-LOL'), sum(u.age) from users u group by concat(u.first_name, '-LOL') order by sum(u.age) desc limit 10;
Michael-LOL|11204
David-LOL|8758
Robert-LOL|8109
Jennifer-LOL|7700
John-LOL|7299
Christopher-LOL|6397
James-LOL|5921
Joseph-LOL|5711
Brian-LOL|5059
William-LOL|5047
```
the query execution engine knows that `concat(u.first_name, '-LOL')` is
the second column of the `ORDER_BY` sorter without any complex caching.
**HACK:** For deduplicating expressions in ORDER BY and the SELECT body,
the code still relies on expression `==` equality to make those
decisions which sucks (e.g. `sum(x) != SUM(x)` -- I've marked the parts
where this is used with a TODO, we should have a custom expression
equality comparison function instead...). This is not a correctness-
breaking thing, but still.
## In short
- No more state machines
- The operator tree is now only a "join tree", pretty much
- No weird general purpose `ExpressionResultCache`
- More direct mapping between SQL operations and generated bytecode --
there's really no harm in carrying the "group by" etc concepts in the
bytecode generation phase instead of burying them inside Operators
- When a ResultRow is emitted, it is _always_ done by evaluating
`plan.result_columns`, instead of the special-casing and hacks that
existed previously
- 600+ LOC removed

Closes #416
2024-11-30 10:03:58 +02:00
JeanArhancet
5693cd1ae0 feat(wasm): add get and iterate func 2024-11-29 21:48:20 +01:00
jussisaurio
84742b81fa Obsolete comment 2024-11-27 22:43:36 +02:00
jussisaurio
da811dc403 add doc comments for members of Plan struct 2024-11-27 19:30:07 +02:00
jussisaurio
db462530f1 metadata instead of m 2024-11-27 19:27:36 +02:00
jussisaurio
7d569aee1f fix stupid comment 2024-11-26 18:37:06 +02:00
jussisaurio
1b34698872 add comments and rename some misleading label variables 2024-11-26 18:28:19 +02:00