Commit Graph

4470 Commits

Author SHA1 Message Date
bit-aloo
ffcadd00ae evaluate limit or offset expr 2025-08-26 19:56:12 +05:30
bit-aloo
28439efd09 make offset and limit Expr 2025-08-26 19:56:11 +05:30
bit-aloo
ea3ab2a9c7 add ifNeg op_code 2025-08-26 19:55:42 +05:30
Jussi Saurio
66d00915d7 Merge 'Improve documentation of page pinning' from Jussi Saurio
Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #2797
2025-08-26 17:18:25 +03:00
Pekka Enberg
26ba09c45f Revert "Merge 'Remove double indirection in the Parser' from Pedro Muniz"
This reverts commit 71c1b357e4, reversing
changes made to 6bc568ff69 because it
actually makes things slower.
2025-08-26 14:58:21 +03:00
Jussi Saurio
bf58d179db Improve documentation of page pinning 2025-08-26 10:13:25 +03:00
Pekka Enberg
3176df64a2 Merge 'Fix: return NULL for rowid() when cursor's null flag is on' from Jussi Saurio
Fixes TPC-H query 13 from returning an incorrect result. In this
specific case, we were returning non-null `IdxRowid` values for the
right-hand side table even when there was no match with the left-hand
side table, meaning the join produced matches even in cases where there
shouldn't have been any.
Closes #2794

Closes #2795
2025-08-26 09:33:49 +03:00
Jussi Saurio
e52f807c7d Fix: return NULL for rowid() when cursor's null flag is on
Fixes TPC-H query 13 from returning an incorrect result. In this specific
case, we were returning non-null `IdxRowid` values for the right-hand side
table even when there was no match with the left-hand side table, meaning
the join produced matches even in cases where there shouldn't have been any.

Closes #2794
2025-08-26 09:08:48 +03:00
Pekka Enberg
114ece0375 Merge 'Make fill_cell_payload() safe for async IO and cache spilling' from Jussi Saurio
## Make fill_cell_payload() safe for async IO and cache spilling
### Problems:
1. fill_cell_payload() is not re-entrant because it can yield IO
   on allocating a new overflow page, resulting in losing some of the
   input data.
2. fill_cell_payload() in its current form is not safe for cache
spilling
   because the previous overflow page in the chain of allocated overflow
pages
   can be evicted by a spill caused by the next overflow page
allocation,
   invalidating the page pointer and causing corruption.
3. fill_cell_payload() uses raw pointers and `unsafe` as a workaround
from a previous time when we used to clone `WriteState`, resulting in
hard-to-read code.
### Solutions:
1. Introduce a new substate to the fill_cell_payload state machine to
handle
   re-entrancy wrt. allocating overflow pages.
2. Always pin the current overflow page so that it cannot be evicted
during the
   overflow chain construction. Also pin the regular page the overflow
chain is
   attached to, because it is immediately accessed after
fill_cell_payload is done.
3. Remove all explicit usages of `unsafe` from `fill_cell_payload`
(although our pager is ofc still extremely unsafe under the hood :] )
Note that solution 2 addresses a problem that arose in the development
of page cache
spilling, which is not yet implemented, but will be soon.
### Miscellania:
1. Renamed a bunch of variables to be clearer
2. Added more comments about what is happening in fill_cell_payload

Closes #2737
2025-08-26 08:36:46 +03:00
Pekka Enberg
6e78c23ce7 Merge 'Remove Windows IO in place of Generic IO' from Preston Thorpe
Generic IO and Windows IO were identical, since we don't do anything
windows specific (maybe when someone eventually wants to implement an IO
back-end for windows `IO_RING` API, we can bring it back :)
This does use the exact impl of `WindowsIO`, simply renaming it to
Generic IO.. because I don't think GenericIO had ever been used.

Reviewed-by: Pedro Muniz (@pedrocarlo)

Closes #2790
2025-08-26 08:33:04 +03:00
Pekka Enberg
8f11311473 Merge 'Improve encryption API' from Avinash Sajjanshetty
This patch brings a bunch of quality of life improvements to encryption:
1. Previously, we just let any string to be used as a key. I have
updated the `PRAGMA hexkey=''` to get the key in hex. I have also
renamed from `key`, because that will be used to get passphrase
2. Added `PRAGMA cipher` so that now users can select which cipher they
want to use (for now, either `aegis256` or `aes256gcm`)
3. We now set the encryption context when both cipher and key are set
I also updated tests to reflect this.

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #2779
2025-08-26 08:32:29 +03:00
pedrocarlo
d3240844ec refactor Core to remove the double indirection 2025-08-25 22:59:31 -03:00
PThorpe92
8c64b772e7 Use previous WindowsIO impl as generic IO 2025-08-25 19:04:14 -04:00
PThorpe92
177c717f25 Remove windows IO in place of Generic IO 2025-08-25 18:47:21 -04:00
PThorpe92
2d661e3304 Apply review suggestions, add logging 2025-08-25 16:56:43 -04:00
PThorpe92
748e339f68 Make clippy happy 2025-08-25 16:52:34 -04:00
PThorpe92
1b514e6d0f Only checkpoint final remaining DB connection, and use Truncate mode 2025-08-25 16:52:29 -04:00
Pekka Enberg
e57f59d744 Merge 'Fix several issues with integrity_check' from Jussi Saurio
Things that were just wrong:
1. No pages other than the root page were checked, because no looping
was done. Add a loop.
2. Rightmost child page was never added to page stack. Add it.
New integrity check features:
- Add overflow pages to stack as well
- Check that no page is referenced more than once in the tree

Closes #2781
2025-08-25 19:05:32 +03:00
Pekka Enberg
6baa4cd1c0 Merge 'DBSP projection' from Pekka Enberg
This PR implements the ProjectOperator for DBSP circuits.

Closes #2773
2025-08-25 19:05:20 +03:00
Pekka Enberg
e3ffc82a1d core/incremental: Fix expression compiler to use new parser 2025-08-25 17:48:20 +03:00
Glauber Costa
ffab4a89a2 addressed review comments from Jussi 2025-08-25 17:48:17 +03:00
Glauber Costa
097510216e implement the projector operator for DBSP
My goal with this patch is to be able to implement the ProjectOperator
for DBSP circuits using VDBE for expression evaluation.

*not* doing so is dangerous for the following reason: we will end up
with different, subtle, and incompatible behavior between SQLite
expressions if they are used in views versus outside of views.

In fact, even in our prototype had them: our projection tests, which
used to pass, were actually wrong =) (sqlite would return something
different if those functions were executed outside the view context)

For optimization reasons, we single out trivial expressions: they don't
have go through VDBE. Trivial expressions are expressions that only
involve Columns, Literals, and simple operators on elements of the same
type. Even type coercion takes this out of the realm of trivial.

Everything that is not trivial, is then translated with translate_expr -
in the same way SQLite will, and then compiled with VDBE.

We can, over time, make this process much better. There are essentially
infinite opportunities for optimization here. But for now, the main
warts are:
* VDBE execution needs a connection
* There is no good way in VDBE to pass parameters to a program.
* It is almost trivial to pollute the original connection. For example,
  we need to issue HALT for the program to stop, but seeing that halt
  will usually cause the program to try and halt the original program.

Subprograms, like the ones we use in triggers are a possible solution,
but they are much more expensive to execute, especially given that our
execution would essentially have to have a program with no other role
than to wrap the subprogram.

Therefore, what I am doing is:
* There is an in-memory database inside the projection operator (an
  obvious optimization is to share it with *all* projection operators).
* We obtain a connection to that database when the operator is created
* We use that connection to execute our VDBE, which offers a clean, safe
  and isolated way to execute the expression.
* We feed the values to the program manually by editing the registers
  directly.
2025-08-25 17:48:17 +03:00
Glauber Costa
38def26704 Add expr_compiler
To be used in DBSP-based projections. This will compile an expression
to VDBE bytecode and execute it.

To do that we need to add a new type of Expression, which we call a
Register.

This is a way for us to pass parameters to a DBSP program which will be
not columns or literals, but inputs from the DBSP deltas.
2025-08-25 17:48:17 +03:00
Glauber Costa
911b4c38a6 do not ignore silent failures from view creation
We have an issue at the moment that when a materialized view fails
to be created, we just swallow the error and leave the database in
a funny state.

We have can_create_view() to detect those issues early, but not all
errors can be detected that early.
2025-08-25 17:48:17 +03:00
Jussi Saurio
8cae10f744 Fix several issues with integrity_check
Things that were just wrong:

1. No pages other than the root page were checked, because no looping
was done. Add a loop.
2. Rightmost child page was never added to page stack. Add it.

New integrity check features:

- Add overflow pages to stack as well
- Check that no page is referenced more than once in the tree
2025-08-25 16:51:57 +03:00
PThorpe92
37a7ec7477 Update append_frames_vectored to use new encryption_ctx and apply review 2025-08-25 09:50:57 -04:00
PThorpe92
daea841b47 Minor adjustments/comments to wal append_frames_vectored method 2025-08-25 09:47:06 -04:00
PThorpe92
0239088718 Use new append_frames_vectored WAL method to flush pager cache and commit write tx 2025-08-25 09:47:06 -04:00
PThorpe92
46e288ac26 Add append_frames_vectored to WAL api
In addition to the existing `append_frame` which will write an individual frame
to the WAL, we add a method `append_frames_vectored` that takes N frames and the
db size which will need to be set for the last (commit) frame, and it
calculates the checksums and submits them as a single `pwritev` call,
reducing the number of syscalls needed for each write operation.
2025-08-25 09:47:01 -04:00
Avinash Sajjanshetty
40b7e3bf5a rename cipher to cipher_mode for consistency 2025-08-25 19:16:15 +05:30
Pekka Enberg
1a4a53e6ea Merge 'core/io: Fix build on Android and iOS' from Pekka Enberg
Commit ebe6aa0d28 ("adjust cfg for unix
and linux IO") adjusted the I/O conditional compilation, but forgot that
Android and iOS are also part of Unix target family.
Fixes #2500

Closes #2776
2025-08-25 15:34:52 +03:00
Pekka Enberg
3f5878243f Merge 'Remove unnecessary argument from Pager::end_tx()' from Nikita Sivukhin
No need to pass `disable` flag to the `end_tx` method as it has that
info from connection itself

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2777
2025-08-25 15:34:41 +03:00
Preston Thorpe
040ceba2d6 Merge 'WAL txn: fix reads from DB file' from Nikita Sivukhin
- Transaction which was started with max_frame = 0 and
max_frame_read_lock_index = 0 can write to the WAL and in this case it
needs to read data back from WAL and not the DB file.
- Without cache spilling its hard to reproduce this issue for the turso-
db now, but I found this issue with sync-engine which do weird stuff
with the WAL which "simulates" cache spilling behaviour to some extent.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #2735
2025-08-25 08:34:17 -04:00
Jussi Saurio
16b1ae4a9f Handle unpinning btree page in case of allocate overflow page error 2025-08-25 15:12:37 +03:00
Jussi Saurio
c6553d82b8 Clarify expected behavior with assertion 2025-08-25 15:05:04 +03:00
Jussi Saurio
42c8a77bb7 use existing payload_overflows() utility in local space calculation 2025-08-25 15:03:10 +03:00
Avinash Sajjanshetty
b85ba09014 Fix clippy boss' complaints 2025-08-25 16:51:19 +05:30
Pekka Enberg
7401d33784 Merge 'core/translate: Add support' from Pekka Enberg
Fixes #2263

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2774
2025-08-25 11:53:36 +03:00
Jussi Saurio
f0a32c2312 Merge 'refactor/btree: rewrite the find_free_cell() function' from Jussi Saurio
Based on #2509, and in a similar spirit.

Closes #2510
2025-08-25 11:41:37 +03:00
Nikita Sivukhin
f7ad55b680 remove unnecessary argument 2025-08-25 12:24:39 +04:00
Pekka Enberg
5fe5e1548b core/io: Fix build on Android and iOS
Commit ebe6aa0d28 ("adjust cfg for unix
and linux IO") adjusted the I/O conditional compilation, but forgot that
Android and iOS are also part of Unix target family.

Fixes #2500
2025-08-25 11:21:46 +03:00
Pekka Enberg
5d3780f25d core/translate: Add CREATE INDEX IF NOT EXISTS support
Fixes #2263
2025-08-25 11:12:41 +03:00
Nikita Sivukhin
c62b87d9b6 read from database file only if max_frame_read_lock_index is 0 and max_frame > min_frame
- transaction which was started with max_frame = 0 and max_frame_read_lock_index = 0
  can write to the WAL and in this case it needs to read data back from WAL
- without cache spilling its hard to reproduce this issue for the turso-db now,
  but I stumbled into this issue with sync-engine which do weird stuff with the WAL
  which "simulates" cache spilling behaviour to some extent
2025-08-25 11:36:58 +04:00
Pekka Enberg
262ead8240 Merge 'ANALYZE creates sqlite_stat1 if it doesn't exist' from Alex Miller
This is change 2/N of #656
This change replaces a bail_parse_error!() when sqlite_stat1 doesn't
exist with the appropriate codegen to create the table, and handle both
cases of the table existing or not existing.
SQLite's codegen looks like:
```
sqlite> create table stat_test(a,b,c);
sqlite> explain analyze stat_test;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     40    0                    0   Start at 40
1     ReadCookie     0     3     2                    0
2     If             3     5     0                    0
3     SetCookie      0     2     4                    0
4     SetCookie      0     5     1                    0
5     CreateBtree    0     2     1                    0   r[2]=root iDb=0 flags=1
6     OpenWrite      0     1     0     5              0   root=1 iDb=0
7     NewRowid       0     1     0                    0   r[1]=rowid
8     Blob           6     3     0                   0   r[3]= (len=6)
9     Insert         0     3     1                    8   intkey=r[1] data=r[3]
10    Close          0     0     0                    0
11    Close          0     0     0                    0
12    Null           0     4     5                    0   r[4..5]=NULL
13    Noop           4     0     4                    0
14    OpenWrite      3     1     0     5              0   root=1 iDb=0; sqlite_master
15    SeekRowid      3     17    1                    0   intkey=r[1]
16    Rowid          3     5     0                    0   r[5]= rowid of 3
17    IsNull         5     26    0                    0   if r[5]==NULL goto 26
18    String8        0     6     0     table          0   r[6]='table'
19    String8        0     7     0     sqlite_stat1   0   r[7]='sqlite_stat1'
20    String8        0     8     0     sqlite_stat1   0   r[8]='sqlite_stat1'
21    Copy           2     9     0                    0   r[9]=r[2]
22    String8        0     10    0     CREATE TABLE sqlite_stat1(tbl,idx,stat) 0   r[10]='CREATE TABLE sqlite_stat1(tbl,idx,stat)'
23    MakeRecord     6     5     4     BBBDB          0   r[4]=mkrec(r[6..10])
24    Delete         3     68    5                    0
25    Insert         3     4     5                    0   intkey=r[5] data=r[4]
26    SetCookie      0     1     2                    0
27    ParseSchema    0     0     0     tbl_name='sqlite_stat1' AND type!='trigger' 0
28    OpenWrite      0     2     0     3              16  root=2 iDb=0; sqlite_stat1
29    OpenRead       5     2     0     3              0   root=2 iDb=0; stat_test
30    String8        0     18    0     stat_test      0   r[18]='stat_test'; stat_test
31    Count          5     20    0                    0   r[20]=count()
32    IfNot          20    37    0                    0
33    Null           0     19    0                    0   r[19]=NULL
34    MakeRecord     18    3     16    BBB            0   r[16]=mkrec(r[18..20])
35    NewRowid       0     12    0                    0   r[12]=rowid
36    Insert         0     16    12                   8   intkey=r[12] data=r[16]
37    LoadAnalysis   0     0     0                    0
38    Expire         0     0     0                    0
39    Halt           0     0     0                    0
40    Transaction    0     1     1     0              1   usesStmtJournal=1
41    Goto           0     1     0                    0
```
And now Turso's looks like:
```
turso> create table stat_test(a,b,c);
turso> explain analyze stat_test;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     23    0                    0   Start at 23
1     Null               0     1     0                    0   r[1]=NULL
2     CreateBtree        0     2     1                    0   r[2]=root iDb=0 flags=1
3     OpenWrite          0     1     0                    0   root=1; iDb=0
4     NewRowid           0     3     0                    0   r[3]=rowid
5     String8            0     4     0     table          0   r[4]='table'
6     String8            0     5     0     sqlite_stat1   0   r[5]='sqlite_stat1'
7     String8            0     6     0     sqlite_stat1   0   r[6]='sqlite_stat1'
8     Copy               2     7     0                    0   r[7]=r[2]
9     String8            0     8     0     CREATE TABLE sqlite_stat1(tbl,idx,stat)  0   r[8]='CREATE TABLE sqlite_stat1(tbl,idx,stat)'
10    MakeRecord         4     5     9                    0   r[9]=mkrec(r[4..8])
11    Insert             0     9     3     sqlite_stat1   0   intkey=r[3] data=r[9]
12    ParseSchema        0     0     0     tbl_name = 'sqlite_stat1' AND type != 'trigger'  0   tbl_name = 'sqlite_stat1' AND type != 'trigger'
13    OpenWrite          1     2     0                    0   root=2; iDb=0
14    OpenRead           2     2     0                    0   =stat_test, root=2, iDb=0
15    String8            0     12    0     stat_test      0   r[12]='stat_test'
16    Count              2     14    0                    0
17    IfNot              14    22    0                    0   if !r[14] goto 22
18    Null               0     13    0                    0   r[13]=NULL
19    MakeRecord         12    3     11                   0   r[11]=mkrec(r[12..14])
20    NewRowid           1     10    0                    0   r[10]=rowid
21    Insert             1     11    10    sqlite_stat1   0   intkey=r[10] data=r[11]
22    Halt               0     0     0                    0
23    Goto               0     1     0                    0
```
The notable difference in size is following the same codegen difference
in CREATE TABLE, where sqlite's odd dance of adding a placeholder entry
which is immediately replaced is instead done in tursodb as just
inserting the correct row in the first place. Aside from lines 6-13 of
sqlite's vdbe being missing, there's still the lack of LoadAnalysis,
Expire, and Cookie management.

Closes #2765
2025-08-25 10:24:46 +03:00
Pekka Enberg
4bca5edb9e Merge 'Don't clear transaction state in nested statement' from Jussi Saurio
Closes #2738
Closes #2739
Closes #2753
Closes #2755
Closes #2767
Closes #2768
Closes #2769
Closes #2770
## El problema
If a connection does e.g. CREATE TABLE, it will start a "child
statement" to reparse the schema. That statement does not start its own
transaction, and so should not try to end the existing one either.
We had a logic bug where these steps would happen:
- `CREATE TABLE` executed successfully
- pread fault happens inside `ParseSchema` child stmt
- `handle_program_error()` is called
- `pager.end_tx()` returns immediately because `is_nested_stmt` is true
and we correctly no-op it.
- however, crucially: `handle_program_error()` then sets tx state to
None
- parent statement now catches error from nested stmt and calls
`handle_program_error()`, which calls `pager.end_tx()` again, and since
txn state is None, when it calls `rollback()` we panic on the assertion
`"dirty pages should be empty for read txn"`
## La solucion
Do not do _any_ error processing in `handle_program_error()` inside a
nested stmt. This means that the parent write txn is still active when
it processes the error from the child and we avoid this panic.

Closes #2772
2025-08-25 10:12:09 +03:00
Jussi Saurio
dc6bcd4d41 refactor/btree: rewrite find_free_cell() 2025-08-25 10:08:39 +03:00
Jussi Saurio
4ea8cd0007 refactor/btree: rewrite the free_cell_range() function
i had a rough time reading this function earlier and trying to understand it,
so rewrote it in a way that, to me, is much more readable.
2025-08-25 09:41:44 +03:00
Jussi Saurio
f2598a2dea Merge 'Remove Result from signature' from Mikaël Francoeur
This PR removes `Result<()>` from `Jsonb::write_to_string()`, since it
wasn't required. The method now returns `()`.

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2717
2025-08-25 09:17:37 +03:00
Jussi Saurio
54ff656c9d Do not clear txn state inside nested statement
If a connection does e.g. CREATE TABLE, it will start a "child statement"
to reparse the schema. That statement does not start its own transaction,
and so should not try to end the existing one either.

We had a logic bug where these steps would happen:

- `CREATE TABLE` executed successfully
- pread fault happens inside `ParseSchema` child stmt
- `handle_program_error()` is called
- `pager.end_tx()` returns immediately because `is_nested_stmt` is true
  and we correctly no-op it.
- however, crucially: `handle_program_error()` then sets tx state to None
- parent statement now catches error from nested stmt and calls
  `handle_program_error()`, which calls `pager.end_tx()` again, and since
  txn state is None, when it calls `rollback()` we panic on the assertion
 `"dirty pages should be empty for read txn"`

Solution:

Do not do _any_ error processing in `handle_program_error()` inside a nested
stmt. This means that the parent write txn is still active when it processes
the error from the child and we avoid this panic.
2025-08-25 08:49:22 +03:00
Pekka Enberg
2e11e1bb8c Merge 'Switch to F_FULLSYNC on Darwin' from
Closes #2743 .

Closes #2766
2025-08-25 08:09:29 +03:00