Commit Graph

1390 Commits

Author SHA1 Message Date
Pekka Enberg
0b91f8a715 Merge 'IO: handle errors properly in io_uring' from Preston Thorpe
Because `io_uring` may have many other I/O submission events queued
(that are relevant to the operation) when we experience an error,
marking our `Completion` objects as aborted is not sufficient, the
kernel will still execute queued I/O, which can mutate WAL or DB state
after we’ve declared failure and keep references (iovec arrays, buffers)
alive and stall reuse. We need to stop those in-flight SQEs at the
kernel and then drain the ring to a known-empty state before reusing any
resources.
The following methods were added to the `IO` trait:
`cancel`: which takes a slice of `Completion` objects and has a default
implementation that simply marks them as `aborted`.
`drain`: which has a default noop implementation, but the `io_uring`
backend implements this method to drain the ring.
CC @sivukhin

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2787
2025-09-10 14:24:43 +03:00
Jussi Saurio
5c8afc5caf pager: fix incorrect freelist page count bookkeeping 2025-09-10 14:02:17 +03:00
Jussi Saurio
11339fc941 Merge 'Fix clear_page_cache method and rollback' from Preston Thorpe
Previously we were iterating over every entry in the page cache,
clearing the dirty flag from each page.

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Reviewed-by: Nikita Sivukhin (@sivukhin)

Closes #2988
2025-09-10 11:11:37 +03:00
PThorpe92
2f4f67efa8 Remove some unused attributes 2025-09-09 16:17:49 -04:00
PThorpe92
02bebf02a5 Remove read_entire_wal_dumb in favor of reading chunks 2025-09-09 16:06:27 -04:00
PThorpe92
cb12a1319d Fix page cache clear method to not re-initialize every slot 2025-09-09 15:55:59 -04:00
PThorpe92
8cc4e7f7a0 Fix rollback method to stop using highly inefficient cache::clear_dirty 2025-09-09 13:28:17 -04:00
PThorpe92
f7471a22c0 Fix clear_page_cache method and stop iterating over every entry 2025-09-09 13:25:33 -04:00
PThorpe92
37ec77eec2 Fix read_entire_wal_dumb to prefer streaming read if over 32mb wal file 2025-09-09 13:12:58 -04:00
PThorpe92
ccae3ab0f2 Change callsites to cancel any further IO when an error occurs and drain 2025-09-08 13:18:40 -04:00
Pekka Enberg
71a812ce55 Merge 'Fix infinite loop when IO failure happens on allocating first page' from Preston Thorpe
closes #2919

Reviewed-by: Pedro Muniz (@pedrocarlo)

Closes #2968
2025-09-08 18:59:34 +03:00
PThorpe92
237b9fefd7 Fix infinite loop when IO failure happens on allocating first page 2025-09-08 11:49:33 -04:00
Pekka Enberg
081a7b563b Merge 'Fix crash in Next opcode if cursor stack has no pages' from Jussi Saurio
Closes #2924
Unsure if this fix is that great, but it does fix the issue described in
#2924 -- added minimal regression test to illustrate the behavior
This crash requires a pretty specific set of circumstances:
- 3-way join with two innermost being left joins
- nullable seek key on the innermost table:
    * middle table gets nulled out because no matches with the outermost
table
    * hence when we seek the innermost table using middle table values,
the seek key is null, so `Insn::IsNull` entirely skips the innermost
table
Perhaps a bytecode plan illustrates this better:
```sql
turso> explain select a.x, b.x, c.x from a left join b on a.y=b.x left join c on b.y=c.x;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     34    0                    0   Start at 34
1     OpenRead           0     2     0                    0   table=a, root=2, iDb=0
2     OpenRead           1     4     0                    0   table=b, root=4, iDb=0
3     OpenRead           2     5     0                    0   index=sqlite_autoindex_b_1, root=5, iDb=0
4     OpenRead           3     7     0                    0   index=sqlite_autoindex_c_1, root=7, iDb=0
5     Rewind             0     33    0                    0   Rewind table a
6       Integer          0     4     0                    0   r[4]=0
7       Column           0     1     6                    0   r[6]=a.y
8       IsNull           6     28    0                    0   if (r[6]==NULL) goto 28
9       SeekGE           2     28    6                    0   key=[6..6]
10        IdxGT          2     28    6                    0   key=[6..6]
11        DeferredSeek   2     1     0                    0   
12        Integer        1     4     0                    0   r[4]=1
13        Integer        0     5     0                    0   r[5]=0
14        Column         1     1     7                    0   r[7]=b.y
-- if b.y is NULL, we skip the entire table loop between insns 16-23
-- except when we call NullRow and then Goto to re-enter that loop in order to
-- return NULL values for the table
15        IsNull         7     24    0                    0   if (r[7]==NULL) goto 24
16        SeekGE         3     24    7                    0   key=[7..7]
17          IdxGT        3     24    7                    0   key=[7..7]
18          Integer      1     5     0                    0   r[5]=1
19          Column       0     0     1                    0   r[1]=a.x
20          Column       1     0     2                    0   r[2]=b.x
21          Column       3     0     3                    0   r[3]=sqlite_autoindex_c_1.x
22          ResultRow    1     3     0                    0   output=r[1..3]
23        Next           3     17    0                    0   
24        IfPos          5     27    0                    0   r[5]>0 -> r[5]-=0, goto 27
25        NullRow        3     0     0                    0   Set cursor 3 to a (pseudo) NULL row
26        Goto           0     18    0                    0   
27      Next             2     10    0                    0   
28      IfPos            4     32    0                    0   r[4]>0 -> r[4]-=0, goto 32
29      NullRow          1     0     0                    0   Set cursor 1 to a (pseudo) NULL row
30      NullRow          2     0     0                    0   Set cursor 2 to a (pseudo) NULL row
31      Goto             0     12    0                    0   
32    Next               0     6     0                    0   
33    Halt               0     0     0                    0   
34    Transaction        0     0     3                    0   iDb=0 write=false
35    Goto               0     1     0                    0
```

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #2967
2025-09-08 17:45:29 +03:00
Jussi Saurio
5820f691af fix: do not crash in Next if cursor stack has no pages 2025-09-08 16:54:35 +03:00
TcMits
3aa4650f06 make mr.clippy happy 2025-09-08 18:24:50 +07:00
TcMits
a6ff568530 reduce cloning 'Arc<Page>' 2025-09-08 18:00:18 +07:00
Jussi Saurio
c664639c09 Merge 'Add assertion: we read a page with the correct id' from Jussi Saurio
Part of debugging #2746 , but a good sanity check in any case.

Reviewed-by: Avinash Sajjanshetty (@avinassh)

Closes #2802
2025-09-08 09:52:31 +03:00
Jussi Saurio
2c6e48903e Merge 'Prevent setting of encryption keys if already set' from Gaurav Sarma
Fixes https://github.com/tursodatabase/turso/issues/2883
<img width="867" height="128" alt="Screenshot 2025-09-05 at 10 44 18 PM"
src="https://github.com/user-attachments/assets/54a659ba-
cfe1-4622-939b-c7c31362ee5a" />

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Reviewed-by: Avinash Sajjanshetty (@avinassh)

Closes #2914
2025-09-08 09:49:55 +03:00
Nikita Sivukhin
cd627c2368 remove unnecessary changes 2025-09-07 19:56:06 +04:00
Nikita Sivukhin
5b9fe0cdf3 fix 2025-09-07 19:56:06 +04:00
Nikita Sivukhin
0b6a6e7713 remove comma 2025-09-07 19:56:06 +04:00
Nikita Sivukhin
9aed831f2f format 2025-09-07 19:56:05 +04:00
Nikita Sivukhin
db7c6b3370 try to speed up count(*) where 1 = 1 2025-09-07 19:55:42 +04:00
Nikita Sivukhin
c374cf0c93 remove Cell/RefCell from PageStack 2025-09-07 19:54:50 +04:00
Gaurav Sarma
b3242a18d9 Prevent setting of encryption keys if already set 2025-09-06 22:37:12 +08:00
PThorpe92
01d64977d7 Use more efficient circular list and rely on clock hand for pagecache 2025-09-05 22:40:27 -04:00
PThorpe92
644d0f270b Add evict slot method in page cache 2025-09-05 16:13:30 -04:00
PThorpe92
b89513f031 remove useless saturating sub 2025-09-05 16:13:30 -04:00
PThorpe92
39a47d67e6 Apply PR suggestions 2025-09-05 16:13:29 -04:00
PThorpe92
f45a7538fe Use true sieve/gclock algo instead of lru,dont link pages circilarly 2025-09-05 16:13:29 -04:00
PThorpe92
e418a902e5 Fix scoping issues now that refcells are gone to prevent extra destructors 2025-09-05 16:13:28 -04:00
PThorpe92
c85a61442f Remove type alias in page cache 2025-09-05 16:13:28 -04:00
PThorpe92
5ba273eea5 remove unused impl for refbit 2025-09-05 16:13:28 -04:00
PThorpe92
246b62d513 Remove unnecessary refcells, as PageCacheEntry has interior mutability 2025-09-05 16:13:27 -04:00
PThorpe92
582e25241e Implement GClock algorithm to distinguish between hot pages and scan touches 2025-09-05 16:13:27 -04:00
PThorpe92
254a0a9342 Apply fix and rename ignore_existing to upsert 2025-09-05 16:13:27 -04:00
PThorpe92
3a0b9b360a Fix clippy warnings 2025-09-05 16:13:26 -04:00
PThorpe92
03d5598cfb Use sieve algorithm in page cache in place of full LRU 2025-09-05 16:13:26 -04:00
Pere Diaz Bou
4ddf9c23de core/pager: assert-ready-page-sanity fmt for jussi 2025-09-05 16:52:33 +02:00
Pere Diaz Bou
382a1e14ca Merge 'core: handle edge cases for read_varint' from Sonny
Add handling malformed inputs to function `read_varint` and test cases.
```
# 9 byte truncated to 8
read_varint(&[0x81, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80]) 
    before -> panic index out of bounds: the len is 8 but the index is 8
    after -> LimboError
    
# bits set without end
read_varint(&[0x80; 9]) 
    before -> Ok((128, 9))
    after -> LimboError
```

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2904
2025-09-05 16:15:42 +02:00
Pekka Enberg
6d80d862ee Merge 'io_uring: prevent out of order operations that could interfere with durability' from Preston Thorpe
closes #1419
When submitting a `pwritev` for flushing dirty pages, in the case that
it's a commit frame, we use a new completion type which tells io_uring
to add a flag, which ensures the following:
1. If any operation in the chain fails, subsequent operations get
cancelled with -ECANCELED
2. All operations in the chain complete in order
If there is an ongoing chain of `IO_LINK`, it ends at the `fsync`
barrier, and ensures everything submitted before it has completed.
for 99% of the cases, the syscall that immediately proceeds the
`pwritev` is going to be the fsync, but just in case, this
implementation links everything that comes between the final commit
`pwritev` and the next `fsync`
In the event that we get a partial write, if it was linked, then we
submit an additional fsync after the partial write completes, with an
`IO_DRAIN` flag after forcing a `submit`, which will mean durability is
maintained, as that fsync will flush/drain everything in the squeue
before submission.
The other option in the event of partial writes on commit frames/linked
writes is to error.. not sure which is the right move here. I guess it's
possible that since the fsync completion fired, than the commit could be
over without us being durable ondisk. So maybe it's an assertion
instead? Thoughts?

Closes #2909
2025-09-05 08:34:35 +03:00
Pekka Enberg
5950003eaf core: Simplify WalFileShared life cycle
Create one WalFileShared for a Database and update its state
accordingly. Also support case where the WAL is disabled.
2025-09-04 21:09:12 +03:00
PThorpe92
e3f366963d Compute the final db page or make the commit frame submit a linked pwritev completion 2025-09-03 16:01:16 -04:00
sonhmai
2b6cb39c7e core: handle edge cases for read_varint 2025-09-03 15:43:34 +07:00
Frank Denis
52d0a3bf4a Make set_encryption_{context,cipher,key} fallible 2025-09-03 01:14:49 +02:00
Frank Denis
e3835afee5 Encryption: add support for other AEGIS and AES-GCM cipher variants
Now supported:

- AEGIS variants: 256, 256X2, 256X4, 128L, 128X2, 128X4
- AES-GCM variants: AES-128-GCM, AES-256-GCM

With minor changes in order to make it easy to add new
ciphers later regardless of their key size.
2025-09-02 23:46:58 +02:00
Pekka Enberg
2addeb5a9f Merge 'introduce eq/contains/starts_with/ends_with_ignore_ascii_case macros' from Lâm Hoàng Phúc
depend on #2865
```sh
`ALTER TABLE _ RENAME TO _`/limbo_rename_table/
                        time:   [10.100 ms 10.191 ms 10.283 ms]
                        change: [-16.770% -15.559% -14.417%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

`ALTER TABLE _ RENAME COLUMN _ TO _`/limbo_rename_column/
                        time:   [7.4829 ms 7.5492 ms 7.6128 ms]
                        change: [-19.397% -18.093% -16.789%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) low mild
  1 (1.00%) high mild

`ALTER TABLE _ ADD COLUMN _`/limbo_add_column/
                        time:   [5.3255 ms 5.3713 ms 5.4183 ms]
                        change: [-24.002% -22.612% -21.195%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 39 outliers among 100 measurements (39.00%)
  17 (17.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
  20 (20.00%) high severe

`ALTER TABLE _ DROP COLUMN _`/limbo_drop_column/
                        time:   [5.8858 ms 5.9183 ms 5.9510 ms]
                        change: [-16.233% -14.679% -13.083%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 25 outliers among 100 measurements (25.00%)
  8 (8.00%) low severe
  11 (11.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe

Prepare `SELECT 1`/limbo_parse_query/SELECT 1
                        time:   [590.28 ns 591.31 ns 592.35 ns]
                        change: [-3.7810% -3.5059% -3.2444%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  6 (6.00%) high mild

Prepare `SELECT * FROM users LIMIT 1`/limbo_parse_query/SELECT * FROM users LIMIT 1
                        time:   [1.2569 µs 1.2582 µs 1.2596 µs]
                        change: [-5.0125% -4.7516% -4.4933%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

Prepare `SELECT first_name, count(1) FROM users GROUP BY first_name HAVING count(1) > 1 ORDER BY cou...
                        time:   [3.7180 µs 3.7227 µs 3.7274 µs]
                        change: [-3.0557% -2.7642% -2.4761%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild

Execute `SELECT 1`/limbo_execute_select_1
                        time:   [27.455 ns 27.477 ns 27.499 ns]
                        change: [-2.9461% -2.7493% -2.5589%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/1
                        time:   [410.53 ns 411.05 ns 411.54 ns]
                        change: [-15.364% -15.133% -14.912%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) low mild
  1 (1.00%) high mild

Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/10
                        time:   [2.1100 µs 2.1122 µs 2.1145 µs]
                        change: [-11.517% -11.065% -10.662%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild

Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/50
                        time:   [9.5156 µs 9.5268 µs 9.5383 µs]
                        change: [-10.284% -10.086% -9.8833%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild

Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/100
                        time:   [18.669 µs 18.698 µs 18.731 µs]
                        change: [-9.5949% -9.3407% -9.1140%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low severe
  1 (1.00%) high mild

Execute `SELECT count() FROM users`/limbo_execute_select_count
                        time:   [7.1027 µs 7.1098 µs 7.1170 µs]
                        change: [-43.739% -43.596% -43.469%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

```

Closes #2866
2025-09-02 18:35:14 +03:00
PThorpe92
cfadc4f579 Fix memory leak in page cache during balancing 2025-09-02 10:35:04 -04:00
TcMits
bfff05faba merge main 2025-09-02 18:25:20 +07:00
Pekka Enberg
15d45e3f68 Merge 'Refactor encryption to manage authentication tag internally' from bit-aloo
This PR updates the internal encryption framework to handle
authentication tags explicitly rather than relying on the underlying
cipher libraries to append/verify them automatically.
closes: #2850

Reviewed-by: Avinash Sajjanshetty (@avinassh)

Closes #2858
2025-09-02 09:44:22 +03:00