Pekka Enberg
0b91f8a715
Merge 'IO: handle errors properly in io_uring' from Preston Thorpe
...
Because `io_uring` may have many other I/O submission events queued
(that are relevant to the operation) when we experience an error,
marking our `Completion` objects as aborted is not sufficient, the
kernel will still execute queued I/O, which can mutate WAL or DB state
after we’ve declared failure and keep references (iovec arrays, buffers)
alive and stall reuse. We need to stop those in-flight SQEs at the
kernel and then drain the ring to a known-empty state before reusing any
resources.
The following methods were added to the `IO` trait:
`cancel`: which takes a slice of `Completion` objects and has a default
implementation that simply marks them as `aborted`.
`drain`: which has a default noop implementation, but the `io_uring`
backend implements this method to drain the ring.
CC @sivukhin
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com >
Closes #2787
2025-09-10 14:24:43 +03:00
Jussi Saurio
5c8afc5caf
pager: fix incorrect freelist page count bookkeeping
2025-09-10 14:02:17 +03:00
Jussi Saurio
11339fc941
Merge 'Fix clear_page_cache method and rollback' from Preston Thorpe
...
Previously we were iterating over every entry in the page cache,
clearing the dirty flag from each page.
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com >
Reviewed-by: Nikita Sivukhin (@sivukhin)
Closes #2988
2025-09-10 11:11:37 +03:00
PThorpe92
2f4f67efa8
Remove some unused attributes
2025-09-09 16:17:49 -04:00
PThorpe92
02bebf02a5
Remove read_entire_wal_dumb in favor of reading chunks
2025-09-09 16:06:27 -04:00
PThorpe92
cb12a1319d
Fix page cache clear method to not re-initialize every slot
2025-09-09 15:55:59 -04:00
PThorpe92
8cc4e7f7a0
Fix rollback method to stop using highly inefficient cache::clear_dirty
2025-09-09 13:28:17 -04:00
PThorpe92
f7471a22c0
Fix clear_page_cache method and stop iterating over every entry
2025-09-09 13:25:33 -04:00
PThorpe92
37ec77eec2
Fix read_entire_wal_dumb to prefer streaming read if over 32mb wal file
2025-09-09 13:12:58 -04:00
PThorpe92
ccae3ab0f2
Change callsites to cancel any further IO when an error occurs and drain
2025-09-08 13:18:40 -04:00
Pekka Enberg
71a812ce55
Merge 'Fix infinite loop when IO failure happens on allocating first page' from Preston Thorpe
...
closes #2919
Reviewed-by: Pedro Muniz (@pedrocarlo)
Closes #2968
2025-09-08 18:59:34 +03:00
PThorpe92
237b9fefd7
Fix infinite loop when IO failure happens on allocating first page
2025-09-08 11:49:33 -04:00
Pekka Enberg
081a7b563b
Merge 'Fix crash in Next opcode if cursor stack has no pages' from Jussi Saurio
...
Closes #2924
Unsure if this fix is that great, but it does fix the issue described in
#2924 -- added minimal regression test to illustrate the behavior
This crash requires a pretty specific set of circumstances:
- 3-way join with two innermost being left joins
- nullable seek key on the innermost table:
* middle table gets nulled out because no matches with the outermost
table
* hence when we seek the innermost table using middle table values,
the seek key is null, so `Insn::IsNull` entirely skips the innermost
table
Perhaps a bytecode plan illustrates this better:
```sql
turso> explain select a.x, b.x, c.x from a left join b on a.y=b.x left join c on b.y=c.x;
addr opcode p1 p2 p3 p4 p5 comment
---- ----------------- ---- ---- ---- ------------- -- -------
0 Init 0 34 0 0 Start at 34
1 OpenRead 0 2 0 0 table=a, root=2, iDb=0
2 OpenRead 1 4 0 0 table=b, root=4, iDb=0
3 OpenRead 2 5 0 0 index=sqlite_autoindex_b_1, root=5, iDb=0
4 OpenRead 3 7 0 0 index=sqlite_autoindex_c_1, root=7, iDb=0
5 Rewind 0 33 0 0 Rewind table a
6 Integer 0 4 0 0 r[4]=0
7 Column 0 1 6 0 r[6]=a.y
8 IsNull 6 28 0 0 if (r[6]==NULL) goto 28
9 SeekGE 2 28 6 0 key=[6..6]
10 IdxGT 2 28 6 0 key=[6..6]
11 DeferredSeek 2 1 0 0
12 Integer 1 4 0 0 r[4]=1
13 Integer 0 5 0 0 r[5]=0
14 Column 1 1 7 0 r[7]=b.y
-- if b.y is NULL, we skip the entire table loop between insns 16-23
-- except when we call NullRow and then Goto to re-enter that loop in order to
-- return NULL values for the table
15 IsNull 7 24 0 0 if (r[7]==NULL) goto 24
16 SeekGE 3 24 7 0 key=[7..7]
17 IdxGT 3 24 7 0 key=[7..7]
18 Integer 1 5 0 0 r[5]=1
19 Column 0 0 1 0 r[1]=a.x
20 Column 1 0 2 0 r[2]=b.x
21 Column 3 0 3 0 r[3]=sqlite_autoindex_c_1.x
22 ResultRow 1 3 0 0 output=r[1..3]
23 Next 3 17 0 0
24 IfPos 5 27 0 0 r[5]>0 -> r[5]-=0, goto 27
25 NullRow 3 0 0 0 Set cursor 3 to a (pseudo) NULL row
26 Goto 0 18 0 0
27 Next 2 10 0 0
28 IfPos 4 32 0 0 r[4]>0 -> r[4]-=0, goto 32
29 NullRow 1 0 0 0 Set cursor 1 to a (pseudo) NULL row
30 NullRow 2 0 0 0 Set cursor 2 to a (pseudo) NULL row
31 Goto 0 12 0 0
32 Next 0 6 0 0
33 Halt 0 0 0 0
34 Transaction 0 0 3 0 iDb=0 write=false
35 Goto 0 1 0 0
```
Reviewed-by: Preston Thorpe <preston@turso.tech >
Closes #2967
2025-09-08 17:45:29 +03:00
Jussi Saurio
5820f691af
fix: do not crash in Next if cursor stack has no pages
2025-09-08 16:54:35 +03:00
TcMits
3aa4650f06
make mr.clippy happy
2025-09-08 18:24:50 +07:00
TcMits
a6ff568530
reduce cloning 'Arc<Page>'
2025-09-08 18:00:18 +07:00
Jussi Saurio
c664639c09
Merge 'Add assertion: we read a page with the correct id' from Jussi Saurio
...
Part of debugging #2746 , but a good sanity check in any case.
Reviewed-by: Avinash Sajjanshetty (@avinassh)
Closes #2802
2025-09-08 09:52:31 +03:00
Jussi Saurio
2c6e48903e
Merge 'Prevent setting of encryption keys if already set' from Gaurav Sarma
...
Fixes https://github.com/tursodatabase/turso/issues/2883
<img width="867" height="128" alt="Screenshot 2025-09-05 at 10 44 18 PM"
src="https://github.com/user-attachments/assets/54a659ba-
cfe1-4622-939b-c7c31362ee5a" />
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com >
Reviewed-by: Avinash Sajjanshetty (@avinassh)
Closes #2914
2025-09-08 09:49:55 +03:00
Nikita Sivukhin
cd627c2368
remove unnecessary changes
2025-09-07 19:56:06 +04:00
Nikita Sivukhin
5b9fe0cdf3
fix
2025-09-07 19:56:06 +04:00
Nikita Sivukhin
0b6a6e7713
remove comma
2025-09-07 19:56:06 +04:00
Nikita Sivukhin
9aed831f2f
format
2025-09-07 19:56:05 +04:00
Nikita Sivukhin
db7c6b3370
try to speed up count(*) where 1 = 1
2025-09-07 19:55:42 +04:00
Nikita Sivukhin
c374cf0c93
remove Cell/RefCell from PageStack
2025-09-07 19:54:50 +04:00
Gaurav Sarma
b3242a18d9
Prevent setting of encryption keys if already set
2025-09-06 22:37:12 +08:00
PThorpe92
01d64977d7
Use more efficient circular list and rely on clock hand for pagecache
2025-09-05 22:40:27 -04:00
PThorpe92
644d0f270b
Add evict slot method in page cache
2025-09-05 16:13:30 -04:00
PThorpe92
b89513f031
remove useless saturating sub
2025-09-05 16:13:30 -04:00
PThorpe92
39a47d67e6
Apply PR suggestions
2025-09-05 16:13:29 -04:00
PThorpe92
f45a7538fe
Use true sieve/gclock algo instead of lru,dont link pages circilarly
2025-09-05 16:13:29 -04:00
PThorpe92
e418a902e5
Fix scoping issues now that refcells are gone to prevent extra destructors
2025-09-05 16:13:28 -04:00
PThorpe92
c85a61442f
Remove type alias in page cache
2025-09-05 16:13:28 -04:00
PThorpe92
5ba273eea5
remove unused impl for refbit
2025-09-05 16:13:28 -04:00
PThorpe92
246b62d513
Remove unnecessary refcells, as PageCacheEntry has interior mutability
2025-09-05 16:13:27 -04:00
PThorpe92
582e25241e
Implement GClock algorithm to distinguish between hot pages and scan touches
2025-09-05 16:13:27 -04:00
PThorpe92
254a0a9342
Apply fix and rename ignore_existing to upsert
2025-09-05 16:13:27 -04:00
PThorpe92
3a0b9b360a
Fix clippy warnings
2025-09-05 16:13:26 -04:00
PThorpe92
03d5598cfb
Use sieve algorithm in page cache in place of full LRU
2025-09-05 16:13:26 -04:00
Pere Diaz Bou
4ddf9c23de
core/pager: assert-ready-page-sanity fmt for jussi
2025-09-05 16:52:33 +02:00
Pere Diaz Bou
382a1e14ca
Merge 'core: handle edge cases for read_varint' from Sonny
...
Add handling malformed inputs to function `read_varint` and test cases.
```
# 9 byte truncated to 8
read_varint(&[0x81, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80])
before -> panic index out of bounds: the len is 8 but the index is 8
after -> LimboError
# bits set without end
read_varint(&[0x80; 9])
before -> Ok((128, 9))
after -> LimboError
```
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com >
Closes #2904
2025-09-05 16:15:42 +02:00
Pekka Enberg
6d80d862ee
Merge 'io_uring: prevent out of order operations that could interfere with durability' from Preston Thorpe
...
closes #1419
When submitting a `pwritev` for flushing dirty pages, in the case that
it's a commit frame, we use a new completion type which tells io_uring
to add a flag, which ensures the following:
1. If any operation in the chain fails, subsequent operations get
cancelled with -ECANCELED
2. All operations in the chain complete in order
If there is an ongoing chain of `IO_LINK`, it ends at the `fsync`
barrier, and ensures everything submitted before it has completed.
for 99% of the cases, the syscall that immediately proceeds the
`pwritev` is going to be the fsync, but just in case, this
implementation links everything that comes between the final commit
`pwritev` and the next `fsync`
In the event that we get a partial write, if it was linked, then we
submit an additional fsync after the partial write completes, with an
`IO_DRAIN` flag after forcing a `submit`, which will mean durability is
maintained, as that fsync will flush/drain everything in the squeue
before submission.
The other option in the event of partial writes on commit frames/linked
writes is to error.. not sure which is the right move here. I guess it's
possible that since the fsync completion fired, than the commit could be
over without us being durable ondisk. So maybe it's an assertion
instead? Thoughts?
Closes #2909
2025-09-05 08:34:35 +03:00
Pekka Enberg
5950003eaf
core: Simplify WalFileShared life cycle
...
Create one WalFileShared for a Database and update its state
accordingly. Also support case where the WAL is disabled.
2025-09-04 21:09:12 +03:00
PThorpe92
e3f366963d
Compute the final db page or make the commit frame submit a linked pwritev completion
2025-09-03 16:01:16 -04:00
sonhmai
2b6cb39c7e
core: handle edge cases for read_varint
2025-09-03 15:43:34 +07:00
Frank Denis
52d0a3bf4a
Make set_encryption_{context,cipher,key} fallible
2025-09-03 01:14:49 +02:00
Frank Denis
e3835afee5
Encryption: add support for other AEGIS and AES-GCM cipher variants
...
Now supported:
- AEGIS variants: 256, 256X2, 256X4, 128L, 128X2, 128X4
- AES-GCM variants: AES-128-GCM, AES-256-GCM
With minor changes in order to make it easy to add new
ciphers later regardless of their key size.
2025-09-02 23:46:58 +02:00
Pekka Enberg
2addeb5a9f
Merge 'introduce eq/contains/starts_with/ends_with_ignore_ascii_case macros' from Lâm Hoàng Phúc
...
depend on #2865
```sh
`ALTER TABLE _ RENAME TO _`/limbo_rename_table/
time: [10.100 ms 10.191 ms 10.283 ms]
change: [-16.770% -15.559% -14.417%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild
`ALTER TABLE _ RENAME COLUMN _ TO _`/limbo_rename_column/
time: [7.4829 ms 7.5492 ms 7.6128 ms]
change: [-19.397% -18.093% -16.789%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) low mild
1 (1.00%) high mild
`ALTER TABLE _ ADD COLUMN _`/limbo_add_column/
time: [5.3255 ms 5.3713 ms 5.4183 ms]
change: [-24.002% -22.612% -21.195%] (p = 0.00 < 0.05)
Performance has improved.
Found 39 outliers among 100 measurements (39.00%)
17 (17.00%) low severe
1 (1.00%) low mild
1 (1.00%) high mild
20 (20.00%) high severe
`ALTER TABLE _ DROP COLUMN _`/limbo_drop_column/
time: [5.8858 ms 5.9183 ms 5.9510 ms]
change: [-16.233% -14.679% -13.083%] (p = 0.00 < 0.05)
Performance has improved.
Found 25 outliers among 100 measurements (25.00%)
8 (8.00%) low severe
11 (11.00%) low mild
2 (2.00%) high mild
4 (4.00%) high severe
Prepare `SELECT 1`/limbo_parse_query/SELECT 1
time: [590.28 ns 591.31 ns 592.35 ns]
change: [-3.7810% -3.5059% -3.2444%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low severe
6 (6.00%) high mild
Prepare `SELECT * FROM users LIMIT 1`/limbo_parse_query/SELECT * FROM users LIMIT 1
time: [1.2569 µs 1.2582 µs 1.2596 µs]
change: [-5.0125% -4.7516% -4.4933%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) low severe
2 (2.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe
Prepare `SELECT first_name, count(1) FROM users GROUP BY first_name HAVING count(1) > 1 ORDER BY cou...
time: [3.7180 µs 3.7227 µs 3.7274 µs]
change: [-3.0557% -2.7642% -2.4761%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) low mild
4 (4.00%) high mild
Execute `SELECT 1`/limbo_execute_select_1
time: [27.455 ns 27.477 ns 27.499 ns]
change: [-2.9461% -2.7493% -2.5589%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/1
time: [410.53 ns 411.05 ns 411.54 ns]
change: [-15.364% -15.133% -14.912%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) low mild
1 (1.00%) high mild
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/10
time: [2.1100 µs 2.1122 µs 2.1145 µs]
change: [-11.517% -11.065% -10.662%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) low severe
2 (2.00%) low mild
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/50
time: [9.5156 µs 9.5268 µs 9.5383 µs]
change: [-10.284% -10.086% -9.8833%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low severe
2 (2.00%) low mild
Execute `SELECT * FROM users LIMIT ?`/limbo_execute_select_rows/100
time: [18.669 µs 18.698 µs 18.731 µs]
change: [-9.5949% -9.3407% -9.1140%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low severe
1 (1.00%) high mild
Execute `SELECT count() FROM users`/limbo_execute_select_count
time: [7.1027 µs 7.1098 µs 7.1170 µs]
change: [-43.739% -43.596% -43.469%] (p = 0.00 < 0.05)
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) low mild
5 (5.00%) high mild
2 (2.00%) high severe
```
Closes #2866
2025-09-02 18:35:14 +03:00
PThorpe92
cfadc4f579
Fix memory leak in page cache during balancing
2025-09-02 10:35:04 -04:00
TcMits
bfff05faba
merge main
2025-09-02 18:25:20 +07:00
Pekka Enberg
15d45e3f68
Merge 'Refactor encryption to manage authentication tag internally' from bit-aloo
...
This PR updates the internal encryption framework to handle
authentication tags explicitly rather than relying on the underlying
cipher libraries to append/verify them automatically.
closes : #2850
Reviewed-by: Avinash Sajjanshetty (@avinassh)
Closes #2858
2025-09-02 09:44:22 +03:00