Commit Graph

7064 Commits

Author SHA1 Message Date
Jussi Saurio
1e59165ea6 Merge 'More State Machines in preparation for tracking IO Completions' from Pedro Muniz
More changes. I want to avoid big PRs, so doing these changes in small
increments. I think in like 2 PRs after this one, I will be able make
the change effectively.

Closes #2400
2025-08-05 00:00:09 +03:00
Jussi Saurio
4cf02dfd14 Merge 'coalesce any adjacent buffers from writev calls into fewer iovecs' from Preston Thorpe
In `io_uring` and `unix` IO backends, we can check if our buffers are
sequential in memory and reduce the number of iovecs per call. Although
this is highly unlikely to actually happen at the moment due to our
buffer pool implementation.
Later on, when #2419 is merged, we will be able to specifically request
runs of contiguous buffers, so that our `writev` calls will (in the
ideal case) be coalesced into a single `pwrite` or preferrably
`WriteFixed` operation on the io_uring backend.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2436
2025-08-04 23:54:57 +03:00
Jussi Saurio
13219dbf87 Merge 'extend raw WAL API with few more methods' from Nikita Sivukhin
This PR extends raw WAL API with few methods which will be helpful for
offline-sync:
1. `try_wal_watermark_read_page` - try to read page from the DB with
given WAL watermark value\
    * Usually, WAL max_frame is set automatically to the latest value
(`shared.max_frame`) when transaction is started and then this
"watermark" is preserved throughout whole transaction
    * New method allows to simulate "read from the past" by controlling
frame watermark explicitly
    * There is an alternative to implement some API like
`start_read_session(frame_watermark: u64)` - but I decided to expose
just single method to simplify the logic and reduce "surface" of actions
which can be executed in this "controllable" manner
    * Also, for simplicity, now `try_wal_watermark_read_page` always
read data from disk and bypass any cached values (and also do not
populate the cache)
2. `wal_changed_pages_after` - return set of unique pages changed after
watermark WAL position in the current WAL session
With these 2 methods we can implement `REVERT frame_watermark` logic
which will just fetch all changed pages first, and then revert them to
the previous value by using `try_wal_watermark_read_page` and
`wal_insert_frame` methods (see `test_wal_api_revert_pages` test).
Note, that if there were schema changes - than `REVERT` logic described
above can bring connection to the inconsistent state, as it will
preserve schema information in memory and will still think that table
exist (while it can be reverted). This should be considered by any
consumer of this new methods.

Closes #2433
2025-08-04 23:53:46 +03:00
PThorpe92
2a3fa0955f Attempt to coalesce contiguous iovecs during pwritev operation for unix IO 2025-08-04 16:18:19 -04:00
PThorpe92
b76ef20f4c Attempt to coalesce contiguous iovecs during pwritev operation for io_uring 2025-08-04 16:18:05 -04:00
Preston Thorpe
3835ad76c3 Merge 'Add bitmap to track pages in arenas' from Preston Thorpe
Really want to get this through independently of #2419 because it's
being added in that PR and don't want to distract from that otherwise
and make that more difficult to review. Added a bunch of tests
The use-case is too lightweight for bringing in something like Roaring

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2420
2025-08-04 15:26:51 -04:00
PThorpe92
6cbc8ff868 Replace values with constants 2025-08-04 15:14:06 -04:00
PThorpe92
73d1fdef14 Fix and change bitmap, apply suggestions and add some optimizations 2025-08-04 14:58:58 -04:00
PThorpe92
f4197f1eb5 change debug assertions to turso asserts 2025-08-04 14:55:48 -04:00
PThorpe92
54696d2f0d Add additional test for edge cases 2025-08-04 14:55:48 -04:00
PThorpe92
5378195ad6 Add page bitmap to storage mod.rs 2025-08-04 14:55:48 -04:00
PThorpe92
3e30335ea5 Add tests for PageBitmap 2025-08-04 14:55:48 -04:00
PThorpe92
7b1f908c00 Add PageBitmap for use with arena page allocator 2025-08-04 14:55:48 -04:00
pedrocarlo
f2d84a534c adjust clear_overflow_pages 2025-08-04 15:28:06 -03:00
pedrocarlo
718ad5e7fd btree_destroy retunrn IO 2025-08-04 14:12:51 -03:00
pedrocarlo
e0978844e6 adjust integrity_check 2025-08-04 14:12:50 -03:00
Nikita Sivukhin
443c177d13 fix test for windows 2025-08-04 21:00:29 +04:00
Jussi Saurio
7045d44fdc Merge 'fix/wal: remove start_pages_in_frames_hack to prevent checkpoint data loss' from Jussi Saurio
Closes #2421
## Background
We have some kind of transaction-local hack (`start_pages_in_frames`)
for bookkeeping how many pages are currently in the in-memory WAL frame
cache, I assume for performance reasons or whatever.
`wal.rollback()` clears all the frames from `shared.frame_cache` that
the rollbacking tx is allowed to clear, and then truncates
`shared.pages_in_frames` to however much its local
`start_pages_in_frames` value was.
## Problem
In `complete_append_frame`, we check if `frame_cache` has that key
(page) already, and if not, we add it to `pages_in_frames`.
However, `wal.rollback()` never _removes_ the key (page) if its value is
empty, so we can end up in a scenario where the `frame_cache` key for
`page P` exists but has no frames, and so `page P` does not get added to
`pages_in_frames` in `complete_append_frame`.
This leads to a checkpoint data loss scenario:
- transaction rolls back, has start_pages_in_frames=0, so truncates
shared pages_in_frames to an empty vec. let's say `page P` key in
`frame_cache` still remains but it has no frames.
- The next time someone commits a frame for `page P`, it does NOT get
added to `pages_in_frames` because `frame_cache` has that key (although
the value vector is empty)
- At some point, a checkpoint checkpoints `n` frames, but since
`pages_in_frames` does not have `page P`, it doesn't actually checkpoint
it and all the "checkpointed" frames are simply thrown away
- very similar to the scenario in #2366
## Fix
Remove the `start_pages_in_frames` hack entirely and just make
`pages_in_frames` effectively the same as `frame_cache.keys`. I think we
could also just get rid of `pages_in_frames` and just use
`frame_cache.contains_key(p)` but maybe Pere can chime in here

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2422
2025-08-04 19:49:55 +03:00
pedrocarlo
aa05616845 fix tests 2025-08-04 13:08:30 -03:00
pedrocarlo
5f52d9b6b4 state machine for count 2025-08-04 13:00:43 -03:00
pedrocarlo
1585d5cbee state machine for 'next' and prev 2025-08-04 13:00:43 -03:00
pedrocarlo
f1df9a909e state machine for 'rewind' 2025-08-04 12:59:52 -03:00
Jussi Saurio
506bb5f67f Merge 'Direct schema mutation – add instruction' from Levy A.
Resolves #2378.
```
`ALTER TABLE _ RENAME TO _`/limbo_rename_table/
                        time:   [15.645 ms 15.741 ms 15.850 ms]
Found 12 outliers among 100 measurements (12.00%)
  8 (8.00%) high mild
  4 (4.00%) high severe
`ALTER TABLE _ RENAME TO _`/sqlite_rename_table/
                        time:   [34.728 ms 35.260 ms 35.955 ms]
Found 15 outliers among 100 measurements (15.00%)
  8 (8.00%) high mild
  7 (7.00%) high severe
  ```
<img width="1000" height="199" alt="image" src="https://github.com/user-
attachments/assets/ad943355-b57d-43d9-8a84-850461b8af41" />

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2399
2025-08-04 16:55:38 +03:00
Jussi Saurio
1813171b91 Merge 'Use pwrite for single buffer pwritev call in unix IO' from Preston Thorpe
Closes #2416
2025-08-04 16:52:14 +03:00
Jussi Saurio
5a06411ce6 Merge 'fix/core/translate: ALTER TABLE DROP COLUMN: ensure schema cookie is updated even when target table is empty' from Jussi Saurio
Closes #2431
Discovered while fuzzing #2086
## What
We update `schema_version` whenever the schema changes
## Problem
Probably unintentionally, we were calling `SetCookie` in a loop for each
row in the target table, instead of only once at the end. This means 2
things:
- For large `n`, this is a lot of unnecessary instructions
- For `n==0`, `SetCookie` doesn't get called at all -> the schema won't
get marked as having been updated -> conns can operate on a stale schema
## Fix
Lift `SetCookie` out of the loop

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #2432
2025-08-04 16:51:24 +03:00
Nikita Sivukhin
33b814054b fix tests for windows 2025-08-04 17:37:22 +04:00
Nikita Sivukhin
9694366645 add one more assert 2025-08-04 17:23:34 +04:00
Nikita Sivukhin
76bdf0c1ab small fixes 2025-08-04 17:02:53 +04:00
Nikita Sivukhin
2e23230e79 extend raw WAL API with few more methods
- try_wal_watermark_read_page - try to read page from the DB with given WAL watermark value
- wal_changed_pages_after - return set of unique pages changed after watermark WAL position
2025-08-04 16:55:50 +04:00
Jussi Saurio
8a1723b3c8 fix/core/translate: ALTER TABLE DROP COLUMN: ensure schema cookie is updated even when target table is empty 2025-08-04 15:05:00 +03:00
Pekka Enberg
e4accdc29d Merge 'hide dangerous methods behind conn_raw_api feature' from Nikita Sivukhin
WAL API shouldn't be exposed by default because this is relatively
dangerous API which we use internally and ordinary users shouldn't not
be interested in it.

Reviewed-by: Pekka Enberg <penberg@iki.fi>

Closes #2424
2025-08-04 14:52:40 +03:00
Pekka Enberg
1572285ee6 Merge 'preserve files in IO memory backend' from Nikita Sivukhin
Simple PR to preserve and reuse files in memory IO

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2428
2025-08-04 14:52:24 +03:00
Pekka Enberg
5b5bc441b7 Merge ' core/mvcc: fix new rowid on restart' from Pere Diaz Bou
Next rowid was being tracked globally for all tables and restarted to 0
every time database was opened

Closes #2425
2025-08-04 14:37:13 +03:00
Nikita Sivukhin
129895f0b2 preserve files in IO memory backend 2025-08-04 15:22:04 +04:00
Pere Diaz Bou
56240ddac9 core/mvcc: add restart tests 2025-08-04 12:31:17 +02:00
Pere Diaz Bou
f26e442597 core/mvcc: fix new rowid
next rowid was being tracked globally for all tables and restarted to 0
every time database was opened
2025-08-04 12:31:17 +02:00
Pere Diaz Bou
83a658d3d6 core/mvcc: add option to test with a random file and restart it 2025-08-04 12:31:17 +02:00
Nikita Sivukhin
83b1e99a61 fix compilation 2025-08-04 12:53:07 +04:00
Nikita Sivukhin
0adb40534c hind dangerous methods behind conn_raw_api feature 2025-08-04 12:40:28 +04:00
Jussi Saurio
4f3f66d55e fix/wal: remove start_pages_in_frames_hack to prevent checkpoint data loss
We have some kind of transaction-local hack (`start_pages_in_frames`) for bookkeeping
how many pages are currently in the in-memory WAL frame cache,
I assume for performance reasons or whatever.

`wal.rollback()` clears all the frames from `shared.frame_cache` that the rollbacking tx is
allowed to clear, and then truncates `shared.pages_in_frames` to however much its local
`start_pages_in_frames` value was.

In `complete_append_frame`, we check if `frame_cache` has that key (page) already, and if not,
we add it to `pages_in_frames`.

However, `wal.rollback()` never _removes_ the key (page) if its value is empty, so we can end
up in a scenario where the `frame_cache` key for `page P` exists but has no frames, and so `page P`
does not get added to `pages_in_frames` in `complete_append_frame`.

This leads to a checkpoint data loss scenario:

- transaction rolls back, has start_pages_in_frames=0, so truncates
  shared pages_in_frames to an empty vec. let's say `page P` key in `frame_cache` still remains
  but it has no frames.
- The next time someone commits a frame for `page P`, it does NOT get added to `pages_in_frames`
  because `frame_cache` has that key
- At some point, a PASSIVE checkpoint checkpoints `n` frames, but since `pages_in_frames` does not have
  `page P`, it doesn't actually checkpoint it and all the "checkpointed" frames are simply thrown away
- very similar to the scenario in #2366

Remove the `start_pages_in_frames` hack entirely and just make `pages_in_frames` effectively
the same as `frame_cache.keys`. I think we could also just get rid of `pages_in_frames` and just use
`frame_cache.contains_key(p)` but maybe Pere can chime in here
2025-08-04 10:35:12 +03:00
Pekka Enberg
deec70e541 Merge 'Improve SQLite3 TCL test suite' from Pekka Enberg
Add more stubs in tester.tcl so that the test suite does not bail out
early.

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #2405
2025-08-04 08:43:14 +03:00
Pekka Enberg
ca14799da5 Merge 'Make completions idempotent' from Preston Thorpe
Closes #2417
2025-08-04 08:42:42 +03:00
Pekka Enberg
3691b51039 Merge 'perf/btree: skip seek in move_to_rightmost() if we are already on rightmost page' from Jussi Saurio
## Background
When we get a new rowid using `op_new_rowid()`, we move to the end of
the btree to look at what the maximum rowid currently is, and then
increment it by one.
This requires a btree seek.
## Problem
If we were already on the rightmost page, this is a lot of unnecessary
work, including potentially a few page reads from disk (although to be
fair the ancestor pages are very likely to be in cache at this point.)
## Fix
Cache the rightmost page id whenever we enter it in
`move_to_rightmost()`, and invalidate it whenever we do a balancing
operation.
## Local benchmark results
```sql
Insert rows in batches/limbo_insert_1_rows
                        time:   [23.333 µs 27.718 µs 35.801 µs]
                        change: [-7.7924% +0.8805% +12.841%] (p = 0.91 > 0.05)
                        No change in performance detected.
Insert rows in batches/limbo_insert_10_rows
                        time:   [38.204 µs 38.381 µs 38.568 µs]
                        change: [-8.7188% -7.4786% -6.1955%] (p = 0.00 < 0.05)
                        Performance has improved.
Insert rows in batches/limbo_insert_100_rows
                        time:   [158.39 µs 165.06 µs 178.37 µs]
                        change: [-21.000% -18.789% -15.666%] (p = 0.00 < 0.05)
                        Performance has improved.

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #2409
2025-08-04 08:41:51 +03:00
PThorpe92
79629daff4 Make completions idempotent 2025-08-02 21:48:39 -04:00
Levy A.
b9a3a93ef0 fix: clippy 2025-08-02 20:06:05 -03:00
PThorpe92
b5117ac5c7 Use pwrite for single buffer in unix IO 2025-08-02 18:34:16 -04:00
Levy A.
b14a11a2fd fix: change name for schema btree + fix benchmark 2025-08-02 17:17:36 -03:00
Jussi Saurio
130e1f80ea fix/vdbe: call seek_to_last() only once in op_new_rowid 2025-08-02 14:18:58 +03:00
Jussi Saurio
63a5ef596b perf/btree: skip seek in move_to_rightmost() if we are already on rightmost page 2025-08-02 13:56:59 +03:00
Pekka Enberg
a4dcbeb392 Update PERF.md 2025-08-02 13:37:42 +03:00