Commit Graph

131 Commits

Author SHA1 Message Date
Preston Thorpe
497808a40c Merge 'eliminate the need for another Once in Completion' from Pedro Muniz
I added the `Once` before so fix a bug, but it was a bit hackery. We can
`get_or_init` to achieve the same purpose, and the code becomes much
cleaner. `get_or_init` guarantees the init will happen only once as
well.

Reviewed-by: Preston Thorpe <preston@turso.tech>
Reviewed-by: bit-aloo (@Shourya742)

Closes #3578
2025-10-06 19:52:41 -04:00
pedrocarlo
2ce0e9db57 eliminate the need for another Once in Completion 2025-10-06 11:10:41 -03:00
pedrocarlo
5a7390735d rename Completion functions 2025-10-06 11:07:06 -03:00
Pekka Enberg
c27b167c6d core/io: Add completion group API for managing multiple I/O operations
Introduces a completion group abstraction that allows grouping multiple
I/O completions together for coordinated tracking and error handling.
This enables:

- Tracking completion status of multiple I/O operations as a group
- Detecting when all operations in a group have finished
- Aborting all operations in a group atomically
- Retrieving errors from any completion in the group

The implementation uses intrusive linked lists for efficient membership
tracking and atomic counters for outstanding operation counts. Each
completion can be linked to a group using the new .link() method.

This lays the groundwork for batch I/O operations and coordinated
transaction handling in the storage layer.
2025-10-06 07:33:31 +03:00
pedrocarlo
ffeb26b24a only ever call callbacks once 2025-09-21 14:36:18 -03:00
Pekka Enberg
8337e86794 core: Use sequential consistency for atomics by default
We use relaxed ordering in a lot of places where we really need to
ensure all CPUs see the write. Switch to sequential consistency, unless
acquire/release is explicitly used. If there are places that can be
optimized, we can switch to relaxed case-by-case, but have a comment
explaning *why* it is safe.
2025-09-18 13:38:13 +03:00
Pekka Enberg
2131a04b7d core: Rename IO::run_once() to IO::step()
The `run_once()` name is just a historical accident. Furthermore, it now
started to appear elsewhere as well, so let's just call it IO::step() as we
should have from the beginning.
2025-09-10 14:36:02 +03:00
PThorpe92
02df372811 Add cancel and drain methods to IO trait 2025-09-08 13:18:03 -04:00
Pekka Enberg
6d80d862ee Merge 'io_uring: prevent out of order operations that could interfere with durability' from Preston Thorpe
closes #1419
When submitting a `pwritev` for flushing dirty pages, in the case that
it's a commit frame, we use a new completion type which tells io_uring
to add a flag, which ensures the following:
1. If any operation in the chain fails, subsequent operations get
cancelled with -ECANCELED
2. All operations in the chain complete in order
If there is an ongoing chain of `IO_LINK`, it ends at the `fsync`
barrier, and ensures everything submitted before it has completed.
for 99% of the cases, the syscall that immediately proceeds the
`pwritev` is going to be the fsync, but just in case, this
implementation links everything that comes between the final commit
`pwritev` and the next `fsync`
In the event that we get a partial write, if it was linked, then we
submit an additional fsync after the partial write completes, with an
`IO_DRAIN` flag after forcing a `submit`, which will mean durability is
maintained, as that fsync will flush/drain everything in the squeue
before submission.
The other option in the event of partial writes on commit frames/linked
writes is to error.. not sure which is the right move here. I guess it's
possible that since the fsync completion fired, than the commit could be
over without us being durable ondisk. So maybe it's an assertion
instead? Thoughts?

Closes #2909
2025-09-05 08:34:35 +03:00
Nikita Sivukhin
4a3d3b3b8c mark completion as done only after callback will be executed
- otherwise, in multi-threading environment, other thread can think that completion is finished
  and start execution
- this can lead to violated assertions (for example, page must be loaded, but as callback is not executed yet
  assert will be fired)
2025-09-04 23:48:08 +04:00
PThorpe92
3831218f6c Add linked to completion types in io/mod 2025-09-03 16:01:16 -04:00
Pekka Enberg
87d3f74e6e Merge 'Evict page from cache if page is unlocked and unloaded' from Pedro Muniz
Because we can abort a read_page completion, this means a page can be in
the cache but be unloaded and unlocked. However, if we do not evict that
page from the page cache, we will return an unloaded page later which
will trigger assertions later on. This is worsened by the fact that page
cache is not per `Statement`, so you can abort a completion in one
Statement, and trigger some error in the next one if we don't evict the
page in these circumstances.
Also, to propagate IO errors we need to return the Error from
IOCompletions on step.

Closes #2785
2025-09-02 09:08:12 +03:00
pedrocarlo
53cfae1db4 return Error from step if IO failed 2025-09-01 11:10:39 -03:00
PThorpe92
0a56d23402 Use u64 for file offsets in IO and calculate such offsets in u64 2025-08-28 09:44:00 -04:00
PThorpe92
177c717f25 Remove windows IO in place of Generic IO 2025-08-25 18:47:21 -04:00
Pekka Enberg
5fe5e1548b core/io: Fix build on Android and iOS
Commit ebe6aa0d28 ("adjust cfg for unix
and linux IO") adjusted the I/O conditional compilation, but forgot that
Android and iOS are also part of Unix target family.

Fixes #2500
2025-08-25 11:21:46 +03:00
Nikita Sivukhin
c771487933 add remove_file method to the IO 2025-08-21 14:51:02 +04:00
Avinash Sajjanshetty
bd9b4bbfd2 encrypt/decrypt when writing/reading from DB 2025-08-20 11:47:23 +05:30
Jussi Saurio
b5439dd068 Remove assertions from Completion::complete() and Completion::error()
The completion callback can be invoked only once via `OnceLock`, let's not
crash if we e.g. call `Completion::abort()` on an already finished completion.

Closes #2673
2025-08-19 22:02:02 +03:00
pedrocarlo
66171527b4 thread safely store the result of completion 2025-08-19 10:48:21 -03:00
pedrocarlo
de1811dea7 abort completions on error 2025-08-19 10:48:21 -03:00
pedrocarlo
ab3b68e360 change completion callbacks to take a Result param + create separate functions to declare a completion errored 2025-08-19 10:48:21 -03:00
pedrocarlo
2d6fad5ea3 nit: adjust order of struct completions 2025-08-19 10:48:21 -03:00
pedrocarlo
fadf78fe67 use a dedicated Error enum for Completion Error 2025-08-19 10:48:21 -03:00
pedrocarlo
7bc0545442 default impl for get_memory_io 2025-08-19 10:48:21 -03:00
pedrocarlo
d5a59c6bee default impl for generate_random_number 2025-08-19 10:48:21 -03:00
pedrocarlo
f72bcbc5da default impl for wait_for_completion + check for errors in completion there 2025-08-19 10:48:21 -03:00
pedrocarlo
002390b5a5 store error inside Completion 2025-08-19 10:48:21 -03:00
PThorpe92
cc2fed3297 Remove copy_to API from file IO trait 2025-08-14 21:31:13 -04:00
PThorpe92
55f09a01c4 Update copy_to method in file trait to separate source and destination IO 2025-08-14 21:31:13 -04:00
Jussi Saurio
359cba0474 Use BufferPool owned by Database instead of a static global
Problem

There are several problems with our current statically allocated
`BufferPool`.

1. You cannot open two databases in the same process with different
page sizes, because the `BufferPool`'s `Arena`s will be locked forever
into the page size of the first database. This is the case regardless
of whether the two `Database`s are open at the same time, or if the first
is closed before the second is opened.

2. It is impossible to even write Rust tests for different page sizes because
of this, assuming the test uses a single process.

Solution

Make `Database` own `BufferPool` instead of it being statically allocated, so this
problem goes away.

Note that I didn't touch the still statically-allocated `TEMP_BUFFER_CACHE`, because
it should continue to work regardless of this change. It should only be a problem if
the user has two or more databases with different page sizes open simultaneously, because
`TEMP_BUFFER_CACHE` will only support one pool of a given page size at a time, so the rest
of the allocations will go through the global allocator instead.

Notes

I extracted this change out from #2569, because I didn't want it to be smuggled in without
being reviewed as an individual piece.
2025-08-14 10:29:52 +03:00
pedrocarlo
2e68296107 create IOCompletions 2025-08-13 10:24:55 +03:00
PThorpe92
213d589dd1 Apply review suggestions, remove FreeEntry 2025-08-08 11:07:29 -04:00
PThorpe92
faf248df03 Add more docs and comments for TempBufferCache 2025-08-08 10:55:28 -04:00
PThorpe92
34d90d5acb Remove Clone impl for Buffer and PageContent to make any copying of page data explicit 2025-08-08 10:55:28 -04:00
PThorpe92
d7e4ba21f8 Add explanation for using 3mb limit 2025-08-08 10:55:28 -04:00
PThorpe92
0ffba81216 Make register buffer io trait return the buf index 2025-08-08 10:55:27 -04:00
PThorpe92
cc75bc448e Move TLC buffer cache to io/mod 2025-08-08 10:55:27 -04:00
PThorpe92
a02f527c06 Add fast path for pwritev on other IO backends 2025-08-08 10:55:25 -04:00
PThorpe92
39bccc2357 Update Buffer type in io module to adjust to new pool 2025-08-08 10:55:25 -04:00
PThorpe92
bcadcb2014 Remove RefCell from copy_to method in io trait 2025-08-07 17:07:53 -04:00
PThorpe92
039fe22405 Add copy_to to io::File trait to support copying DB files 2025-08-07 16:27:02 -04:00
PThorpe92
53a0524050 Fix clippy warning 2025-08-05 16:24:50 -04:00
PThorpe92
f6a68cffc2 Remove RefCell from IO and Page apis 2025-08-05 16:24:49 -04:00
PThorpe92
914c10e095 Remove Clone impl for Buffer and PageContent 2025-08-05 14:26:53 -04:00
pedrocarlo
ebe6aa0d28 adjust cfg for unix and linux IO 2025-08-04 15:49:52 -03:00
PThorpe92
79629daff4 Make completions idempotent 2025-08-02 21:48:39 -04:00
PThorpe92
3048e4fa97 Add optional register_fixed_buffer method to IO trait 2025-08-01 14:54:26 -04:00
PThorpe92
ef69df7258 Apply review suggestions 2025-07-30 19:42:53 -04:00
PThorpe92
28283e4d1c Fix bench_vfs python script to use fresh db for each run 2025-07-30 19:42:52 -04:00