Commit Graph

3783 Commits

Author SHA1 Message Date
Glauber Costa
9e8ba5263b Implement the AddImm opcode
It is a simple opcode. The hard part was finding a sqlite statement
that uses it =)
2025-07-31 08:08:07 -05:00
Jussi Saurio
62e804480e fix/wal: make db_changed check detect cases where max frame happens to be the same 2025-07-31 14:37:33 +03:00
Jussi Saurio
39dec647a7 fix/wal: reset page cache when another connection checkpointed in between 2025-07-31 12:44:22 +03:00
Jussi Saurio
7d082ab614 small fix after header accessor refactor 2025-07-31 10:05:52 +03:00
Jussi Saurio
f619556344 Merge 'Direct DatabaseHeader reads and writes – with_header and with_header_mut' from Levy A.
This PR introduces two methods to pager. Very much inspired by
`with_schema` and `with_schema_mut`. `Pager::with_header` and
`Pager::with_header_mut` will give to the closure a shared and unique
reference respectively that are transmuted references from the `PageRef`
buffer.
This PR also adds type-safe wrappers for `Version`, `PageSize`,
`CacheSize` and `TextEncoding`, as they have special in-memory
representations.
Writing the `DatabaseHeader` is just a single `memcpy` now.
```rs
pub fn write_database_header(&self, header: &DatabaseHeader) {
    let buf = self.as_ptr();
    buf[0..DatabaseHeader::SIZE].copy_from_slice(bytemuck::bytes_of(header));
}
```
`HeaderRef` and `HeaderRefMut` are used in the `with_header*` methods,
but also can be used on its own when there are multiple reads and writes
to the header, where putting everything in a closure would add too much
nesting.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #2234
2025-07-31 10:02:47 +03:00
Jussi Saurio
62d79e8c16 Merge 'refactor/btree: simplify get_next_record()/get_prev_record()' from Jussi Saurio
When traversing, we are only interested the following things:
- Is the page a leaf or not
- Is the page an index or table page
- If not a leaf, what is the left child page
This means we don't have to read the entire cell, just the left child
page.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #2317
2025-07-31 10:02:08 +03:00
Jussi Saurio
99e20e46bb Merge 'Accumulate/batch vectored writes when backfilling during checkpoint' from Preston Thorpe
After significant digging into what was causing (particularly writes) to
be so much slower for io_uring back-end, it was determined that
particularly checkpointing was incredibly slow, for several reasons. One
is that we essentially end up calling `submit_and_wait` for every page.
This PR (of course, heavily conflicts with my other open PR) attempts to
remedy this: addding `pwritev` to the File trait for IO back-ends that
want to support it, and aggregates contiguous writes into a series of
`pwritev` calls instead of individually
### Performance:
`make bench-vfs SQL="insert into products (name,price) values
(randomblob(4096), randomblob(2048));" N=1000`
# Update:
**main**
<img width="505" height="194" alt="image" src="https://github.com/user-
attachments/assets/8e4a27af-0bb6-4e01-8725-00bc9f8a82d6" />
**this branch**
<img width="555" height="197" alt="image" src="https://github.com/user-
attachments/assets/fad1f685-3cb0-4e06-aa9d-f797a0db8c63" />
The same test (any test with writes) on this updated branch is now
roughly as fast as syscall IO back-end, often runs will be faster.
Illustrating a checkpoint. Every `count=N` where N > 1 is M syscalls
saved, where M = N - 1.
(roughly ~850 syscalls saved)
<img width="590" height="534" alt="image" src="https://github.com/user-
attachments/assets/a6171ac9-1192-4d3e-a6bf-eeda3f43af07" />
(if you are wondering about why it didn't add 12000-399 and 12400-417,
it's because there is a `512` page batch limit that was hit to prevent
hitting `IOV_MAX`, in the rare case that it's lower than 1024 and the
entire checkpoint is a single run)

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2278
2025-07-31 07:30:57 +03:00
PThorpe92
07137c7aaf Merge 'Implement the Cast opcode' from Glauber Costa
Our compat matrix mentions a couple of opcodes: ToInt, ToBlob, etc.
Those opcodes do not exist.
Instead, there is a single Cast opcode, that takes the affinity as a
parameter.
Currently we just call a function when we need to cast. This PR fixes
the compat file, implements the cast opcode, and in at least one
instance, when explicitly using the CAST keyword, uses that opcode
instead of a function in the generated bytecode.

Reviewed-by: Preston Thorpe (@PThorpe92)
Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #2352
2025-07-30 22:32:09 -04:00
Glauber Costa
4bd1582e7d Implement the Cast opcode
Our compat matrix mentions a couple of opcodes: ToInt, ToBlob, etc.
Those opcodes do not exist.

Instead, there is a single Cast opcode, that takes the affinity as a
parameter.

Currently we just call a function when we need to cast. This PR fixes
the compat file, implements the cast opcode, and in at least one
instance, when explicitly using the CAST keyword, uses that opcode
instead of a function in the generated bytecode.
2025-07-30 20:44:54 -05:00
PThorpe92
2e741641e6 Add test to assert we are backfilling all the rows properly with vectored writes 2025-07-30 19:42:54 -04:00
PThorpe92
ade1c182de Add is_full method to checkpoint batch 2025-07-30 19:42:54 -04:00
PThorpe92
693b71449e Clean up writev batching and apply suggestions 2025-07-30 19:42:53 -04:00
PThorpe92
ef69df7258 Apply review suggestions 2025-07-30 19:42:53 -04:00
PThorpe92
73882b97d6 Remove unnecessary collecting CQEs into an array in run_once, comments 2025-07-30 19:42:53 -04:00
PThorpe92
28283e4d1c Fix bench_vfs python script to use fresh db for each run 2025-07-30 19:42:52 -04:00
PThorpe92
efcffd380d Clean up io_uring writev implementation, add iovec and cqe cache 2025-07-30 19:42:52 -04:00
PThorpe92
689007cb74 Remove unrelated io_uring changes 2025-07-30 19:42:52 -04:00
PThorpe92
b8e6cd5ae2 Fix taking page content from cached pages in checkpoint loop 2025-07-30 19:42:51 -04:00
PThorpe92
b04128b585 Fix write_pages_vectored to properly track completion 2025-07-30 19:42:50 -04:00
PThorpe92
0f94cdef03 Fix io_uring pwritev to properly handle partial writes 2025-07-30 19:42:50 -04:00
PThorpe92
88445328a5 Handle partial writes for pwritev calls in io_uring and fix JS bindings 2025-07-30 19:42:50 -04:00
PThorpe92
5f01eaae35 Fix default io:;File::pwritev impl 2025-07-30 19:42:49 -04:00
PThorpe92
62f004c898 Fix write counter for writev batching in checkpoint 2025-07-30 19:42:49 -04:00
PThorpe92
7b2163208b batch backfilling pages when checkpointing 2025-07-30 19:42:48 -04:00
Levy A.
2bde1dbd42 fix: PageSize bounds check 2025-07-30 17:33:59 -03:00
Levy A.
fe66c61ff5 add usable_space to DatabaseHeader
we already have the `DatabaseHeader`, we don't need the cached result
2025-07-30 17:33:59 -03:00
Levy A.
e35fdb8263 feat: zero-copy DatabaseHeader 2025-07-30 17:33:59 -03:00
Jussi Saurio
e128bd477e Merge 'Support VALUES clauses for compound select' from meteorgan
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2293
2025-07-30 21:34:40 +03:00
Jussi Saurio
bc148a3906 Merge 'skip invalid inputs in cosine distance prop test' from bit-aloo
closes: #1790
Updates test_vector_distance to treat invalid inputs as non-errors,
skipping them instead. These cases aren't considered real errors, so no
explicit error handling is needed in the test.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2341
2025-07-30 21:33:15 +03:00
Pekka Enberg
895f2acbfb Merge 'Fix concat_ws to match sqlite behavior' from bit-aloo
closes: #2101
Refactors exec_concat_ws to skip null and blob arguments instead of
inserting separators for them. Also adds a fuzz test.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2338
2025-07-30 21:31:58 +03:00
Pekka Enberg
147f105136 Merge 'core/mvcc: Switch to parking_lot RwLock' from Pekka Enberg
Closes #2344
2025-07-30 21:31:31 +03:00
Jussi Saurio
ac8a123e38 refactor/btree: simplify get_next_record()/get_prev_record()
When traversing, we are only interested the following things:

- Is the page a leaf or not
- Is the page an index or table page
- If not a leaf, what is the left child page

This means we don't have to read the entire cell, just the left child
page.
2025-07-30 21:29:14 +03:00
Pekka Enberg
b7cb4a3ed4 core/mvcc: Switch to parking_lot RwLock 2025-07-30 20:25:45 +03:00
bit-aloo
1be93c8c18 skip invalid inputs in cosine distance prop test 2025-07-30 21:27:55 +05:30
Jussi Saurio
7caef278a5 Merge 'Rewrite the WAL' from Preston Thorpe
closes #1893
Adds some fairly extensive tests but I'll continue to add some python
tests on top of the unit tests.
## Restart:
tested 
- open new DB
- create table and do a bunch of inserts
- `pragma wal_checkpoint(RESTART);`
- close db file
- re-open and verify we can read the wal/repopulate the frame cache
- verify min|max frame
tested 
- open same DB
- add more inserts
- `pragma wal_checkpoint(RESTART);`
- do _more_ inserts
- close
- re-open
- verify checksums/max_frame are valid
- verify row count
## Truncate
tested 
- open new db
- create table and add inserts
- `pragma wal_checkpoint(truncate);`
- close file
- verify WAL file is empty (32 bytes, header only)
- re-open file
- verify content/row count
tested 
- open db
- create table and insert many rows
- `pragma wal_checkpoint(truncate);`
- insert _more_ rows
- close db file
- verify WAL file is valid
- re-open file
- verify we can read entire file/repopulate the frame cache
<img width="541" height="315" alt="image" src="https://github.com/user-
attachments/assets/0470c795-5116-4866-b913-78c07b06b68c" />
```
# header
magic=0x377f0682
version=3007000
page_size=4096
seq=2
salt=ec475ff2-7ea94342
checksum=c9464aff-c571cc22
```

Closes #2179
2025-07-30 18:50:49 +03:00
Jussi Saurio
ff1c1b6b8c wal_insert_end: call pager.rollback() after tx ends so that lock index is preserved when ending tx 2025-07-30 18:22:40 +03:00
Jussi Saurio
7240d7903c fmt 2025-07-30 18:22:17 +03:00
Jussi Saurio
7bc11fe2f9 wal_insert_end: revert unintentional changes 2025-07-30 18:16:23 +03:00
Jussi Saurio
2813a7a5de clippy 2025-07-30 17:25:30 +03:00
Jussi Saurio
c00d1fcfc0 fmt 2025-07-30 17:21:29 +03:00
Jussi Saurio
66c4b44c55 pager: call rollback() after ending txn so that read lock info is not lost when ending txn 2025-07-30 17:21:19 +03:00
Jussi Saurio
7b1f04dc5e pager: only ROLLBACK your own transaction, not if someone else is writing 2025-07-30 17:00:38 +03:00
Pekka Enberg
951bebfac3 Merge 'Add vector_concat and vector_slice support' from bit-aloo
Closes: #2323
This PR adds support for two new vector functions:
* vector_concat(x, y) – Concatenates two vectors of the same type.
* vector_slice(x, start_index, end_index) – Extracts a subvector from
the input vector.
Notes:
* Negative start_index or end_index is not supported

Reviewed-by: Nikita Sivukhin (@sivukhin)

Closes #2336
2025-07-30 16:58:38 +03:00
PThorpe92
e7eda25802 Make sure to end read tx on error of wal insert begin API 2025-07-30 09:44:29 -04:00
Jussi Saurio
b1aa13375d call pager.end_tx() everywhere instead of pager.rollback() 2025-07-30 16:39:38 +03:00
Jussi Saurio
975b7b5434 wal: fix test incorrect expectation 2025-07-30 15:53:13 +03:00
Jussi Saurio
af660326d8 finish_append_frames_commit: revert bumping readmark incorrectly 2025-07-30 15:53:01 +03:00
Jussi Saurio
43d1321033 ignore completion result of self.read_frame 2025-07-30 14:58:03 +03:00
Jussi Saurio
338cab3f28 End read transaction when Schema::make_from_btree fails 2025-07-30 14:58:03 +03:00
Jussi Saurio
fd5e73f038 op_transaction: read tx must be ended in all cases if begin_write_tx fails 2025-07-30 14:58:03 +03:00