Commit Graph

10455 Commits

Author SHA1 Message Date
Pekka Enberg
6a2f0d6061 Merge 'Add per page checksums' from Avinash Sajjanshetty
This patch adds checksums to Turso DB. You may check the design here in
the [RFC](https://github.com/tursodatabase/turso/issues/2178).
1. We use reserved bytes (8 bytes) to store the checksums. On every IO
read, we verify that the checksum matches.
2. We use twox hash for checksums.
3. Checksum works only on 4K pages now. It's a small change to enable
for all other sizes, I will send another PR.
4. Right now, it's not possible to switch to different algorithm or turn
off altogether. That will be added in the future PRs.
5. Checksums can be enabled only for new dbs. For existing DBs, we will
disable it.
6. To add checksums for existing DBs, we need vacuum since it would
require rewrite of whole db.

Closes #2840
2025-09-13 18:46:53 +03:00
Pekka Enberg
7f5038f7c9 Merge 'perf/throughput/turso: Async transactions with concurrent mode' from Pekka Enberg
With `BEGIN CONCURRENT`, we should also take advantage of async
transaction processing to maximize concurrency.

Closes #3082
2025-09-13 15:07:29 +03:00
TcMits
e18b6b0b56 inline 2025-09-13 18:07:45 +07:00
Pekka Enberg
7d3ce68695 Merge 'core/throughput: Add per transaction think time support' from Pekka Enberg
Closes #3080
2025-09-13 14:07:30 +03:00
Pavan-Nambi
0effb981e6 autoincrement functionality works as good as sqlite now, handled all edge cases that we are aware of
- The code now prevents dropping or indexing `sqlite_sequence`
- make sure that AUTOINCREMENT only works on a single `INTEGER PRIMARY KEY`
-  handles `i64::MAX` gracefully by returning `SQLITE_FULL`
- also AUTOINCREMENT now works in both column and table constraints.

fmt
2025-09-13 16:35:36 +05:30
Pekka Enberg
898f32f7f7 Fix Antithesis Dockerfile to include whopper 2025-09-13 13:33:11 +03:00
Pekka Enberg
3733d3856a Merge 'core: Panic on fsync() error by default' from Pekka Enberg
Retrying fsync() on error was historically not safe ("fsyncgate") and
Postgres still defaults to panicing on fsync(). Therefore, add a
"data_sync_retry" pragma (disabled by default) and use it to determine
whether to panic on fsync() error or not.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #3077
2025-09-13 13:32:07 +03:00
Pekka Enberg
0fad30a30d perf/throughput/turso: Async transactions with concurrent mode
With `BEGIN CONCURRENT`, we should also take advantage of async
transaction processing to maximize concurrency.
2025-09-13 13:25:16 +03:00
Pekka Enberg
8dc2e738a4 core/throughput: Add per transaction think time support 2025-09-13 13:02:43 +03:00
TcMits
17ee979583 fix macro 2025-09-13 16:35:58 +07:00
TcMits
01da48fde9 introduce instruction virtual table 2025-09-13 16:35:17 +07:00
Piotr Rzysko
1a95131c3c Include windows in ToTokens for SelectPlan 2025-09-13 11:12:44 +02:00
Piotr Rzysko
9ff2133ff2 Rewrite window function expressions in the optimizer
Currently, this is effectively a no-op because, at the optimization
stage, window function expressions are in the form
win_func(subquery_column1, subquery_column2, ...).

Nevertheless, expressions are rewritten to maintain consistency with
aggregates, which also hold cloned expressions from sources like result
columns. This ensures future changes in the optimizer won’t break window
function handling.
2025-09-13 11:12:44 +02:00
Piotr Rzysko
f5efcbe745 Add support for window functions
Adds initial support for window functions. For now, only existing
aggregate functions can be used as window functions—no specialized
window-specific functions are supported yet.

Currently, only the default frame definition is implemented:
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW EXCLUDE NO OTHERS.
2025-09-13 11:12:44 +02:00
Piotr Rzysko
c81cd16230 Extract QueryDestination::placeholder_for_subquery 2025-09-13 10:49:14 +02:00
Piotr Rzysko
1826023c32 Decouple AggArgumentSource::Expression from Aggregate
This allows it to be reused for window function processing without
relying on the Aggregate struct.
2025-09-13 10:49:14 +02:00
Piotr Rzysko
6c3c44e204 Expose fewer details from AggArgumentSource
Hides unnecessary internals to decouple the API from the Aggregate struct.
2025-09-13 10:49:14 +02:00
Piotr Rzysko
5f2a3e1242 Handle dummy argument for count() and count(*) in translation
Two main reasons for this change:
* Improve readability by moving the logic for this special case closer
  to the code that relies on it.
* Decouple AggFunc from the Aggregate struct. In the future, window
  function processing will use AggFunc directly, without necessarily
  depending on Aggregate.
2025-09-13 10:49:14 +02:00
Piotr Rzysko
6d84cbedc2 Fix delimiter handling in group_concat and string_agg
Non-literal delimiters must be translated by AggArgumentSource.
2025-09-13 10:49:14 +02:00
Piotr Rzysko
110ffba2a1 Fix accumulator reset when arguments outnumber aggregates
Previously, while resetting accumulator registers, we would also
reset subsequent registers. This happened because the number of registers
to reset was computed as the sum of arguments rather than the number of
aggregate functions.
2025-09-13 10:49:14 +02:00
Piotr Rzysko
6224cdbbd3 Support WalkControl in walk_expr_mut
Now walk_expr_mut can use WalkControl to skip parts of the expression
tree. This makes it consistent with walk_expr.
2025-09-13 10:49:14 +02:00
Piotr Rzysko
b911e80607 Add AggValue instruction
Adds the AggValue instruction, which computes the current aggregate
result and writes it to a dedicated destination register.

Unlike AggFinal, it does not overwrite or clear the accumulator
register. This makes it possible to retrieve aggregate results multiple
times—needed when processing window functions—while preserving the
accumulator state.
2025-09-13 10:49:14 +02:00
Piotr Rzysko
c5a12f52c2 Don't mutate state in op_agg_final
Previously, only the External and Avg aggregates mutated state during
AggFinal. This is unnecessary because AggFinal runs only once per group,
so caching the result provides no performance benefit.

By avoiding state mutation, we can also reuse op_agg_final for the
AggValue instruction that will be added soon.
2025-09-13 10:49:14 +02:00
Piotr Rzysko
458172220e Remove unused method from AggContext 2025-09-13 10:49:14 +02:00
Piotr Rzysko
867bef55d8 Add ResetSorter instruction
This instruction isn't used yet, but it will be needed for window
functions, since they heavily rely on ephemeral tables.
2025-09-13 10:44:56 +02:00
Piotr Rzysko
ea9599681e Add OpenDup instruction
The instruction isn’t used yet, but it’ll be needed for window functions,
since they heavily rely on ephemeral tables.
2025-09-13 10:35:33 +02:00
Pekka Enberg
d8f07fe3da core: Panic on fsync() error by default
Retrying fsync() on error was historically not safe ("fsyncgate") and
Postgres still defaults to panicing on fsync(). Therefore, add a
"data_sync_retry" pragma (disabled by default) and use it to determine
whether to panic on fsync() error or not.
2025-09-13 10:21:12 +03:00
Pekka Enberg
a7e34f1551 Merge 'Handle partial writes in unix IO for pwrite and pwritev' from Preston Thorpe
currently, `io_uring` is setup to handle partial writes for `pwritev`
(will add `pwrite` in subsequent PR), but unix and other IO back-ends
were not correctly setup for this.

Closes #3073
2025-09-13 09:08:43 +03:00
Pekka Enberg
f547a302cc Merge 'remove Stmt clone' from Lâm Hoàng Phúc
Closes #3076
2025-09-13 09:06:15 +03:00
Avinash Sajjanshetty
5256f29a9c Add checksums behind a feature flag 2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty
06a824ec68 Add checksum tests 2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty
1536f65f07 move test helper run_query to common module 2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty
11030056c7 rename method to verify_checksum 2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty
e010c46552 use checksums when reading/writing from db file 2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty
4b59cf19e5 use checksums when reading/writing from wal 2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty
14a1307720 Set reserved space as required when allocating page1 2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty
3b410e4f79 set required reserved bytes while initialising the pager 2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty
2e6943bfdf Add helper to read reserved bytes value from disk 2025-09-13 11:00:39 +05:30
Avinash Sajjanshetty
c2c1ec2dba Pass use usable_space() instead of hardcoding the value 2025-09-13 11:00:38 +05:30
Avinash Sajjanshetty
15266105f7 Update IOContext to carry checksum ctx 2025-09-13 11:00:38 +05:30
Avinash Sajjanshetty
3f72de3623 Add checksum module 2025-09-13 11:00:37 +05:30
TcMits
48522c1cc0 remove Stmt clone 2025-09-13 12:08:29 +07:00
Pavan-Nambi
0afae0db20 update tests after merging 2025-09-13 07:33:43 +05:30
Pavan-Nambi
fdb4f98e11 Merge remote-tracking branch 'upstream/main' into cdc_fail_autoincrement 2025-09-13 07:17:18 +05:30
Pavan-Nambi
f5c52065ed update sync-engine tests 2025-09-13 07:10:18 +05:30
PThorpe92
6098bca211 Handle partial writes in unix IO for pwrite and pwritev 2025-09-12 18:13:02 -04:00
Preston Thorpe
b1420904bb Merge 'fix(btree): advance cursor after interior node replacement in delete' from Jussi Saurio
## Problem
When a delete replaces an index interior cell, the replacement key is LT
the deleted key. Currently on the main branch, after the deletion
happens, the following call to BTreeCursor::next() stops at the replaced
interior cell.
This is incorrect - imagine the following sequence:
- We are executing a query that deletes all keys WHERE key > 5
- We delete <key=6> from an interior node, and take a replacement
<key=5> from the left subtree of that interior page
- next() is called, and we land on the interior node again, which now
has <key=5>, and we incorrectly delete it even though our WHERE
condition is key > 5.
## Solution
This PR:
- Tracks `interior_node_was_replaced` in CheckNeedsBalancing
- If no balancing is needed and a replacement occurred, advances once so
the next invocation of next() will skip the replaced cell properly
i.e. we prevent next() from landing on the replaced content and ensures
iteration continues with the next logical record.
## Details
This problem only became apparent once we started using indexes as valid
iteration cursors for DELETE operations in #2981
Closes #3045

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #3049
2025-09-12 17:37:01 -04:00
Pekka Enberg
1803d0bb5d test: Enable some MVCC test cases
Suggested by Jussi
2025-09-12 23:11:45 +03:00
Pekka Enberg
ad6157028e Merge 'core/vdbe: Fix BEGIN CONCURRENT transactions' from Pekka Enberg
The transaction upgrade logic in Transaction opcode is total nonsense
for concurrent transactions so just drop it.
Fixes #3061

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #3070
2025-09-12 23:11:12 +03:00
Pekka Enberg
a0921c4221 Merge 'core/storage: Remove unused import warning' from Pekka Enberg
Closes #3069
2025-09-12 23:11:05 +03:00