# Implement Saga Pattern for Swap Operations with Recovery Mechanism
## Overview
This PR refactors the swap operation implementation to use the saga pattern - a distributed transaction pattern that provides reliable transaction management through explicit state tracking and compensation-based error handling. The implementation includes a robust recovery mechanism that automatically handles swap operations interrupted by crashes, power loss, or network failures.
## What Changed
**Saga Pattern Implementation:**
- Introduced a strict linear state machine for swaps: `Initial` → `SetupComplete` → `Signed` → `Completed`
- New modular `swap_saga` module with state validation, compensation logic, and saga orchestration
- Automatic rollback of database changes on failure, ensuring atomic swap operations
- Replaced previous swap implementation (`swap.rs`, `blinded_message_writer.rs`) with saga-based approach
**Recovery Mechanism:**
- Added `operation_id` and `operation_kind` columns to database schema for tracking which operation proofs belong to
- New `recover_from_bad_swaps()` method that runs on mint startup to handle incomplete swaps
- For proofs left in `PENDING` state from swap operations:
- If blind signatures exist: marks proofs as `SPENT` (swap completed but not finalized)
- If no blind signatures exist: removes proofs from database (swap failed partway through)
- Database migrations included for both PostgreSQL and SQLite
These functions are designed as a single funnel to talk to the database,
whether it is synchronous or asynchronous.
This single funnel will log SQL queries and slow operations, providing a clear
and unified debug message for the problematic query, so it can be optimized
accordingly (for instance, missing indexes or unbound SQL requests).
* Working on a better database abstraction
After [this question in the chat](https://matrix.to/#/!oJFtttFHGfnTGrIjvD:matrix.cashu.space/$oJFtttFHGfnTGrIjvD:matrix.cashu.space/$I5ZtjJtBM0ctltThDYpoCwClZFlM6PHzf8q2Rjqmso8)
regarding a database transaction within the same function, I realized a few
design flaws in our SQL database abstraction, particularly regarding
transactions.
1. Our upper abstraction got it right, where a transaction is bound with `&mut
self`, so Rust knows how to handle its lifetime with' async/await'.
2. The raw database does not; instead, it returns &self, and beginning a
transaction takes &self as well, which is problematic for Rust, but that's not
all. It is fundamentally wrong. A transaction should take &mut self when
beginning a transaction, as that connection is bound to a transaction and
should not be returned to the pool. Currently, that responsibility lies with
the implementor. If a mistake is made, a transaction could be executed in two
or more connections.
3. The way a database is bound to our store layer is through a single struct,
which may or may not internally utilize our connection pool. This is also
another design flow, in this PR, a connection pool is owned, and to use a
connection, it should be requested, and that connection is reference with
mutable when beginning a transaction
* Improve the abstraction with fewer generics
As suggested by @thesimplekid
* Add BEGIN IMMEDIATE for SQLite
The primary purpose of this new crate is to have a common and shared codebase
for all SQL storage systems. It would force us to write standard SQL using best
practices for all databases.
This crate has been extracted from #878
do not always return UnknownQuoteTTL
return UnknownMintInfo when appropriate
add a new UnknownConfigKey for unknown key values
unit tests to cover this functionality
* Split the database trait into read and transactions.
The transaction traits will encapsulate all database changes and also expect
READ-and-lock operations to read and lock records from the database for
exclusive access, thereby avoiding race conditions.
The Transaction trait expects a `rollback` operation on Drop unless the
transaction has been committed.
* fix: melt quote duplicate error
This change stops a second melt quote from being created
if there is an existing valid melt quote for an invoice already.
If the first melt quote has expired then we allow for a new melt quote to be created.
---------
Co-authored-by: thesimplekid <tsk@thesimplekid.com>
* Fix SQLite race condition
Bug: https://github.com/crodas/cdk/actions/runs/15732950296/job/44339804072#step:5:1853
Reason: When melting in parallel, many update the melt status and attempt to
add proofs and they fail when adding the proof and the rollback code kicks in.
The loser process removes all the proofs, and the winner process has no proof
later on.
Fix: Modify `update_melt_quote_state` requirements and implementation to allow
only one winner.
This will be solved by design with a transaction writer trait
* Remove `melt_request`
Fixes#809
* Remove `get_melt_request` from db trait
Bug: https://github.com/crodas/cdk/actions/runs/15732950296/job/44339804072#step:5:1853
Reason: When melting in parallel, many update the melt status and attempt to
add proofs and they fail when adding the proof and the rollback code kicks in.
The loser process removes all the proofs, and the winner process has no proof
later on.
Fix: Modify `update_melt_quote_state` requirements and implementation to allow
only one winner.
This will be solved by design with a transaction writer trait
Bug: https://github.com/cashubtc/cdk/actions/runs/15683152414/job/44190084378?pr=822#step:5:19212
Reason: a race condition between removing proofs while melting and the quote states being updated.
Solution:
1. Error on duplicate proofs
2. Read quote when updating to avoid race conditions and rollbacks
Real solution: A transaction trait in the storage layer. That is coming next
* Migrate from `sqlx` to rusqlite
1. Add rusqlite with rusqlite with a working thread
2. Add wallet without a thread (synchronous)
3. Add custom migration
Co-authored-by: thesimplekid <tsk@thesimplekid.com>
* WIP: Introduce a SignatoryManager service.
The SignatoryManager manager provides an API to interact with keysets, private
keys, and all key-related operations, offering segregation between the mint and
the most sensible part of the mind: the private keys.
Although the default signatory runs in memory, it is completely isolated from
the rest of the system and can only be communicated through the interface
offered by the signatory manager. Only messages can be sent from the mintd to
the Signatory trait through the Signatory Manager.
This pull request sets the foundation for eventually being able to run the
Signatory and all the key-related operations in a separate service, possibly in
a foreign service, to offload risks, as described in #476.
The Signatory manager is concurrent and deferred any mechanism needed to handle
concurrency to the Signatory trait.
* Fixed missing default feature for signatory
* Do not read keys from the DB
* Removed KeysDatabase Trait from MintDatabase
All Keys operations should be done through the signatory
* Make sure signatory has all the keys in memory
Drop also foreign constraints on sqlite
* Fix race condition
* Adding debug info to failing test
* Add `sleep` in test
* Fixed issue with active auth keyset
* Fixed dependency
* Move all keys and keysets to an ArcSwap.
Since the keys and keysets exist in RAM, most wrapping functions are infallible
and synchronous, improving performance and adding breaking API changes.
The signatory will provide this information on the boot and update when the
`rotate_keyset` is executed.
Todo: Implement a subscription key to reload the keys when the GRPC server
changes the keys. For the embedded mode, that makes no sense since there is a
single way to rotate keys, and that bit is already covered.
* Implementing https://github.com/cashubtc/nuts/pull/250
* Add CLI for cdk-signatory to spawn an external signatory
Add to the pipeline the external signatory
* Update tests
* Apply suggestions from code review
Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>
Co-authored-by: thesimplekid <tsk@thesimplekid.com>
* Minor change
* Update proto buf to use the newest format
* Rename binary
* Add instrumentations
* Add more comments
* Use a single database for the signatory
Store all keys, even auth keys, in a single database. Leave the MintAuthDatabse
trait implementation for the CDK but not the signagtory
This commit also moves the cli mod to its own file
* Update dep
* Add `test_mint_keyset_gen` test
---------
Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>
Co-authored-by: thesimplekid <tsk@thesimplekid.com>
Add a strict set of updates to prevent incorrect state changes and correct
usage. Supporting the transaction at the trait level prevented some cases, but
having a strict set of state change flows is better.
This bug was found while developing the signatory. The keys are read from
memory, triggering race conditions at the database, and some `Pending` states
are selected (instead of just selecting `Unspent`).
This PR also introduces a set of generic database tests to be executed for all
database implementations, this test suite will make sure writing and
maintaining new database drivers
* feat: Add created_time and paid_time fields to MintQuote struct
* feat: Add serde default of 0 for created_time in MintQuote
* feat: Add created_time and paid_time to MintQuote and MeltQuote structs
* feat: Add paid_time update when setting melt quote state to Paid
* fix: Update melt quote state with current Unix timestamp
* feat: Add paid_time update for mint quote when state is set to Paid
* feat: Add issued_time field to MintQuote conversion from SQLite row
* feat: Add issued_time tracking for MintQuoteState::Issued state
* feat: Add migration script for mint time of quotes
* feat: Add timestamp columns to mint_quote and melt_quote tables
* feat: Add timestamp columns to `add_mint_quote` method
* refactor: Improve code formatting and readability in mint quote state update logic
* feat: Add created_time and paid_time columns to melt_quote query
* feat: time on mint and melt quotes
* feat: Add migration script for mint created time signature
feat: Add created_time column to blind_signature table
feat: Add created_time to blind_signature insertion
feat: Add created_time column to proof table and update insert query
feat: time on mint and melt quotes
* feat: Add new table to track blind signature creation time
* feat: Add timestamp tracking for proofs in ReDB database
* feat: redb proof time
* chore: fmt
The bug comes with the SQLx-sqlite pool bug, where several connections are
created by default, but the `new` function takes care of that, fixing that bug
by making a single instance of the database.
If constructed directly, the pool would create several connections to the
database, which in most instances is fine, but with SQLite :memory: each
connection is entirely independent.
Also follow documentation to make sure that failed `acquire` will not end up
dropping connections by setting test_before_acquire to false
However, if your workload is sensitive to dropped connections such as using an in-memory
SQLite database with a pool size of 1, you can pretty easily ensure that a cancelled
`acquire()` call will never drop connections by tweaking your [`PoolOptions`]:
* Set [`test_before_acquire(false)`][PoolOptions::test_before_acquire]
* Never set [`before_acquire`][PoolOptions::before_acquire] or
[`after_connect`][PoolOptions::after_connect].
- MintKeysDatabase
- MintQuotesDatabase
- MintProofsDatabase
- MintSignaturesDatabase
This commit splits the MintDatabase trait with 30+ methods into a series
of smaller traits, each dedicate to a specific subsystem of the mint
service.