Files
cdk/crates/cdk-prometheus/README.md
asmo 75a3e6d2c7 Prometheus crate (#883)
* feat: introduce `cdk-prometheus` crate with Prometheus server and CDK-specific metrics support
2025-09-09 13:26:03 +01:00

190 lines
7.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CDK Prometheus
A small, focused crate that provides Prometheus metrics for CDK-based services. It bundles a ready-to-use metrics registry, a background HTTP server to expose metrics, helper functions for common CDK domains (HTTP, auth, Lightning, DB, mint operations), and an ergonomic macro for conditional metrics recording.
- Out-of-the-box metrics for HTTP, auth, Lightning payments, database, and mint operations
- Global, lazily-initialized metrics instance you can use anywhere
- Optional background server to expose metrics on /metrics
- Re-exports the prometheus crate for custom instrumentation
- Optional system metrics (feature-gated)
## Installation
Add the crate to your Cargo.toml (replace the version as needed):
```toml
[dependencies] cdk-prometheus = { version = "0.1", features = ["system-metrics"] }
```
- Feature flags:
- system-metrics: include basic process/system metrics collected periodically.
Note for downstream crates: the provided record_metrics! macro is gated at call-site by a feature named prometheus. If you use that macro, declare a prometheus feature in your application crate and enable it to compile the macro calls into real metrics (otherwise they no-op).
## Quick start
### Docker
Start Prometheus and Grafana with docker-compose:
```
docker compose up -d prometheus grafana
```
Start your mintd
```
./mintd -w ~/.cdk-mintd
```
Check Prometheus and Grafana
* `curl localhost:9000/metrics` for checking CDK metrics
* `http://localhost:9090/targets?search=` checking the prometheus collector (you should see http://host.docker.internal:9000/metrics)
* `http://localhost:3011/d/cdk-mint-dashboard/cdk-mint-dashboard` Grafana dashboard (default login: admin/admin)
### Rust
Expose a Prometheus endpoint with a default registry and CDK metrics:
```rust
use cdk_prometheus::start_default_server_with_metrics;
#[tokio::main] async fn main() -> anyhow::Result<()> { // Starts an HTTP server (default bind and path) and registers CDK metrics into its registry start_default_server_with_metrics().await?; Ok(()) }
```
Or start it in the background (e.g., from your application bootstrap):
```rust
use cdk_prometheus::start_background_server_with_metrics;
fn main() -> anyhow::Result<()> { let _handle = start_background_server_with_metrics()?; // Continue bootstrapping your application... Ok(()) }
```
## Recording metrics
You can record metrics using:
- The global helpers (simple functions)
- The global singleton METRICS (direct methods)
- The record_metrics! macro (conditional recording with an optional instance)
### Global helpers
```rust
use cdk_prometheus::global;
fn handle_request() {
global::record_http_request("/health", "200"); global::record_http_request_duration(0.003, "/health");
global::record_auth_attempt();
global::record_auth_success();
// Lightning and DB
global::record_lightning_payment(1500.0, 2.0); // amount, fee (both in base units you track)
global::record_db_operation(0.015, "select_user");
global::set_db_connections_active(8);
// Mint operations
global::inc_in_flight_requests("get_payment_quote");
// ... do work ...
global::record_mint_operation("get_payment_quote", true);
global::record_mint_operation_histogram("get_payment_quote", true, 0.021);
global::dec_in_flight_requests("get_payment_quote");
// Errors
global::record_error();
}
```
### Using the global METRICS instance directly
```rust
use cdk_prometheus::METRICS;
fn do_db_work() { METRICS.record_db_operation(0.005, "update_user"); }
```
### Using the record_metrics! macro
The macro lets you write grouped calls concisely and optionally pass an instance to use; if no instance is present, it automatically falls back to the global helpers. At call-site, wrap your invocations with a prometheus feature so they can be disabled in minimal builds.
```rust
use cdk_prometheus::record_metrics;
fn run_operation(metrics_opt: Option<cdk_prometheus::CdkMetrics>) { // Use instance if present, otherwise fallback to global record_metrics!(metrics_opt => { inc_in_flight_requests("make_payment"); record_mint_operation("make_payment", true); record_mint_operation_histogram("make_payment", true, 0.123); dec_in_flight_requests("make_payment"); });
// Or call directly on the global helpers
record_metrics!({
record_error();
});
}
```
## Exposing the /metrics endpoint
If you just need sane defaults, use the convenience starters shown above. If you want finer control (bind address, path, system metrics), build the server explicitly:
```rust
use cdk_prometheus::{PrometheusBuilder, PrometheusServer, CdkMetrics, prometheus::Registry};
fn build_and_run() -> anyhow::Result<tokio::task::JoinHandle<anyhow::Result<()>>> { // Build a server wired up with the default CDK metrics let server = PrometheusBuilder::new().build_with_cdk_metrics()?; let handle = server.start_background(); Ok(handle) }
```
Notes:
- Default bind address and metrics path are set by the server configuration (commonly 127.0.0.1:9090 and /metrics).
- With system-metrics enabled, the server periodically updates process/system gauges.
## Whats included
The default CDK metrics instance (CdkMetrics) registers and maintains counters, histograms, and gauges for common areas:
- HTTP: request totals, durations
- Auth: attempts and successes
- Lightning: payment totals, amounts, fees
- Database: operation totals, latencies, active connections
- Mint: operation totals, in-flight gauges, per-operation latencies
- Errors: a general counter
You can use these immediately through the global helpers or the METRICS instance.
## Adding custom metrics
This crate re-exports the prometheus crate and exposes the underlying Registry so you can define and register your own metrics:
```rust
use cdk_prometheus::{prometheus, global};
fn register_custom_metric() -> Result<(), prometheus::Error> { let my_counter = prometheus::IntCounter::new("my_counter", "A custom counter")?; let registry = global::registry(); // Arcregistry.register(Box::new(my_counter.clone()))?;
my_counter.inc();
Ok(())
}
```
If you prefer instance-level control:
```rust
use std::sync::Arc; use cdk_prometheus::{create_cdk_metrics, prometheus};
fn with_instance() -> anyhow::Result<()> { let metrics = create_cdk_metrics()?; let registry: Arc[prometheus::Registry]() = metrics.registry();
let hist = prometheus::Histogram::with_opts(
prometheus::HistogramOpts::new("my_latency_seconds", "My op latency")
)?;
registry.register(Box::new(hist))?;
Ok(())
}
```
## Scraping with Prometheus
Example scrape_config:
```yaml
scrape_configs:
- job_name: 'cdk'
scrape_interval: 15s
static_configs:
- targets: ['127.0.0.1:9090']
```
If you changed the bind address or path, make sure to update targets or the metrics_path in your Prometheus configuration accordingly.
## System metrics (optional)
Enable the system-metrics feature to export basic process/system metrics. The server updates these at a configurable interval.
```toml
cdk-prometheus = { version = "0.1", features = ["system-metrics"] }
```
## Error handling
Common error types surfaced by this crate include:
- Server bind failures
- Metrics collection/registry errors
- System metrics collection errors (when enabled)
Handle these at startup and monitor logs during runtime.
## Best practices
- Run the metrics server on localhost or a private interface and use a Prometheus agent/sidecar if needed.
- Register application-specific metrics early in your bootstrap so they are visible from the first scrape.
- Use histograms for latencies and size distributions; use counters for event totals; use gauges for in-flight or current-state values.
- Keep label cardinality bounded.
## License
MIT
```