Transcription API Service
A high-performance, standalone transcription service with REST API, gRPC, and WebSocket support, optimized for real-time speech-to-text applications. Perfect for desktop applications, web services, and IoT devices.
Features
- Multiple API Interfaces: REST API, gRPC, and WebSocket
- High Performance: Optimized with TF32, cuDNN, and efficient batching
- Whisper Models: Support for all Whisper models (tiny to large-v3)
- Real-time Streaming: Bidirectional streaming for live transcription
- Voice Activity Detection: Smart VAD to filter silence and noise
- Anti-hallucination: Advanced filtering to reduce Whisper hallucinations
- Docker Ready: Easy deployment with GPU support
- Interactive Docs: Auto-generated API documentation (Swagger/OpenAPI)
Quick Start
Using Docker Compose (Recommended)
# Clone the repository
cd transcription-api
# Start the service (uses 'base' model by default)
docker compose up -d
# Check logs
docker compose logs -f
# Stop the service
docker compose down
Configuration
Edit .env or docker-compose.yml to configure:
# Model Configuration
MODEL_PATH=base # tiny, base, small, medium, large, large-v3
# Service Ports
GRPC_PORT=50051 # gRPC service port
WEBSOCKET_PORT=8765 # WebSocket service port
REST_PORT=8000 # REST API port
# Feature Flags
ENABLE_WEBSOCKET=true # Enable WebSocket support
ENABLE_REST=true # Enable REST API
# GPU Configuration
CUDA_VISIBLE_DEVICES=0 # GPU device ID (if available)
API Endpoints
The service provides three ways to access transcription:
1. REST API (Port 8000)
The REST API is perfect for simple HTTP-based integrations.
Base URLs
- API Docs: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
- Health: http://localhost:8000/health
Key Endpoints
Transcribe File
curl -X POST "http://localhost:8000/transcribe" \
-F "file=@audio.wav" \
-F "language=en" \
-F "task=transcribe" \
-F "vad_enabled=true"
Health Check
curl http://localhost:8000/health
Get Capabilities
curl http://localhost:8000/capabilities
WebSocket Streaming (via REST API)
# Connect to WebSocket
ws://localhost:8000/ws/transcribe
For detailed API documentation, visit http://localhost:8000/docs after starting the service.
2. gRPC (Port 50051)
For high-performance, low-latency applications. See protobuf definitions in proto/transcription.proto.
3. WebSocket (Port 8765)
Legacy WebSocket endpoint for backward compatibility.
Usage Examples
REST API (Python)
import requests
# Transcribe a file
with open('audio.wav', 'rb') as f:
response = requests.post(
'http://localhost:8000/transcribe',
files={'file': f},
data={
'language': 'en',
'task': 'transcribe',
'vad_enabled': True
}
)
result = response.json()
print(result['full_text'])
REST API (cURL)
# Transcribe an audio file
curl -X POST "http://localhost:8000/transcribe" \
-F "file=@audio.wav" \
-F "language=en"
# Health check
curl http://localhost:8000/health
# Get service capabilities
curl http://localhost:8000/capabilities
WebSocket (JavaScript)
const ws = new WebSocket('ws://localhost:8000/ws/transcribe');
ws.onopen = () => {
console.log('Connected');
// Send audio data (base64-encoded PCM16)
ws.send(JSON.stringify({
type: 'audio',
data: base64AudioData,
language: 'en',
vad_enabled: true
}));
};
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === 'transcription') {
console.log('Transcription:', data.text);
}
};
// Stop transcription
ws.send(JSON.stringify({ type: 'stop' }));
Rust Client Usage
Build and Run Examples
cd examples/rust-client
# Build
cargo build --release
# Run live transcription from microphone
cargo run --bin live-transcribe
# Transcribe a file
cargo run --bin file-transcribe -- audio.wav
# Stream a WAV file
cargo run --bin stream-transcribe -- audio.wav --realtime
Performance Optimizations
This service includes several performance optimizations:
- Shared Model Instance: Single model loaded in memory, shared across all connections
- TF32 & cuDNN: Enabled for Ampere GPUs for faster inference
- No Gradient Computation:
torch.no_grad()context for inference - Optimized Threading: Dynamic thread pool sizing based on CPU cores
- Efficient VAD: Fast voice activity detection to skip silent audio
- Batch Processing: Processes audio in optimal chunk sizes
- gRPC Optimizations: Keepalive and HTTP/2 settings tuned for performance
Supported Formats
- Audio: WAV, MP3, WebM, OGG, FLAC, M4A, raw PCM16
- Sample Rate: 16kHz (automatically resampled)
- Languages: Auto-detect or specify (en, es, fr, de, it, pt, ru, zh, ja, ko, etc.)
- Tasks: Transcribe or Translate to English
API Documentation
Full interactive API documentation is available at:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
Health Monitoring
# Check service health
curl http://localhost:8000/health
# Response:
{
"healthy": true,
"status": "running",
"model_loaded": "large-v3",
"uptime_seconds": 3600,
"active_sessions": 2
}