mirror of
https://github.com/aljazceru/transcription-api.git
synced 2025-12-16 23:14:18 +01:00
3.9 KiB
3.9 KiB
Rust Transcription Client Examples
This directory contains Rust client examples for the Transcription API service.
Available Clients
1. file-transcribe - File Transcription
Transcribe audio files either by sending the entire file or streaming in real-time chunks.
# Send entire file at once (fast, but no real-time feedback)
cargo run --bin file-transcribe -- audio.wav
# Stream file in chunks for real-time transcription (like YouTube)
cargo run --bin file-transcribe -- audio.wav --stream
# With VAD (Voice Activity Detection) to filter silence
cargo run --bin file-transcribe -- audio.wav --stream --vad
# Specify model and language
cargo run --bin file-transcribe -- audio.wav --stream --model large-v3 --language en
2. realtime-playback - Play Audio with Live Transcription
Plays audio through your speakers while showing real-time transcriptions, similar to YouTube's live captions.
# Basic usage - plays audio and shows transcriptions
cargo run --bin realtime-playback -- audio.wav
# With timestamps for each transcription
cargo run --bin realtime-playback -- audio.wav --timestamps
# With VAD to reduce noise transcriptions
cargo run --bin realtime-playback -- audio.wav --vad
# Using a specific model
cargo run --bin realtime-playback -- audio.wav --model large-v3
3. stream-transcribe - Stream WAV Files
Streams WAV files chunk by chunk for transcription.
# Stream without delays (fast processing)
cargo run --bin stream-transcribe -- audio.wav
# Simulate real-time streaming with delays
cargo run --bin stream-transcribe -- audio.wav --realtime
4. live-transcribe - Live Microphone Transcription
Captures audio from your microphone and transcribes in real-time.
# Use default microphone
cargo run --bin live-transcribe
# Specify server and language
cargo run --bin live-transcribe -- --server http://localhost:50051 --language en
Building
# Build all binaries
cargo build --release
# Build specific binary
cargo build --release --bin realtime-playback
Common Options
All clients support these common options:
--server <URL>- gRPC server address (default: http://localhost:50051)--language <code>- Language code: en, es, fr, de, etc., or "auto" (default: auto)--model <name>- Model to use: tiny, base, small, medium, large-v3 (default: base)--vad- Enable Voice Activity Detection to filter silence
Features
Real-time Streaming
The --stream flag in file-transcribe and the realtime-playback binary both support real-time streaming, which means:
- Audio is sent in small chunks (0.5 second intervals)
- Transcriptions appear as the audio is being processed
- Similar experience to YouTube's live captions
- Lower latency compared to sending entire file
Voice Activity Detection (VAD)
When --vad is enabled, the service will:
- Filter out silence and background noise
- Reduce false transcriptions (like repeated "Thank you")
- Improve transcription quality for speech-only content
Audio Playback
The realtime-playback binary uses the rodio library to:
- Play audio through your system's default audio output
- Synchronize playback with transcription display
- Support multiple audio formats (WAV, MP3, FLAC, etc.)
Requirements
- Rust 1.70 or later
- The Transcription API server running (usually on localhost:50051)
- For live transcription: A working microphone
- For playback: Audio output device (speakers/headphones)
Troubleshooting
"Connection refused" error
Make sure the Transcription API server is running:
cd ../../
docker compose up
No audio playback
- Check your system's default audio output device
- Ensure the audio file format is supported (WAV, MP3, FLAC)
- Try with a different audio file
Poor transcription quality
- Use a larger model (e.g.,
--model large-v3) - Enable VAD to filter noise (
--vad) - Ensure audio quality is good (16kHz or higher recommended)