# Rust Transcription Client Examples This directory contains Rust client examples for the Transcription API service. ## Available Clients ### 1. `file-transcribe` - File Transcription Transcribe audio files either by sending the entire file or streaming in real-time chunks. ```bash # Send entire file at once (fast, but no real-time feedback) cargo run --bin file-transcribe -- audio.wav # Stream file in chunks for real-time transcription (like YouTube) cargo run --bin file-transcribe -- audio.wav --stream # With VAD (Voice Activity Detection) to filter silence cargo run --bin file-transcribe -- audio.wav --stream --vad # Specify model and language cargo run --bin file-transcribe -- audio.wav --stream --model large-v3 --language en ``` ### 2. `realtime-playback` - Play Audio with Live Transcription Plays audio through your speakers while showing real-time transcriptions, similar to YouTube's live captions. ```bash # Basic usage - plays audio and shows transcriptions cargo run --bin realtime-playback -- audio.wav # With timestamps for each transcription cargo run --bin realtime-playback -- audio.wav --timestamps # With VAD to reduce noise transcriptions cargo run --bin realtime-playback -- audio.wav --vad # Using a specific model cargo run --bin realtime-playback -- audio.wav --model large-v3 ``` ### 3. `stream-transcribe` - Stream WAV Files Streams WAV files chunk by chunk for transcription. ```bash # Stream without delays (fast processing) cargo run --bin stream-transcribe -- audio.wav # Simulate real-time streaming with delays cargo run --bin stream-transcribe -- audio.wav --realtime ``` ### 4. `live-transcribe` - Live Microphone Transcription Captures audio from your microphone and transcribes in real-time. ```bash # Use default microphone cargo run --bin live-transcribe # Specify server and language cargo run --bin live-transcribe -- --server http://localhost:50051 --language en ``` ## Building ```bash # Build all binaries cargo build --release # Build specific binary cargo build --release --bin realtime-playback ``` ## Common Options All clients support these common options: - `--server ` - gRPC server address (default: http://localhost:50051) - `--language ` - Language code: en, es, fr, de, etc., or "auto" (default: auto) - `--model ` - Model to use: tiny, base, small, medium, large-v3 (default: base) - `--vad` - Enable Voice Activity Detection to filter silence ## Features ### Real-time Streaming The `--stream` flag in `file-transcribe` and the `realtime-playback` binary both support real-time streaming, which means: - Audio is sent in small chunks (0.5 second intervals) - Transcriptions appear as the audio is being processed - Similar experience to YouTube's live captions - Lower latency compared to sending entire file ### Voice Activity Detection (VAD) When `--vad` is enabled, the service will: - Filter out silence and background noise - Reduce false transcriptions (like repeated "Thank you") - Improve transcription quality for speech-only content ### Audio Playback The `realtime-playback` binary uses the `rodio` library to: - Play audio through your system's default audio output - Synchronize playback with transcription display - Support multiple audio formats (WAV, MP3, FLAC, etc.) ## Requirements - Rust 1.70 or later - The Transcription API server running (usually on localhost:50051) - For live transcription: A working microphone - For playback: Audio output device (speakers/headphones) ## Troubleshooting ### "Connection refused" error Make sure the Transcription API server is running: ```bash cd ../../ docker compose up ``` ### No audio playback - Check your system's default audio output device - Ensure the audio file format is supported (WAV, MP3, FLAC) - Try with a different audio file ### Poor transcription quality - Use a larger model (e.g., `--model large-v3`) - Enable VAD to filter noise (`--vad`) - Ensure audio quality is good (16kHz or higher recommended)