Files
transcription-api/examples/rust-client/README.md
2025-09-11 09:59:16 +02:00

3.9 KiB

Rust Transcription Client Examples

This directory contains Rust client examples for the Transcription API service.

Available Clients

1. file-transcribe - File Transcription

Transcribe audio files either by sending the entire file or streaming in real-time chunks.

# Send entire file at once (fast, but no real-time feedback)
cargo run --bin file-transcribe -- audio.wav

# Stream file in chunks for real-time transcription (like YouTube)
cargo run --bin file-transcribe -- audio.wav --stream

# With VAD (Voice Activity Detection) to filter silence
cargo run --bin file-transcribe -- audio.wav --stream --vad

# Specify model and language
cargo run --bin file-transcribe -- audio.wav --stream --model large-v3 --language en

2. realtime-playback - Play Audio with Live Transcription

Plays audio through your speakers while showing real-time transcriptions, similar to YouTube's live captions.

# Basic usage - plays audio and shows transcriptions
cargo run --bin realtime-playback -- audio.wav

# With timestamps for each transcription
cargo run --bin realtime-playback -- audio.wav --timestamps

# With VAD to reduce noise transcriptions
cargo run --bin realtime-playback -- audio.wav --vad

# Using a specific model
cargo run --bin realtime-playback -- audio.wav --model large-v3

3. stream-transcribe - Stream WAV Files

Streams WAV files chunk by chunk for transcription.

# Stream without delays (fast processing)
cargo run --bin stream-transcribe -- audio.wav

# Simulate real-time streaming with delays
cargo run --bin stream-transcribe -- audio.wav --realtime

4. live-transcribe - Live Microphone Transcription

Captures audio from your microphone and transcribes in real-time.

# Use default microphone
cargo run --bin live-transcribe

# Specify server and language
cargo run --bin live-transcribe -- --server http://localhost:50051 --language en

Building

# Build all binaries
cargo build --release

# Build specific binary
cargo build --release --bin realtime-playback

Common Options

All clients support these common options:

  • --server <URL> - gRPC server address (default: http://localhost:50051)
  • --language <code> - Language code: en, es, fr, de, etc., or "auto" (default: auto)
  • --model <name> - Model to use: tiny, base, small, medium, large-v3 (default: base)
  • --vad - Enable Voice Activity Detection to filter silence

Features

Real-time Streaming

The --stream flag in file-transcribe and the realtime-playback binary both support real-time streaming, which means:

  • Audio is sent in small chunks (0.5 second intervals)
  • Transcriptions appear as the audio is being processed
  • Similar experience to YouTube's live captions
  • Lower latency compared to sending entire file

Voice Activity Detection (VAD)

When --vad is enabled, the service will:

  • Filter out silence and background noise
  • Reduce false transcriptions (like repeated "Thank you")
  • Improve transcription quality for speech-only content

Audio Playback

The realtime-playback binary uses the rodio library to:

  • Play audio through your system's default audio output
  • Synchronize playback with transcription display
  • Support multiple audio formats (WAV, MP3, FLAC, etc.)

Requirements

  • Rust 1.70 or later
  • The Transcription API server running (usually on localhost:50051)
  • For live transcription: A working microphone
  • For playback: Audio output device (speakers/headphones)

Troubleshooting

"Connection refused" error

Make sure the Transcription API server is running:

cd ../../
docker compose up

No audio playback

  • Check your system's default audio output device
  • Ensure the audio file format is supported (WAV, MP3, FLAC)
  • Try with a different audio file

Poor transcription quality

  • Use a larger model (e.g., --model large-v3)
  • Enable VAD to filter noise (--vad)
  • Ensure audio quality is good (16kHz or higher recommended)