aljaz/transcription-api

Fork 0

mirror of https://github.com/aljazceru/transcription-api.git synced 2025-12-16 23:14:18 +01:00

Files

Aljaz Ceru ab17a8ac21 initial commit

2025-09-11 09:59:16 +02:00

3.9 KiB

Raw Blame History

Rust Transcription Client Examples

This directory contains Rust client examples for the Transcription API service.

Available Clients

1. `file-transcribe` - File Transcription

Transcribe audio files either by sending the entire file or streaming in real-time chunks.

# Send entire file at once (fast, but no real-time feedback)
cargo run --bin file-transcribe -- audio.wav

# Stream file in chunks for real-time transcription (like YouTube)
cargo run --bin file-transcribe -- audio.wav --stream

# With VAD (Voice Activity Detection) to filter silence
cargo run --bin file-transcribe -- audio.wav --stream --vad

# Specify model and language
cargo run --bin file-transcribe -- audio.wav --stream --model large-v3 --language en

2. `realtime-playback` - Play Audio with Live Transcription

Plays audio through your speakers while showing real-time transcriptions, similar to YouTube's live captions.

# Basic usage - plays audio and shows transcriptions
cargo run --bin realtime-playback -- audio.wav

# With timestamps for each transcription
cargo run --bin realtime-playback -- audio.wav --timestamps

# With VAD to reduce noise transcriptions
cargo run --bin realtime-playback -- audio.wav --vad

# Using a specific model
cargo run --bin realtime-playback -- audio.wav --model large-v3

3. `stream-transcribe` - Stream WAV Files

Streams WAV files chunk by chunk for transcription.

# Stream without delays (fast processing)
cargo run --bin stream-transcribe -- audio.wav

# Simulate real-time streaming with delays
cargo run --bin stream-transcribe -- audio.wav --realtime

4. `live-transcribe` - Live Microphone Transcription

Captures audio from your microphone and transcribes in real-time.

# Use default microphone
cargo run --bin live-transcribe

# Specify server and language
cargo run --bin live-transcribe -- --server http://localhost:50051 --language en

Building

# Build all binaries
cargo build --release

# Build specific binary
cargo build --release --bin realtime-playback

Common Options

All clients support these common options:

--server <URL> - gRPC server address (default: http://localhost:50051)
--language <code> - Language code: en, es, fr, de, etc., or "auto" (default: auto)
--model <name> - Model to use: tiny, base, small, medium, large-v3 (default: base)
--vad - Enable Voice Activity Detection to filter silence

Features

Real-time Streaming

The --stream flag in file-transcribe and the realtime-playback binary both support real-time streaming, which means:

Audio is sent in small chunks (0.5 second intervals)
Transcriptions appear as the audio is being processed
Similar experience to YouTube's live captions
Lower latency compared to sending entire file

Voice Activity Detection (VAD)

When --vad is enabled, the service will:

Filter out silence and background noise
Reduce false transcriptions (like repeated "Thank you")
Improve transcription quality for speech-only content

Audio Playback

The realtime-playback binary uses the rodio library to:

Play audio through your system's default audio output
Synchronize playback with transcription display
Support multiple audio formats (WAV, MP3, FLAC, etc.)

Requirements

Rust 1.70 or later
The Transcription API server running (usually on localhost:50051)
For live transcription: A working microphone
For playback: Audio output device (speakers/headphones)

Troubleshooting

"Connection refused" error

Make sure the Transcription API server is running:

cd ../../
docker compose up

No audio playback

Check your system's default audio output device
Ensure the audio file format is supported (WAV, MP3, FLAC)
Try with a different audio file

Poor transcription quality

Use a larger model (e.g., --model large-v3)
Enable VAD to filter noise (--vad)
Ensure audio quality is good (16kHz or higher recommended)

3.9 KiB Raw Blame History