Files
transcription-api/examples/rust-client/README.md
2025-09-11 17:50:21 +02:00

190 lines
5.8 KiB
Markdown

# Rust Transcription Client Examples
This directory contains Rust client examples for the Transcription API service.
## Available Clients
### 1. `file-transcribe` - File Transcription
Transcribe audio files either by sending the entire file or streaming in real-time chunks.
```bash
# Send entire file at once (fast, but no real-time feedback)
cargo run --bin file-transcribe -- audio.wav
# Stream file in chunks for real-time transcription (like YouTube)
cargo run --bin file-transcribe -- audio.wav --stream
# With VAD (Voice Activity Detection) to filter silence
cargo run --bin file-transcribe -- audio.wav --stream --vad
# Specify model and language
cargo run --bin file-transcribe -- audio.wav --stream --model large-v3 --language en
```
### 2. `realtime-playback` - Play Audio with Live Transcription
Plays audio through your speakers while showing real-time transcriptions, similar to YouTube's live captions.
```bash
# Basic usage - plays audio and shows transcriptions
cargo run --bin realtime-playback -- audio.wav
# With timestamps for each transcription
cargo run --bin realtime-playback -- audio.wav --timestamps
# With VAD to reduce noise transcriptions
cargo run --bin realtime-playback -- audio.wav --vad
# Using a specific model
cargo run --bin realtime-playback -- audio.wav --model large-v3
```
### 3. `stream-transcribe` - Stream WAV Files
Streams WAV files chunk by chunk for transcription.
```bash
# Stream without delays (fast processing)
cargo run --bin stream-transcribe -- audio.wav
# Simulate real-time streaming with delays
cargo run --bin stream-transcribe -- audio.wav --realtime
```
### 4. `live-transcribe` - Live Microphone Transcription
Captures audio from your microphone and transcribes in real-time.
```bash
# Use default microphone
cargo run --bin live-transcribe
# Specify server and language
cargo run --bin live-transcribe -- --server http://localhost:50051 --language en
```
### 5. `stdin-transcribe` - Transcribe Audio from stdin
Accepts audio data from stdin, perfect for piping from other tools.
```bash
# Pipe audio from parec (PulseAudio/PipeWire)
parec --format=s16le --rate=16000 --channels=1 | cargo run --bin stdin-transcribe
# With options
parec --format=s16le --rate=16000 --channels=1 | \
cargo run --bin stdin-transcribe -- --language en --no-vad --chunk-seconds 2.5
```
### 6. `system-audio` - System Audio Capture
Attempts to capture system audio using available audio devices.
```bash
# List available audio devices
cargo run --bin system-audio -- --list-devices
# Capture from specific device
cargo run --bin system-audio -- --device pulse
```
## Video Call & System Audio Transcription
### Transcribe Video Calls (Zoom, Teams, Meet, etc.)
Use the provided script to transcribe any video call or system audio:
```bash
# Transcribe system audio (video calls, YouTube, etc.)
./transcribe_video_call.sh
# List available audio sources
./transcribe_video_call.sh --list
# Use microphone instead of system audio
./transcribe_video_call.sh --microphone
```
### Quick YouTube/System Audio Test
```bash
# Test with any playing audio (YouTube, music, etc.)
./test_youtube.sh
```
**Note**: System audio capture requires `pulseaudio-utils` package:
```bash
sudo apt-get install pulseaudio-utils
```
## Building
```bash
# Build all binaries
cargo build --release
# Build specific binary
cargo build --release --bin realtime-playback
```
## Common Options
All clients support these common options:
- `--server <URL>` - gRPC server address (default: http://localhost:50051)
- `--language <code>` - Language code: en, es, fr, de, etc., or "auto" (default: auto)
- `--model <name>` - Model to use: tiny, base, small, medium, large-v3 (default: base)
- `--vad` - Enable Voice Activity Detection to filter silence
## Features
### Real-time Streaming
The `--stream` flag in `file-transcribe` and the `realtime-playback` binary both support real-time streaming, which means:
- Audio is sent in small chunks (0.5 second intervals)
- Transcriptions appear as the audio is being processed
- Similar experience to YouTube's live captions
- Lower latency compared to sending entire file
### Voice Activity Detection (VAD)
When `--vad` is enabled, the service will:
- Filter out silence and background noise
- Reduce false transcriptions (like repeated "Thank you")
- Improve transcription quality for speech-only content
### Audio Playback
The `realtime-playback` binary uses the `rodio` library to:
- Play audio through your system's default audio output
- Synchronize playback with transcription display
- Support multiple audio formats (WAV, MP3, FLAC, etc.)
## Requirements
- Rust 1.70 or later
- The Transcription API server running (usually on localhost:50051)
- For live transcription: A working microphone
- For playback: Audio output device (speakers/headphones)
## System Requirements
### For Video Call Transcription (Ubuntu/Linux)
- PulseAudio utilities: `sudo apt-get install pulseaudio-utils`
- PipeWire or PulseAudio audio server
- The monitor audio source must be available
## Troubleshooting
### "Connection refused" error
Make sure the Transcription API server is running:
```bash
cd ../../
docker compose up
```
### No audio playback
- Check your system's default audio output device
- Ensure the audio file format is supported (WAV, MP3, FLAC)
- Try with a different audio file
### Poor transcription quality
- Use a larger model (e.g., `--model large-v3`)
- For system audio: use `--no-vad` flag to disable voice activity detection
- Ensure audio quality is good (16kHz or higher recommended)
- Use 2.5-3 second chunks for optimal accuracy
### System audio not working
- Install pulseaudio-utils: `sudo apt-get install pulseaudio-utils`
- Check monitor source exists: `./transcribe_video_call.sh --list`
- Make sure audio is playing when you start transcription
- Use headphones to avoid echo/feedback