mirror of
https://github.com/aljazceru/transcription-api.git
synced 2025-12-18 07:44:21 +01:00
190 lines
5.8 KiB
Markdown
190 lines
5.8 KiB
Markdown
# Rust Transcription Client Examples
|
|
|
|
This directory contains Rust client examples for the Transcription API service.
|
|
|
|
## Available Clients
|
|
|
|
### 1. `file-transcribe` - File Transcription
|
|
Transcribe audio files either by sending the entire file or streaming in real-time chunks.
|
|
|
|
```bash
|
|
# Send entire file at once (fast, but no real-time feedback)
|
|
cargo run --bin file-transcribe -- audio.wav
|
|
|
|
# Stream file in chunks for real-time transcription (like YouTube)
|
|
cargo run --bin file-transcribe -- audio.wav --stream
|
|
|
|
# With VAD (Voice Activity Detection) to filter silence
|
|
cargo run --bin file-transcribe -- audio.wav --stream --vad
|
|
|
|
# Specify model and language
|
|
cargo run --bin file-transcribe -- audio.wav --stream --model large-v3 --language en
|
|
```
|
|
|
|
### 2. `realtime-playback` - Play Audio with Live Transcription
|
|
Plays audio through your speakers while showing real-time transcriptions, similar to YouTube's live captions.
|
|
|
|
```bash
|
|
# Basic usage - plays audio and shows transcriptions
|
|
cargo run --bin realtime-playback -- audio.wav
|
|
|
|
# With timestamps for each transcription
|
|
cargo run --bin realtime-playback -- audio.wav --timestamps
|
|
|
|
# With VAD to reduce noise transcriptions
|
|
cargo run --bin realtime-playback -- audio.wav --vad
|
|
|
|
# Using a specific model
|
|
cargo run --bin realtime-playback -- audio.wav --model large-v3
|
|
```
|
|
|
|
### 3. `stream-transcribe` - Stream WAV Files
|
|
Streams WAV files chunk by chunk for transcription.
|
|
|
|
```bash
|
|
# Stream without delays (fast processing)
|
|
cargo run --bin stream-transcribe -- audio.wav
|
|
|
|
# Simulate real-time streaming with delays
|
|
cargo run --bin stream-transcribe -- audio.wav --realtime
|
|
```
|
|
|
|
### 4. `live-transcribe` - Live Microphone Transcription
|
|
Captures audio from your microphone and transcribes in real-time.
|
|
|
|
```bash
|
|
# Use default microphone
|
|
cargo run --bin live-transcribe
|
|
|
|
# Specify server and language
|
|
cargo run --bin live-transcribe -- --server http://localhost:50051 --language en
|
|
```
|
|
|
|
### 5. `stdin-transcribe` - Transcribe Audio from stdin
|
|
Accepts audio data from stdin, perfect for piping from other tools.
|
|
|
|
```bash
|
|
# Pipe audio from parec (PulseAudio/PipeWire)
|
|
parec --format=s16le --rate=16000 --channels=1 | cargo run --bin stdin-transcribe
|
|
|
|
# With options
|
|
parec --format=s16le --rate=16000 --channels=1 | \
|
|
cargo run --bin stdin-transcribe -- --language en --no-vad --chunk-seconds 2.5
|
|
```
|
|
|
|
### 6. `system-audio` - System Audio Capture
|
|
Attempts to capture system audio using available audio devices.
|
|
|
|
```bash
|
|
# List available audio devices
|
|
cargo run --bin system-audio -- --list-devices
|
|
|
|
# Capture from specific device
|
|
cargo run --bin system-audio -- --device pulse
|
|
```
|
|
|
|
## Video Call & System Audio Transcription
|
|
|
|
### Transcribe Video Calls (Zoom, Teams, Meet, etc.)
|
|
Use the provided script to transcribe any video call or system audio:
|
|
|
|
```bash
|
|
# Transcribe system audio (video calls, YouTube, etc.)
|
|
./transcribe_video_call.sh
|
|
|
|
# List available audio sources
|
|
./transcribe_video_call.sh --list
|
|
|
|
# Use microphone instead of system audio
|
|
./transcribe_video_call.sh --microphone
|
|
```
|
|
|
|
### Quick YouTube/System Audio Test
|
|
```bash
|
|
# Test with any playing audio (YouTube, music, etc.)
|
|
./test_youtube.sh
|
|
```
|
|
|
|
**Note**: System audio capture requires `pulseaudio-utils` package:
|
|
```bash
|
|
sudo apt-get install pulseaudio-utils
|
|
```
|
|
|
|
## Building
|
|
|
|
```bash
|
|
# Build all binaries
|
|
cargo build --release
|
|
|
|
# Build specific binary
|
|
cargo build --release --bin realtime-playback
|
|
```
|
|
|
|
## Common Options
|
|
|
|
All clients support these common options:
|
|
- `--server <URL>` - gRPC server address (default: http://localhost:50051)
|
|
- `--language <code>` - Language code: en, es, fr, de, etc., or "auto" (default: auto)
|
|
- `--model <name>` - Model to use: tiny, base, small, medium, large-v3 (default: base)
|
|
- `--vad` - Enable Voice Activity Detection to filter silence
|
|
|
|
## Features
|
|
|
|
### Real-time Streaming
|
|
The `--stream` flag in `file-transcribe` and the `realtime-playback` binary both support real-time streaming, which means:
|
|
- Audio is sent in small chunks (0.5 second intervals)
|
|
- Transcriptions appear as the audio is being processed
|
|
- Similar experience to YouTube's live captions
|
|
- Lower latency compared to sending entire file
|
|
|
|
### Voice Activity Detection (VAD)
|
|
When `--vad` is enabled, the service will:
|
|
- Filter out silence and background noise
|
|
- Reduce false transcriptions (like repeated "Thank you")
|
|
- Improve transcription quality for speech-only content
|
|
|
|
### Audio Playback
|
|
The `realtime-playback` binary uses the `rodio` library to:
|
|
- Play audio through your system's default audio output
|
|
- Synchronize playback with transcription display
|
|
- Support multiple audio formats (WAV, MP3, FLAC, etc.)
|
|
|
|
## Requirements
|
|
|
|
- Rust 1.70 or later
|
|
- The Transcription API server running (usually on localhost:50051)
|
|
- For live transcription: A working microphone
|
|
- For playback: Audio output device (speakers/headphones)
|
|
|
|
## System Requirements
|
|
|
|
### For Video Call Transcription (Ubuntu/Linux)
|
|
- PulseAudio utilities: `sudo apt-get install pulseaudio-utils`
|
|
- PipeWire or PulseAudio audio server
|
|
- The monitor audio source must be available
|
|
|
|
## Troubleshooting
|
|
|
|
### "Connection refused" error
|
|
Make sure the Transcription API server is running:
|
|
```bash
|
|
cd ../../
|
|
docker compose up
|
|
```
|
|
|
|
### No audio playback
|
|
- Check your system's default audio output device
|
|
- Ensure the audio file format is supported (WAV, MP3, FLAC)
|
|
- Try with a different audio file
|
|
|
|
### Poor transcription quality
|
|
- Use a larger model (e.g., `--model large-v3`)
|
|
- For system audio: use `--no-vad` flag to disable voice activity detection
|
|
- Ensure audio quality is good (16kHz or higher recommended)
|
|
- Use 2.5-3 second chunks for optimal accuracy
|
|
|
|
### System audio not working
|
|
- Install pulseaudio-utils: `sudo apt-get install pulseaudio-utils`
|
|
- Check monitor source exists: `./transcribe_video_call.sh --list`
|
|
- Make sure audio is playing when you start transcription
|
|
- Use headphones to avoid echo/feedback |