mirror of
https://github.com/aljazceru/transcription-api.git
synced 2025-12-17 07:14:24 +01:00
readme updates
This commit is contained in:
58
README.md
58
README.md
@@ -343,59 +343,15 @@ grpcurl -plaintext localhost:50051 transcription.TranscriptionService/HealthChec
|
||||
python test_client.py
|
||||
```
|
||||
|
||||
## Production Deployment
|
||||
## R&D Project Notice
|
||||
|
||||
### Docker Swarm
|
||||
This is a research and development project for exploring real-time transcription capabilities. It is not production-ready and should be used for experimentation and development purposes only.
|
||||
|
||||
```bash
|
||||
docker stack deploy -c docker-compose.yml transcription
|
||||
```
|
||||
|
||||
### Kubernetes
|
||||
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: transcription-api
|
||||
spec:
|
||||
replicas: 3
|
||||
selector:
|
||||
matchLabels:
|
||||
app: transcription-api
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: transcription-api
|
||||
spec:
|
||||
containers:
|
||||
- name: transcription-api
|
||||
image: transcription-api:latest
|
||||
ports:
|
||||
- containerPort: 50051
|
||||
name: grpc
|
||||
- containerPort: 8765
|
||||
name: websocket
|
||||
env:
|
||||
- name: MODEL_PATH
|
||||
value: "base"
|
||||
resources:
|
||||
requests:
|
||||
memory: "4Gi"
|
||||
cpu: "2"
|
||||
limits:
|
||||
memory: "8Gi"
|
||||
cpu: "4"
|
||||
```
|
||||
|
||||
### Security
|
||||
|
||||
For production:
|
||||
1. Enable TLS for gRPC
|
||||
2. Use WSS for WebSocket
|
||||
3. Add authentication
|
||||
4. Rate limiting
|
||||
5. Input validation
|
||||
### Known Limitations
|
||||
- Memory usage scales with model size (1.5-6GB for large models)
|
||||
- Single model instance shared across connections
|
||||
- No authentication or rate limiting
|
||||
- Not optimized for high-concurrency production use
|
||||
|
||||
## License
|
||||
|
||||
|
||||
@@ -60,6 +60,56 @@ cargo run --bin live-transcribe
|
||||
cargo run --bin live-transcribe -- --server http://localhost:50051 --language en
|
||||
```
|
||||
|
||||
### 5. `stdin-transcribe` - Transcribe Audio from stdin
|
||||
Accepts audio data from stdin, perfect for piping from other tools.
|
||||
|
||||
```bash
|
||||
# Pipe audio from parec (PulseAudio/PipeWire)
|
||||
parec --format=s16le --rate=16000 --channels=1 | cargo run --bin stdin-transcribe
|
||||
|
||||
# With options
|
||||
parec --format=s16le --rate=16000 --channels=1 | \
|
||||
cargo run --bin stdin-transcribe -- --language en --no-vad --chunk-seconds 2.5
|
||||
```
|
||||
|
||||
### 6. `system-audio` - System Audio Capture
|
||||
Attempts to capture system audio using available audio devices.
|
||||
|
||||
```bash
|
||||
# List available audio devices
|
||||
cargo run --bin system-audio -- --list-devices
|
||||
|
||||
# Capture from specific device
|
||||
cargo run --bin system-audio -- --device pulse
|
||||
```
|
||||
|
||||
## Video Call & System Audio Transcription
|
||||
|
||||
### Transcribe Video Calls (Zoom, Teams, Meet, etc.)
|
||||
Use the provided script to transcribe any video call or system audio:
|
||||
|
||||
```bash
|
||||
# Transcribe system audio (video calls, YouTube, etc.)
|
||||
./transcribe_video_call.sh
|
||||
|
||||
# List available audio sources
|
||||
./transcribe_video_call.sh --list
|
||||
|
||||
# Use microphone instead of system audio
|
||||
./transcribe_video_call.sh --microphone
|
||||
```
|
||||
|
||||
### Quick YouTube/System Audio Test
|
||||
```bash
|
||||
# Test with any playing audio (YouTube, music, etc.)
|
||||
./test_youtube.sh
|
||||
```
|
||||
|
||||
**Note**: System audio capture requires `pulseaudio-utils` package:
|
||||
```bash
|
||||
sudo apt-get install pulseaudio-utils
|
||||
```
|
||||
|
||||
## Building
|
||||
|
||||
```bash
|
||||
@@ -106,6 +156,13 @@ The `realtime-playback` binary uses the `rodio` library to:
|
||||
- For live transcription: A working microphone
|
||||
- For playback: Audio output device (speakers/headphones)
|
||||
|
||||
## System Requirements
|
||||
|
||||
### For Video Call Transcription (Ubuntu/Linux)
|
||||
- PulseAudio utilities: `sudo apt-get install pulseaudio-utils`
|
||||
- PipeWire or PulseAudio audio server
|
||||
- The monitor audio source must be available
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Connection refused" error
|
||||
@@ -122,5 +179,12 @@ docker compose up
|
||||
|
||||
### Poor transcription quality
|
||||
- Use a larger model (e.g., `--model large-v3`)
|
||||
- Enable VAD to filter noise (`--vad`)
|
||||
- Ensure audio quality is good (16kHz or higher recommended)
|
||||
- For system audio: use `--no-vad` flag to disable voice activity detection
|
||||
- Ensure audio quality is good (16kHz or higher recommended)
|
||||
- Use 2.5-3 second chunks for optimal accuracy
|
||||
|
||||
### System audio not working
|
||||
- Install pulseaudio-utils: `sudo apt-get install pulseaudio-utils`
|
||||
- Check monitor source exists: `./transcribe_video_call.sh --list`
|
||||
- Make sure audio is playing when you start transcription
|
||||
- Use headphones to avoid echo/feedback
|
||||
Reference in New Issue
Block a user