From ffdda3d730cccf724670c471d40d27a265a1993a Mon Sep 17 00:00:00 2001 From: Aljaz Ceru Date: Thu, 11 Sep 2025 17:50:21 +0200 Subject: [PATCH] readme updates --- README.md | 58 ++++------------------------- examples/rust-client/README.md | 68 +++++++++++++++++++++++++++++++++- 2 files changed, 73 insertions(+), 53 deletions(-) diff --git a/README.md b/README.md index 2f2612c..e153ff4 100644 --- a/README.md +++ b/README.md @@ -343,59 +343,15 @@ grpcurl -plaintext localhost:50051 transcription.TranscriptionService/HealthChec python test_client.py ``` -## Production Deployment +## R&D Project Notice -### Docker Swarm +This is a research and development project for exploring real-time transcription capabilities. It is not production-ready and should be used for experimentation and development purposes only. -```bash -docker stack deploy -c docker-compose.yml transcription -``` - -### Kubernetes - -```yaml -apiVersion: apps/v1 -kind: Deployment -metadata: - name: transcription-api -spec: - replicas: 3 - selector: - matchLabels: - app: transcription-api - template: - metadata: - labels: - app: transcription-api - spec: - containers: - - name: transcription-api - image: transcription-api:latest - ports: - - containerPort: 50051 - name: grpc - - containerPort: 8765 - name: websocket - env: - - name: MODEL_PATH - value: "base" - resources: - requests: - memory: "4Gi" - cpu: "2" - limits: - memory: "8Gi" - cpu: "4" -``` - -### Security - -For production: -1. Enable TLS for gRPC -2. Use WSS for WebSocket -3. Add authentication -4. Rate limiting -5. Input validation +### Known Limitations +- Memory usage scales with model size (1.5-6GB for large models) +- Single model instance shared across connections +- No authentication or rate limiting +- Not optimized for high-concurrency production use ## License diff --git a/examples/rust-client/README.md b/examples/rust-client/README.md index 0679650..3eb8392 100644 --- a/examples/rust-client/README.md +++ b/examples/rust-client/README.md @@ -60,6 +60,56 @@ cargo run --bin live-transcribe cargo run --bin live-transcribe -- --server http://localhost:50051 --language en ``` +### 5. `stdin-transcribe` - Transcribe Audio from stdin +Accepts audio data from stdin, perfect for piping from other tools. + +```bash +# Pipe audio from parec (PulseAudio/PipeWire) +parec --format=s16le --rate=16000 --channels=1 | cargo run --bin stdin-transcribe + +# With options +parec --format=s16le --rate=16000 --channels=1 | \ + cargo run --bin stdin-transcribe -- --language en --no-vad --chunk-seconds 2.5 +``` + +### 6. `system-audio` - System Audio Capture +Attempts to capture system audio using available audio devices. + +```bash +# List available audio devices +cargo run --bin system-audio -- --list-devices + +# Capture from specific device +cargo run --bin system-audio -- --device pulse +``` + +## Video Call & System Audio Transcription + +### Transcribe Video Calls (Zoom, Teams, Meet, etc.) +Use the provided script to transcribe any video call or system audio: + +```bash +# Transcribe system audio (video calls, YouTube, etc.) +./transcribe_video_call.sh + +# List available audio sources +./transcribe_video_call.sh --list + +# Use microphone instead of system audio +./transcribe_video_call.sh --microphone +``` + +### Quick YouTube/System Audio Test +```bash +# Test with any playing audio (YouTube, music, etc.) +./test_youtube.sh +``` + +**Note**: System audio capture requires `pulseaudio-utils` package: +```bash +sudo apt-get install pulseaudio-utils +``` + ## Building ```bash @@ -106,6 +156,13 @@ The `realtime-playback` binary uses the `rodio` library to: - For live transcription: A working microphone - For playback: Audio output device (speakers/headphones) +## System Requirements + +### For Video Call Transcription (Ubuntu/Linux) +- PulseAudio utilities: `sudo apt-get install pulseaudio-utils` +- PipeWire or PulseAudio audio server +- The monitor audio source must be available + ## Troubleshooting ### "Connection refused" error @@ -122,5 +179,12 @@ docker compose up ### Poor transcription quality - Use a larger model (e.g., `--model large-v3`) -- Enable VAD to filter noise (`--vad`) -- Ensure audio quality is good (16kHz or higher recommended) \ No newline at end of file +- For system audio: use `--no-vad` flag to disable voice activity detection +- Ensure audio quality is good (16kHz or higher recommended) +- Use 2.5-3 second chunks for optimal accuracy + +### System audio not working +- Install pulseaudio-utils: `sudo apt-get install pulseaudio-utils` +- Check monitor source exists: `./transcribe_video_call.sh --list` +- Make sure audio is playing when you start transcription +- Use headphones to avoid echo/feedback \ No newline at end of file