readme updates

2025-12-17 07:14:24 +01:00 · 2025-09-11 17:50:21 +02:00
parent 1707bf917d
commit ffdda3d730
2 changed files with 73 additions and 53 deletions
--- a/README.md
+++ b/README.md
@@ -343,59 +343,15 @@ grpcurl -plaintext localhost:50051 transcription.TranscriptionService/HealthChec
 python test_client.py
 ```

-## Production Deployment
+## R&D Project Notice

-### Docker Swarm
+This is a research and development project for exploring real-time transcription capabilities. It is not production-ready and should be used for experimentation and development purposes only.

-```bash
-docker stack deploy -c docker-compose.yml transcription
-```
-
-### Kubernetes
-
-```yaml
-apiVersion: apps/v1
-kind: Deployment
-metadata:
-  name: transcription-api
-spec:
-  replicas: 3
-  selector:
-    matchLabels:
-      app: transcription-api
-  template:
-    metadata:
-      labels:
-        app: transcription-api
-    spec:
-      containers:
-      - name: transcription-api
-        image: transcription-api:latest
-        ports:
-        - containerPort: 50051
-          name: grpc
-        - containerPort: 8765
-          name: websocket
-        env:
-        - name: MODEL_PATH
-          value: "base"
-        resources:
-          requests:
-            memory: "4Gi"
-            cpu: "2"
-          limits:
-            memory: "8Gi"
-            cpu: "4"
-```
-
-### Security
-
-For production:
-1. Enable TLS for gRPC
-2. Use WSS for WebSocket
-3. Add authentication
-4. Rate limiting
-5. Input validation
+### Known Limitations
+- Memory usage scales with model size (1.5-6GB for large models)
+- Single model instance shared across connections
+- No authentication or rate limiting
+- Not optimized for high-concurrency production use

 ## License

--- a/examples/rust-client/README.md
+++ b/examples/rust-client/README.md
@@ -60,6 +60,56 @@ cargo run --bin live-transcribe
 cargo run --bin live-transcribe -- --server http://localhost:50051 --language en
 ```

+### 5. `stdin-transcribe` - Transcribe Audio from stdin
+Accepts audio data from stdin, perfect for piping from other tools.
+
+```bash
+# Pipe audio from parec (PulseAudio/PipeWire)
+parec --format=s16le --rate=16000 --channels=1 | cargo run --bin stdin-transcribe
+
+# With options
+parec --format=s16le --rate=16000 --channels=1 | \
+  cargo run --bin stdin-transcribe -- --language en --no-vad --chunk-seconds 2.5
+```
+
+### 6. `system-audio` - System Audio Capture
+Attempts to capture system audio using available audio devices.
+
+```bash
+# List available audio devices
+cargo run --bin system-audio -- --list-devices
+
+# Capture from specific device
+cargo run --bin system-audio -- --device pulse
+```
+
+## Video Call & System Audio Transcription
+
+### Transcribe Video Calls (Zoom, Teams, Meet, etc.)
+Use the provided script to transcribe any video call or system audio:
+
+```bash
+# Transcribe system audio (video calls, YouTube, etc.)
+./transcribe_video_call.sh
+
+# List available audio sources
+./transcribe_video_call.sh --list
+
+# Use microphone instead of system audio
+./transcribe_video_call.sh --microphone
+```
+
+### Quick YouTube/System Audio Test
+```bash
+# Test with any playing audio (YouTube, music, etc.)
+./test_youtube.sh
+```
+
+**Note**: System audio capture requires `pulseaudio-utils` package:
+```bash
+sudo apt-get install pulseaudio-utils
+```
+
 ## Building

 ```bash
@@ -106,6 +156,13 @@ The `realtime-playback` binary uses the `rodio` library to:
 - For live transcription: A working microphone
 - For playback: Audio output device (speakers/headphones)

+## System Requirements
+
+### For Video Call Transcription (Ubuntu/Linux)
+- PulseAudio utilities: `sudo apt-get install pulseaudio-utils`
+- PipeWire or PulseAudio audio server
+- The monitor audio source must be available
+
 ## Troubleshooting

 ### "Connection refused" error
@@ -122,5 +179,12 @@ docker compose up

 ### Poor transcription quality
 - Use a larger model (e.g., `--model large-v3`)
- Enable VAD to filter noise (`--vad`)
- Ensure audio quality is good (16kHz or higher recommended)
+- For system audio: use `--no-vad` flag to disable voice activity detection
+- Ensure audio quality is good (16kHz or higher recommended)
+- Use 2.5-3 second chunks for optimal accuracy
+
+### System audio not working
+- Install pulseaudio-utils: `sudo apt-get install pulseaudio-utils`
+- Check monitor source exists: `./transcribe_video_call.sh --list`
+- Make sure audio is playing when you start transcription
+- Use headphones to avoid echo/feedback