readme update

2025-12-17 07:14:24 +01:00 · 2025-09-12 07:01:22 +02:00
parent ffdda3d730
commit fd1709e386
1 changed files with 0 additions and 315 deletions
--- a/README.md
+++ b/README.md
@@ -2,17 +2,6 @@
 A high-performance, standalone transcription service with gRPC and WebSocket support, optimized for real-time speech-to-text applications. Perfect for desktop applications, web services, and IoT devices.
 ## Features
 - **Dual Protocol Support**: Both gRPC (recommended) and WebSocket
 - **Real-Time Streaming**: Bidirectional audio streaming with immediate transcription
 - **Multiple Models**: Support for all Whisper models (tiny to large-v3)
 - **Language Support**: 50+ languages with automatic detection
 - **Docker Ready**: Simple deployment with Docker Compose
 - **Production Ready**: Health checks, monitoring, and graceful shutdown
 - **Rust Client Examples**: Ready-to-use Rust client for desktop applications
 ## Quick Start
 ### Using Docker Compose (Recommended)
@@ -42,92 +31,9 @@ ENABLE_WEBSOCKET=true   # Enable WebSocket support
 CUDA_VISIBLE_DEVICES=0  # GPU device ID (if available)
 ```
 ## API Protocols
 ### gRPC (Recommended for Desktop Apps)
 **Why gRPC?**
 - Strongly typed with Protocol Buffers
 - Excellent performance with HTTP/2
 - Built-in streaming support
 - Auto-generated client code
 - Better error handling
 **Proto Definition**: See `proto/transcription.proto`
 **Service Methods**:
 - `StreamTranscribe`: Bidirectional streaming for real-time transcription
 - `TranscribeFile`: Single file transcription
 - `GetCapabilities`: Query available models and languages
 - `HealthCheck`: Service health status
 ### WebSocket (Alternative)
 **Protocol**:
 ```javascript
 // Connect
 ws://localhost:8765
 // Send audio
 {
  "type": "audio",
  "data": "base64_encoded_pcm16_audio"
 }
 // Receive transcription
 {
  "type": "transcription",
  "text": "Hello world",
  "start_time": 0.0,
  "end_time": 1.5,
  "is_final": true,
  "timestamp": 1234567890
 }
 // Stop
 {
  "type": "stop"
 }
 ```
 ## Rust Client Usage
 ### Installation
 ```toml
 # Add to your Cargo.toml
 [dependencies]
 tonic = "0.10"
 tokio = { version = "1.35", features = ["full"] }
 # ... see examples/rust-client/Cargo.toml for full list
 ```
 ### Live Microphone Transcription
 ```rust
 use transcription_client::TranscriptionClient;
 #[tokio::main]
 async fn main() -> Result<()> {
    // Connect to service
    let mut client = TranscriptionClient::connect("http://localhost:50051").await?;
    // Start streaming from microphone
    let stream = client.stream_from_microphone(
        "auto",       // language
        "transcribe", // task
        "base"        // model
    ).await?;
    // Process transcriptions
    while let Some(transcription) = stream.next().await {
        println!("{}", transcription.text);
    }
    Ok(())
 }
 ```
 ### Build and Run Examples
 ```bash
@@ -145,224 +51,3 @@ cargo run --bin file-transcribe -- audio.wav
 # Stream a WAV file
 cargo run --bin stream-transcribe -- audio.wav --realtime
 ```
 ## Audio Requirements
 - **Format**: PCM16 (16-bit signed integer)
 - **Sample Rate**: 16kHz
 - **Channels**: Mono
 - **Chunk Size**: Minimum ~500 bytes (flexible for real-time)
 ## Performance Optimization
 ### For Real-Time Applications
 1. **Use gRPC**: Lower latency than WebSocket
 2. **Small Chunks**: Send audio in 0.5-1 second chunks
 3. **Model Selection**:
   - `tiny`: Fastest, lowest accuracy (real-time on CPU)
   - `base`: Good balance (near real-time on CPU)
   - `small`: Better accuracy (may lag on CPU)
   - `large-v3`: Best accuracy (requires GPU for real-time)
 ### GPU Acceleration
 ```yaml
 # docker-compose.yml
 environment:
  - CUDA_VISIBLE_DEVICES=0
 deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: 1
          capabilities: [gpu]
 ```
 ## Architecture
 ```
 ┌─────────────┐
 │ Rust App    │
 │ (Desktop)   │
 └──────┬──────┘
       │ gRPC/HTTP2
       ▼
 ┌─────────────┐
 │ Transcription│
 │   Service    │
 │  ┌────────┐ │
 │  │Whisper │ │
 │  │ Model  │ │
 │  └────────┘ │
 └─────────────┘
 ```
 ### Components
 1. **gRPC Server**: Handles streaming audio and returns transcriptions
 2. **WebSocket Server**: Alternative protocol for web clients
 3. **Transcription Engine**: Whisper/SimulStreaming for speech-to-text
 4. **Session Manager**: Handles multiple concurrent streams
 5. **Model Cache**: Prevents re-downloading models
 ## Advanced Configuration
 ### Using SimulStreaming
 For even lower latency, mount SimulStreaming:
 ```yaml
 volumes:
  - ./SimulStreaming:/app/SimulStreaming
 environment:
  - SIMULSTREAMING_PATH=/app/SimulStreaming
 ```
 ### Custom Models
 Mount your own Whisper models:
 ```yaml
 volumes:
  - ./models:/app/models
 environment:
  - MODEL_PATH=/app/models/custom-model.pt
 ```
 ### Monitoring
 The service exposes metrics on `/metrics` (when enabled):
 ```bash
 curl http://localhost:9090/metrics
 ```
 ## API Reference
 ### gRPC Methods
 #### StreamTranscribe
 ```protobuf
 rpc StreamTranscribe(stream AudioChunk) returns (stream TranscriptionResult);
 ```
 Bidirectional streaming for real-time transcription. Send audio chunks, receive transcriptions.
 #### TranscribeFile
 ```protobuf
 rpc TranscribeFile(AudioFile) returns (TranscriptionResponse);
 ```
 Transcribe a complete audio file in one request.
 #### GetCapabilities
 ```protobuf
 rpc GetCapabilities(Empty) returns (Capabilities);
 ```
 Query available models, languages, and features.
 #### HealthCheck
 ```protobuf
 rpc HealthCheck(Empty) returns (HealthStatus);
 ```
 Check service health and status.
 ## Language Support
 Supports 50+ languages including:
 - English (en)
 - Spanish (es)
 - French (fr)
 - German (de)
 - Italian (it)
 - Portuguese (pt)
 - Russian (ru)
 - Chinese (zh)
 - Japanese (ja)
 - Korean (ko)
 - And many more...
 Use `"auto"` for automatic language detection.
 ## Troubleshooting
 ### Service won't start
 - Check if ports 50051 and 8765 are available
 - Ensure Docker has enough memory (minimum 4GB)
 - Check logs: `docker compose logs transcription-api`
 ### Slow transcription
 - Use a smaller model (tiny or base)
 - Enable GPU if available
 - Reduce audio quality to 16kHz mono
 - Send smaller chunks more frequently
 ### Connection refused
 - Check firewall settings
 - Ensure service is running: `docker compose ps`
 - Verify correct ports in client configuration
 ### High memory usage
 - Models are cached in memory for performance
 - Use smaller models for limited memory systems
 - Set memory limits in docker-compose.yml
 ## Development
 ### Building from Source
 ```bash
 # Install dependencies
 pip install -r requirements.txt
 # Generate gRPC code
 python -m grpc_tools.protoc \
    -I./proto \
    --python_out=./src \
    --grpc_python_out=./src \
    ./proto/transcription.proto
 # Run the service
 python src/transcription_server.py
 ```
 ### Running Tests
 ```bash
 # Test gRPC connection
 grpcurl -plaintext localhost:50051 list
 # Test health check
 grpcurl -plaintext localhost:50051 transcription.TranscriptionService/HealthCheck
 # Test with example audio
 python test_client.py
 ```
 ## R&D Project Notice
 This is a research and development project for exploring real-time transcription capabilities. It is not production-ready and should be used for experimentation and development purposes only.
 ### Known Limitations
 - Memory usage scales with model size (1.5-6GB for large models)
 - Single model instance shared across connections
 - No authentication or rate limiting
 - Not optimized for high-concurrency production use
 ## License
 MIT License - See LICENSE file for details
 ## Contributing
 Contributions welcome! Please read CONTRIBUTING.md for guidelines.
 ## Support
 - GitHub Issues: [Report bugs or request features]
 - Documentation: [Full API documentation]
 - Examples: See `examples/` directory