--- title: Speaker Diarization emoji: 🔥 colorFrom: blue colorTo: blue sdk: docker pinned: false license: mit --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference # Real-Time Speaker Diarization This project implements real-time speaker diarization using WebRTC, FastAPI, and Gradio. It automatically transcribes speech and identifies different speakers in real-time. ## Architecture The system is split into two components: 1. **Model Server (Hugging Face Space)**: Runs the speech recognition and speaker diarization models 2. **Signaling Server (Render)**: Handles WebRTC signaling for direct audio streaming from browser ## Deployment Instructions ### Deploy Model Server on Hugging Face Space 1. Create a new Space on Hugging Face (Docker SDK) 2. Upload all files from the `Speaker-Diarization` directory 3. In Space settings: - Set Hardware to CPU (or GPU if available) - Set the public visibility - Environment: Make sure Docker SDK is selected ### Deploy Signaling Server on Render 1. Create a new Render Web Service 2. Connect to your GitHub repo containing the `render-signal` directory 3. Configure Render service: - Set Build Command: `cd render-signal && pip install -r requirements.txt` - Set Start Command: `cd render-signal && python backend.py` - Select Environment: Python 3 - Set Environment Variables: - `HF_SPACE_URL`: Set to your Hugging Face Space URL (e.g., `your-username-speaker-diarization.hf.space`) ### Update Configuration After both services are deployed: 1. Update `ui.py` on your Hugging Face Space: - Change `RENDER_SIGNALING_URL` to your Render app URL (`wss://your-app.onrender.com/stream`) - Make sure `HF_SPACE_URL` matches your actual Hugging Face Space URL 2. Update `backend.py` on your Render service: - Set `API_WS` to your Hugging Face Space WebSocket URL (`wss://your-username-speaker-diarization.hf.space/ws_inference`) ## Usage 1. Open your Hugging Face Space URL in a web browser 2. Click "Start Listening" to begin 3. Speak into your microphone 4. The system will transcribe your speech and identify different speakers in real-time ## Technology Stack - **Frontend**: Gradio UI with WebRTC for audio streaming - **Signaling**: FastRTC on Render for WebRTC signaling - **Backend**: FastAPI + WebSockets - **Models**: - SpeechBrain ECAPA-TDNN for speaker embeddings - Automatic Speech Recognition for transcription ## License MIT