---
title: Speaker Diarization
emoji: 🔥
colorFrom: blue
colorTo: blue
sdk: docker
pinned: false
license: mit
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

# Real-Time Speaker Diarization

This project implements real-time speaker diarization using WebRTC, FastAPI, and Gradio. It automatically transcribes speech and identifies different speakers in real-time.

## Architecture

The system is split into two components:

1. **Model Server (Hugging Face Space)**: Runs the speech recognition and speaker diarization models
2. **Signaling Server (Render)**: Handles WebRTC signaling for direct audio streaming from browser

## Deployment Instructions

### Deploy Model Server on Hugging Face Space

1. Create a new Space on Hugging Face (Docker SDK)
2. Upload all files from the `Speaker-Diarization` directory
3. In Space settings:
   - Set Hardware to CPU (or GPU if available)
   - Set the public visibility
   - Environment: Make sure Docker SDK is selected

### Deploy Signaling Server on Render

1. Create a new Render Web Service
2. Connect to your GitHub repo containing the `render-signal` directory
3. Configure Render service:
   - Set Build Command: `cd render-signal && pip install -r requirements.txt`
   - Set Start Command: `cd render-signal && python backend.py`
   - Select Environment: Python 3
   - Set Environment Variables:
     - `HF_SPACE_URL`: Set to your Hugging Face Space URL (e.g., `your-username-speaker-diarization.hf.space`)

### Update Configuration

After both services are deployed:

1. Update `ui.py` on your Hugging Face Space:
   - Change `RENDER_SIGNALING_URL` to your Render app URL (`wss://your-app.onrender.com/stream`)
   - Make sure `HF_SPACE_URL` matches your actual Hugging Face Space URL

2. Update `backend.py` on your Render service:
   - Set `API_WS` to your Hugging Face Space WebSocket URL (`wss://your-username-speaker-diarization.hf.space/ws_inference`)

## Usage

1. Open your Hugging Face Space URL in a web browser
2. Click "Start Listening" to begin
3. Speak into your microphone
4. The system will transcribe your speech and identify different speakers in real-time

## Technology Stack

- **Frontend**: Gradio UI with WebRTC for audio streaming
- **Signaling**: FastRTC on Render for WebRTC signaling
- **Backend**: FastAPI + WebSockets
- **Models**:
  - SpeechBrain ECAPA-TDNN for speaker embeddings
  - Automatic Speech Recognition for transcription

## License

MIT