File size: 3,595 Bytes
0acf986 3499c7d 4bbc337 3499c7d 2f74fe7 4bbc337 0acf986 4bbc337 0acf986 3499c7d aad85c9 3499c7d aad85c9 3499c7d aad85c9 3499c7d aad85c9 3499c7d aad85c9 3499c7d aad85c9 3499c7d aad85c9 3499c7d aad85c9 3499c7d aad85c9 3499c7d aad85c9 3499c7d aad85c9 3499c7d aad85c9 3499c7d aad85c9 3499c7d aad85c9 3499c7d aad85c9 3499c7d aad85c9 3499c7d aad85c9 3499c7d aad85c9 3499c7d aad85c9 3499c7d aad85c9 3499c7d aad85c9 3499c7d aad85c9 3499c7d aad85c9 3499c7d aad85c9 3499c7d aad85c9 4bbc337 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
---
title: Modal Transcriber MCP
emoji: ποΈ
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: mit
tag: mcp-server-track
---
# ποΈ Modal Transcriber MCP
A powerful audio transcription system integrating Gradio UI, FastMCP Tools, and Modal cloud computing with intelligent speaker identification.
## β¨ Key Features
- **π΅ Multi-platform Audio Download**: Support for Apple Podcasts, XiaoYuZhou, and other podcast platforms
- **π High-performance Transcription**: Based on OpenAI Whisper with multiple model support (turbo, large-v3, etc.)
- **π€ Intelligent Speaker Identification**: Using pyannote.audio for speaker separation and embedding clustering
- **β‘ Distributed Processing**: Support for large file concurrent chunk processing, significantly improving processing speed
- **π§ FastMCP Tools**: Complete MCP (Model Context Protocol) tool integration
- **βοΈ Modal Deployment**: Support for both local and cloud deployment modes
## π― Core Advantages
### π§ Intelligent Audio Segmentation
- **Silence Detection Segmentation**: Automatically identify silent segments in audio for intelligent chunking
- **Fallback Mechanism**: Long audio automatically degrades to time-based segmentation, ensuring processing efficiency
- **Concurrent Processing**: Multiple chunks processed simultaneously, dramatically improving transcription speed
### π€ Advanced Speaker Identification
- **Embedding Clustering**: Using deep learning embeddings for speaker consistency identification
- **Cross-chunk Unification**: Solving speaker label inconsistency issues in distributed processing
- **Quality Filtering**: Automatically filter low-quality segments to improve output accuracy
### π§ Developer Friendly
- **MCP Protocol Support**: Complete tool invocation interface
- **REST API**: Standardized API interface
- **Gradio UI**: Intuitive web interface
- **Test Coverage**: 29 unit tests and integration tests
## π Quick Start
### Local Setup
1. **Clone Repository**
```bash
git clone https://huggingface.co/spaces/Agents-MCP-Hackathon/ModalTranscriberMCP
cd ModalTranscriberMCP
```
2. **Install Dependencies**
```bash
pip install -r requirements.txt
```
3. **Configure Hugging Face Token** (Optional, for speaker identification)
```bash
# Create .env file
echo "HF_TOKEN=your_huggingface_token_here" > .env
```
4. **Start Application**
```bash
python app.py
```
### Usage Instructions
1. **Upload audio file** or **Input podcast URL**
2. **Select transcription options**:
- Model size: turbo (recommended) / large-v3
- Output format: SRT / TXT
- Enable speaker identification
3. **Start transcription**, the system will automatically process and generate results
## π οΈ Technical Architecture
- **Frontend**: Gradio 4.44.0
- **Backend**: FastAPI + FastMCP
- **Transcription Engine**: OpenAI Whisper
- **Speaker Identification**: pyannote.audio
- **Cloud Computing**: Modal.com
- **Audio Processing**: FFmpeg
## π Performance Metrics
- **Processing Speed**: Support for 30x real-time transcription speed
- **Concurrency**: Up to 10 chunks processed simultaneously
- **Accuracy**: Chinese accuracy >95%
- **Supported Formats**: MP3, WAV, M4A, FLAC, etc.
## π€ Contributing
Issues and Pull Requests are welcome!
## π License
MIT License
## π Related Links
- **Project Documentation**: See `docs/` directory in the repository
- **Test Coverage**: 29 test cases ensuring functional stability
- **Modal Deployment**: Support for cloud high-performance processing
---
*Last updated: 2025-06-11* |