metadata

title: Modal Transcriber MCP
emoji: 🎙️
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: mit
tag: mcp-server-track

🎙️ Modal Transcriber MCP

A powerful audio transcription system integrating Gradio UI, FastMCP Tools, and Modal cloud computing with intelligent speaker identification.

✨ Key Features

🎵 Multi-platform Audio Download: Support for Apple Podcasts, XiaoYuZhou, and other podcast platforms
🚀 High-performance Transcription: Based on OpenAI Whisper with multiple model support (turbo, large-v3, etc.)
🎤 Intelligent Speaker Identification: Using pyannote.audio for speaker separation and embedding clustering
⚡ Distributed Processing: Support for large file concurrent chunk processing, significantly improving processing speed
🔧 FastMCP Tools: Complete MCP (Model Context Protocol) tool integration
☁️ Modal Deployment: Support for both local and cloud deployment modes

Silence Detection Segmentation: Automatically identify silent segments in audio for intelligent chunking
Fallback Mechanism: Long audio automatically degrades to time-based segmentation, ensuring processing efficiency
Concurrent Processing: Multiple chunks processed simultaneously, dramatically improving transcription speed

Embedding Clustering: Using deep learning embeddings for speaker consistency identification
Cross-chunk Unification: Solving speaker label inconsistency issues in distributed processing
Quality Filtering: Automatically filter low-quality segments to improve output accuracy

git clone https://huggingface.co/spaces/Agents-MCP-Hackathon/ModalTranscriberMCP
cd ModalTranscriberMCP

pip install -r requirements.txt

# Create .env file
echo "HF_TOKEN=your_huggingface_token_here" > .env

python app.py

Upload audio file or Input podcast URL
Select transcription options:
- Model size: turbo (recommended) / large-v3
- Output format: SRT / TXT
- Enable speaker identification
Start transcription, the system will automatically process and generate results

Issues and Pull Requests are welcome!

MIT License

Last updated: 2025-06-11