Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -13,96 +13,99 @@ python_version: 3.10.12
|
|
13 |
|
14 |
# 🎙️ Modal Transcriber MCP
|
15 |
|
16 |
-
|
17 |
|
18 |
-
## ✨
|
19 |
|
20 |
-
- **🎵
|
21 |
-
- **🚀
|
22 |
-
- **🎤
|
23 |
-
- **⚡
|
24 |
-
- **🔧 FastMCP
|
25 |
-
- **☁️ Modal
|
26 |
|
27 |
-
## 🎯
|
28 |
|
29 |
-
### 🧠
|
30 |
-
-
|
31 |
-
- **Fallback
|
32 |
-
-
|
33 |
|
34 |
-
### 🎤
|
35 |
-
- **Embedding
|
36 |
-
-
|
37 |
-
-
|
38 |
|
39 |
-
### 🔧
|
40 |
-
- **MCP
|
41 |
-
- **REST API
|
42 |
-
- **Gradio UI
|
43 |
-
-
|
44 |
|
45 |
-
## 🚀
|
46 |
|
47 |
-
###
|
48 |
|
49 |
-
1.
|
50 |
```bash
|
51 |
git clone https://huggingface.co/spaces/Agents-MCP-Hackathon/ModalTranscriberMCP
|
52 |
cd ModalTranscriberMCP
|
53 |
```
|
54 |
|
55 |
-
2.
|
56 |
```bash
|
57 |
pip install -r requirements.txt
|
58 |
```
|
59 |
|
60 |
-
3.
|
61 |
```bash
|
62 |
-
#
|
63 |
echo "HF_TOKEN=your_huggingface_token_here" > .env
|
64 |
```
|
65 |
|
66 |
-
4.
|
67 |
```bash
|
68 |
python app.py
|
69 |
```
|
70 |
|
71 |
-
###
|
72 |
|
73 |
-
1.
|
74 |
-
2.
|
75 |
-
-
|
76 |
-
-
|
77 |
-
-
|
78 |
-
3.
|
79 |
|
80 |
-
## 🛠️
|
81 |
|
82 |
-
-
|
83 |
-
-
|
84 |
-
-
|
85 |
-
-
|
86 |
-
-
|
87 |
-
-
|
88 |
|
89 |
-
## 📊
|
90 |
|
91 |
-
-
|
92 |
-
-
|
93 |
-
-
|
94 |
-
-
|
95 |
|
96 |
-
## 🤝
|
97 |
|
98 |
-
|
99 |
|
100 |
-
## 📜
|
101 |
|
102 |
MIT License
|
103 |
|
104 |
-
## 🔗
|
105 |
|
106 |
-
-
|
107 |
-
-
|
108 |
-
- **Modal
|
|
|
|
|
|
|
|
13 |
|
14 |
# 🎙️ Modal Transcriber MCP
|
15 |
|
16 |
+
A powerful audio transcription system integrating Gradio UI, FastMCP Tools, and Modal cloud computing with intelligent speaker identification.
|
17 |
|
18 |
+
## ✨ Key Features
|
19 |
|
20 |
+
- **🎵 Multi-platform Audio Download**: Support for Apple Podcasts, XiaoYuZhou, and other podcast platforms
|
21 |
+
- **🚀 High-performance Transcription**: Based on OpenAI Whisper with multiple model support (turbo, large-v3, etc.)
|
22 |
+
- **🎤 Intelligent Speaker Identification**: Using pyannote.audio for speaker separation and embedding clustering
|
23 |
+
- **⚡ Distributed Processing**: Support for large file concurrent chunk processing, significantly improving processing speed
|
24 |
+
- **🔧 FastMCP Tools**: Complete MCP (Model Context Protocol) tool integration
|
25 |
+
- **☁️ Modal Deployment**: Support for both local and cloud deployment modes
|
26 |
|
27 |
+
## 🎯 Core Advantages
|
28 |
|
29 |
+
### 🧠 Intelligent Audio Segmentation
|
30 |
+
- **Silence Detection Segmentation**: Automatically identify silent segments in audio for intelligent chunking
|
31 |
+
- **Fallback Mechanism**: Long audio automatically degrades to time-based segmentation, ensuring processing efficiency
|
32 |
+
- **Concurrent Processing**: Multiple chunks processed simultaneously, dramatically improving transcription speed
|
33 |
|
34 |
+
### 🎤 Advanced Speaker Identification
|
35 |
+
- **Embedding Clustering**: Using deep learning embeddings for speaker consistency identification
|
36 |
+
- **Cross-chunk Unification**: Solving speaker label inconsistency issues in distributed processing
|
37 |
+
- **Quality Filtering**: Automatically filter low-quality segments to improve output accuracy
|
38 |
|
39 |
+
### 🔧 Developer Friendly
|
40 |
+
- **MCP Protocol Support**: Complete tool invocation interface
|
41 |
+
- **REST API**: Standardized API interface
|
42 |
+
- **Gradio UI**: Intuitive web interface
|
43 |
+
- **Test Coverage**: 29 unit tests and integration tests
|
44 |
|
45 |
+
## 🚀 Quick Start
|
46 |
|
47 |
+
### Local Setup
|
48 |
|
49 |
+
1. **Clone Repository**
|
50 |
```bash
|
51 |
git clone https://huggingface.co/spaces/Agents-MCP-Hackathon/ModalTranscriberMCP
|
52 |
cd ModalTranscriberMCP
|
53 |
```
|
54 |
|
55 |
+
2. **Install Dependencies**
|
56 |
```bash
|
57 |
pip install -r requirements.txt
|
58 |
```
|
59 |
|
60 |
+
3. **Configure Hugging Face Token** (Optional, for speaker identification)
|
61 |
```bash
|
62 |
+
# Create .env file
|
63 |
echo "HF_TOKEN=your_huggingface_token_here" > .env
|
64 |
```
|
65 |
|
66 |
+
4. **Start Application**
|
67 |
```bash
|
68 |
python app.py
|
69 |
```
|
70 |
|
71 |
+
### Usage Instructions
|
72 |
|
73 |
+
1. **Upload audio file** or **Input podcast URL**
|
74 |
+
2. **Select transcription options**:
|
75 |
+
- Model size: turbo (recommended) / large-v3
|
76 |
+
- Output format: SRT / TXT
|
77 |
+
- Enable speaker identification
|
78 |
+
3. **Start transcription**, the system will automatically process and generate results
|
79 |
|
80 |
+
## 🛠️ Technical Architecture
|
81 |
|
82 |
+
- **Frontend**: Gradio 4.44.0
|
83 |
+
- **Backend**: FastAPI + FastMCP
|
84 |
+
- **Transcription Engine**: OpenAI Whisper
|
85 |
+
- **Speaker Identification**: pyannote.audio
|
86 |
+
- **Cloud Computing**: Modal.com
|
87 |
+
- **Audio Processing**: FFmpeg
|
88 |
|
89 |
+
## 📊 Performance Metrics
|
90 |
|
91 |
+
- **Processing Speed**: Support for 30x real-time transcription speed
|
92 |
+
- **Concurrency**: Up to 10 chunks processed simultaneously
|
93 |
+
- **Accuracy**: Chinese accuracy >95%
|
94 |
+
- **Supported Formats**: MP3, WAV, M4A, FLAC, etc.
|
95 |
|
96 |
+
## 🤝 Contributing
|
97 |
|
98 |
+
Issues and Pull Requests are welcome!
|
99 |
|
100 |
+
## 📜 License
|
101 |
|
102 |
MIT License
|
103 |
|
104 |
+
## 🔗 Related Links
|
105 |
|
106 |
+
- **Project Documentation**: See `docs/` directory in the repository
|
107 |
+
- **Test Coverage**: 29 test cases ensuring functional stability
|
108 |
+
- **Modal Deployment**: Support for cloud high-performance processing
|
109 |
+
|
110 |
+
---
|
111 |
+
*Last updated: 2025-06-11*
|