Prathamesh Sarjerao Vaidya commited on
Commit
8c5e398
Β·
1 Parent(s): 5e6e4ea

made some changes

Browse files
Files changed (2) hide show
  1. DOCUMENTATION.md +75 -57
  2. README.md +25 -17
DOCUMENTATION.md CHANGED
@@ -193,63 +193,81 @@ These cached demo results ensure instant transcript, translation, and analytics
193
 
194
  ```mermaid
195
  graph TB
196
- subgraph "User Interface Layer"
197
- A[FastAPI Web Interface]
198
- B[Interactive Visualizations]
199
- C[Real-time Progress Tracking]
200
- D[Multi-format Downloads]
201
- end
202
-
203
- subgraph "Application Layer"
204
- E[AudioIntelligencePipeline]
205
- F[Model Preloader]
206
- G[Background Task Manager]
207
- H[API Endpoints]
208
- end
209
-
210
- subgraph "AI Processing Layer"
211
- I[Speaker Diarization]
212
- J[Speech Recognition]
213
- K[Neural Translation]
214
- L[Output Formatting]
215
- end
216
-
217
- subgraph "Data Layer"
218
- M[Model Cache]
219
- N[Audio Storage]
220
- O[Result Storage]
221
- P[Configuration]
222
- end
223
-
224
- subgraph "External Services"
225
- Q[HuggingFace Hub]
226
- R[pyannote.audio Models]
227
- S[Whisper Models]
228
- T[Translation Models]
229
- end
230
-
231
- A --> E
232
- B --> F
233
- C --> G
234
- D --> H
235
- E --> I
236
- E --> J
237
- E --> K
238
- E --> L
239
- I --> M
240
- J --> N
241
- K --> O
242
- L --> P
243
- F --> Q
244
- Q --> R
245
- Q --> S
246
- Q --> T
247
-
248
- E --> F
249
- F --> G
250
- G --> H
251
- M --> N
252
- N --> O
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
253
  ```
254
 
255
  **Key Architecture Features:**
 
193
 
194
  ```mermaid
195
  graph TB
196
+
197
+ %% Define classes for styling
198
+ classDef ui fill:#cce5ff,stroke:#004085,stroke-width:2px;
199
+ classDef app fill:#d4edda,stroke:#155724,stroke-width:2px;
200
+ classDef ai fill:#f8d7da,stroke:#721c24,stroke-width:2px;
201
+ classDef data fill:#fff3cd,stroke:#856404,stroke-width:2px;
202
+ classDef external fill:#e2e3e5,stroke:#383d41,stroke-width:2px;
203
+
204
+ %% UI Layer
205
+ subgraph "User Interface Layer"
206
+ A[FastAPI Web Interface]
207
+ B[Interactive Visualizations]
208
+ C[Real-time Progress Tracking]
209
+ D[Multi-format Downloads]
210
+ end
211
+ class A,B,C,D ui;
212
+
213
+ %% Application Layer
214
+ subgraph "Application Layer"
215
+ E[AudioIntelligencePipeline]
216
+ F[Model Preloader]
217
+ G[Background Task Manager]
218
+ H[API Endpoints]
219
+ end
220
+ class E,F,G,H app;
221
+
222
+ %% AI Processing Layer
223
+ subgraph "AI Processing Layer"
224
+ I[Speaker Diarization]
225
+ J[Speech Recognition]
226
+ K[Neural Translation]
227
+ L[Output Formatting]
228
+ end
229
+ class I,J,K,L ai;
230
+
231
+ %% Data Layer
232
+ subgraph "Data Layer"
233
+ M[Model Cache]
234
+ N[Audio Storage]
235
+ O[Result Storage]
236
+ P[Configuration]
237
+ end
238
+ class M,N,O,P data;
239
+
240
+ %% External Services
241
+ subgraph "External Services"
242
+ Q[HuggingFace Hub]
243
+ R[pyannote.audio Models]
244
+ S[Whisper Models]
245
+ T[Translation Models]
246
+ end
247
+ class Q,R,S,T external;
248
+
249
+ %% Connections
250
+ A --> E
251
+ B --> F
252
+ C --> G
253
+ D --> H
254
+ E --> I
255
+ E --> J
256
+ E --> K
257
+ E --> L
258
+ I --> M
259
+ J --> N
260
+ K --> O
261
+ L --> P
262
+ F --> Q
263
+ Q --> R
264
+ Q --> S
265
+ Q --> T
266
+ E --> F
267
+ F --> G
268
+ G --> H
269
+ M --> N
270
+ N --> O
271
  ```
272
 
273
  **Key Architecture Features:**
README.md CHANGED
@@ -40,6 +40,11 @@ The Multilingual Audio Intelligence System is an advanced AI-powered platform th
40
 
41
  ![Summary Output](/static/imgs/demo_res_summary.png)
42
 
 
 
 
 
 
43
  ## Installation and Quick Start
44
 
45
  1. **Clone the Repository:**
@@ -78,24 +83,27 @@ The Multilingual Audio Intelligence System is an advanced AI-powered platform th
78
  ## File Structure
79
 
80
  ```
81
- audio_challenge/
82
- β”œβ”€β”€ web_app.py # FastAPI application
83
- β”œβ”€β”€ run_fastapi.py # Startup script
84
- β”œβ”€β”€ requirements.txt # Dependencies
 
 
 
 
 
 
 
 
85
  β”œβ”€β”€ templates/
86
- β”‚ └── index.html # Main interface
87
- β”œβ”€β”€ src/ # Core modules
88
- β”‚ β”œβ”€β”€ main.py # Pipeline orchestrator
89
- β”‚ β”œβ”€β”€ audio_processor.py # Audio preprocessing
90
- β”‚ β”œβ”€β”€ speaker_diarizer.py # Speaker identification
91
- β”‚ β”œβ”€β”€ speech_recognizer.py # ASR with language detection
92
- β”‚ β”œβ”€β”€ translator.py # Neural machine translation
93
- β”‚ β”œβ”€β”€ output_formatter.py # Output generation
94
- β”‚ └── utils.py # Utility functions
95
- β”œβ”€β”€ static/ # Static assets
96
- β”œβ”€β”€ uploads/ # Uploaded files
97
- └── outputs/ # Generated outputs
98
- └── README.md
99
  ```
100
 
101
  ## Configuration
 
40
 
41
  ![Summary Output](/static/imgs/demo_res_summary.png)
42
 
43
+ ## Demo & Documentation
44
+
45
+ - πŸŽ₯ [Video Preview]()
46
+ - πŸ“„ [Project Documentation](DOCUMENTATION.md)
47
+
48
  ## Installation and Quick Start
49
 
50
  1. **Clone the Repository:**
 
83
  ## File Structure
84
 
85
  ```
86
+ Multilingual-Audio-Intelligence-System/
87
+ β”œβ”€β”€ web_app.py # FastAPI application with RESTful endpoints
88
+ β”œβ”€β”€ model_preloader.py # Intelligent model loading with progress tracking
89
+ β”œβ”€β”€ run_fastapi.py # Application startup script with preloading
90
+ β”œβ”€β”€ src/
91
+ β”‚ β”œβ”€β”€ main.py # AudioIntelligencePipeline orchestrator
92
+ β”‚ β”œβ”€β”€ audio_processor.py # Advanced audio preprocessing and normalization
93
+ β”‚ β”œβ”€β”€ speaker_diarizer.py # pyannote.audio integration for speaker identification
94
+ β”‚ β”œβ”€β”€ speech_recognizer.py # faster-whisper ASR with language detection
95
+ β”‚ β”œβ”€β”€ translator.py # Neural machine translation with multiple models
96
+ β”‚ β”œβ”€β”€ output_formatter.py # Multi-format result generation and export
97
+ β”‚ └── utils.py # Utility functions and performance monitoring
98
  β”œβ”€β”€ templates/
99
+ β”‚ └── index.html # Responsive web interface with home page
100
+ β”œβ”€β”€ static/ # Static assets and client-side resources
101
+ β”œβ”€β”€ model_cache/ # Intelligent model caching directory
102
+ β”œβ”€β”€ uploads/ # User audio file storage
103
+ β”œβ”€β”€ outputs/ # Generated results and downloads
104
+ β”œβ”€β”€ requirements.txt # Comprehensive dependency specification
105
+ β”œβ”€β”€ Dockerfile # Production-ready containerization
106
+ └── config.example.env # Environment configuration template
 
 
 
 
 
107
  ```
108
 
109
  ## Configuration