Spaces:

prathameshv07
/

Multilingual-Audio-Intelligence-System

Sleeping

App Files Files Community

Prathamesh Sarjerao Vaidya commited on 16 days ago

Commit

8c5e398

1 Parent(s): 5e6e4ea

made some changes

Browse files

Files changed (2) hide show

DOCUMENTATION.md +75 -57
README.md +25 -17

DOCUMENTATION.md CHANGED Viewed

@@ -193,63 +193,81 @@ These cached demo results ensure instant transcript, translation, and analytics
 ```mermaid
 graph TB
-    subgraph "User Interface Layer"
-        A[FastAPI Web Interface]
-        B[Interactive Visualizations]
-        C[Real-time Progress Tracking]
-        D[Multi-format Downloads]
-    end
-    subgraph "Application Layer"
-        E[AudioIntelligencePipeline]
-        F[Model Preloader]
-        G[Background Task Manager]
-        H[API Endpoints]
-    end
-    subgraph "AI Processing Layer"
-        I[Speaker Diarization]
-        J[Speech Recognition]
-        K[Neural Translation]
-        L[Output Formatting]
-    end
-    subgraph "Data Layer"
-        M[Model Cache]
-        N[Audio Storage]
-        O[Result Storage]
-        P[Configuration]
-    end
-    subgraph "External Services"
-        Q[HuggingFace Hub]
-        R[pyannote.audio Models]
-        S[Whisper Models]
-        T[Translation Models]
-    end
-    A --> E
-    B --> F
-    C --> G
-    D --> H
-    E --> I
-    E --> J
-    E --> K
-    E --> L
-    I --> M
-    J --> N
-    K --> O
-    L --> P
-    F --> Q
-    Q --> R
-    Q --> S
-    Q --> T
-    E --> F
-    F --> G
-    G --> H
-    M --> N
-    N --> O
 ```
 **Key Architecture Features:**

 ```mermaid
 graph TB
+%% Define classes for styling
+classDef ui fill:#cce5ff,stroke:#004085,stroke-width:2px;
+classDef app fill:#d4edda,stroke:#155724,stroke-width:2px;
+classDef ai fill:#f8d7da,stroke:#721c24,stroke-width:2px;
+classDef data fill:#fff3cd,stroke:#856404,stroke-width:2px;
+classDef external fill:#e2e3e5,stroke:#383d41,stroke-width:2px;
+%% UI Layer
+subgraph "User Interface Layer"
+    A[FastAPI Web Interface]
+    B[Interactive Visualizations]
+    C[Real-time Progress Tracking]
+    D[Multi-format Downloads]
+end
+class A,B,C,D ui;
+%% Application Layer
+subgraph "Application Layer"
+    E[AudioIntelligencePipeline]
+    F[Model Preloader]
+    G[Background Task Manager]
+    H[API Endpoints]
+end
+class E,F,G,H app;
+%% AI Processing Layer
+subgraph "AI Processing Layer"
+    I[Speaker Diarization]
+    J[Speech Recognition]
+    K[Neural Translation]
+    L[Output Formatting]
+end
+class I,J,K,L ai;
+%% Data Layer
+subgraph "Data Layer"
+    M[Model Cache]
+    N[Audio Storage]
+    O[Result Storage]
+    P[Configuration]
+end
+class M,N,O,P data;
+%% External Services
+subgraph "External Services"
+    Q[HuggingFace Hub]
+    R[pyannote.audio Models]
+    S[Whisper Models]
+    T[Translation Models]
+end
+class Q,R,S,T external;
+%% Connections
+A --> E
+B --> F
+C --> G
+D --> H
+E --> I
+E --> J
+E --> K
+E --> L
+I --> M
+J --> N
+K --> O
+L --> P
+F --> Q
+Q --> R
+Q --> S
+Q --> T
+E --> F
+F --> G
+G --> H
+M --> N
+N --> O
 ```
 **Key Architecture Features:**

README.md CHANGED Viewed

@@ -40,6 +40,11 @@ The Multilingual Audio Intelligence System is an advanced AI-powered platform th
 ![Summary Output](/static/imgs/demo_res_summary.png)
 ## Installation and Quick Start
 1. **Clone the Repository:**
@@ -78,24 +83,27 @@ The Multilingual Audio Intelligence System is an advanced AI-powered platform th
 ## File Structure
 ```
-audio_challenge/
-├── web_app.py               # FastAPI application
-├── run_fastapi.py           # Startup script
-├── requirements.txt         # Dependencies
 ├── templates/
-│   └── index.html           # Main interface
-├── src/                     # Core modules
-│   ├── main.py              # Pipeline orchestrator
-│   ├── audio_processor.py   # Audio preprocessing
-│   ├── speaker_diarizer.py  # Speaker identification
-│   ├── speech_recognizer.py # ASR with language detection
-│   ├── translator.py        # Neural machine translation
-│   ├── output_formatter.py  # Output generation
-│   └── utils.py             # Utility functions
-├── static/                  # Static assets
-├── uploads/                 # Uploaded files
-└── outputs/                 # Generated outputs
-└── README.md
 ```
 ## Configuration

 ![Summary Output](/static/imgs/demo_res_summary.png)
+## Demo & Documentation
+- 🎥 [Video Preview]()
+- 📄 [Project Documentation](DOCUMENTATION.md)
 ## Installation and Quick Start
 1. **Clone the Repository:**
 ## File Structure
 ```
+Multilingual-Audio-Intelligence-System/
+├── web_app.py                      # FastAPI application with RESTful endpoints
+├── model_preloader.py              # Intelligent model loading with progress tracking
+├── run_fastapi.py                  # Application startup script with preloading
+├── src/
+│   ├── main.py                     # AudioIntelligencePipeline orchestrator
+│   ├── audio_processor.py          # Advanced audio preprocessing and normalization
+│   ├── speaker_diarizer.py         # pyannote.audio integration for speaker identification
+│   ├── speech_recognizer.py        # faster-whisper ASR with language detection
+│   ├── translator.py               # Neural machine translation with multiple models
+│   ├── output_formatter.py         # Multi-format result generation and export
+│   └── utils.py                    # Utility functions and performance monitoring
 ├── templates/
+│   └── index.html                  # Responsive web interface with home page
+├── static/                         # Static assets and client-side resources
+├── model_cache/                    # Intelligent model caching directory
+├── uploads/                        # User audio file storage
+├── outputs/                        # Generated results and downloads
+├── requirements.txt                # Comprehensive dependency specification
+├── Dockerfile                      # Production-ready containerization
+└── config.example.env              # Environment configuration template
 ```
 ## Configuration