nightey3s commited on
Commit
80f71e5
·
unverified ·
1 Parent(s): 005a4bc

Add application file

Browse files
Files changed (8) hide show
  1. .dockerignore +13 -0
  2. Dockerfile +38 -0
  3. README.md +276 -12
  4. docker-compose.yml +45 -0
  5. environment.yml +15 -0
  6. profanity_detector.py +822 -0
  7. requirements.txt +9 -0
  8. test_text.md +50 -0
.dockerignore ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .git
2
+ .gitignore
3
+ __pycache__/
4
+ *.py[cod]
5
+ *$py.class
6
+ *.so
7
+ .env
8
+ .venv
9
+ env/
10
+ venv/
11
+ ENV/
12
+ *.log
13
+ temp_*.wav
Dockerfile ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Use PyTorch as base image (comes with CUDA support in the GPU variant)
2
+ FROM pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime
3
+
4
+ # Set working directory
5
+ WORKDIR /app
6
+
7
+ # Set environment variables
8
+ ENV PYTHONDONTWRITEBYTECODE=1 \
9
+ PYTHONUNBUFFERED=1 \
10
+ KMP_DUPLICATE_LIB_OK=TRUE \
11
+ DEBIAN_FRONTEND=noninteractive \
12
+ TZ=UTC
13
+
14
+ # Install system dependencies
15
+ RUN apt-get update && apt-get install -y --no-install-recommends \
16
+ ffmpeg \
17
+ libsndfile1 \
18
+ build-essential \
19
+ && apt-get clean \
20
+ && rm -rf /var/lib/apt/lists/*
21
+
22
+ # Create directory for model caching
23
+ RUN mkdir -p /root/.cache/huggingface
24
+
25
+ # Copy requirements file
26
+ COPY requirements.txt .
27
+
28
+ # Install Python dependencies
29
+ RUN pip install --no-cache-dir -r requirements.txt
30
+
31
+ # Copy application code
32
+ COPY profanity_detector.py .
33
+
34
+ # Expose the Gradio port
35
+ EXPOSE 7860
36
+
37
+ # Command to run the application
38
+ CMD ["python", "profanity_detector.py"]
README.md CHANGED
@@ -1,12 +1,276 @@
1
- ---
2
- title: Profanity Detection
3
- emoji: 🐠
4
- colorFrom: pink
5
- colorTo: indigo
6
- sdk: docker
7
- pinned: false
8
- license: mit
9
- short_description: A multimodal AI system that detects and rephrases profanity.
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Profanity Detection in Speech and Text
2
+
3
+ A robust multimodal system for detecting and rephrasing profanity in both speech and text, leveraging advanced NLP models to ensure accurate filtering while preserving conversational context.
4
+
5
+ ![Profanity Detection System](https://img.shields.io/badge/AI-NLP%20System-blue)
6
+ ![Python](https://img.shields.io/badge/Python-3.12%2B-green)
7
+ ![Transformers](https://img.shields.io/badge/HuggingFace-Transformers-yellow)
8
+
9
+ ## 📋 Features
10
+
11
+ - **Multimodal Analysis**: Process both written text and spoken audio
12
+ - **Context-Aware Detection**: Goes beyond simple keyword matching
13
+ - **Automatic Content Refinement**: Intelligently rephrases content while preserving meaning
14
+ - **Audio Synthesis**: Converts rephrased content into high-quality spoken audio
15
+ - **Classification System**: Categorises content by toxicity levels
16
+ - **User-Friendly Interface**: Intuitive Gradio-based UI
17
+ - **Real-time Streaming**: Process audio in real-time as you speak
18
+ - **Adjustable Sensitivity**: Fine-tune profanity detection threshold
19
+ - **Visual Highlighting**: Instantly identify problematic words with visual highlighting
20
+ - **Toxicity Classification**: Automatically categorize content from "No Toxicity" to "Severe Toxicity"
21
+ - **Performance Optimization**: Half-precision support for improved GPU memory efficiency
22
+
23
+ ## 🧠 Models Used
24
+
25
+ The system leverages four powerful models:
26
+
27
+ 1. **Profanity Detection**: `parsawar/profanity_model_3.1` - A RoBERTa-based model trained for offensive language detection
28
+ 2. **Content Refinement**: `s-nlp/t5-paranmt-detox` - A T5-based model for rephrasing offensive language
29
+ 3. **Speech-to-Text**: OpenAI's `Whisper` (large) - For transcribing spoken audio
30
+ 4. **Text-to-Speech**: Microsoft's `SpeechT5` - For converting rephrased text back to audio
31
+
32
+ ## 🔧 Installation
33
+
34
+ ### Prerequisites
35
+
36
+ - Python 3.10+
37
+ - CUDA-compatible GPU recommended (but CPU mode works too)
38
+ - FFmpeg for audio processing
39
+
40
+ ### Option 1: Using Conda (Recommended for Local Development)
41
+
42
+ ```bash
43
+ # Clone the repository
44
+ git clone https://github.com/yourusername/profanity-detection.git
45
+ cd profanity-detection
46
+
47
+ # Method A: Create environment from environment.yml (recommended)
48
+ conda env create -f environment.yml
49
+ conda activate llm_project
50
+
51
+ # Method B: Create a new conda environment manually
52
+ conda create -n profanity-detection python=3.10
53
+ conda activate profanity-detection
54
+
55
+ # Install PyTorch with CUDA support (adjust CUDA version if needed)
56
+ conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
57
+
58
+ # Install FFmpeg for audio processing
59
+ conda install -c conda-forge ffmpeg
60
+
61
+ # Install Pillow properly to avoid DLL errors
62
+ conda install -c conda-forge pillow
63
+
64
+ # Install additional dependencies
65
+ pip install -r requirements.txt
66
+
67
+ # Set environment variable to avoid OpenMP conflicts (recommended)
68
+ conda env config vars set KMP_DUPLICATE_LIB_OK=TRUE
69
+ conda activate profanity-detection # Re-activate to apply the variable
70
+ ```
71
+
72
+ ### Option 2: Using Docker
73
+
74
+ ```bash
75
+ # Clone the repository
76
+ git clone https://github.com/yourusername/profanity-detection.git
77
+ cd profanity-detection
78
+
79
+ # Build and run the Docker container
80
+ docker-compose build --no-cache
81
+
82
+ docker-compose up
83
+ ```
84
+
85
+ ## 🚀 Usage
86
+
87
+ ### Running the Application
88
+
89
+ ```bash
90
+ # Set environment variable to avoid OpenMP conflicts (if not set in conda config)
91
+ # For Windows:
92
+ set KMP_DUPLICATE_LIB_OK=TRUE
93
+
94
+ # For Linux/Mac:
95
+ export KMP_DUPLICATE_LIB_OK=TRUE
96
+
97
+ # Run the application
98
+ python profanity_detector.py
99
+ ```
100
+
101
+ The Gradio interface will be accessible at http://127.0.0.1:7860 in your browser.
102
+
103
+ ### Using the Interface
104
+
105
+ 1. **Initialise Models**
106
+ - Click the "Initialize Models" button when you first open the interface
107
+ - Wait for all models to load (this may take a few minutes on first run)
108
+
109
+ 2. **Text Analysis Tab**
110
+ - Enter text into the text box
111
+ - Adjust the "Profanity Detection Sensitivity" slider if needed
112
+ - Click "Analyze Text"
113
+ - View results including profanity score, toxicity classification, and rephrased content
114
+ - See highlighted profane words in the text
115
+ - Listen to the audio version of the rephrased content
116
+
117
+ 3. **Audio Analysis Tab**
118
+ - Upload an audio file or record directly using your microphone
119
+ - Click "Analyze Audio"
120
+ - View transcription, profanity analysis, and rephrased content
121
+ - Listen to the cleaned audio version of the rephrased content
122
+
123
+ 4. **Real-time Streaming Tab**
124
+ - Click "Start Real-time Processing"
125
+ - Speak into your microphone
126
+ - Watch as your speech is transcribed, analyzed, and rephrased in real-time
127
+ - Listen to the clean audio output
128
+ - Click "Stop Real-time Processing" when finished
129
+
130
+ ## 🔧 Deployment Options
131
+
132
+ ### Local Deployment with Conda
133
+
134
+ For the best development experience with fine-grained control:
135
+
136
+ ```bash
137
+ # Create and configure environment
138
+ conda env create -f environment.yml
139
+ conda activate llm_project
140
+
141
+ # Run with sharing enabled (accessible from other devices)
142
+ python profanity_detector.py
143
+ ```
144
+
145
+ ### Docker Deployment (Production)
146
+
147
+ For containerised deployment with predictable environment:
148
+
149
+ #### Basic CPU Deployment
150
+ ```bash
151
+ docker-compose up --build
152
+ ```
153
+
154
+ #### GPU-Accelerated Deployment
155
+ ```bash
156
+ # Automatic detection (recommended)
157
+ docker-compose up --build
158
+
159
+ # Or explicitly request GPU mode
160
+ docker-compose up --build profanity-detector-gpu
161
+ ```
162
+
163
+ No need to edit any configuration files - the system will automatically detect and use your GPU if available.
164
+
165
+ #### Custom Port Configuration
166
+ To change the default port (7860):
167
+ 1. Edit docker-compose.yml and change the port mapping (e.g., "8080:7860")
168
+ 2. Run `docker-compose up --build`
169
+
170
+ ## ⚠️ Troubleshooting
171
+
172
+ ### OpenMP Runtime Conflict
173
+
174
+ If you encounter this error:
175
+ ```
176
+ OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
177
+ ```
178
+
179
+ **Solutions:**
180
+
181
+ 1. **Temporary fix**: Set environment variable before running:
182
+ ```bash
183
+ set KMP_DUPLICATE_LIB_OK=TRUE # Windows
184
+ export KMP_DUPLICATE_LIB_OK=TRUE # Linux/Mac
185
+ ```
186
+
187
+ 2. **Code-based fix**: Add to the beginning of your script:
188
+ ```python
189
+ import os
190
+ os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'
191
+ ```
192
+
193
+ 3. **Permanent fix for Conda environment**:
194
+ ```bash
195
+ conda env config vars set KMP_DUPLICATE_LIB_OK=TRUE -n profanity-detection
196
+ conda deactivate
197
+ conda activate profanity-detection
198
+ ```
199
+
200
+ ### GPU Memory Issues
201
+
202
+ If you encounter CUDA out of memory errors:
203
+
204
+ 1. Use smaller models:
205
+ ```python
206
+ # Change Whisper from "large" to "medium" or "small"
207
+ whisper_model = whisper.load_model("medium").to(device)
208
+
209
+ # Keep the TTS model on CPU to save GPU memory
210
+ tts_model = SpeechT5ForTextToSpeech.from_pretrained(TTS_MODEL) # CPU mode
211
+ ```
212
+
213
+ 2. Run some models on CPU instead of GPU:
214
+ ```python
215
+ # Remove .to(device) to keep model on CPU
216
+ t5_model = AutoModelForSeq2SeqLM.from_pretrained(T5_MODEL) # CPU mode
217
+ ```
218
+
219
+ 3. Use Docker with specific GPU memory limits:
220
+ ```yaml
221
+ # In docker-compose.yml
222
+ deploy:
223
+ resources:
224
+ reservations:
225
+ devices:
226
+ - driver: nvidia
227
+ count: 1
228
+ capabilities: [gpu]
229
+ options:
230
+ memory: 4G # Limit to 4GB of GPU memory
231
+ ```
232
+
233
+ ### Docker-Specific Issues
234
+
235
+ 1. **Permission issues with mounted volumes**:
236
+ ```bash
237
+ # Fix permissions (Linux/Mac)
238
+ sudo chown -R $USER:$USER .
239
+ ```
240
+
241
+ 2. **No GPU access in container**:
242
+ - Verify NVIDIA Container Toolkit installation
243
+ - Check GPU driver compatibility
244
+ - Run `nvidia-smi` on the host to confirm GPU availability
245
+
246
+ ### First-Time Slowness
247
+
248
+ When first run, the application downloads all models, which may take time. Subsequent runs will be faster as models are cached locally. The text-to-speech model requires additional download time on first use.
249
+
250
+ ## 📄 Project Structure
251
+
252
+ ```
253
+ profanity-detection/
254
+ ├── profanity_detector.py # Main application file
255
+ ├── Dockerfile # For containerised deployment
256
+ ├── docker-compose.yml # Container orchestration
257
+ ├── requirements.txt # Python dependencies
258
+ ├── environment.yml # Conda environment specification
259
+ └── README.md # This file
260
+ ```
261
+
262
+ ## 📚 References
263
+
264
+ - [HuggingFace Transformers](https://huggingface.co/docs/transformers/index)
265
+ - [OpenAI Whisper](https://github.com/openai/whisper)
266
+ - [Microsoft SpeechT5](https://huggingface.co/microsoft/speecht5_tts)
267
+ - [Gradio Documentation](https://gradio.app/docs/)
268
+
269
+ ## 📝 License
270
+
271
+ This project is licensed under the MIT License - see the LICENSE file for details.
272
+
273
+ ## 🙏 Acknowledgments
274
+
275
+ - This project utilises models from HuggingFace Hub, Microsoft, and OpenAI
276
+ - Inspired by research in content moderation and responsible AI
docker-compose.yml ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ version: '3.8'
2
+
3
+ services:
4
+ # Main service configuration - automatically uses GPU if available
5
+ profanity-detector:
6
+ build:
7
+ context: .
8
+ dockerfile: Dockerfile
9
+ ports:
10
+ - "7860:7860"
11
+ volumes:
12
+ - huggingface-cache:/root/.cache/huggingface
13
+ - ./:/app # Mount current directory for development
14
+ environment:
15
+ - KMP_DUPLICATE_LIB_OK=TRUE
16
+ command: python profanity_detector.py
17
+ deploy:
18
+ resources:
19
+ reservations:
20
+ devices:
21
+ - driver: nvidia
22
+ count: 1
23
+ capabilities: [gpu]
24
+ restart: unless-stopped
25
+
26
+ # Explicit CPU-only configuration for when GPU causes issues
27
+ profanity-detector-cpu:
28
+ build:
29
+ context: .
30
+ dockerfile: Dockerfile
31
+ ports:
32
+ - "7860:7860"
33
+ volumes:
34
+ - huggingface-cache:/root/.cache/huggingface
35
+ - ./:/app # Mount current directory for development
36
+ environment:
37
+ - KMP_DUPLICATE_LIB_OK=TRUE
38
+ - CUDA_VISIBLE_DEVICES=-1 # Disable CUDA
39
+ command: python profanity_detector.py
40
+ profiles:
41
+ - cpu-only
42
+ restart: unless-stopped
43
+
44
+ volumes:
45
+ huggingface-cache:
environment.yml ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: profanity-detection
2
+ channels:
3
+ - https://repo.anaconda.com/pkgs/main
4
+ - https://repo.anaconda.com/pkgs/r
5
+ - https://repo.anaconda.com/pkgs/msys2
6
+ dependencies:
7
+ - python=3.10
8
+ - pytorch
9
+ - pytorch-cuda=11.8
10
+ - torchaudio
11
+ - torchvision
12
+ - ffmpeg
13
+ variables:
14
+ KMP_DUPLICATE_LIB_OK: 'TRUE'
15
+ prefix: C:\Users\brian\anaconda3\envs\profanity-detection
profanity_detector.py ADDED
@@ -0,0 +1,822 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoModelForSeq2SeqLM
3
+ from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
4
+ import whisper
5
+ import gradio as gr
6
+ import re
7
+ import pandas as pd
8
+ import numpy as np
9
+ import os
10
+ import time
11
+ import logging
12
+ import threading
13
+ import queue
14
+ from scipy.io.wavfile import write as write_wav
15
+ from html import escape
16
+ import traceback
17
+
18
+ # Configure logging
19
+ logging.basicConfig(
20
+ level=logging.INFO,
21
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
22
+ handlers=[logging.StreamHandler()]
23
+ )
24
+ logger = logging.getLogger('profanity_detector')
25
+
26
+ # Define device at the top of the script (global scope)
27
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
28
+ logger.info(f"Using device: {device}")
29
+
30
+ # Global variables for models
31
+ profanity_model = None
32
+ profanity_tokenizer = None
33
+ t5_model = None
34
+ t5_tokenizer = None
35
+ whisper_model = None
36
+ tts_processor = None
37
+ tts_model = None
38
+ vocoder = None
39
+ models_loaded = False
40
+
41
+ # Default speaker embeddings for TTS
42
+ speaker_embeddings = None
43
+
44
+ # Queue for real-time audio processing
45
+ audio_queue = queue.Queue()
46
+ processing_active = False
47
+
48
+ # Model loading with int8 quantization
49
+ def load_models():
50
+ global profanity_model, profanity_tokenizer, t5_model, t5_tokenizer, whisper_model
51
+ global tts_processor, tts_model, vocoder, speaker_embeddings, models_loaded
52
+
53
+ try:
54
+ logger.info("Loading profanity detection model...")
55
+ PROFANITY_MODEL = "parsawar/profanity_model_3.1"
56
+ profanity_tokenizer = AutoTokenizer.from_pretrained(PROFANITY_MODEL)
57
+
58
+ # Load model with memory optimization using half-precision
59
+ profanity_model = AutoModelForSequenceClassification.from_pretrained(PROFANITY_MODEL)
60
+
61
+ # Move to GPU if available and optimize with half-precision where possible
62
+ if torch.cuda.is_available():
63
+ profanity_model = profanity_model.to(device)
64
+ # Convert to half precision to save memory (if possible)
65
+ try:
66
+ profanity_model = profanity_model.half() # Convert to FP16
67
+ logger.info("Successfully converted profanity model to half precision")
68
+ except Exception as e:
69
+ logger.warning(f"Could not convert to half precision: {str(e)}")
70
+
71
+ logger.info("Loading detoxification model...")
72
+ T5_MODEL = "s-nlp/t5-paranmt-detox"
73
+ t5_tokenizer = AutoTokenizer.from_pretrained(T5_MODEL)
74
+
75
+ # Load model with memory optimization
76
+ t5_model = AutoModelForSeq2SeqLM.from_pretrained(T5_MODEL)
77
+
78
+ # Move to GPU if available and optimize with half-precision where possible
79
+ if torch.cuda.is_available():
80
+ t5_model = t5_model.to(device)
81
+ # Convert to half precision to save memory (if possible)
82
+ try:
83
+ t5_model = t5_model.half() # Convert to FP16
84
+ logger.info("Successfully converted T5 model to half precision")
85
+ except Exception as e:
86
+ logger.warning(f"Could not convert to half precision: {str(e)}")
87
+
88
+ logger.info("Loading Whisper speech-to-text model...")
89
+ whisper_model = whisper.load_model("large")
90
+ if torch.cuda.is_available():
91
+ whisper_model = whisper_model.to(device)
92
+
93
+ logger.info("Loading Text-to-Speech model...")
94
+ TTS_MODEL = "microsoft/speecht5_tts"
95
+ tts_processor = SpeechT5Processor.from_pretrained(TTS_MODEL)
96
+ # Load TTS models without automatic device mapping
97
+ tts_model = SpeechT5ForTextToSpeech.from_pretrained(TTS_MODEL)
98
+ vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")
99
+
100
+ # Move models to appropriate device
101
+ if torch.cuda.is_available():
102
+ tts_model = tts_model.to(device)
103
+ vocoder = vocoder.to(device)
104
+
105
+ # Speaker embeddings for TTS
106
+ speaker_embeddings = torch.zeros((1, 512))
107
+ if torch.cuda.is_available():
108
+ speaker_embeddings = speaker_embeddings.to(device)
109
+
110
+ models_loaded = True
111
+ logger.info("All models loaded successfully.")
112
+
113
+ return "Models loaded successfully."
114
+ except Exception as e:
115
+ error_msg = f"Error loading models: {str(e)}\n{traceback.format_exc()}"
116
+ logger.error(error_msg)
117
+ return error_msg
118
+
119
+ def detect_profanity(text: str, threshold: float = 0.5):
120
+ """
121
+ Detect profanity in text with adjustable threshold
122
+
123
+ Args:
124
+ text: The input text to analyze
125
+ threshold: Profanity detection threshold (0.0-1.0)
126
+
127
+ Returns:
128
+ Dictionary with analysis results
129
+ """
130
+ if not models_loaded:
131
+ return {"error": "Models not loaded yet. Please wait."}
132
+
133
+ try:
134
+ # Detect profanity and score
135
+ inputs = profanity_tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
136
+ if torch.cuda.is_available():
137
+ inputs = inputs.to(device)
138
+
139
+ with torch.no_grad():
140
+ outputs = profanity_model(**inputs).logits
141
+ score = torch.nn.functional.softmax(outputs, dim=1)[0][1].item()
142
+
143
+ # Identify specific profane words
144
+ words = re.findall(r'\b\w+\b', text)
145
+ profane_words = []
146
+ word_scores = {}
147
+
148
+ if score > threshold:
149
+ for word in words:
150
+ if len(word) < 2: # Skip very short words
151
+ continue
152
+
153
+ word_inputs = profanity_tokenizer(word, return_tensors="pt", truncation=True, max_length=512)
154
+ if torch.cuda.is_available():
155
+ word_inputs = word_inputs.to(device)
156
+
157
+ with torch.no_grad():
158
+ word_outputs = profanity_model(**word_inputs).logits
159
+ word_score = torch.nn.functional.softmax(word_outputs, dim=1)[0][1].item()
160
+ word_scores[word] = word_score
161
+
162
+ if word_score > threshold:
163
+ profane_words.append(word.lower())
164
+
165
+ # Create highlighted version of the text
166
+ highlighted_text = create_highlighted_text(text, profane_words)
167
+
168
+ return {
169
+ "text": text,
170
+ "score": score,
171
+ "profanity": score > threshold,
172
+ "profane_words": profane_words,
173
+ "highlighted_text": highlighted_text,
174
+ "word_scores": word_scores
175
+ }
176
+ except Exception as e:
177
+ error_msg = f"Error in profanity detection: {str(e)}"
178
+ logger.error(error_msg)
179
+ return {"error": error_msg, "text": text, "score": 0, "profanity": False}
180
+
181
+ def create_highlighted_text(text, profane_words):
182
+ """
183
+ Create HTML-formatted text with profane words highlighted
184
+ """
185
+ if not profane_words:
186
+ return escape(text)
187
+
188
+ # Create a regex pattern matching any of the profane words (case insensitive)
189
+ pattern = r'\b(' + '|'.join(re.escape(word) for word in profane_words) + r')\b'
190
+
191
+ # Replace occurrences with highlighted versions
192
+ def highlight_match(match):
193
+ return f'<span style="background-color: rgba(255, 0, 0, 0.3); padding: 0px 2px; border-radius: 3px;">{match.group(0)}</span>'
194
+
195
+ highlighted = re.sub(pattern, highlight_match, text, flags=re.IGNORECASE)
196
+ return highlighted
197
+
198
+ def rephrase_profanity(text):
199
+ """
200
+ Rephrase text containing profanity
201
+ """
202
+ if not models_loaded:
203
+ return "Models not loaded yet. Please wait."
204
+
205
+ try:
206
+ # Rephrase using the detoxification model
207
+ inputs = t5_tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
208
+ if torch.cuda.is_available():
209
+ inputs = inputs.to(device)
210
+
211
+ # Use more conservative generation settings with error handling
212
+ try:
213
+ outputs = t5_model.generate(
214
+ **inputs,
215
+ max_length=512,
216
+ num_beams=4, # Reduced from 5 to be more memory-efficient
217
+ early_stopping=True,
218
+ no_repeat_ngram_size=2,
219
+ length_penalty=1.0
220
+ )
221
+ rephrased_text = t5_tokenizer.decode(outputs[0], skip_special_tokens=True)
222
+
223
+ # Verify the output is reasonable
224
+ if not rephrased_text or len(rephrased_text) < 3:
225
+ logger.warning(f"T5 model produced unusable output: '{rephrased_text}'")
226
+ return text # Return original if output is too short
227
+
228
+ return rephrased_text.strip()
229
+
230
+ except RuntimeError as e:
231
+ # Handle potential CUDA out of memory error
232
+ if "CUDA out of memory" in str(e):
233
+ logger.warning("CUDA out of memory in T5 model. Trying with smaller beam size...")
234
+ # Try again with smaller beam size
235
+ outputs = t5_model.generate(
236
+ **inputs,
237
+ max_length=512,
238
+ num_beams=2, # Use smaller beam size
239
+ early_stopping=True
240
+ )
241
+ rephrased_text = t5_tokenizer.decode(outputs[0], skip_special_tokens=True)
242
+ return rephrased_text.strip()
243
+ else:
244
+ raise e # Re-raise if it's not a memory issue
245
+
246
+ except Exception as e:
247
+ error_msg = f"Error in rephrasing: {str(e)}"
248
+ logger.error(error_msg)
249
+ return text # Return original text if rephrasing fails
250
+
251
+ def text_to_speech(text):
252
+ """
253
+ Convert text to speech using SpeechT5
254
+ """
255
+ if not models_loaded:
256
+ return None
257
+
258
+ try:
259
+ # Create a temporary file path to save the audio
260
+ temp_file = f"temp_tts_output_{int(time.time())}.wav"
261
+
262
+ # Process the text input
263
+ inputs = tts_processor(text=text, return_tensors="pt")
264
+ if torch.cuda.is_available():
265
+ inputs = inputs.to(device)
266
+
267
+ # Generate speech with a fixed speaker embedding
268
+ speech = tts_model.generate_speech(
269
+ inputs["input_ids"],
270
+ speaker_embeddings,
271
+ vocoder=vocoder
272
+ )
273
+
274
+ # Convert from PyTorch tensor to NumPy array
275
+ speech_np = speech.cpu().numpy()
276
+
277
+ # Save as WAV file (sampling rate is 16kHz for SpeechT5)
278
+ write_wav(temp_file, 16000, speech_np)
279
+
280
+ return temp_file
281
+ except Exception as e:
282
+ error_msg = f"Error in text-to-speech conversion: {str(e)}"
283
+ logger.error(error_msg)
284
+ return None
285
+
286
+ def text_analysis(input_text, threshold=0.5):
287
+ """
288
+ Analyze text for profanity with adjustable threshold
289
+ """
290
+ if not models_loaded:
291
+ return "Models not loaded yet. Please wait for initialization to complete.", None, None
292
+
293
+ try:
294
+ # Detect profanity with the given threshold
295
+ result = detect_profanity(input_text, threshold=threshold)
296
+
297
+ # Handle error case
298
+ if "error" in result:
299
+ return result["error"], None, None
300
+
301
+ # Process results
302
+ if result["profanity"]:
303
+ clean_text = rephrase_profanity(input_text)
304
+ profane_words_str = ", ".join(result["profane_words"])
305
+
306
+ toxicity_score = result["score"]
307
+
308
+ classification = (
309
+ "Severe Toxicity" if toxicity_score >= 0.7 else
310
+ "Moderate Toxicity" if toxicity_score >= 0.5 else
311
+ "Mild Toxicity" if toxicity_score >= 0.35 else
312
+ "Minimal Toxicity" if toxicity_score >= 0.2 else
313
+ "No Toxicity"
314
+ )
315
+
316
+ # Generate audio for the rephrased text
317
+ audio_output = text_to_speech(clean_text)
318
+
319
+ return (
320
+ f"Profanity Score: {result['score']:.4f}\n\n"
321
+ f"Profane: {result['profanity']}\n"
322
+ f"Classification: {classification}\n"
323
+ f"Detected Profane Words: {profane_words_str}\n\n"
324
+ f"Reworded: {clean_text}"
325
+ ), result["highlighted_text"], audio_output
326
+ else:
327
+ # If no profanity detected, just convert the original text to speech
328
+ audio_output = text_to_speech(input_text)
329
+
330
+ return (
331
+ f"Profanity Score: {result['score']:.4f}\n"
332
+ f"Profane: {result['profanity']}\n"
333
+ f"Classification: No Toxicity"
334
+ ), None, audio_output
335
+ except Exception as e:
336
+ error_msg = f"Error in text analysis: {str(e)}\n{traceback.format_exc()}"
337
+ logger.error(error_msg)
338
+ return error_msg, None, None
339
+
340
+ def analyze_audio(audio_path, threshold=0.5):
341
+ """
342
+ Analyze audio for profanity with adjustable threshold
343
+ """
344
+ if not models_loaded:
345
+ return "Models not loaded yet. Please wait for initialization to complete.", None, None
346
+
347
+ if not audio_path:
348
+ return "No audio provided.", None, None
349
+
350
+ try:
351
+ # Transcribe audio
352
+ result = whisper_model.transcribe(audio_path, fp16=torch.cuda.is_available())
353
+ text = result["text"]
354
+
355
+ # Detect profanity with user-defined threshold
356
+ analysis = detect_profanity(text, threshold=threshold)
357
+
358
+ # Handle error case
359
+ if "error" in analysis:
360
+ return f"Error during analysis: {analysis['error']}\nTranscription: {text}", None, None
361
+
362
+ if analysis["profanity"]:
363
+ clean_text = rephrase_profanity(text)
364
+ else:
365
+ clean_text = text
366
+
367
+ # Generate audio for the rephrased text
368
+ audio_output = text_to_speech(clean_text)
369
+
370
+ return (
371
+ f"Transcription: {text}\n\n"
372
+ f"Profanity Score: {analysis['score']:.4f}\n"
373
+ f"Profane: {'Yes' if analysis['profanity'] else 'No'}\n"
374
+ f"Classification: {'Severe Toxicity' if analysis['score'] >= 0.7 else 'Moderate Toxicity' if analysis['score'] >= 0.5 else 'Mild Toxicity' if analysis['score'] >= 0.35 else 'Minimal Toxicity' if analysis['score'] >= 0.2 else 'No Toxicity'}\n"
375
+ f"Profane Words: {', '.join(analysis['profane_words']) if analysis['profanity'] else 'None'}\n\n"
376
+ f"Reworded: {clean_text}"
377
+ ), analysis["highlighted_text"] if analysis["profanity"] else None, audio_output
378
+ except Exception as e:
379
+ error_msg = f"Error in audio analysis: {str(e)}\n{traceback.format_exc()}"
380
+ logger.error(error_msg)
381
+ return error_msg, None, None
382
+
383
+ # Global variables to store streaming results
384
+ stream_results = {
385
+ "transcript": "",
386
+ "profanity_info": "",
387
+ "clean_text": "",
388
+ "audio_output": None
389
+ }
390
+
391
+ def process_stream_chunk(audio_chunk):
392
+ """Process an audio chunk from the streaming interface"""
393
+ global stream_results, processing_active
394
+
395
+ if not processing_active or not models_loaded:
396
+ return stream_results["transcript"], stream_results["profanity_info"], stream_results["clean_text"], stream_results["audio_output"]
397
+
398
+ try:
399
+ # The format of audio_chunk from Gradio streaming can vary
400
+ # It can be: (numpy_array, sample_rate), (filepath, sample_rate, numpy_array) or just numpy_array
401
+ # Let's handle all possible cases
402
+
403
+ if audio_chunk is None:
404
+ # No audio received
405
+ return stream_results["transcript"], stream_results["profanity_info"], stream_results["clean_text"], stream_results["audio_output"]
406
+
407
+ # Different Gradio versions return different formats
408
+ temp_file = None
409
+
410
+ if isinstance(audio_chunk, tuple):
411
+ if len(audio_chunk) == 2:
412
+ # Format: (numpy_array, sample_rate)
413
+ samples, sample_rate = audio_chunk
414
+ temp_file = f"temp_stream_{int(time.time())}.wav"
415
+ write_wav(temp_file, sample_rate, samples)
416
+ elif len(audio_chunk) == 3:
417
+ # Format: (filepath, sample_rate, numpy_array)
418
+ filepath, sample_rate, samples = audio_chunk
419
+ # Use the provided filepath if it exists
420
+ if os.path.exists(filepath):
421
+ temp_file = filepath
422
+ else:
423
+ # Create our own file
424
+ temp_file = f"temp_stream_{int(time.time())}.wav"
425
+ write_wav(temp_file, sample_rate, samples)
426
+ elif isinstance(audio_chunk, np.ndarray):
427
+ # Just a numpy array, assume sample rate of 16000 for Whisper
428
+ samples = audio_chunk
429
+ sample_rate = 16000
430
+ temp_file = f"temp_stream_{int(time.time())}.wav"
431
+ write_wav(temp_file, sample_rate, samples)
432
+ elif isinstance(audio_chunk, str) and os.path.exists(audio_chunk):
433
+ # It's a filepath
434
+ temp_file = audio_chunk
435
+ else:
436
+ # Unknown format
437
+ stream_results["profanity_info"] = f"Error: Unknown audio format: {type(audio_chunk)}"
438
+ return stream_results["transcript"], stream_results["profanity_info"], stream_results["clean_text"], stream_results["audio_output"]
439
+
440
+ # Make sure we have a valid file to process
441
+ if not temp_file or not os.path.exists(temp_file):
442
+ stream_results["profanity_info"] = "Error: Failed to create audio file for processing"
443
+ return stream_results["transcript"], stream_results["profanity_info"], stream_results["clean_text"], stream_results["audio_output"]
444
+
445
+ # Process with Whisper
446
+ result = whisper_model.transcribe(temp_file, fp16=torch.cuda.is_available())
447
+ transcript = result["text"].strip()
448
+
449
+ # Skip processing if transcript is empty
450
+ if not transcript:
451
+ # Clean up temp file if we created it
452
+ if temp_file and temp_file.startswith("temp_stream_") and os.path.exists(temp_file):
453
+ try:
454
+ os.remove(temp_file)
455
+ except:
456
+ pass
457
+ # Return current state, but update profanity info
458
+ stream_results["profanity_info"] = "No speech detected. Keep talking..."
459
+ return stream_results["transcript"], stream_results["profanity_info"], stream_results["clean_text"], stream_results["audio_output"]
460
+
461
+ # Update transcript
462
+ stream_results["transcript"] = transcript
463
+
464
+ # Analyze for profanity
465
+ analysis = detect_profanity(transcript, threshold=0.5)
466
+
467
+ # Check if profanity was detected
468
+ if analysis.get("profanity", False):
469
+ profane_words = ", ".join(analysis.get("profane_words", []))
470
+ stream_results["profanity_info"] = f"Profanity Detected (Score: {analysis['score']:.2f})\nProfane Words: {profane_words}"
471
+
472
+ # Rephrase to clean text
473
+ clean_text = rephrase_profanity(transcript)
474
+ stream_results["clean_text"] = clean_text
475
+
476
+ # Create audio from cleaned text
477
+ audio_file = text_to_speech(clean_text)
478
+ if audio_file:
479
+ stream_results["audio_output"] = audio_file
480
+ else:
481
+ stream_results["profanity_info"] = f"No Profanity Detected (Score: {analysis['score']:.2f})"
482
+ stream_results["clean_text"] = transcript
483
+
484
+ # Use original text for audio if no profanity
485
+ audio_file = text_to_speech(transcript)
486
+ if audio_file:
487
+ stream_results["audio_output"] = audio_file
488
+
489
+ # Clean up temporary file if we created it
490
+ if temp_file and temp_file.startswith("temp_stream_") and os.path.exists(temp_file):
491
+ try:
492
+ os.remove(temp_file)
493
+ except:
494
+ pass
495
+
496
+ return stream_results["transcript"], stream_results["profanity_info"], stream_results["clean_text"], stream_results["audio_output"]
497
+
498
+ except Exception as e:
499
+ error_msg = f"Error processing streaming audio: {str(e)}\n{traceback.format_exc()}"
500
+ logger.error(error_msg)
501
+
502
+ # Update profanity info with error message
503
+ stream_results["profanity_info"] = f"Error: {str(e)}"
504
+
505
+ return stream_results["transcript"], stream_results["profanity_info"], stream_results["clean_text"], stream_results["audio_output"]
506
+
507
+ def start_streaming():
508
+ """Start the real-time audio processing"""
509
+ global processing_active, stream_results
510
+
511
+ if not models_loaded:
512
+ return "Models not loaded yet. Please wait for initialization to complete."
513
+
514
+ if processing_active:
515
+ return "Streaming is already active."
516
+
517
+ # Reset results
518
+ stream_results = {
519
+ "transcript": "",
520
+ "profanity_info": "Waiting for audio input...",
521
+ "clean_text": "",
522
+ "audio_output": None
523
+ }
524
+
525
+ processing_active = True
526
+ logger.info("Started real-time audio processing")
527
+ return "Started real-time audio processing. Speak into your microphone."
528
+
529
+ def stop_streaming():
530
+ """Stop the real-time audio processing"""
531
+ global processing_active
532
+
533
+ if not processing_active:
534
+ return "Streaming is not active."
535
+
536
+ processing_active = False
537
+ return "Stopped real-time audio processing."
538
+
539
+ def create_ui():
540
+ """Create the Gradio UI"""
541
+ # Simple CSS for styling
542
+ css = """
543
+ /* Fix for dark mode text visibility */
544
+ .dark .gr-input,
545
+ .dark textarea,
546
+ .dark .gr-textbox,
547
+ .dark [data-testid="textbox"] {
548
+ color: white !important;
549
+ background-color: #2c303b !important;
550
+ }
551
+
552
+ .dark .gr-box,
553
+ .dark .gr-form,
554
+ .dark .gr-panel,
555
+ .dark .gr-block {
556
+ color: white !important;
557
+ }
558
+
559
+ /* Highlighted text container - with dark mode fixes */
560
+ .highlighted-text {
561
+ border: 1px solid #ddd;
562
+ border-radius: 5px;
563
+ padding: 10px;
564
+ margin: 10px 0;
565
+ background-color: #f9f9f9;
566
+ font-family: sans-serif;
567
+ max-height: 300px;
568
+ overflow-y: auto;
569
+ color: #333 !important; /* Ensure text is dark for light mode */
570
+ }
571
+
572
+ /* Dark mode specific styling for highlighted text */
573
+ .dark .highlighted-text {
574
+ background-color: #2c303b !important;
575
+ color: #ffffff !important;
576
+ border-color: #4a4f5a !important;
577
+ }
578
+
579
+ /* Make sure text in the highlighted container remains visible in both themes */
580
+ .highlighted-text, .dark .highlighted-text {
581
+ color-scheme: light dark;
582
+ }
583
+
584
+ /* Loading animation */
585
+ .loading {
586
+ display: inline-block;
587
+ width: 20px;
588
+ height: 20px;
589
+ border: 3px solid rgba(0,0,0,.3);
590
+ border-radius: 50%;
591
+ border-top-color: #3498db;
592
+ animation: spin 1s ease-in-out infinite;
593
+ }
594
+
595
+ @keyframes spin {
596
+ to { transform: rotate(360deg); }
597
+ }
598
+ """
599
+
600
+ # Create a custom theme based on Soft but explicitly set to light mode
601
+ light_theme = gr.themes.Soft(
602
+ primary_hue="blue",
603
+ secondary_hue="blue",
604
+ neutral_hue="gray"
605
+ )
606
+
607
+ # Set theme to light mode and disable theme switching
608
+ with gr.Blocks(css=css, theme=light_theme, analytics_enabled=False) as ui:
609
+ # Model initialization
610
+ init_status = gr.State("")
611
+
612
+ gr.Markdown(
613
+ """
614
+ # Profanity Detection & Replacement System
615
+ Detect, rephrase, and listen to cleaned content from text or audio!
616
+ """,
617
+ elem_classes="header"
618
+ )
619
+
620
+ # The rest of your UI code remains unchanged...
621
+ # Initialize models button with status indicators
622
+ with gr.Row():
623
+ with gr.Column(scale=3):
624
+ init_button = gr.Button("Initialize Models", variant="primary")
625
+ init_output = gr.Textbox(label="Initialization Status", interactive=False)
626
+ with gr.Column(scale=1):
627
+ model_status = gr.HTML(
628
+ """<div style="text-align: center; padding: 5px;">
629
+ <p><b>Model Status:</b> <span style="color: #e74c3c;">Not Loaded</span></p>
630
+ </div>"""
631
+ )
632
+
633
+ # Global sensitivity slider
634
+ sensitivity = gr.Slider(
635
+ minimum=0.2,
636
+ maximum=0.95,
637
+ value=0.5,
638
+ step=0.05,
639
+ label="Profanity Detection Sensitivity",
640
+ info="Lower values are more permissive, higher values are more strict"
641
+ )
642
+
643
+ with gr.Row():
644
+ with gr.Column(scale=3):
645
+ gr.Markdown("### Choose an Input Method")
646
+
647
+ # Text Analysis
648
+ with gr.Tabs():
649
+ with gr.TabItem("Text Analysis", elem_id="text-tab"):
650
+ with gr.Row():
651
+ text_input = gr.Textbox(
652
+ label="Enter Text",
653
+ placeholder="Type your text here...",
654
+ lines=5,
655
+ elem_classes="textbox"
656
+ )
657
+ with gr.Row():
658
+ text_button = gr.Button("Analyze Text", variant="primary")
659
+ clear_button = gr.Button("Clear", variant="secondary")
660
+
661
+ with gr.Row():
662
+ with gr.Column(scale=2):
663
+ text_output = gr.Textbox(label="Results", lines=10)
664
+ highlighted_output = gr.HTML(label="Detected Profanity", elem_classes="highlighted-text")
665
+ with gr.Column(scale=1):
666
+ text_audio_output = gr.Audio(label="Rephrased Audio", type="filepath")
667
+
668
+ # Audio Analysis
669
+ with gr.TabItem("Audio Analysis", elem_id="audio-tab"):
670
+ gr.Markdown("### Upload or Record Audio")
671
+ audio_input = gr.Audio(
672
+ label="Audio Input",
673
+ type="filepath",
674
+ sources=["microphone", "upload"]
675
+ #waveform_options=gr.WaveformOptions(waveform_color="#4a90e2")
676
+ )
677
+ with gr.Row():
678
+ audio_button = gr.Button("Analyze Audio", variant="primary")
679
+ clear_audio_button = gr.Button("Clear", variant="secondary")
680
+
681
+ with gr.Row():
682
+ with gr.Column(scale=2):
683
+ audio_output = gr.Textbox(label="Results", lines=10, show_copy_button=True)
684
+ audio_highlighted_output = gr.HTML(label="Detected Profanity", elem_classes="highlighted-text")
685
+ with gr.Column(scale=1):
686
+ clean_audio_output = gr.Audio(label="Rephrased Audio", type="filepath")
687
+
688
+ # Real-time Streaming
689
+ with gr.TabItem("Real-time Streaming", elem_id="streaming-tab"):
690
+ gr.Markdown("### Real-time Audio Processing")
691
+ gr.Markdown("Enable real-time audio processing to filter profanity as you speak.")
692
+
693
+ with gr.Row():
694
+ with gr.Column(scale=1):
695
+ start_stream_button = gr.Button("Start Real-time Processing", variant="primary")
696
+ stop_stream_button = gr.Button("Stop Real-time Processing", variant="secondary")
697
+ stream_status = gr.Textbox(label="Streaming Status", value="Inactive", interactive=False)
698
+
699
+ # Add microphone input specifically for streaming
700
+ stream_audio_input = gr.Audio(
701
+ label="Streaming Microphone Input",
702
+ type="filepath",
703
+ sources=["microphone"],
704
+ streaming=True
705
+ #waveform_options=gr.WaveformOptions(waveform_color="#4a90e2")
706
+ )
707
+
708
+ with gr.Column(scale=2):
709
+ # Add elements to display streaming results
710
+ stream_transcript = gr.Textbox(label="Live Transcription", lines=2)
711
+ stream_profanity_info = gr.Textbox(label="Profanity Detection", lines=2)
712
+ stream_clean_text = gr.Textbox(label="Clean Text", lines=2)
713
+ # Element to play the clean audio
714
+ stream_audio_output = gr.Audio(label="Clean Audio Output", type="filepath")
715
+
716
+ gr.Markdown("""
717
+ ### How Real-time Streaming Works
718
+ 1. Click "Start Real-time Processing" to begin
719
+ 2. Use the microphone input to speak
720
+ 3. The system will process audio in real-time, detect and clean profanity
721
+ 4. You'll see the transcription, profanity info, and clean output appear above
722
+ 5. Click "Stop Real-time Processing" when finished
723
+
724
+ Note: This feature requires microphone access and may have some latency.
725
+ """)
726
+
727
+ # Event handlers
728
+ def update_model_status(status_text):
729
+ """Update both the status text and the visual indicator"""
730
+ if "successfully" in status_text.lower():
731
+ status_html = """<div style="text-align: center; padding: 5px;">
732
+ <p><b>Model Status:</b> <span style="color: #2ecc71;">Loaded ✓</span></p>
733
+ </div>"""
734
+ elif "error" in status_text.lower():
735
+ status_html = """<div style="text-align: center; padding: 5px;">
736
+ <p><b>Model Status:</b> <span style="color: #e74c3c;">Error ✗</span></p>
737
+ </div>"""
738
+ else:
739
+ status_html = """<div style="text-align: center; padding: 5px;">
740
+ <p><b>Model Status:</b> <span style="color: #f39c12;">Loading...</span></p>
741
+ </div>"""
742
+ return status_text, status_html
743
+
744
+ init_button.click(
745
+ lambda: update_model_status("Loading models, please wait..."),
746
+ inputs=[],
747
+ outputs=[init_output, model_status]
748
+ ).then(
749
+ load_models,
750
+ inputs=[],
751
+ outputs=[init_output]
752
+ ).then(
753
+ update_model_status,
754
+ inputs=[init_output],
755
+ outputs=[init_output, model_status]
756
+ )
757
+
758
+ text_button.click(
759
+ text_analysis,
760
+ inputs=[text_input, sensitivity],
761
+ outputs=[text_output, highlighted_output, text_audio_output]
762
+ )
763
+
764
+ clear_button.click(
765
+ lambda: [None, None, None],
766
+ inputs=None,
767
+ outputs=[text_input, highlighted_output, text_audio_output]
768
+ )
769
+
770
+ audio_button.click(
771
+ analyze_audio,
772
+ inputs=[audio_input, sensitivity],
773
+ outputs=[audio_output, audio_highlighted_output, clean_audio_output]
774
+ )
775
+
776
+ clear_audio_button.click(
777
+ lambda: [None, None, None, None],
778
+ inputs=None,
779
+ outputs=[audio_input, audio_output, audio_highlighted_output, clean_audio_output]
780
+ )
781
+
782
+ start_stream_button.click(
783
+ start_streaming,
784
+ inputs=[],
785
+ outputs=[stream_status]
786
+ )
787
+
788
+ stop_stream_button.click(
789
+ stop_streaming,
790
+ inputs=[],
791
+ outputs=[stream_status]
792
+ )
793
+
794
+ # Connect the streaming audio input to our processing function
795
+ # First function to debug the audio chunk format
796
+ def debug_audio_format(audio_chunk):
797
+ """Debug function to log audio format"""
798
+ format_info = f"Type: {type(audio_chunk)}"
799
+ if isinstance(audio_chunk, tuple):
800
+ format_info += f", Length: {len(audio_chunk)}"
801
+ for i, item in enumerate(audio_chunk):
802
+ format_info += f", Item {i} type: {type(item)}"
803
+ logger.info(f"Audio chunk format: {format_info}")
804
+ return audio_chunk
805
+
806
+ # Use the stream method with preprocessor for debugging
807
+ stream_audio_input.stream(
808
+ fn=process_stream_chunk,
809
+ inputs=[stream_audio_input],
810
+ outputs=[stream_transcript, stream_profanity_info, stream_clean_text, stream_audio_output],
811
+ preprocess=debug_audio_format
812
+ )
813
+
814
+ return ui
815
+
816
+ if __name__ == "__main__":
817
+ # Set environment variable to avoid OpenMP conflicts
818
+ os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'
819
+
820
+ # Create and launch the UI
821
+ ui = create_ui()
822
+ ui.launch(server_name="0.0.0.0", share=True)
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ gradio
2
+ numpy
3
+ openai_whisper
4
+ pandas
5
+ scipy
6
+ torch
7
+ transformers
8
+ pillow
9
+ sentencepiece
test_text.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Understood. For research/educational purposes and model testing, here’s an uncensored, explicit version of the script with raw profanity and edge cases.
2
+
3
+ Uncensored Profanity Testing Script
4
+ Context: High-pressure workplace (Wolf of Wall Street-inspired)
5
+ Jordan (sales trainer):
6
+ "Listen up, you spineless maggots! If you can’t close a deal without crying like a goddamn toddler, get the hell out of my office! This isn’t a fucking charity! You think clients care about your excuses? Bullshit! Sell or get screwed!"
7
+
8
+ Context: Family confrontation (Sopranos-inspired)
9
+ Tony (angry parent):
10
+ "You lied to me? You’re gonna sit there with that shit-eating grin and act innocent? I oughta smack that damn phone outta your hand! You’re lucky I don’t fucking lose it right now!"
11
+
12
+ Context: Crime/heist scene (Pulp Fiction-inspired)
13
+ Vincent (panicking):
14
+ "Move your ass! We’ve got cops in 3 minutes! Why’d you leave the goddamn keys in the ignition, you dumb shit?!"
15
+
16
+ Context: Sarcastic humor (The Big Lebowski-inspired)
17
+ The Dude (relaxed):
18
+ "Nice rug, man. Really ties the room together… though your attitude’s about as useful as a fucking screen door on a submarine."
19
+
20
+ Context: Toxic online gaming chat
21
+ Player 1:
22
+ "Stop camping, you noob! Go touch grass, you motherfucker! This is why your ass got carried in ranked!"
23
+
24
+ Edge Cases & Ambiguities
25
+ False Positives:
26
+
27
+ "I’m tired of this bull session." (vs. "bullshit")
28
+
29
+ "He’s such a prickly cactus." (vs. "prick")
30
+
31
+ Creative Spelling:
32
+
33
+ "Sh1t, fck, @ss, d!ck"* (leetspeak/symbol evasion)
34
+
35
+ "fukken hell, biatch" (phonetic slang)
36
+
37
+ Reclaimed/Contextual Terms:
38
+
39
+ "That queer filmmaker revolutionized the genre." (non-slur usage)
40
+
41
+ "She’s a bad bitch CEO." (empowerment vs. insult)
42
+
43
+ Ethical Reminder
44
+ Use anonymized datasets.
45
+
46
+ Flag cultural/regional variance (e.g., "bloody wanker" vs. "goddamn idiot").
47
+
48
+ Avoid amplifying harm by limiting real-world deployment of raw data.
49
+
50
+ Let me know if you need additional explicit examples (e.g., sexual terms, extreme aggression) or specific dialect tests.