Spaces:

nightey3s
/

profanity-detection

Running on Zero

App Files Files Community

nightey3s commited on Mar 15

Commit

80f71e5

unverified ·

1 Parent(s): 005a4bc

Add application file

Browse files

Files changed (8) hide show

.dockerignore +13 -0
Dockerfile +38 -0
README.md +276 -12
docker-compose.yml +45 -0
environment.yml +15 -0
profanity_detector.py +822 -0
requirements.txt +9 -0
test_text.md +50 -0

.dockerignore ADDED Viewed

	@@ -0,0 +1,13 @@

+.git
+.gitignore
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.env
+.venv
+env/
+venv/
+ENV/
+*.log
+temp_*.wav

Dockerfile ADDED Viewed

	@@ -0,0 +1,38 @@

+# Use PyTorch as base image (comes with CUDA support in the GPU variant)
+FROM pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime
+# Set working directory
+WORKDIR /app
+# Set environment variables
+ENV PYTHONDONTWRITEBYTECODE=1 \
+    PYTHONUNBUFFERED=1 \
+    KMP_DUPLICATE_LIB_OK=TRUE \
+    DEBIAN_FRONTEND=noninteractive \
+    TZ=UTC
+# Install system dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    ffmpeg \
+    libsndfile1 \
+    build-essential \
+    && apt-get clean \
+    && rm -rf /var/lib/apt/lists/*
+# Create directory for model caching
+RUN mkdir -p /root/.cache/huggingface
+# Copy requirements file
+COPY requirements.txt .
+# Install Python dependencies
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy application code
+COPY profanity_detector.py .
+# Expose the Gradio port
+EXPOSE 7860
+# Command to run the application
+CMD ["python", "profanity_detector.py"]

README.md CHANGED Viewed

@@ -1,12 +1,276 @@
----
-title: Profanity Detection
-emoji: 🐠
-colorFrom: pink
-colorTo: indigo
-sdk: docker
-pinned: false
-license: mit
-short_description: A multimodal AI system that detects and rephrases profanity.
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Profanity Detection in Speech and Text
+A robust multimodal system for detecting and rephrasing profanity in both speech and text, leveraging advanced NLP models to ensure accurate filtering while preserving conversational context.
+![Profanity Detection System](https://img.shields.io/badge/AI-NLP%20System-blue)
+![Python](https://img.shields.io/badge/Python-3.12%2B-green)
+![Transformers](https://img.shields.io/badge/HuggingFace-Transformers-yellow)
+## 📋 Features
+- **Multimodal Analysis**: Process both written text and spoken audio
+- **Context-Aware Detection**: Goes beyond simple keyword matching
+- **Automatic Content Refinement**: Intelligently rephrases content while preserving meaning
+- **Audio Synthesis**: Converts rephrased content into high-quality spoken audio
+- **Classification System**: Categorises content by toxicity levels
+- **User-Friendly Interface**: Intuitive Gradio-based UI
+- **Real-time Streaming**: Process audio in real-time as you speak
+- **Adjustable Sensitivity**: Fine-tune profanity detection threshold
+- **Visual Highlighting**: Instantly identify problematic words with visual highlighting
+- **Toxicity Classification**: Automatically categorize content from "No Toxicity" to "Severe Toxicity"
+- **Performance Optimization**: Half-precision support for improved GPU memory efficiency
+## 🧠 Models Used
+The system leverages four powerful models:
+1. **Profanity Detection**: `parsawar/profanity_model_3.1` - A RoBERTa-based model trained for offensive language detection
+2. **Content Refinement**: `s-nlp/t5-paranmt-detox` - A T5-based model for rephrasing offensive language
+3. **Speech-to-Text**: OpenAI's `Whisper` (large) - For transcribing spoken audio
+4. **Text-to-Speech**: Microsoft's `SpeechT5` - For converting rephrased text back to audio
+## 🔧 Installation
+### Prerequisites
+- Python 3.10+
+- CUDA-compatible GPU recommended (but CPU mode works too)
+- FFmpeg for audio processing
+### Option 1: Using Conda (Recommended for Local Development)
+```bash
+# Clone the repository
+git clone https://github.com/yourusername/profanity-detection.git
+cd profanity-detection
+# Method A: Create environment from environment.yml (recommended)
+conda env create -f environment.yml
+conda activate llm_project
+# Method B: Create a new conda environment manually
+conda create -n profanity-detection python=3.10
+conda activate profanity-detection
+# Install PyTorch with CUDA support (adjust CUDA version if needed)
+conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
+# Install FFmpeg for audio processing
+conda install -c conda-forge ffmpeg
+# Install Pillow properly to avoid DLL errors
+conda install -c conda-forge pillow
+# Install additional dependencies
+pip install -r requirements.txt
+# Set environment variable to avoid OpenMP conflicts (recommended)
+conda env config vars set KMP_DUPLICATE_LIB_OK=TRUE
+conda activate profanity-detection  # Re-activate to apply the variable
+```
+### Option 2: Using Docker
+```bash
+# Clone the repository
+git clone https://github.com/yourusername/profanity-detection.git
+cd profanity-detection
+# Build and run the Docker container
+docker-compose build --no-cache
+docker-compose up
+```
+## 🚀 Usage
+### Running the Application
+```bash
+# Set environment variable to avoid OpenMP conflicts (if not set in conda config)
+# For Windows:
+set KMP_DUPLICATE_LIB_OK=TRUE
+# For Linux/Mac:
+export KMP_DUPLICATE_LIB_OK=TRUE
+# Run the application
+python profanity_detector.py
+```
+The Gradio interface will be accessible at http://127.0.0.1:7860 in your browser.
+### Using the Interface
+1. **Initialise Models**
+   - Click the "Initialize Models" button when you first open the interface
+   - Wait for all models to load (this may take a few minutes on first run)
+2. **Text Analysis Tab**
+   - Enter text into the text box
+   - Adjust the "Profanity Detection Sensitivity" slider if needed
+   - Click "Analyze Text"
+   - View results including profanity score, toxicity classification, and rephrased content
+   - See highlighted profane words in the text
+   - Listen to the audio version of the rephrased content
+3. **Audio Analysis Tab**
+   - Upload an audio file or record directly using your microphone
+   - Click "Analyze Audio"
+   - View transcription, profanity analysis, and rephrased content
+   - Listen to the cleaned audio version of the rephrased content
+4. **Real-time Streaming Tab**
+   - Click "Start Real-time Processing"
+   - Speak into your microphone
+   - Watch as your speech is transcribed, analyzed, and rephrased in real-time
+   - Listen to the clean audio output
+   - Click "Stop Real-time Processing" when finished
+## 🔧 Deployment Options
+### Local Deployment with Conda
+For the best development experience with fine-grained control:
+```bash
+# Create and configure environment
+conda env create -f environment.yml
+conda activate llm_project
+# Run with sharing enabled (accessible from other devices)
+python profanity_detector.py
+```
+### Docker Deployment (Production)
+For containerised deployment with predictable environment:
+#### Basic CPU Deployment
+```bash
+docker-compose up --build
+```
+#### GPU-Accelerated Deployment
+```bash
+# Automatic detection (recommended)
+docker-compose up --build
+# Or explicitly request GPU mode
+docker-compose up --build profanity-detector-gpu
+```
+No need to edit any configuration files - the system will automatically detect and use your GPU if available.
+#### Custom Port Configuration
+To change the default port (7860):
+1. Edit docker-compose.yml and change the port mapping (e.g., "8080:7860")
+2. Run `docker-compose up --build`
+## ⚠️ Troubleshooting
+### OpenMP Runtime Conflict
+If you encounter this error:
+```
+OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
+```
+**Solutions:**
+1. **Temporary fix**: Set environment variable before running:
+   ```bash
+   set KMP_DUPLICATE_LIB_OK=TRUE  # Windows
+   export KMP_DUPLICATE_LIB_OK=TRUE  # Linux/Mac
+   ```
+2. **Code-based fix**: Add to the beginning of your script:
+   ```python
+   import os
+   os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'
+   ```
+3. **Permanent fix for Conda environment**:
+   ```bash
+   conda env config vars set KMP_DUPLICATE_LIB_OK=TRUE -n profanity-detection
+   conda deactivate
+   conda activate profanity-detection
+   ```
+### GPU Memory Issues
+If you encounter CUDA out of memory errors:
+1. Use smaller models:
+   ```python
+   # Change Whisper from "large" to "medium" or "small"
+   whisper_model = whisper.load_model("medium").to(device)
+   # Keep the TTS model on CPU to save GPU memory
+   tts_model = SpeechT5ForTextToSpeech.from_pretrained(TTS_MODEL)  # CPU mode
+   ```
+2. Run some models on CPU instead of GPU:
+   ```python
+   # Remove .to(device) to keep model on CPU
+   t5_model = AutoModelForSeq2SeqLM.from_pretrained(T5_MODEL)  # CPU mode
+   ```
+3. Use Docker with specific GPU memory limits:
+   ```yaml
+   # In docker-compose.yml
+   deploy:
+     resources:
+       reservations:
+         devices:
+           - driver: nvidia
+             count: 1
+             capabilities: [gpu]
+             options:
+               memory: 4G  # Limit to 4GB of GPU memory
+   ```
+### Docker-Specific Issues
+1. **Permission issues with mounted volumes**:
+   ```bash
+   # Fix permissions (Linux/Mac)
+   sudo chown -R $USER:$USER .
+   ```
+2. **No GPU access in container**:
+   - Verify NVIDIA Container Toolkit installation
+   - Check GPU driver compatibility
+   - Run `nvidia-smi` on the host to confirm GPU availability
+### First-Time Slowness
+When first run, the application downloads all models, which may take time. Subsequent runs will be faster as models are cached locally. The text-to-speech model requires additional download time on first use.
+## 📄 Project Structure
+```
+profanity-detection/
+├── profanity_detector.py    # Main application file
+├── Dockerfile               # For containerised deployment
+├── docker-compose.yml       # Container orchestration
+├── requirements.txt         # Python dependencies
+├── environment.yml          # Conda environment specification
+└── README.md                # This file
+```
+## 📚 References
+- [HuggingFace Transformers](https://huggingface.co/docs/transformers/index)
+- [OpenAI Whisper](https://github.com/openai/whisper)
+- [Microsoft SpeechT5](https://huggingface.co/microsoft/speecht5_tts)
+- [Gradio Documentation](https://gradio.app/docs/)
+## 📝 License
+This project is licensed under the MIT License - see the LICENSE file for details.
+## 🙏 Acknowledgments
+- This project utilises models from HuggingFace Hub, Microsoft, and OpenAI
+- Inspired by research in content moderation and responsible AI

docker-compose.yml ADDED Viewed

	@@ -0,0 +1,45 @@

+version: '3.8'
+services:
+  # Main service configuration - automatically uses GPU if available
+  profanity-detector:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    ports:
+      - "7860:7860"
+    volumes:
+      - huggingface-cache:/root/.cache/huggingface
+      - ./:/app  # Mount current directory for development
+    environment:
+      - KMP_DUPLICATE_LIB_OK=TRUE
+    command: python profanity_detector.py
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: 1
+              capabilities: [gpu]
+    restart: unless-stopped
+  # Explicit CPU-only configuration for when GPU causes issues
+  profanity-detector-cpu:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    ports:
+      - "7860:7860"
+    volumes:
+      - huggingface-cache:/root/.cache/huggingface
+      - ./:/app  # Mount current directory for development
+    environment:
+      - KMP_DUPLICATE_LIB_OK=TRUE
+      - CUDA_VISIBLE_DEVICES=-1  # Disable CUDA
+    command: python profanity_detector.py
+    profiles:
+      - cpu-only
+    restart: unless-stopped
+volumes:
+  huggingface-cache:

environment.yml ADDED Viewed

	@@ -0,0 +1,15 @@

+name: profanity-detection
+channels:
+  - https://repo.anaconda.com/pkgs/main
+  - https://repo.anaconda.com/pkgs/r
+  - https://repo.anaconda.com/pkgs/msys2
+dependencies:
+  - python=3.10
+  - pytorch
+  - pytorch-cuda=11.8
+  - torchaudio
+  - torchvision
+  - ffmpeg
+variables:
+  KMP_DUPLICATE_LIB_OK: 'TRUE'
+prefix: C:\Users\brian\anaconda3\envs\profanity-detection

profanity_detector.py ADDED Viewed

	@@ -0,0 +1,822 @@

+import torch
+from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoModelForSeq2SeqLM
+from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
+import whisper
+import gradio as gr
+import re
+import pandas as pd
+import numpy as np
+import os
+import time
+import logging
+import threading
+import queue
+from scipy.io.wavfile import write as write_wav
+from html import escape
+import traceback
+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
+    handlers=[logging.StreamHandler()]
+)
+logger = logging.getLogger('profanity_detector')
+# Define device at the top of the script (global scope)
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+logger.info(f"Using device: {device}")
+# Global variables for models
+profanity_model = None
+profanity_tokenizer = None
+t5_model = None
+t5_tokenizer = None
+whisper_model = None
+tts_processor = None
+tts_model = None
+vocoder = None
+models_loaded = False
+# Default speaker embeddings for TTS
+speaker_embeddings = None
+# Queue for real-time audio processing
+audio_queue = queue.Queue()
+processing_active = False
+# Model loading with int8 quantization
+def load_models():
+    global profanity_model, profanity_tokenizer, t5_model, t5_tokenizer, whisper_model
+    global tts_processor, tts_model, vocoder, speaker_embeddings, models_loaded
+    try:
+        logger.info("Loading profanity detection model...")
+        PROFANITY_MODEL = "parsawar/profanity_model_3.1"
+        profanity_tokenizer = AutoTokenizer.from_pretrained(PROFANITY_MODEL)
+        # Load model with memory optimization using half-precision
+        profanity_model = AutoModelForSequenceClassification.from_pretrained(PROFANITY_MODEL)
+        # Move to GPU if available and optimize with half-precision where possible
+        if torch.cuda.is_available():
+            profanity_model = profanity_model.to(device)
+            # Convert to half precision to save memory (if possible)
+            try:
+                profanity_model = profanity_model.half()  # Convert to FP16
+                logger.info("Successfully converted profanity model to half precision")
+            except Exception as e:
+                logger.warning(f"Could not convert to half precision: {str(e)}")
+        logger.info("Loading detoxification model...")
+        T5_MODEL = "s-nlp/t5-paranmt-detox"
+        t5_tokenizer = AutoTokenizer.from_pretrained(T5_MODEL)
+        # Load model with memory optimization
+        t5_model = AutoModelForSeq2SeqLM.from_pretrained(T5_MODEL)
+        # Move to GPU if available and optimize with half-precision where possible
+        if torch.cuda.is_available():
+            t5_model = t5_model.to(device)
+            # Convert to half precision to save memory (if possible)
+            try:
+                t5_model = t5_model.half()  # Convert to FP16
+                logger.info("Successfully converted T5 model to half precision")
+            except Exception as e:
+                logger.warning(f"Could not convert to half precision: {str(e)}")
+        logger.info("Loading Whisper speech-to-text model...")
+        whisper_model = whisper.load_model("large")
+        if torch.cuda.is_available():
+            whisper_model = whisper_model.to(device)
+        logger.info("Loading Text-to-Speech model...")
+        TTS_MODEL = "microsoft/speecht5_tts"
+        tts_processor = SpeechT5Processor.from_pretrained(TTS_MODEL)
+        # Load TTS models without automatic device mapping
+        tts_model = SpeechT5ForTextToSpeech.from_pretrained(TTS_MODEL)
+        vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")
+        # Move models to appropriate device
+        if torch.cuda.is_available():
+            tts_model = tts_model.to(device)
+            vocoder = vocoder.to(device)
+        # Speaker embeddings for TTS
+        speaker_embeddings = torch.zeros((1, 512))
+        if torch.cuda.is_available():
+            speaker_embeddings = speaker_embeddings.to(device)
+        models_loaded = True
+        logger.info("All models loaded successfully.")
+        return "Models loaded successfully."
+    except Exception as e:
+        error_msg = f"Error loading models: {str(e)}\n{traceback.format_exc()}"
+        logger.error(error_msg)
+        return error_msg
+def detect_profanity(text: str, threshold: float = 0.5):
+    """
+    Detect profanity in text with adjustable threshold
+    Args:
+        text: The input text to analyze
+        threshold: Profanity detection threshold (0.0-1.0)
+    Returns:
+        Dictionary with analysis results
+    """
+    if not models_loaded:
+        return {"error": "Models not loaded yet. Please wait."}
+    try:
+        # Detect profanity and score
+        inputs = profanity_tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
+        if torch.cuda.is_available():
+            inputs = inputs.to(device)
+        with torch.no_grad():
+            outputs = profanity_model(**inputs).logits
+        score = torch.nn.functional.softmax(outputs, dim=1)[0][1].item()
+        # Identify specific profane words
+        words = re.findall(r'\b\w+\b', text)
+        profane_words = []
+        word_scores = {}
+        if score > threshold:
+            for word in words:
+                if len(word) < 2:  # Skip very short words
+                    continue
+                word_inputs = profanity_tokenizer(word, return_tensors="pt", truncation=True, max_length=512)
+                if torch.cuda.is_available():
+                    word_inputs = word_inputs.to(device)
+                with torch.no_grad():
+                    word_outputs = profanity_model(**word_inputs).logits
+                word_score = torch.nn.functional.softmax(word_outputs, dim=1)[0][1].item()
+                word_scores[word] = word_score
+                if word_score > threshold:
+                    profane_words.append(word.lower())
+        # Create highlighted version of the text
+        highlighted_text = create_highlighted_text(text, profane_words)
+        return {
+            "text": text,
+            "score": score,
+            "profanity": score > threshold,
+            "profane_words": profane_words,
+            "highlighted_text": highlighted_text,
+            "word_scores": word_scores
+        }
+    except Exception as e:
+        error_msg = f"Error in profanity detection: {str(e)}"
+        logger.error(error_msg)
+        return {"error": error_msg, "text": text, "score": 0, "profanity": False}
+def create_highlighted_text(text, profane_words):
+    """
+    Create HTML-formatted text with profane words highlighted
+    """
+    if not profane_words:
+        return escape(text)
+    # Create a regex pattern matching any of the profane words (case insensitive)
+    pattern = r'\b(' + '|'.join(re.escape(word) for word in profane_words) + r')\b'
+    # Replace occurrences with highlighted versions
+    def highlight_match(match):
+        return f'<span style="background-color: rgba(255, 0, 0, 0.3); padding: 0px 2px; border-radius: 3px;">{match.group(0)}</span>'
+    highlighted = re.sub(pattern, highlight_match, text, flags=re.IGNORECASE)
+    return highlighted
+def rephrase_profanity(text):
+    """
+    Rephrase text containing profanity
+    """
+    if not models_loaded:
+        return "Models not loaded yet. Please wait."
+    try:
+        # Rephrase using the detoxification model
+        inputs = t5_tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
+        if torch.cuda.is_available():
+            inputs = inputs.to(device)
+        # Use more conservative generation settings with error handling
+        try:
+            outputs = t5_model.generate(
+                **inputs,
+                max_length=512,
+                num_beams=4,         # Reduced from 5 to be more memory-efficient
+                early_stopping=True,
+                no_repeat_ngram_size=2,
+                length_penalty=1.0
+            )
+            rephrased_text = t5_tokenizer.decode(outputs[0], skip_special_tokens=True)
+            # Verify the output is reasonable
+            if not rephrased_text or len(rephrased_text) < 3:
+                logger.warning(f"T5 model produced unusable output: '{rephrased_text}'")
+                return text  # Return original if output is too short
+            return rephrased_text.strip()
+        except RuntimeError as e:
+            # Handle potential CUDA out of memory error
+            if "CUDA out of memory" in str(e):
+                logger.warning("CUDA out of memory in T5 model. Trying with smaller beam size...")
+                # Try again with smaller beam size
+                outputs = t5_model.generate(
+                    **inputs,
+                    max_length=512,
+                    num_beams=2,  # Use smaller beam size
+                    early_stopping=True
+                )
+                rephrased_text = t5_tokenizer.decode(outputs[0], skip_special_tokens=True)
+                return rephrased_text.strip()
+            else:
+                raise e  # Re-raise if it's not a memory issue
+    except Exception as e:
+        error_msg = f"Error in rephrasing: {str(e)}"
+        logger.error(error_msg)
+        return text  # Return original text if rephrasing fails
+def text_to_speech(text):
+    """
+    Convert text to speech using SpeechT5
+    """
+    if not models_loaded:
+        return None
+    try:
+        # Create a temporary file path to save the audio
+        temp_file = f"temp_tts_output_{int(time.time())}.wav"
+        # Process the text input
+        inputs = tts_processor(text=text, return_tensors="pt")
+        if torch.cuda.is_available():
+            inputs = inputs.to(device)
+        # Generate speech with a fixed speaker embedding
+        speech = tts_model.generate_speech(
+            inputs["input_ids"],
+            speaker_embeddings,
+            vocoder=vocoder
+        )
+        # Convert from PyTorch tensor to NumPy array
+        speech_np = speech.cpu().numpy()
+        # Save as WAV file (sampling rate is 16kHz for SpeechT5)
+        write_wav(temp_file, 16000, speech_np)
+        return temp_file
+    except Exception as e:
+        error_msg = f"Error in text-to-speech conversion: {str(e)}"
+        logger.error(error_msg)
+        return None
+def text_analysis(input_text, threshold=0.5):
+    """
+    Analyze text for profanity with adjustable threshold
+    """
+    if not models_loaded:
+        return "Models not loaded yet. Please wait for initialization to complete.", None, None
+    try:
+        # Detect profanity with the given threshold
+        result = detect_profanity(input_text, threshold=threshold)
+        # Handle error case
+        if "error" in result:
+            return result["error"], None, None
+        # Process results
+        if result["profanity"]:
+            clean_text = rephrase_profanity(input_text)
+            profane_words_str = ", ".join(result["profane_words"])
+            toxicity_score = result["score"]
+            classification = (
+                "Severe Toxicity" if toxicity_score >= 0.7 else
+                "Moderate Toxicity" if toxicity_score >= 0.5 else
+                "Mild Toxicity" if toxicity_score >= 0.35 else
+                "Minimal Toxicity" if toxicity_score >= 0.2 else
+                "No Toxicity"
+            )
+            # Generate audio for the rephrased text
+            audio_output = text_to_speech(clean_text)
+            return (
+                f"Profanity Score: {result['score']:.4f}\n\n"
+                f"Profane: {result['profanity']}\n"
+                f"Classification: {classification}\n"
+                f"Detected Profane Words: {profane_words_str}\n\n"
+                f"Reworded: {clean_text}"
+            ), result["highlighted_text"], audio_output
+        else:
+            # If no profanity detected, just convert the original text to speech
+            audio_output = text_to_speech(input_text)
+            return (
+                f"Profanity Score: {result['score']:.4f}\n"
+                f"Profane: {result['profanity']}\n"
+                f"Classification: No Toxicity"
+            ), None, audio_output
+    except Exception as e:
+        error_msg = f"Error in text analysis: {str(e)}\n{traceback.format_exc()}"
+        logger.error(error_msg)
+        return error_msg, None, None
+def analyze_audio(audio_path, threshold=0.5):
+    """
+    Analyze audio for profanity with adjustable threshold
+    """
+    if not models_loaded:
+        return "Models not loaded yet. Please wait for initialization to complete.", None, None
+    if not audio_path:
+        return "No audio provided.", None, None
+    try:
+        # Transcribe audio
+        result = whisper_model.transcribe(audio_path, fp16=torch.cuda.is_available())
+        text = result["text"]
+        # Detect profanity with user-defined threshold
+        analysis = detect_profanity(text, threshold=threshold)
+        # Handle error case
+        if "error" in analysis:
+            return f"Error during analysis: {analysis['error']}\nTranscription: {text}", None, None
+        if analysis["profanity"]:
+            clean_text = rephrase_profanity(text)
+        else:
+            clean_text = text
+        # Generate audio for the rephrased text
+        audio_output = text_to_speech(clean_text)
+        return (
+            f"Transcription: {text}\n\n"
+            f"Profanity Score: {analysis['score']:.4f}\n"
+            f"Profane: {'Yes' if analysis['profanity'] else 'No'}\n"
+            f"Classification: {'Severe Toxicity' if analysis['score'] >= 0.7 else 'Moderate Toxicity' if analysis['score'] >= 0.5 else 'Mild Toxicity' if analysis['score'] >= 0.35 else 'Minimal Toxicity' if analysis['score'] >= 0.2 else 'No Toxicity'}\n"
+            f"Profane Words: {', '.join(analysis['profane_words']) if analysis['profanity'] else 'None'}\n\n"
+            f"Reworded: {clean_text}"
+        ), analysis["highlighted_text"] if analysis["profanity"] else None, audio_output
+    except Exception as e:
+        error_msg = f"Error in audio analysis: {str(e)}\n{traceback.format_exc()}"
+        logger.error(error_msg)
+        return error_msg, None, None
+# Global variables to store streaming results
+stream_results = {
+    "transcript": "",
+    "profanity_info": "",
+    "clean_text": "",
+    "audio_output": None
+}
+def process_stream_chunk(audio_chunk):
+    """Process an audio chunk from the streaming interface"""
+    global stream_results, processing_active
+    if not processing_active or not models_loaded:
+        return stream_results["transcript"], stream_results["profanity_info"], stream_results["clean_text"], stream_results["audio_output"]
+    try:
+        # The format of audio_chunk from Gradio streaming can vary
+        # It can be: (numpy_array, sample_rate), (filepath, sample_rate, numpy_array) or just numpy_array
+        # Let's handle all possible cases
+        if audio_chunk is None:
+            # No audio received
+            return stream_results["transcript"], stream_results["profanity_info"], stream_results["clean_text"], stream_results["audio_output"]
+        # Different Gradio versions return different formats
+        temp_file = None
+        if isinstance(audio_chunk, tuple):
+            if len(audio_chunk) == 2:
+                # Format: (numpy_array, sample_rate)
+                samples, sample_rate = audio_chunk
+                temp_file = f"temp_stream_{int(time.time())}.wav"
+                write_wav(temp_file, sample_rate, samples)
+            elif len(audio_chunk) == 3:
+                # Format: (filepath, sample_rate, numpy_array)
+                filepath, sample_rate, samples = audio_chunk
+                # Use the provided filepath if it exists
+                if os.path.exists(filepath):
+                    temp_file = filepath
+                else:
+                    # Create our own file
+                    temp_file = f"temp_stream_{int(time.time())}.wav"
+                    write_wav(temp_file, sample_rate, samples)
+        elif isinstance(audio_chunk, np.ndarray):
+            # Just a numpy array, assume sample rate of 16000 for Whisper
+            samples = audio_chunk
+            sample_rate = 16000
+            temp_file = f"temp_stream_{int(time.time())}.wav"
+            write_wav(temp_file, sample_rate, samples)
+        elif isinstance(audio_chunk, str) and os.path.exists(audio_chunk):
+            # It's a filepath
+            temp_file = audio_chunk
+        else:
+            # Unknown format
+            stream_results["profanity_info"] = f"Error: Unknown audio format: {type(audio_chunk)}"
+            return stream_results["transcript"], stream_results["profanity_info"], stream_results["clean_text"], stream_results["audio_output"]
+        # Make sure we have a valid file to process
+        if not temp_file or not os.path.exists(temp_file):
+            stream_results["profanity_info"] = "Error: Failed to create audio file for processing"
+            return stream_results["transcript"], stream_results["profanity_info"], stream_results["clean_text"], stream_results["audio_output"]
+        # Process with Whisper
+        result = whisper_model.transcribe(temp_file, fp16=torch.cuda.is_available())
+        transcript = result["text"].strip()
+        # Skip processing if transcript is empty
+        if not transcript:
+            # Clean up temp file if we created it
+            if temp_file and temp_file.startswith("temp_stream_") and os.path.exists(temp_file):
+                try:
+                    os.remove(temp_file)
+                except:
+                    pass
+            # Return current state, but update profanity info
+            stream_results["profanity_info"] = "No speech detected. Keep talking..."
+            return stream_results["transcript"], stream_results["profanity_info"], stream_results["clean_text"], stream_results["audio_output"]
+        # Update transcript
+        stream_results["transcript"] = transcript
+        # Analyze for profanity
+        analysis = detect_profanity(transcript, threshold=0.5)
+        # Check if profanity was detected
+        if analysis.get("profanity", False):
+            profane_words = ", ".join(analysis.get("profane_words", []))
+            stream_results["profanity_info"] = f"Profanity Detected (Score: {analysis['score']:.2f})\nProfane Words: {profane_words}"
+            # Rephrase to clean text
+            clean_text = rephrase_profanity(transcript)
+            stream_results["clean_text"] = clean_text
+            # Create audio from cleaned text
+            audio_file = text_to_speech(clean_text)
+            if audio_file:
+                stream_results["audio_output"] = audio_file
+        else:
+            stream_results["profanity_info"] = f"No Profanity Detected (Score: {analysis['score']:.2f})"
+            stream_results["clean_text"] = transcript
+            # Use original text for audio if no profanity
+            audio_file = text_to_speech(transcript)
+            if audio_file:
+                stream_results["audio_output"] = audio_file
+        # Clean up temporary file if we created it
+        if temp_file and temp_file.startswith("temp_stream_") and os.path.exists(temp_file):
+            try:
+                os.remove(temp_file)
+            except:
+                pass
+        return stream_results["transcript"], stream_results["profanity_info"], stream_results["clean_text"], stream_results["audio_output"]
+    except Exception as e:
+        error_msg = f"Error processing streaming audio: {str(e)}\n{traceback.format_exc()}"
+        logger.error(error_msg)
+        # Update profanity info with error message
+        stream_results["profanity_info"] = f"Error: {str(e)}"
+        return stream_results["transcript"], stream_results["profanity_info"], stream_results["clean_text"], stream_results["audio_output"]
+def start_streaming():
+    """Start the real-time audio processing"""
+    global processing_active, stream_results
+    if not models_loaded:
+        return "Models not loaded yet. Please wait for initialization to complete."
+    if processing_active:
+        return "Streaming is already active."
+    # Reset results
+    stream_results = {
+        "transcript": "",
+        "profanity_info": "Waiting for audio input...",
+        "clean_text": "",
+        "audio_output": None
+    }
+    processing_active = True
+    logger.info("Started real-time audio processing")
+    return "Started real-time audio processing. Speak into your microphone."
+def stop_streaming():
+    """Stop the real-time audio processing"""
+    global processing_active
+    if not processing_active:
+        return "Streaming is not active."
+    processing_active = False
+    return "Stopped real-time audio processing."
+def create_ui():
+    """Create the Gradio UI"""
+    # Simple CSS for styling
+    css = """
+    /* Fix for dark mode text visibility */
+    .dark .gr-input,
+    .dark textarea,
+    .dark .gr-textbox,
+    .dark [data-testid="textbox"] {
+        color: white !important;
+        background-color: #2c303b !important;
+    }
+    .dark .gr-box,
+    .dark .gr-form,
+    .dark .gr-panel,
+    .dark .gr-block {
+        color: white !important;
+    }
+    /* Highlighted text container - with dark mode fixes */
+    .highlighted-text {
+        border: 1px solid #ddd;
+        border-radius: 5px;
+        padding: 10px;
+        margin: 10px 0;
+        background-color: #f9f9f9;
+        font-family: sans-serif;
+        max-height: 300px;
+        overflow-y: auto;
+        color: #333 !important; /* Ensure text is dark for light mode */
+    }
+    /* Dark mode specific styling for highlighted text */
+    .dark .highlighted-text {
+        background-color: #2c303b !important;
+        color: #ffffff !important;
+        border-color: #4a4f5a !important;
+    }
+    /* Make sure text in the highlighted container remains visible in both themes */
+    .highlighted-text, .dark .highlighted-text {
+        color-scheme: light dark;
+    }
+    /* Loading animation */
+    .loading {
+        display: inline-block;
+        width: 20px;
+        height: 20px;
+        border: 3px solid rgba(0,0,0,.3);
+        border-radius: 50%;
+        border-top-color: #3498db;
+        animation: spin 1s ease-in-out infinite;
+    }
+    @keyframes spin {
+        to { transform: rotate(360deg); }
+    }
+    """
+    # Create a custom theme based on Soft but explicitly set to light mode
+    light_theme = gr.themes.Soft(
+        primary_hue="blue",
+        secondary_hue="blue",
+        neutral_hue="gray"
+    )
+    # Set theme to light mode and disable theme switching
+    with gr.Blocks(css=css, theme=light_theme, analytics_enabled=False) as ui:
+        # Model initialization
+        init_status = gr.State("")
+        gr.Markdown(
+            """
+            # Profanity Detection & Replacement System
+            Detect, rephrase, and listen to cleaned content from text or audio!
+            """,
+            elem_classes="header"
+        )
+        # The rest of your UI code remains unchanged...
+        # Initialize models button with status indicators
+        with gr.Row():
+            with gr.Column(scale=3):
+                init_button = gr.Button("Initialize Models", variant="primary")
+                init_output = gr.Textbox(label="Initialization Status", interactive=False)
+            with gr.Column(scale=1):
+                model_status = gr.HTML(
+                    """<div style="text-align: center; padding: 5px;">
+                    <p><b>Model Status:</b> <span style="color: #e74c3c;">Not Loaded</span></p>
+                    </div>"""
+                )
+        # Global sensitivity slider
+        sensitivity = gr.Slider(
+            minimum=0.2,
+            maximum=0.95,
+            value=0.5,
+            step=0.05,
+            label="Profanity Detection Sensitivity",
+            info="Lower values are more permissive, higher values are more strict"
+        )
+        with gr.Row():
+            with gr.Column(scale=3):
+                gr.Markdown("### Choose an Input Method")
+        # Text Analysis
+        with gr.Tabs():
+            with gr.TabItem("Text Analysis", elem_id="text-tab"):
+                with gr.Row():
+                    text_input = gr.Textbox(
+                        label="Enter Text",
+                        placeholder="Type your text here...",
+                        lines=5,
+                        elem_classes="textbox"
+                    )
+                with gr.Row():
+                    text_button = gr.Button("Analyze Text", variant="primary")
+                    clear_button = gr.Button("Clear", variant="secondary")
+                with gr.Row():
+                    with gr.Column(scale=2):
+                        text_output = gr.Textbox(label="Results", lines=10)
+                        highlighted_output = gr.HTML(label="Detected Profanity", elem_classes="highlighted-text")
+                    with gr.Column(scale=1):
+                        text_audio_output = gr.Audio(label="Rephrased Audio", type="filepath")
+            # Audio Analysis
+            with gr.TabItem("Audio Analysis", elem_id="audio-tab"):
+                gr.Markdown("### Upload or Record Audio")
+                audio_input = gr.Audio(
+                    label="Audio Input",
+                    type="filepath",
+                    sources=["microphone", "upload"]
+                    #waveform_options=gr.WaveformOptions(waveform_color="#4a90e2")
+                )
+                with gr.Row():
+                    audio_button = gr.Button("Analyze Audio", variant="primary")
+                    clear_audio_button = gr.Button("Clear", variant="secondary")
+                with gr.Row():
+                    with gr.Column(scale=2):
+                        audio_output = gr.Textbox(label="Results", lines=10, show_copy_button=True)
+                        audio_highlighted_output = gr.HTML(label="Detected Profanity", elem_classes="highlighted-text")
+                    with gr.Column(scale=1):
+                        clean_audio_output = gr.Audio(label="Rephrased Audio", type="filepath")
+            # Real-time Streaming
+            with gr.TabItem("Real-time Streaming", elem_id="streaming-tab"):
+                gr.Markdown("### Real-time Audio Processing")
+                gr.Markdown("Enable real-time audio processing to filter profanity as you speak.")
+                with gr.Row():
+                    with gr.Column(scale=1):
+                        start_stream_button = gr.Button("Start Real-time Processing", variant="primary")
+                        stop_stream_button = gr.Button("Stop Real-time Processing", variant="secondary")
+                        stream_status = gr.Textbox(label="Streaming Status", value="Inactive", interactive=False)
+                        # Add microphone input specifically for streaming
+                        stream_audio_input = gr.Audio(
+                            label="Streaming Microphone Input",
+                            type="filepath",
+                            sources=["microphone"],
+                            streaming=True
+                            #waveform_options=gr.WaveformOptions(waveform_color="#4a90e2")
+                        )
+                    with gr.Column(scale=2):
+                        # Add elements to display streaming results
+                        stream_transcript = gr.Textbox(label="Live Transcription", lines=2)
+                        stream_profanity_info = gr.Textbox(label="Profanity Detection", lines=2)
+                        stream_clean_text = gr.Textbox(label="Clean Text", lines=2)
+                        # Element to play the clean audio
+                        stream_audio_output = gr.Audio(label="Clean Audio Output", type="filepath")
+                gr.Markdown("""
+                ### How Real-time Streaming Works
+                1. Click "Start Real-time Processing" to begin
+                2. Use the microphone input to speak
+                3. The system will process audio in real-time, detect and clean profanity
+                4. You'll see the transcription, profanity info, and clean output appear above
+                5. Click "Stop Real-time Processing" when finished
+                Note: This feature requires microphone access and may have some latency.
+                """)
+        # Event handlers
+        def update_model_status(status_text):
+            """Update both the status text and the visual indicator"""
+            if "successfully" in status_text.lower():
+                status_html = """<div style="text-align: center; padding: 5px;">
+                <p><b>Model Status:</b> <span style="color: #2ecc71;">Loaded ✓</span></p>
+                </div>"""
+            elif "error" in status_text.lower():
+                status_html = """<div style="text-align: center; padding: 5px;">
+                <p><b>Model Status:</b> <span style="color: #e74c3c;">Error ✗</span></p>
+                </div>"""
+            else:
+                status_html = """<div style="text-align: center; padding: 5px;">
+                <p><b>Model Status:</b> <span style="color: #f39c12;">Loading...</span></p>
+                </div>"""
+            return status_text, status_html
+        init_button.click(
+            lambda: update_model_status("Loading models, please wait..."),
+            inputs=[],
+            outputs=[init_output, model_status]
+        ).then(
+            load_models,
+            inputs=[],
+            outputs=[init_output]
+        ).then(
+            update_model_status,
+            inputs=[init_output],
+            outputs=[init_output, model_status]
+        )
+        text_button.click(
+            text_analysis,
+            inputs=[text_input, sensitivity],
+            outputs=[text_output, highlighted_output, text_audio_output]
+        )
+        clear_button.click(
+            lambda: [None, None, None],
+            inputs=None,
+            outputs=[text_input, highlighted_output, text_audio_output]
+        )
+        audio_button.click(
+            analyze_audio,
+            inputs=[audio_input, sensitivity],
+            outputs=[audio_output, audio_highlighted_output, clean_audio_output]
+        )
+        clear_audio_button.click(
+            lambda: [None, None, None, None],
+            inputs=None,
+            outputs=[audio_input, audio_output, audio_highlighted_output, clean_audio_output]
+        )
+        start_stream_button.click(
+            start_streaming,
+            inputs=[],
+            outputs=[stream_status]
+        )
+        stop_stream_button.click(
+            stop_streaming,
+            inputs=[],
+            outputs=[stream_status]
+        )
+        # Connect the streaming audio input to our processing function
+        # First function to debug the audio chunk format
+        def debug_audio_format(audio_chunk):
+            """Debug function to log audio format"""
+            format_info = f"Type: {type(audio_chunk)}"
+            if isinstance(audio_chunk, tuple):
+                format_info += f", Length: {len(audio_chunk)}"
+                for i, item in enumerate(audio_chunk):
+                    format_info += f", Item {i} type: {type(item)}"
+            logger.info(f"Audio chunk format: {format_info}")
+            return audio_chunk
+        # Use the stream method with preprocessor for debugging
+        stream_audio_input.stream(
+            fn=process_stream_chunk,
+            inputs=[stream_audio_input],
+            outputs=[stream_transcript, stream_profanity_info, stream_clean_text, stream_audio_output],
+            preprocess=debug_audio_format
+        )
+    return ui
+if __name__ == "__main__":
+    # Set environment variable to avoid OpenMP conflicts
+    os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'
+    # Create and launch the UI
+    ui = create_ui()
+    ui.launch(server_name="0.0.0.0", share=True)

requirements.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+gradio
+numpy
+openai_whisper
+pandas
+scipy
+torch
+transformers
+pillow
+sentencepiece

test_text.md ADDED Viewed

	@@ -0,0 +1,50 @@

+Understood. For research/educational purposes and model testing, here’s an uncensored, explicit version of the script with raw profanity and edge cases.
+Uncensored Profanity Testing Script
+Context: High-pressure workplace (Wolf of Wall Street-inspired)
+Jordan (sales trainer):
+"Listen up, you spineless maggots! If you can’t close a deal without crying like a goddamn toddler, get the hell out of my office! This isn’t a fucking charity! You think clients care about your excuses? Bullshit! Sell or get screwed!"
+Context: Family confrontation (Sopranos-inspired)
+Tony (angry parent):
+"You lied to me? You’re gonna sit there with that shit-eating grin and act innocent? I oughta smack that damn phone outta your hand! You’re lucky I don’t fucking lose it right now!"
+Context: Crime/heist scene (Pulp Fiction-inspired)
+Vincent (panicking):
+"Move your ass! We’ve got cops in 3 minutes! Why’d you leave the goddamn keys in the ignition, you dumb shit?!"
+Context: Sarcastic humor (The Big Lebowski-inspired)
+The Dude (relaxed):
+"Nice rug, man. Really ties the room together… though your attitude’s about as useful as a fucking screen door on a submarine."
+Context: Toxic online gaming chat
+Player 1:
+"Stop camping, you noob! Go touch grass, you motherfucker! This is why your ass got carried in ranked!"
+Edge Cases & Ambiguities
+False Positives:
+"I’m tired of this bull session." (vs. "bullshit")
+"He’s such a prickly cactus." (vs. "prick")
+Creative Spelling:
+"Sh1t, fck, @ss, d!ck"* (leetspeak/symbol evasion)
+"fukken hell, biatch" (phonetic slang)
+Reclaimed/Contextual Terms:
+"That queer filmmaker revolutionized the genre." (non-slur usage)
+"She’s a bad bitch CEO." (empowerment vs. insult)
+Ethical Reminder
+Use anonymized datasets.
+Flag cultural/regional variance (e.g., "bloody wanker" vs. "goddamn idiot").
+Avoid amplifying harm by limiting real-world deployment of raw data.
+Let me know if you need additional explicit examples (e.g., sexual terms, extreme aggression) or specific dialect tests.