mcp-deepfake-forensics

Running

App Files Files Community

LPX55 commited on Jun 10

Commit

b88f9dc

1 Parent(s): bc355a9

docs: expand README with roadmap, feature status, and AI content detection tools; update requirements for transformers

Browse files

Files changed (4) hide show

README.md +116 -1
forensics/__init__.py +2 -2
forensics/exif.py +10 -10
requirements.txt +1 -1

README.md CHANGED Viewed

@@ -186,4 +186,119 @@ When you upload an image for analysis and click the "Predict" or "Augment & Pred
     *   The final consensus label is prepared with appropriate styling.
 *   **Data Type Conversion**: Numerical values (like AI Score, Real Score) are converted to standard Python floats to ensure proper JSON serialization.
-Finally, all these prepared outputs are returned to the Gradio interface for you to view.

     *   The final consensus label is prepared with appropriate styling.
 *   **Data Type Conversion**: Numerical values (like AI Score, Real Score) are converted to standard Python floats to ensure proper JSON serialization.
+---
+## Roadmap & Features
+### In Progress & Pending Tasks
+| Task | Status | Priority | Notes |
+|------|--------|----------|-------|
+| [x] Set up basic ensemble model architecture | ✅ Completed | High | Core framework established |
+| [x] Implement initial forensic analysis tools | ✅ Completed | High | ELA, Gradient, MinMax processing |
+| [x] Create intelligent agent system | ✅ Completed | High | All monitoring agents implemented |
+| [x] Refactor Gradio interface for MCP | ✅ Completed | Medium | User-friendly web interface |
+| [x] Integrate multiple deepfake detection models | ✅ Completed | High | 7 models successfully integrated |
+| [x] Implement weighted consensus algorithm | ✅ Completed | High | Dynamic weight adjustment working |
+| [x] Add image augmentation capabilities | ✅ Completed | Medium | Rotation, noise, sharpening features |
+| [x] Set up data logging to Hugging Face | ✅ Completed | Medium | Continuous improvement pipeline |
+| [x] Create system health monitoring | ✅ Completed | Medium | Resource usage tracking |
+| [x] Implement contextual intelligence analysis | ✅ Completed | Medium | Context tag inference system |
+| [ ] Implement real-time model performance monitoring | 🔷 In Progress | High | Add live metrics dashboard |
+| [ ] Add support for video deepfake detection | Pending | Medium | Extend current image-based system |
+| [ ] Optimize forensic analysis processing speed | 🔷 In Progress | High | Current ELA processing is slow |
+| [ ] Implement batch processing for multiple images | 🔷 In Progress | Medium | Improve throughput for bulk analysis |
+| [ ] Add model confidence threshold configuration | Pending | Low | Allow users to adjust sensitivity |
+| [ ] Create test suite | Pending | High | Unit tests for all agents and models |
+| [ ] Implement model versioning and rollback | Pending | Medium | Track model performance over time |
+| [ ] Add export functionality for analysis reports | Pending | Low | PDF/CSV export options |
+| [ ] Optimize memory usage for large images | 🔷 In Progress | High | Handle 4K+ resolution images |
+| [ ] Add support for additional forensic techniques | 🔷 In Progress | Medium | Consider adding noise analysis |
+| [ ] Implement user authentication system | Pending | Low | For enterprise deployment |
+| [ ] Create API documentation | 🔷 In Progress | Medium | OpenAPI/Swagger specs |
+| [ ] Add model ensemble validation metrics | Pending | High | Cross-validation for weight optimization |
+| [ ] Implement caching for repeated analyses | Pending | Medium | Reduce redundant processing |
+| [ ] Add support for custom model integration | Pending | Low | Plugin architecture for new models |
+### Legend
+- **Priority**: High (Critical), Medium (Important), Low (Nice to have)
+- **Status**: Pending, 🔷 In Progress, ✅ Completed, 🔻 Blocked
+---
+Digital Forensics Implementation
+Here's the updated table with an additional column providing **instructions on how to use these tools with vision LLMs** (e.g., CLIP, Vision Transformers, or CNNs) for effective AI content detection:
+---
+### **Top 20 Tools for AI Content Detection (with Vision LLM Integration Guidance)**
+| Status | Rank | Tool/Algorithm                         | Reason                                                                                                                                                 | **Agent Guidance / Instructions**                                                                                               |
+|--------|------|----------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|
+| ✅     | 1    | Noise Separation                       | Detect synthetic noise patterns absent in natural images.                                                                                              | Train the LLM on noise-separated image patches to recognize AI-specific noise textures (e.g., overly smooth or missing thermal noise).                  |
+| 🔷     | 2    | EXIF Full Dump                         | AI-generated images lack valid metadata (e.g., camera model, geolocation).                                                                             | Input the image *and its metadata as text* to a **multimodal LLM** (e.g., image + metadata caption). Flag inconsistencies (e.g., missing GPS, invalid timestamps). |
+| ✅     | 3    | Error Level Analysis (ELA)             | Reveals compression artifacts unique to AI-generated images.                                                                                           | Preprocess images via ELA before input to the LLM. Train the model to detect high-error regions indicative of synthetic content.                          |
+| 🔷     | 4    | JPEG Ghost Maps                        | Identifies compression history anomalies.                                                                                                              | Use ghost maps as a separate input channel (e.g., overlay ELA results on the RGB image) to train the LLM on synthetic vs. natural compression traces.          |
+| 🔷     | 5    | Copy-Move Forgery                      | AI models often clone/reuse elements.                                                                                                                  | Train the LLM to detect duplicated regions via frequency analysis or gradient-based saliency maps (e.g., using a Siamese network to compare image segments). |
+| ✅     | 6    | Channel Histograms                       | Skewed color distributions in AI-generated images.                                                                                                     | Feed the **histogram plots** as additional input (e.g., as a grayscale image) to highlight unnatural color profiles in the LLM.                             |
+| 🔷     | 7    | Pixel Statistics                         | Unnatural RGB value deviations in AI-generated images.                                                                                                 | Train the LLM on datasets with metadata tags indicating mean/max/min RGB values, using these stats as part of the training signal.                          |
+| 🔷     | 8    | JPEG Quality Estimation                  | AI-generated content may have atypical JPEG quality settings.                                                                                          | Preprocess the image to expose JPEG quality artifacts (e.g., blockiness) and train the LLM to identify these patterns via loss functions tuned to compression. |
+| 🔷     | 9    | Resampling Detection                     | AI tools may upscale/rotate images, leaving subpixel-level artifacts.                                                                                  | Use **frequency analysis** modules in the LLM (e.g., Fourier-transformed images) to detect Moiré patterns or grid distortions from resampling.               |
+| ✅     | 10   | PCA Projection                           | Highlights synthetic color distributions.                                                                                                              | Apply PCA to reduce color dimensions and input the 2D/3D projection to the LLM as a simplified feature space.                                               |
+| ✅     | 11   | Bit Planes Values                        | Detect synthetic noise patterns absent in natural images.                                                                                              | Analyze individual bit planes (e.g., bit plane 1–8) and feed the binary images to the LLM to train on AI-specific bit-plane anomalies.                      |
+| 🔷     | 12   | Median Filtering Traces                  | AI pre/post-processing steps mimic median filtering.                                                                                                   | Train the LLM on synthetically filtered images to recognize AI-applied diffusion artifacts.                                                                     |
+| ✅     | 13   | Wavelet Threshold                        | Identifies AI-generated texture inconsistencies.                                                                                                       | Use wavelet-decomposed images as input channels to the LLM to isolate synthetic textures vs. natural textures.                                             |
+| ✅     | 14   | Frequency Split                          | AI may generate unnatural gradients or sharpness.                                                                                                      | Separate high/low frequencies and train the LLM to detect missing high-frequency content in AI-generated regions (e.g., over-smoothed edges).               |
+| 🔷     | 15   | PRNU Identification                      | Absence of sensor-specific noise in AI-generated images.                                                                                               | Train the LLM on PRNU-noise databases to detect the absence or mismatch of sensor-specific noise in unlabeled images.                                    |
+| 🔷     | 16   | EXIF Tampering Detection                 | AI may falsify metadata.                                                                                                                               | Flag images with inconsistent Exif hashes (e.g., mismatched EXIF/visual content) and use metadata tags as training labels.                                |
+| 🔷     | 17   | Composite Splicing                       | AI-generated images often stitch elements with inconsistencies.                                                                                        | Use **edge-aware models** (e.g., CRFL-like architectures) to detect lighting/shadow mismatches in spliced regions.                                          |
+| 🔷     | 18   | RGB/HSV Plots                            | AI-generated images have unnatural color distributions.                                                                                                | Input RGB/HSV channel plots as 1D signals to the LLM's classifier head, along with the original image.                                                          |
+| 🔷     | 19   | Dead/Hot Pixel Analysis                | Absence of sensor-level imperfections in AI-generated images.                                                                                          | Use pre-trained sensor noise databases to train the LLM to flag images missing dead/hot pixels.                                                             |
+| 🔷     | 20   | File Digest (Hashing)                  | Compare to known AI-generated image hashes for rapid detection.                                                                                       | Use hash values as binary tags in a training dataset (e.g., "hash matches known AI model" → label as synthetic).                                           |
+### Legend
+- **Priority**: High (Critical), Medium (Important), Low (Nice to have)
+- **Status**: 🔷 In-Progress, ✅ Completed, 🔻 Blocked
+---
+### **Hybrid Input Table for AI Content Detection (Planned)**
+| **Strategy #** | **Description**                                                                 | **Input Components**                                                                 | **Agent Guidance / Instructions**                                                                                                              |
+|----------------|----------------------------------------------------------------------------------|--------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------|
+| 1              | Combine ELA (Error Level Analysis) with RGB images for texture discrimination.     | ELA-processed image + original RGB image (stacked as 4D tensor).                     | Use a **multi-input CNN** to process ELA maps and RGB images in parallel, or concatenate them into a 6-channel input (3 RGB + 3 ELA). |
+| 2              | Use metadata (Exif) and visual content as a **multimodal pair**.                   | Visual image + Exif metadata (as text caption).                                      | Feed the image and metadata text into a **multimodal LLM** (e.g., CLIP or MMBT). Use a cross-attention module to align metadata with visual features. |
+| 3              | Add **histogram plots** as a 1D auxiliary input for color distribution analysis.   | Image (3D input) + histogram plots (1D vector or 2D grayscale image).                | Train a **dual-stream model** (CNN for image + LSTM/Transformer for histogram data) to learn the relationship between visual and statistical features. |
+| 4              | Combine **frequency split images** (high/low) with RGB for texture detection.      | High-frequency image + low-frequency image + RGB image (as 3+3+3 input channels).    | Use a **frequency-aware CNN** to process each frequency band with separate filters, then merge features for classification.             |
+| 5              | Train a model on **bit planes values** alongside the original image.               | Bit plane images (binary black-and-white layers) + original RGB image.               | Stack or concatenate bit plane images with RGB channels before inputting to the LLM. For example, combine 3 bit planes with 3 RGB channels. |
+| 6              | Use **PRNU noise maps** and visual features to detect synthetic content.            | PRNU-noise map (grayscale) + RGB image (3D input).                                   | Train a **Siamese network** to compare PRNU maps with real-world noise databases. If PRNU is absent or mismatched, flag the image as synthetic. |
+| 7              | Stack **hex-editor-derived metadata** (e.g., file header signatures) as a channel. | Hex-derived binary patterns (encoded as 1D or 2D data) + RGB image.                  | Use a **transformer with 1D hex embeddings** as a metadata input, cross-attending with a ViT (Vision Transformer) for RGB analysis.     |
+| 8              | Add **dead/hot pixel detection maps** as a mask to highlight sensor artifacts.     | Dead/hot pixel mask (binary 2D map) + RGB image.                                     | Concatenate the mask with the RGB image as a 4th channel. Train a U-Net-style model to detect synthetic regions where the mask lacks sensor patterns. |
+| 9              | Use **PCA-reduced color projections** as a simplified input for LLMs.              | PCA-transformed color embeddings (2D/3D projection) + original image.                | Train a **transformer** to learn how PCA-projected color distributions differ between natural and synthetic images.                 |
+| 10             | Integrate **wavelet-decomposed subbands** with RGB for texture discrimination.     | Wavelet subbands (LL, LH, HL, HH) + RGB image (stacked as 7D input).                 | Design a **wavelet-aware CNN** to process each subband separately before global pooling and classification.                          |
+---
+### **Key Integration Tips for Hybrid Inputs**
+1. **Multimodal Models**
+   - Use models like **CLIP**, **BLIP**, or **MBT** to align metadata (text) with visual features (images).
+   - For example: Combine a **ViT** (for image processing) with a **Transformer** (for Exif metadata or histograms).
+2. **Feature Fusion Techniques**
+   - **Early fusion**: Concatenate inputs (e.g., ELA + RGB) before the first layer.
+   - **Late fusion**: Process inputs separately and merge features before final classification.
+   - **Cross-modal attention**: Use cross-attention to align metadata with visual features (e.g., Exif text and PRNU noise maps).
+3. **Preprocessing for Hybrid Inputs**
+   - Normalize metadata and image data to the same scale (e.g., 0–1).
+   - Convert 1D histogram data into 2D images (e.g., heatmap-like plots) for consistent input formats.
+4. **Loss Functions for Hybrid Tasks**
+   - Use **multi-task loss** (e.g., classification + regression) if metadata is involved.
+   - For consistency checks (e.g., metadata vs. visual content), use **triplet loss** or **contrastive loss**.
+---

forensics/__init__.py CHANGED Viewed

@@ -1,6 +1,6 @@
 from .bitplane import bit_plane_extractor
 from .ela import ELA
-from .exif import exif_full_dump
 from .gradient import gradient_processing
 from .minmax import minmax_process
 from .wavelet import wavelet_blocking_noise_estimation
@@ -8,7 +8,7 @@ from .wavelet import wavelet_blocking_noise_estimation
 __all__ = [
     'bit_plane_extractor',
     'ELA',
-    'exif_full_dump',
     'gradient_processing',
     'minmax_process',
     'wavelet_blocking_noise_estimation'

 from .bitplane import bit_plane_extractor
 from .ela import ELA
+# from .exif import exif_full_dump
 from .gradient import gradient_processing
 from .minmax import minmax_process
 from .wavelet import wavelet_blocking_noise_estimation
 __all__ = [
     'bit_plane_extractor',
     'ELA',
+     # 'exif_full_dump',
     'gradient_processing',
     'minmax_process',
     'wavelet_blocking_noise_estimation'

forensics/exif.py CHANGED Viewed

@@ -1,11 +1,11 @@
-import tempfile
-import exiftool
-from PIL import Image
-def exif_full_dump(image: Image.Image) -> dict:
-    """Extract all EXIF metadata from an image using exiftool."""
-    with tempfile.NamedTemporaryFile(suffix='.jpg', delete=True) as tmp:
-        image.save(tmp.name)
-        with exiftool.ExifTool() as et:
-            metadata = et.get_metadata(tmp.name)
-    return metadata

+# import tempfile
+# import exiftool
+# from PIL import Image
+# def exif_full_dump(image: Image.Image) -> dict:
+#     """Extract all EXIF metadata from an image using exiftool."""
+#     with tempfile.NamedTemporaryFile(suffix='.jpg', delete=True) as tmp:
+#         image.save(tmp.name)
+#         with exiftool.ExifTool() as et:
+#             metadata = et.get_metadata(tmp.name)
+#     return metadata

requirements.txt CHANGED Viewed

@@ -1,7 +1,7 @@
 --index-url https://download.pytorch.org/whl/nightly/cpu
 # Core ML/AI libraries
-transformers>=4.48.2
 torch
 torchvision
 torchaudio

 --index-url https://download.pytorch.org/whl/nightly/cpu
 # Core ML/AI libraries
+transformers
 torch
 torchvision
 torchaudio