docs: expand README with roadmap, feature status, and AI content detection tools; update requirements for transformers
Browse files- README.md +116 -1
- forensics/__init__.py +2 -2
- forensics/exif.py +10 -10
- requirements.txt +1 -1
README.md
CHANGED
@@ -186,4 +186,119 @@ When you upload an image for analysis and click the "Predict" or "Augment & Pred
|
|
186 |
* The final consensus label is prepared with appropriate styling.
|
187 |
* **Data Type Conversion**: Numerical values (like AI Score, Real Score) are converted to standard Python floats to ensure proper JSON serialization.
|
188 |
|
189 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
186 |
* The final consensus label is prepared with appropriate styling.
|
187 |
* **Data Type Conversion**: Numerical values (like AI Score, Real Score) are converted to standard Python floats to ensure proper JSON serialization.
|
188 |
|
189 |
+
---
|
190 |
+
|
191 |
+
## Roadmap & Features
|
192 |
+
|
193 |
+
### In Progress & Pending Tasks
|
194 |
+
|
195 |
+
| Task | Status | Priority | Notes |
|
196 |
+
|------|--------|----------|-------|
|
197 |
+
| [x] Set up basic ensemble model architecture | β
Completed | High | Core framework established |
|
198 |
+
| [x] Implement initial forensic analysis tools | β
Completed | High | ELA, Gradient, MinMax processing |
|
199 |
+
| [x] Create intelligent agent system | β
Completed | High | All monitoring agents implemented |
|
200 |
+
| [x] Refactor Gradio interface for MCP | β
Completed | Medium | User-friendly web interface |
|
201 |
+
| [x] Integrate multiple deepfake detection models | β
Completed | High | 7 models successfully integrated |
|
202 |
+
| [x] Implement weighted consensus algorithm | β
Completed | High | Dynamic weight adjustment working |
|
203 |
+
| [x] Add image augmentation capabilities | β
Completed | Medium | Rotation, noise, sharpening features |
|
204 |
+
| [x] Set up data logging to Hugging Face | β
Completed | Medium | Continuous improvement pipeline |
|
205 |
+
| [x] Create system health monitoring | β
Completed | Medium | Resource usage tracking |
|
206 |
+
| [x] Implement contextual intelligence analysis | β
Completed | Medium | Context tag inference system |
|
207 |
+
| [ ] Implement real-time model performance monitoring | π· In Progress | High | Add live metrics dashboard |
|
208 |
+
| [ ] Add support for video deepfake detection | Pending | Medium | Extend current image-based system |
|
209 |
+
| [ ] Optimize forensic analysis processing speed | π· In Progress | High | Current ELA processing is slow |
|
210 |
+
| [ ] Implement batch processing for multiple images | π· In Progress | Medium | Improve throughput for bulk analysis |
|
211 |
+
| [ ] Add model confidence threshold configuration | Pending | Low | Allow users to adjust sensitivity |
|
212 |
+
| [ ] Create test suite | Pending | High | Unit tests for all agents and models |
|
213 |
+
| [ ] Implement model versioning and rollback | Pending | Medium | Track model performance over time |
|
214 |
+
| [ ] Add export functionality for analysis reports | Pending | Low | PDF/CSV export options |
|
215 |
+
| [ ] Optimize memory usage for large images | π· In Progress | High | Handle 4K+ resolution images |
|
216 |
+
| [ ] Add support for additional forensic techniques | π· In Progress | Medium | Consider adding noise analysis |
|
217 |
+
| [ ] Implement user authentication system | Pending | Low | For enterprise deployment |
|
218 |
+
| [ ] Create API documentation | π· In Progress | Medium | OpenAPI/Swagger specs |
|
219 |
+
| [ ] Add model ensemble validation metrics | Pending | High | Cross-validation for weight optimization |
|
220 |
+
| [ ] Implement caching for repeated analyses | Pending | Medium | Reduce redundant processing |
|
221 |
+
| [ ] Add support for custom model integration | Pending | Low | Plugin architecture for new models |
|
222 |
+
|
223 |
+
### Legend
|
224 |
+
- **Priority**: High (Critical), Medium (Important), Low (Nice to have)
|
225 |
+
- **Status**: Pending, π· In Progress, β
Completed, π» Blocked
|
226 |
+
|
227 |
+
---
|
228 |
+
|
229 |
+
Digital Forensics Implementation
|
230 |
+
|
231 |
+
|
232 |
+
Here's the updated table with an additional column providing **instructions on how to use these tools with vision LLMs** (e.g., CLIP, Vision Transformers, or CNNs) for effective AI content detection:
|
233 |
+
|
234 |
+
---
|
235 |
+
|
236 |
+
### **Top 20 Tools for AI Content Detection (with Vision LLM Integration Guidance)**
|
237 |
+
|
238 |
+
| Status | Rank | Tool/Algorithm | Reason | **Agent Guidance / Instructions** |
|
239 |
+
|--------|------|----------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|
|
240 |
+
| β
| 1 | Noise Separation | Detect synthetic noise patterns absent in natural images. | Train the LLM on noise-separated image patches to recognize AI-specific noise textures (e.g., overly smooth or missing thermal noise). |
|
241 |
+
| π· | 2 | EXIF Full Dump | AI-generated images lack valid metadata (e.g., camera model, geolocation). | Input the image *and its metadata as text* to a **multimodal LLM** (e.g., image + metadata caption). Flag inconsistencies (e.g., missing GPS, invalid timestamps). |
|
242 |
+
| β
| 3 | Error Level Analysis (ELA) | Reveals compression artifacts unique to AI-generated images. | Preprocess images via ELA before input to the LLM. Train the model to detect high-error regions indicative of synthetic content. |
|
243 |
+
| π· | 4 | JPEG Ghost Maps | Identifies compression history anomalies. | Use ghost maps as a separate input channel (e.g., overlay ELA results on the RGB image) to train the LLM on synthetic vs. natural compression traces. |
|
244 |
+
| π· | 5 | Copy-Move Forgery | AI models often clone/reuse elements. | Train the LLM to detect duplicated regions via frequency analysis or gradient-based saliency maps (e.g., using a Siamese network to compare image segments). |
|
245 |
+
| β
| 6 | Channel Histograms | Skewed color distributions in AI-generated images. | Feed the **histogram plots** as additional input (e.g., as a grayscale image) to highlight unnatural color profiles in the LLM. |
|
246 |
+
| π· | 7 | Pixel Statistics | Unnatural RGB value deviations in AI-generated images. | Train the LLM on datasets with metadata tags indicating mean/max/min RGB values, using these stats as part of the training signal. |
|
247 |
+
| π· | 8 | JPEG Quality Estimation | AI-generated content may have atypical JPEG quality settings. | Preprocess the image to expose JPEG quality artifacts (e.g., blockiness) and train the LLM to identify these patterns via loss functions tuned to compression. |
|
248 |
+
| π· | 9 | Resampling Detection | AI tools may upscale/rotate images, leaving subpixel-level artifacts. | Use **frequency analysis** modules in the LLM (e.g., Fourier-transformed images) to detect MoirΓ© patterns or grid distortions from resampling. |
|
249 |
+
| β
| 10 | PCA Projection | Highlights synthetic color distributions. | Apply PCA to reduce color dimensions and input the 2D/3D projection to the LLM as a simplified feature space. |
|
250 |
+
| β
| 11 | Bit Planes Values | Detect synthetic noise patterns absent in natural images. | Analyze individual bit planes (e.g., bit plane 1β8) and feed the binary images to the LLM to train on AI-specific bit-plane anomalies. |
|
251 |
+
| π· | 12 | Median Filtering Traces | AI pre/post-processing steps mimic median filtering. | Train the LLM on synthetically filtered images to recognize AI-applied diffusion artifacts. |
|
252 |
+
| β
| 13 | Wavelet Threshold | Identifies AI-generated texture inconsistencies. | Use wavelet-decomposed images as input channels to the LLM to isolate synthetic textures vs. natural textures. |
|
253 |
+
| β
| 14 | Frequency Split | AI may generate unnatural gradients or sharpness. | Separate high/low frequencies and train the LLM to detect missing high-frequency content in AI-generated regions (e.g., over-smoothed edges). |
|
254 |
+
| π· | 15 | PRNU Identification | Absence of sensor-specific noise in AI-generated images. | Train the LLM on PRNU-noise databases to detect the absence or mismatch of sensor-specific noise in unlabeled images. |
|
255 |
+
| π· | 16 | EXIF Tampering Detection | AI may falsify metadata. | Flag images with inconsistent Exif hashes (e.g., mismatched EXIF/visual content) and use metadata tags as training labels. |
|
256 |
+
| π· | 17 | Composite Splicing | AI-generated images often stitch elements with inconsistencies. | Use **edge-aware models** (e.g., CRFL-like architectures) to detect lighting/shadow mismatches in spliced regions. |
|
257 |
+
| π· | 18 | RGB/HSV Plots | AI-generated images have unnatural color distributions. | Input RGB/HSV channel plots as 1D signals to the LLM's classifier head, along with the original image. |
|
258 |
+
| π· | 19 | Dead/Hot Pixel Analysis | Absence of sensor-level imperfections in AI-generated images. | Use pre-trained sensor noise databases to train the LLM to flag images missing dead/hot pixels. |
|
259 |
+
| π· | 20 | File Digest (Hashing) | Compare to known AI-generated image hashes for rapid detection. | Use hash values as binary tags in a training dataset (e.g., "hash matches known AI model" β label as synthetic). |
|
260 |
+
|
261 |
+
### Legend
|
262 |
+
- **Priority**: High (Critical), Medium (Important), Low (Nice to have)
|
263 |
+
- **Status**: π· In-Progress, β
Completed, π» Blocked
|
264 |
+
|
265 |
+
|
266 |
+
---
|
267 |
+
|
268 |
+
### **Hybrid Input Table for AI Content Detection (Planned)**
|
269 |
+
|
270 |
+
| **Strategy #** | **Description** | **Input Components** | **Agent Guidance / Instructions** |
|
271 |
+
|----------------|----------------------------------------------------------------------------------|--------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------|
|
272 |
+
| 1 | Combine ELA (Error Level Analysis) with RGB images for texture discrimination. | ELA-processed image + original RGB image (stacked as 4D tensor). | Use a **multi-input CNN** to process ELA maps and RGB images in parallel, or concatenate them into a 6-channel input (3 RGB + 3 ELA). |
|
273 |
+
| 2 | Use metadata (Exif) and visual content as a **multimodal pair**. | Visual image + Exif metadata (as text caption). | Feed the image and metadata text into a **multimodal LLM** (e.g., CLIP or MMBT). Use a cross-attention module to align metadata with visual features. |
|
274 |
+
| 3 | Add **histogram plots** as a 1D auxiliary input for color distribution analysis. | Image (3D input) + histogram plots (1D vector or 2D grayscale image). | Train a **dual-stream model** (CNN for image + LSTM/Transformer for histogram data) to learn the relationship between visual and statistical features. |
|
275 |
+
| 4 | Combine **frequency split images** (high/low) with RGB for texture detection. | High-frequency image + low-frequency image + RGB image (as 3+3+3 input channels). | Use a **frequency-aware CNN** to process each frequency band with separate filters, then merge features for classification. |
|
276 |
+
| 5 | Train a model on **bit planes values** alongside the original image. | Bit plane images (binary black-and-white layers) + original RGB image. | Stack or concatenate bit plane images with RGB channels before inputting to the LLM. For example, combine 3 bit planes with 3 RGB channels. |
|
277 |
+
| 6 | Use **PRNU noise maps** and visual features to detect synthetic content. | PRNU-noise map (grayscale) + RGB image (3D input). | Train a **Siamese network** to compare PRNU maps with real-world noise databases. If PRNU is absent or mismatched, flag the image as synthetic. |
|
278 |
+
| 7 | Stack **hex-editor-derived metadata** (e.g., file header signatures) as a channel. | Hex-derived binary patterns (encoded as 1D or 2D data) + RGB image. | Use a **transformer with 1D hex embeddings** as a metadata input, cross-attending with a ViT (Vision Transformer) for RGB analysis. |
|
279 |
+
| 8 | Add **dead/hot pixel detection maps** as a mask to highlight sensor artifacts. | Dead/hot pixel mask (binary 2D map) + RGB image. | Concatenate the mask with the RGB image as a 4th channel. Train a U-Net-style model to detect synthetic regions where the mask lacks sensor patterns. |
|
280 |
+
| 9 | Use **PCA-reduced color projections** as a simplified input for LLMs. | PCA-transformed color embeddings (2D/3D projection) + original image. | Train a **transformer** to learn how PCA-projected color distributions differ between natural and synthetic images. |
|
281 |
+
| 10 | Integrate **wavelet-decomposed subbands** with RGB for texture discrimination. | Wavelet subbands (LL, LH, HL, HH) + RGB image (stacked as 7D input). | Design a **wavelet-aware CNN** to process each subband separately before global pooling and classification. |
|
282 |
+
|
283 |
+
---
|
284 |
+
|
285 |
+
### **Key Integration Tips for Hybrid Inputs**
|
286 |
+
1. **Multimodal Models**
|
287 |
+
- Use models like **CLIP**, **BLIP**, or **MBT** to align metadata (text) with visual features (images).
|
288 |
+
- For example: Combine a **ViT** (for image processing) with a **Transformer** (for Exif metadata or histograms).
|
289 |
+
|
290 |
+
2. **Feature Fusion Techniques**
|
291 |
+
- **Early fusion**: Concatenate inputs (e.g., ELA + RGB) before the first layer.
|
292 |
+
- **Late fusion**: Process inputs separately and merge features before final classification.
|
293 |
+
- **Cross-modal attention**: Use cross-attention to align metadata with visual features (e.g., Exif text and PRNU noise maps).
|
294 |
+
|
295 |
+
3. **Preprocessing for Hybrid Inputs**
|
296 |
+
- Normalize metadata and image data to the same scale (e.g., 0β1).
|
297 |
+
- Convert 1D histogram data into 2D images (e.g., heatmap-like plots) for consistent input formats.
|
298 |
+
|
299 |
+
4. **Loss Functions for Hybrid Tasks**
|
300 |
+
- Use **multi-task loss** (e.g., classification + regression) if metadata is involved.
|
301 |
+
- For consistency checks (e.g., metadata vs. visual content), use **triplet loss** or **contrastive loss**.
|
302 |
+
|
303 |
+
---
|
304 |
+
|
forensics/__init__.py
CHANGED
@@ -1,6 +1,6 @@
|
|
1 |
from .bitplane import bit_plane_extractor
|
2 |
from .ela import ELA
|
3 |
-
from .exif import exif_full_dump
|
4 |
from .gradient import gradient_processing
|
5 |
from .minmax import minmax_process
|
6 |
from .wavelet import wavelet_blocking_noise_estimation
|
@@ -8,7 +8,7 @@ from .wavelet import wavelet_blocking_noise_estimation
|
|
8 |
__all__ = [
|
9 |
'bit_plane_extractor',
|
10 |
'ELA',
|
11 |
-
|
12 |
'gradient_processing',
|
13 |
'minmax_process',
|
14 |
'wavelet_blocking_noise_estimation'
|
|
|
1 |
from .bitplane import bit_plane_extractor
|
2 |
from .ela import ELA
|
3 |
+
# from .exif import exif_full_dump
|
4 |
from .gradient import gradient_processing
|
5 |
from .minmax import minmax_process
|
6 |
from .wavelet import wavelet_blocking_noise_estimation
|
|
|
8 |
__all__ = [
|
9 |
'bit_plane_extractor',
|
10 |
'ELA',
|
11 |
+
# 'exif_full_dump',
|
12 |
'gradient_processing',
|
13 |
'minmax_process',
|
14 |
'wavelet_blocking_noise_estimation'
|
forensics/exif.py
CHANGED
@@ -1,11 +1,11 @@
|
|
1 |
-
import tempfile
|
2 |
-
import exiftool
|
3 |
-
from PIL import Image
|
4 |
|
5 |
-
def exif_full_dump(image: Image.Image) -> dict:
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
|
|
1 |
+
# import tempfile
|
2 |
+
# import exiftool
|
3 |
+
# from PIL import Image
|
4 |
|
5 |
+
# def exif_full_dump(image: Image.Image) -> dict:
|
6 |
+
# """Extract all EXIF metadata from an image using exiftool."""
|
7 |
+
# with tempfile.NamedTemporaryFile(suffix='.jpg', delete=True) as tmp:
|
8 |
+
# image.save(tmp.name)
|
9 |
+
# with exiftool.ExifTool() as et:
|
10 |
+
# metadata = et.get_metadata(tmp.name)
|
11 |
+
# return metadata
|
requirements.txt
CHANGED
@@ -1,7 +1,7 @@
|
|
1 |
--index-url https://download.pytorch.org/whl/nightly/cpu
|
2 |
|
3 |
# Core ML/AI libraries
|
4 |
-
transformers
|
5 |
torch
|
6 |
torchvision
|
7 |
torchaudio
|
|
|
1 |
--index-url https://download.pytorch.org/whl/nightly/cpu
|
2 |
|
3 |
# Core ML/AI libraries
|
4 |
+
transformers
|
5 |
torch
|
6 |
torchvision
|
7 |
torchaudio
|