AIvisionGuard-v2 / README.md
AashishKumar's picture
Update README.md
fc1de3e verified
---
license: apache-2.0
datasets:
- Hemg/cifake-real-and-ai-generated-synthetic-images
language:
- en
metrics:
- accuracy
library_name: transformers
tags:
- Diffusors
- GanDetectors
- Cifake
base_model:
- google/vit-base-patch16-224
inference: True
---
# AI Guard Vision Model Card
[![License: Apache 2.0](https://img.shields.io/badge/license-Apache--2.0-blue)](LICENSE)
## Overview
This model, **AI Guard Vision**, is a Vision Transformer (ViT)-based architecture designed for image classification tasks. Its primary objective is to accurately distinguish between real and AI-generated synthetic images. The model addresses the growing challenge of detecting manipulated or fake visual content to preserve trust and integrity in digital media.
## Model Summary
- **Model Type:** Vision Transformer (ViT) – `vit-base-patch16-224`
- **Objective:** Real vs. AI-generated image classification
- **License:** Apache 2.0
- **Fine-tuned From:** `google/vit-base-patch16-224`
- **Training Dataset:** [CIFake Dataset](https://www.kaggle.com/datasets/birdy654/cifake-real-and-ai-generated-synthetic-images)
- **Developer:** Aashish Kumar, IIIT Manipur
## Applications & Use Cases
- **Content Moderation:** Identifying AI-generated images across media platforms.
- **Digital Forensics:** Verifying the authenticity of visual content for investigative purposes.
- **Trust Preservation:** Helping maintain the integrity of digital ecosystems by combating misinformation spread through fake images.
## How to Use the Model
```python
from transformers import AutoImageProcessor, ViTForImageClassification
import torch
from PIL import Image
from pillow_heif import register_heif_opener, register_avif_opener
register_heif_opener()
register_avif_opener()
def get_prediction(img):
image = Image.open(img).convert('RGB')
image_processor = AutoImageProcessor.from_pretrained("AashishKumar/AIvisionGuard-v2")
model = ViTForImageClassification.from_pretrained("AashishKumar/AIvisionGuard-v2")
inputs = image_processor(image, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
top2_labels = logits.topk(2).indices.squeeze().tolist()
top2_scores = logits.topk(2).values.squeeze().tolist()
response = [{"label": model.config.id2label[label], "score": score} for label, score in zip(top2_labels, top2_scores)]
return response
```
## Dataset Information
The model was fine-tuned on the **CIFake dataset**, which contains both real and AI-generated synthetic images:
- **Real Images:** Collected from the CIFAR-10 dataset.
- **Fake Images:** Generated using Stable Diffusion 1.4.
- **Training Data:** 100,000 images (50,000 per class).
- **Testing Data:** 20,000 images (10,000 per class).
## Model Architecture
- **Transformer Encoder Layers:** Utilizes self-attention mechanisms.
- **Positional Encodings:** Helps the model understand image structure.
- **Pretrained Weights:** Pretrained on ImageNet-21k and fine-tuned on ImageNet 2012 for enhanced performance.
### Why Vision Transformer?
- **Scalability and Performance:** Excels at high-level global feature extraction.
- **State-of-the-Art Accuracy:** Leverages transformers to outperform traditional CNN models.
## Training Details
- **Learning Rate:** 0.0000001
- **Batch Size:** 64
- **Epochs:** 100
- **Training Time:** 1 hr 36 min
## Evaluation Metrics
The model was evaluated using the CIFake test dataset, with the following metrics:
- **Accuracy:** 92%
- **F1 Score:** 0.89
- **Precision:** 0.85
- **Recall:** 0.88
| Model | Accuracy | F1-Score | Precision | Recall |
|---------------|----------|----------|-----------|--------|
| Baseline | 85% | 0.82 | 0.78 | 0.80 |
| Augmented | 88% | 0.85 | 0.83 | 0.84 |
| Fine-tuned ViT| **92%** | **0.89** | **0.85** | **0.88**|
## Evaluation Fig:
![image/png](https://cdn-uploads.huggingface.co/production/uploads/640ed1fb06c3b5ca883d5ad5/vmiE8IhMLUwJIOLK-Q9dT.png)
## System Workflow
- **Frontend:** ReactJS
- **Backend:** Python Flask
- **Database:** PostgreSQL(Supabase)
- **Model:** Deployed via Pytorch and TensorFlow frameworks
## Strengths and Limitations
### Strengths:
- **High Accuracy:** Achieves state-of-the-art performance in distinguishing real and synthetic images.
- **Pretrained on ImageNet-21k:** Allows for efficient transfer learning and robust generalization.
### Limitations:
- **Synthetic Image Diversity:** The model may underperform on novel or unseen synthetic images that are significantly different from the training data.
- **Data Bias:** Like all machine learning models, its predictions may reflect biases present in the training data.
## Conclusion and Future Work
This model provides a highly effective tool for detecting AI-generated synthetic images and has promising applications in content moderation, digital forensics, and trust preservation. Future improvements may include:
- **Hybrid Architectures:** Combining transformers with convolutional layers for improved performance.
- **Multimodal Detection:** Incorporating additional modalities (e.g., metadata or contextual information) for more comprehensive detection.