AIvisionGuard-v2 / README.md

Update README.md

fc1de3e verified 10 months ago

5.21 kB

	---
	license: apache-2.0
	datasets:
	- Hemg/cifake-real-and-ai-generated-synthetic-images
	language:
	- en
	metrics:
	- accuracy
	library_name: transformers
	tags:
	- Diffusors
	- GanDetectors
	- Cifake
	base_model:
	- google/vit-base-patch16-224
	inference: True
	---
	# AI Guard Vision Model Card

	[![License: Apache 2.0](https://img.shields.io/badge/license-Apache--2.0-blue)](LICENSE)

	## Overview

	This model, AI Guard Vision, is a Vision Transformer (ViT)-based architecture designed for image classification tasks. Its primary objective is to accurately distinguish between real and AI-generated synthetic images. The model addresses the growing challenge of detecting manipulated or fake visual content to preserve trust and integrity in digital media.

	## Model Summary

	- Model Type: Vision Transformer (ViT) – `vit-base-patch16-224`
	- Objective: Real vs. AI-generated image classification
	- License: Apache 2.0
	- Fine-tuned From: `google/vit-base-patch16-224`
	- Training Dataset: [CIFake Dataset](https://www.kaggle.com/datasets/birdy654/cifake-real-and-ai-generated-synthetic-images)
	- Developer: Aashish Kumar, IIIT Manipur

	## Applications & Use Cases

	- Content Moderation: Identifying AI-generated images across media platforms.
	- Digital Forensics: Verifying the authenticity of visual content for investigative purposes.
	- Trust Preservation: Helping maintain the integrity of digital ecosystems by combating misinformation spread through fake images.

	## How to Use the Model

	```python
	from transformers import AutoImageProcessor, ViTForImageClassification
	import torch
	from PIL import Image
	from pillow_heif import register_heif_opener, register_avif_opener

	register_heif_opener()
	register_avif_opener()

	def get_prediction(img):
	image = Image.open(img).convert('RGB')
	image_processor = AutoImageProcessor.from_pretrained("AashishKumar/AIvisionGuard-v2")
	model = ViTForImageClassification.from_pretrained("AashishKumar/AIvisionGuard-v2")
	inputs = image_processor(image, return_tensors="pt")

	with torch.no_grad():
	logits = model(**inputs).logits

	top2_labels = logits.topk(2).indices.squeeze().tolist()
	top2_scores = logits.topk(2).values.squeeze().tolist()

	response = [{"label": model.config.id2label[label], "score": score} for label, score in zip(top2_labels, top2_scores)]
	return response
	```

	## Dataset Information

	The model was fine-tuned on the CIFake dataset, which contains both real and AI-generated synthetic images:
	- Real Images: Collected from the CIFAR-10 dataset.
	- Fake Images: Generated using Stable Diffusion 1.4.
	- Training Data: 100,000 images (50,000 per class).
	- Testing Data: 20,000 images (10,000 per class).

	## Model Architecture

	- Transformer Encoder Layers: Utilizes self-attention mechanisms.
	- Positional Encodings: Helps the model understand image structure.
	- Pretrained Weights: Pretrained on ImageNet-21k and fine-tuned on ImageNet 2012 for enhanced performance.

	### Why Vision Transformer?

	- Scalability and Performance: Excels at high-level global feature extraction.
	- State-of-the-Art Accuracy: Leverages transformers to outperform traditional CNN models.

	## Training Details

	- Learning Rate: 0.0000001
	- Batch Size: 64
	- Epochs: 100
	- Training Time: 1 hr 36 min

	## Evaluation Metrics

	The model was evaluated using the CIFake test dataset, with the following metrics:

	- Accuracy: 92%
	- F1 Score: 0.89
	- Precision: 0.85
	- Recall: 0.88

	\| Model \| Accuracy \| F1-Score \| Precision \| Recall \|
	\|---------------\|----------\|----------\|-----------\|--------\|
	\| Baseline \| 85% \| 0.82 \| 0.78 \| 0.80 \|
	\| Augmented \| 88% \| 0.85 \| 0.83 \| 0.84 \|
	\| Fine-tuned ViT\| 92% \| 0.89 \| 0.85 \| 0.88\|

	## Evaluation Fig:
	![image/png](https://cdn-uploads.huggingface.co/production/uploads/640ed1fb06c3b5ca883d5ad5/vmiE8IhMLUwJIOLK-Q9dT.png)

	## System Workflow

	- Frontend: ReactJS
	- Backend: Python Flask
	- Database: PostgreSQL(Supabase)
	- Model: Deployed via Pytorch and TensorFlow frameworks

	## Strengths and Limitations

	### Strengths:
	- High Accuracy: Achieves state-of-the-art performance in distinguishing real and synthetic images.
	- Pretrained on ImageNet-21k: Allows for efficient transfer learning and robust generalization.

	### Limitations:
	- Synthetic Image Diversity: The model may underperform on novel or unseen synthetic images that are significantly different from the training data.
	- Data Bias: Like all machine learning models, its predictions may reflect biases present in the training data.

	## Conclusion and Future Work

	This model provides a highly effective tool for detecting AI-generated synthetic images and has promising applications in content moderation, digital forensics, and trust preservation. Future improvements may include:
	- Hybrid Architectures: Combining transformers with convolutional layers for improved performance.
	- Multimodal Detection: Incorporating additional modalities (e.g., metadata or contextual information) for more comprehensive detection.