--- license: apache-2.0 datasets: - Hemg/cifake-real-and-ai-generated-synthetic-images language: - en metrics: - accuracy library_name: transformers tags: - Diffusors - GanDetectors - Cifake base_model: - google/vit-base-patch16-224 inference: True --- # AI Guard Vision Model Card [![License: Apache 2.0](https://img.shields.io/badge/license-Apache--2.0-blue)](LICENSE) ## Overview This model, **AI Guard Vision**, is a Vision Transformer (ViT)-based architecture designed for image classification tasks. Its primary objective is to accurately distinguish between real and AI-generated synthetic images. The model addresses the growing challenge of detecting manipulated or fake visual content to preserve trust and integrity in digital media. ## Model Summary - **Model Type:** Vision Transformer (ViT) – `vit-base-patch16-224` - **Objective:** Real vs. AI-generated image classification - **License:** Apache 2.0 - **Fine-tuned From:** `google/vit-base-patch16-224` - **Training Dataset:** [CIFake Dataset](https://www.kaggle.com/datasets/birdy654/cifake-real-and-ai-generated-synthetic-images) - **Developer:** Aashish Kumar, IIIT Manipur ## Applications & Use Cases - **Content Moderation:** Identifying AI-generated images across media platforms. - **Digital Forensics:** Verifying the authenticity of visual content for investigative purposes. - **Trust Preservation:** Helping maintain the integrity of digital ecosystems by combating misinformation spread through fake images. ## How to Use the Model ```python from transformers import AutoImageProcessor, ViTForImageClassification import torch from PIL import Image from pillow_heif import register_heif_opener, register_avif_opener register_heif_opener() register_avif_opener() def get_prediction(img): image = Image.open(img).convert('RGB') image_processor = AutoImageProcessor.from_pretrained("AashishKumar/AIvisionGuard-v2") model = ViTForImageClassification.from_pretrained("AashishKumar/AIvisionGuard-v2") inputs = image_processor(image, return_tensors="pt") with torch.no_grad(): logits = model(**inputs).logits top2_labels = logits.topk(2).indices.squeeze().tolist() top2_scores = logits.topk(2).values.squeeze().tolist() response = [{"label": model.config.id2label[label], "score": score} for label, score in zip(top2_labels, top2_scores)] return response ``` ## Dataset Information The model was fine-tuned on the **CIFake dataset**, which contains both real and AI-generated synthetic images: - **Real Images:** Collected from the CIFAR-10 dataset. - **Fake Images:** Generated using Stable Diffusion 1.4. - **Training Data:** 100,000 images (50,000 per class). - **Testing Data:** 20,000 images (10,000 per class). ## Model Architecture - **Transformer Encoder Layers:** Utilizes self-attention mechanisms. - **Positional Encodings:** Helps the model understand image structure. - **Pretrained Weights:** Pretrained on ImageNet-21k and fine-tuned on ImageNet 2012 for enhanced performance. ### Why Vision Transformer? - **Scalability and Performance:** Excels at high-level global feature extraction. - **State-of-the-Art Accuracy:** Leverages transformers to outperform traditional CNN models. ## Training Details - **Learning Rate:** 0.0000001 - **Batch Size:** 64 - **Epochs:** 100 - **Training Time:** 1 hr 36 min ## Evaluation Metrics The model was evaluated using the CIFake test dataset, with the following metrics: - **Accuracy:** 92% - **F1 Score:** 0.89 - **Precision:** 0.85 - **Recall:** 0.88 | Model | Accuracy | F1-Score | Precision | Recall | |---------------|----------|----------|-----------|--------| | Baseline | 85% | 0.82 | 0.78 | 0.80 | | Augmented | 88% | 0.85 | 0.83 | 0.84 | | Fine-tuned ViT| **92%** | **0.89** | **0.85** | **0.88**| ## Evaluation Fig: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/640ed1fb06c3b5ca883d5ad5/vmiE8IhMLUwJIOLK-Q9dT.png) ## System Workflow - **Frontend:** ReactJS - **Backend:** Python Flask - **Database:** PostgreSQL(Supabase) - **Model:** Deployed via Pytorch and TensorFlow frameworks ## Strengths and Limitations ### Strengths: - **High Accuracy:** Achieves state-of-the-art performance in distinguishing real and synthetic images. - **Pretrained on ImageNet-21k:** Allows for efficient transfer learning and robust generalization. ### Limitations: - **Synthetic Image Diversity:** The model may underperform on novel or unseen synthetic images that are significantly different from the training data. - **Data Bias:** Like all machine learning models, its predictions may reflect biases present in the training data. ## Conclusion and Future Work This model provides a highly effective tool for detecting AI-generated synthetic images and has promising applications in content moderation, digital forensics, and trust preservation. Future improvements may include: - **Hybrid Architectures:** Combining transformers with convolutional layers for improved performance. - **Multimodal Detection:** Incorporating additional modalities (e.g., metadata or contextual information) for more comprehensive detection.