AashishKumar commited on
Commit
12fe065
·
verified ·
1 Parent(s): 6569276

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -73
README.md CHANGED
@@ -11,111 +11,119 @@ tags:
11
  - Diffusors
12
  - GanDetectors
13
  - Cifake
 
 
14
  ---
15
- # Model Card for Model ID
16
 
17
- <!-- Provide a quick summary of what the model is/does. -->
18
 
19
- This model card provides comprehensive information about the model's architecture, training data, evaluation metrics, and environmental impact.
20
 
21
- ## Model Details
22
 
23
- ### Model Description
24
 
25
- This model is a pre-trained model for image classification, specifically designed for detecting fake images, including both real and AI-generated synthetic images. It utilizes the ViT (Vision Transformer) architecture for image classification tasks.
 
 
 
 
 
26
 
27
- - **Developed by:** [Author(s) Name(s)]
28
- - **Funded by [optional]:** [Funding Source(s)]
29
- - **Shared by [optional]:** [Organization/Individual(s) Sharing the Model]
30
- - **Model type:** Vision Transformer (ViT)
31
- - **Language(s) (NLP):** N/A
32
- - **License:** Apache License 2.0
33
- - **Finetuned from model [optional]:** [Base Pre-trained Model]
34
 
35
- ### Model Sources [optional]
 
 
36
 
37
- - **Repository:** https://github.com/AashishKumar-3002/AIGuardVision.git
38
 
39
- ## Uses
 
 
 
 
40
 
41
- ### Direct Use
 
42
 
43
- This model can be directly used for classifying images as real or AI-generated synthetic images.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
45
- ### Downstream Use [optional]
46
 
47
- This model can be fine-tuned for specific image classification tasks related to detecting fake images in various domains.
 
 
 
 
48
 
49
- ### Out-of-Scope Use
50
 
51
- The model may not perform well on tasks outside the scope of image classification, such as object detection or segmentation.
 
 
52
 
53
- ## Bias, Risks, and Limitations
54
 
55
- The model's performance may be influenced by biases in the training data, leading to potential inaccuracies in classification.
56
-
57
- ### Recommendations
58
-
59
- Users should be aware of potential biases and limitations when using the model for classification tasks, and additional data sources may be necessary to mitigate biases.
60
-
61
- ## How to Get Started with the Model
62
-
63
- Use the code below to get started with the model:
64
-
65
- [Code Snippet for Model Usage]
66
 
67
  ## Training Details
68
 
69
- ### Training Data
70
-
71
- The model was trained on the CIFake dataset, which contains real and AI-generated synthetic images for training the classification model.
72
-
73
- ### Training Procedure
74
-
75
- #### Preprocessing [optional]
76
-
77
- Data preprocessing techniques were applied to the training data, including normalization and data augmentation to improve model generalization.
78
-
79
- #### Training Hyperparameters
80
-
81
- - **Training regime:** Fine-tuning with a learning rate of 0.0000001
82
  - **Batch Size:** 64
83
  - **Epochs:** 100
84
-
85
- #### Speeds, Sizes, Times [optional]
86
-
87
  - **Training Time:** 1 hr 36 min
88
 
89
- ## Evaluation
90
-
91
- ### Testing Data, Factors & Metrics
92
-
93
- #### Testing Data
94
-
95
- The model was evaluated on a separate test set from the CIFake dataset.
96
-
97
- #### Factors
98
-
99
- The evaluation considered factors such as class imbalance and dataset diversity.
100
-
101
- #### Metrics
102
 
103
- Evaluation metrics included accuracy, precision, recall, and F1-score.
104
 
105
- ### Results
 
 
 
106
 
107
- The model achieved an accuracy of [Accuracy] on the test set, with detailed metrics summarized in the following table:
 
 
 
 
108
 
109
- [Metrics Table]
110
 
111
- ## Model Examination [optional]
 
 
 
112
 
113
- <!-- Relevant interpretability work for the model goes here -->
114
 
115
- [Information on Model Examination, if available]
 
 
116
 
117
- ## Technical Specifications [optional]
 
 
118
 
119
- ### Model Architecture and Objective
120
 
121
- The model architecture is based on the Vision Transformer (ViT) architecture, which uses self-attention mechanisms for image classification tasks.
 
 
 
11
  - Diffusors
12
  - GanDetectors
13
  - Cifake
14
+ base_model:
15
+ - google/vit-base-patch16-224
16
  ---
17
+ # AI Guard Vision Model Card
18
 
19
+ [![License: Apache 2.0](https://img.shields.io/badge/license-Apache--2.0-blue)](LICENSE)
20
 
21
+ ## Overview
22
 
23
+ This model, **AI Guard Vision**, is a Vision Transformer (ViT)-based architecture designed for image classification tasks. Its primary objective is to accurately distinguish between real and AI-generated synthetic images. The model addresses the growing challenge of detecting manipulated or fake visual content to preserve trust and integrity in digital media.
24
 
25
+ ## Model Summary
26
 
27
+ - **Model Type:** Vision Transformer (ViT) `vit-base-patch16-224`
28
+ - **Objective:** Real vs. AI-generated image classification
29
+ - **License:** Apache 2.0
30
+ - **Fine-tuned From:** `google/vit-base-patch16-224`
31
+ - **Training Dataset:** [CIFake Dataset](https://www.kaggle.com/datasets/birdy654/cifake-real-and-ai-generated-synthetic-images)
32
+ - **Developer:** Aashish Kumar, IIIT Manipur
33
 
34
+ ## Applications & Use Cases
 
 
 
 
 
 
35
 
36
+ - **Content Moderation:** Identifying AI-generated images across media platforms.
37
+ - **Digital Forensics:** Verifying the authenticity of visual content for investigative purposes.
38
+ - **Trust Preservation:** Helping maintain the integrity of digital ecosystems by combating misinformation spread through fake images.
39
 
40
+ ## How to Use the Model
41
 
42
+ ```python
43
+ from transformers import AutoImageProcessor, ViTForImageClassification
44
+ import torch
45
+ from PIL import Image
46
+ from pillow_heif import register_heif_opener, register_avif_opener
47
 
48
+ register_heif_opener()
49
+ register_avif_opener()
50
 
51
+ def get_prediction(img):
52
+ image = Image.open(img).convert('RGB')
53
+ image_processor = AutoImageProcessor.from_pretrained("AashishKumar/AIvisionGuard-v2")
54
+ model = ViTForImageClassification.from_pretrained("AashishKumar/AIvisionGuard-v2")
55
+ inputs = image_processor(image, return_tensors="pt")
56
+
57
+ with torch.no_grad():
58
+ logits = model(**inputs).logits
59
+
60
+ top2_labels = logits.topk(2).indices.squeeze().tolist()
61
+ top2_scores = logits.topk(2).values.squeeze().tolist()
62
+
63
+ response = [{"label": model.config.id2label[label], "score": score} for label, score in zip(top2_labels, top2_scores)]
64
+ return response
65
+ ```
66
 
67
+ ## Dataset Information
68
 
69
+ The model was fine-tuned on the **CIFake dataset**, which contains both real and AI-generated synthetic images:
70
+ - **Real Images:** Collected from the CIFAR-10 dataset.
71
+ - **Fake Images:** Generated using Stable Diffusion 1.4.
72
+ - **Training Data:** 100,000 images (50,000 per class).
73
+ - **Testing Data:** 20,000 images (10,000 per class).
74
 
75
+ ## Model Architecture
76
 
77
+ - **Transformer Encoder Layers:** Utilizes self-attention mechanisms.
78
+ - **Positional Encodings:** Helps the model understand image structure.
79
+ - **Pretrained Weights:** Pretrained on ImageNet-21k and fine-tuned on ImageNet 2012 for enhanced performance.
80
 
81
+ ### Why Vision Transformer?
82
 
83
+ - **Scalability and Performance:** Excels at high-level global feature extraction.
84
+ - **State-of-the-Art Accuracy:** Leverages transformers to outperform traditional CNN models.
 
 
 
 
 
 
 
 
 
85
 
86
  ## Training Details
87
 
88
+ - **Learning Rate:** 0.0000001
 
 
 
 
 
 
 
 
 
 
 
 
89
  - **Batch Size:** 64
90
  - **Epochs:** 100
 
 
 
91
  - **Training Time:** 1 hr 36 min
92
 
93
+ ## Evaluation Metrics
 
 
 
 
 
 
 
 
 
 
 
 
94
 
95
+ The model was evaluated using the CIFake test dataset, with the following metrics:
96
 
97
+ - **Accuracy:** 92%
98
+ - **F1 Score:** 0.89
99
+ - **Precision:** 0.85
100
+ - **Recall:** 0.88
101
 
102
+ | Model | Accuracy | F1-Score | Precision | Recall |
103
+ |---------------|----------|----------|-----------|--------|
104
+ | Baseline | 85% | 0.82 | 0.78 | 0.80 |
105
+ | Augmented | 88% | 0.85 | 0.83 | 0.84 |
106
+ | Fine-tuned ViT| **92%** | **0.89** | **0.85** | **0.88**|
107
 
108
+ ## System Workflow
109
 
110
+ - **Frontend:** ReactJS
111
+ - **Backend:** Python Flask
112
+ - **Database:** PostgreSQL
113
+ - **Model:** Deployed via Pytorch and TensorFlow frameworks
114
 
115
+ ## Strengths and Limitations
116
 
117
+ ### Strengths:
118
+ - **High Accuracy:** Achieves state-of-the-art performance in distinguishing real and synthetic images.
119
+ - **Pretrained on ImageNet-21k:** Allows for efficient transfer learning and robust generalization.
120
 
121
+ ### Limitations:
122
+ - **Synthetic Image Diversity:** The model may underperform on novel or unseen synthetic images that are significantly different from the training data.
123
+ - **Data Bias:** Like all machine learning models, its predictions may reflect biases present in the training data.
124
 
125
+ ## Conclusion and Future Work
126
 
127
+ This model provides a highly effective tool for detecting AI-generated synthetic images and has promising applications in content moderation, digital forensics, and trust preservation. Future improvements may include:
128
+ - **Hybrid Architectures:** Combining transformers with convolutional layers for improved performance.
129
+ - **Multimodal Detection:** Incorporating additional modalities (e.g., metadata or contextual information) for more comprehensive detection.