DeepakKumarMSL commited on
Commit
9ff2532
·
verified ·
1 Parent(s): fc891ec

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -0
README.md ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🧠 Image Classification AI Model (CIFAR-100)
2
+
3
+ This repository contains a Vision Transformer (ViT)-based AI model fine-tuned for **image classification** on the CIFAR-100 dataset. The model is built using `google/vit-base-patch16-224`, quantized to **FP16** for efficient inference, and delivers high accuracy in multi-class image classification tasks.
4
+
5
+ ---
6
+
7
+ ## 🚀 Features
8
+
9
+ - 🖼️ **Task**: Image Classification
10
+ - 🧠 **Base Model**: `google/vit-base-patch16-224` (Vision Transformer)
11
+ - 🧪 **Quantized**: FP16 for faster and memory-efficient inference
12
+ - 🎯 **Dataset**: 100 fine-grained object categories
13
+ - ⚡ **CUDA Enabled**: Optimized for GPU acceleration
14
+ - 📈 **High Accuracy**: Fine-tuned and evaluated on validation split
15
+
16
+ ---
17
+
18
+ ## 📊 Dataset Used
19
+
20
+ **Hugging Face Dataset**: [`tanganke/cifar100`](https://huggingface.co/datasets/tanganke/cifar100)
21
+
22
+ - **Description**: CIFAR-100 is a dataset of 60,000 32×32 color images in 100 classes (600 images per class)
23
+ - **Split**: 50,000 training images and 10,000 test images
24
+ - **Categories**: Animals, Vehicles, Food, Household items, etc.
25
+ - **License**: MIT License (from source)
26
+
27
+ ```python
28
+ from datasets import load_dataset
29
+
30
+ dataset = load_dataset("tanganke/cifar100")
31
+ ```
32
+
33
+ ## 🛠️ Model & Training Configuration
34
+
35
+ - Model: google/vit-base-patch16-224
36
+
37
+ - Image Size: 224x224 (resized from 32x32)
38
+
39
+ - Framework: Hugging Face Transformers & Datasets
40
+
41
+ - Training Environment: Kaggle Notebook with CUDA
42
+
43
+ - Epochs: 5–10 (with early stopping)
44
+
45
+ - Batch Size: 32
46
+
47
+ - Optimizer: AdamW
48
+
49
+ - Loss Function: CrossEntropyLoss
50
+
51
+ # ✅ Evaluation & Scoring
52
+
53
+ - Accuracy: ~70–80% (varies by configuration)
54
+
55
+ - Validation Tool: evaluate or sklearn.metrics
56
+
57
+ - Metric: Accuracy, Top-1 and Top-5 scores
58
+
59
+ - Inference Speed: Significantly faster after quantizationextractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224")
60
+
61
+ # 🔍 Inference Example
62
+
63
+ ```python
64
+ from PIL import Image
65
+ import torch
66
+
67
+ def predict(image_path):
68
+ image = Image.open(image_path).convert("RGB")
69
+ inputs = feature_extractor(images=image, return_tensors="pt").to("cuda")
70
+ outputs = model(**inputs)
71
+ logits = outputs.logits
72
+ predicted_class = logits.argmax(-1).item()
73
+ return dataset["train"].features["fine_label"].int2str(predicted_class)
74
+
75
+ print(predict("sample_image.jpg"))
76
+ ```
77
+
78
+ # 📁 Folder Structure
79
+
80
+ 📦image-classification-vit
81
+ ┣ 📂vit-cifar100-fp16
82
+ ┣ 📜train.py
83
+ ┣ 📜inference.py
84
+ ┣ 📜README.md
85
+ ┗ 📜requirements.txt
86
+
87
+