Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,87 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# 🧠 Image Classification AI Model (CIFAR-100)
|
2 |
+
|
3 |
+
This repository contains a Vision Transformer (ViT)-based AI model fine-tuned for **image classification** on the CIFAR-100 dataset. The model is built using `google/vit-base-patch16-224`, quantized to **FP16** for efficient inference, and delivers high accuracy in multi-class image classification tasks.
|
4 |
+
|
5 |
+
---
|
6 |
+
|
7 |
+
## 🚀 Features
|
8 |
+
|
9 |
+
- 🖼️ **Task**: Image Classification
|
10 |
+
- 🧠 **Base Model**: `google/vit-base-patch16-224` (Vision Transformer)
|
11 |
+
- 🧪 **Quantized**: FP16 for faster and memory-efficient inference
|
12 |
+
- 🎯 **Dataset**: 100 fine-grained object categories
|
13 |
+
- ⚡ **CUDA Enabled**: Optimized for GPU acceleration
|
14 |
+
- 📈 **High Accuracy**: Fine-tuned and evaluated on validation split
|
15 |
+
|
16 |
+
---
|
17 |
+
|
18 |
+
## 📊 Dataset Used
|
19 |
+
|
20 |
+
**Hugging Face Dataset**: [`tanganke/cifar100`](https://huggingface.co/datasets/tanganke/cifar100)
|
21 |
+
|
22 |
+
- **Description**: CIFAR-100 is a dataset of 60,000 32×32 color images in 100 classes (600 images per class)
|
23 |
+
- **Split**: 50,000 training images and 10,000 test images
|
24 |
+
- **Categories**: Animals, Vehicles, Food, Household items, etc.
|
25 |
+
- **License**: MIT License (from source)
|
26 |
+
|
27 |
+
```python
|
28 |
+
from datasets import load_dataset
|
29 |
+
|
30 |
+
dataset = load_dataset("tanganke/cifar100")
|
31 |
+
```
|
32 |
+
|
33 |
+
## 🛠️ Model & Training Configuration
|
34 |
+
|
35 |
+
- Model: google/vit-base-patch16-224
|
36 |
+
|
37 |
+
- Image Size: 224x224 (resized from 32x32)
|
38 |
+
|
39 |
+
- Framework: Hugging Face Transformers & Datasets
|
40 |
+
|
41 |
+
- Training Environment: Kaggle Notebook with CUDA
|
42 |
+
|
43 |
+
- Epochs: 5–10 (with early stopping)
|
44 |
+
|
45 |
+
- Batch Size: 32
|
46 |
+
|
47 |
+
- Optimizer: AdamW
|
48 |
+
|
49 |
+
- Loss Function: CrossEntropyLoss
|
50 |
+
|
51 |
+
# ✅ Evaluation & Scoring
|
52 |
+
|
53 |
+
- Accuracy: ~70–80% (varies by configuration)
|
54 |
+
|
55 |
+
- Validation Tool: evaluate or sklearn.metrics
|
56 |
+
|
57 |
+
- Metric: Accuracy, Top-1 and Top-5 scores
|
58 |
+
|
59 |
+
- Inference Speed: Significantly faster after quantizationextractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224")
|
60 |
+
|
61 |
+
# 🔍 Inference Example
|
62 |
+
|
63 |
+
```python
|
64 |
+
from PIL import Image
|
65 |
+
import torch
|
66 |
+
|
67 |
+
def predict(image_path):
|
68 |
+
image = Image.open(image_path).convert("RGB")
|
69 |
+
inputs = feature_extractor(images=image, return_tensors="pt").to("cuda")
|
70 |
+
outputs = model(**inputs)
|
71 |
+
logits = outputs.logits
|
72 |
+
predicted_class = logits.argmax(-1).item()
|
73 |
+
return dataset["train"].features["fine_label"].int2str(predicted_class)
|
74 |
+
|
75 |
+
print(predict("sample_image.jpg"))
|
76 |
+
```
|
77 |
+
|
78 |
+
# 📁 Folder Structure
|
79 |
+
|
80 |
+
📦image-classification-vit
|
81 |
+
┣ 📂vit-cifar100-fp16
|
82 |
+
┣ 📜train.py
|
83 |
+
┣ 📜inference.py
|
84 |
+
┣ 📜README.md
|
85 |
+
┗ 📜requirements.txt
|
86 |
+
|
87 |
+
|