Vision Transformer (ViT) for Female Age Classification π©
This repository contains a Vision Transformer (ViT) model trained to classify the age of women based on facial images. The model predicts the following age groups: Child, Teen, Woman, and Senior.
Model Overview π―
- Model Type: Vision Transformer (ViT)
- Task: Age classification from female facial images
- Age Categories:
- Child: Ages 0-12
- Teen: Ages 13-19
- Woman: Ages 20-49
- Senior: Ages 50+
- Input Image Size: 224x224 pixels
- Preprocessing: Images are resized to 224x224 pixels and converted to RGB format.
- Framework: Hugging Face
transformers
How to Use the Model π
You can run the model using the following Python code:
1. Install Required Packages
Make sure to install the necessary packages:
pip install transformers torch torchvision pillow opencv-python
2. Run the Model
Hereβs an example of how to load the model and classify an image:
from transformers import pipeline
import cv2
from PIL import Image
# Load the pre-trained model from Hugging Face Hub
model_dir = 'anggiatm/vit-female-age-classification'
female_age = pipeline("image-classification", model=model_dir, device='cuda') # Use CUDA if available
# Path to your image file
file_path = '/path/to/your/image.jpg'
# Load and preprocess the image
img = cv2.imread(file_path)
if img is not None:
# Convert the image from BGR to RGB format
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# Resize the image to 224x224
img_resized = cv2.resize(img_rgb, (224, 224))
# Convert to PIL image format
pil_image = Image.fromarray(img_resized)
# Run the image through the model
female_age_preds = female_age(pil_image)
# Get the label and confidence score
label = female_age_preds[0]['label']
score = female_age_preds[0]['score'] * 100
# Print the result
print(f"{file_path:<23} {label:<10} {score:>10.2f}%")
3. Example Output
/path/to/your/image.jpg Woman 95.72%
Dataset Information π
The model was trained on a dataset of female facial images categorized into the following age groups:
- Child: 0-12 years
- Teen: 13-19 years
- Woman: 20-49 years
- Senior: 50+ years
Each image is resized to 224x224 pixels before being fed into the model.
You can access the dataset here.
Caution β οΈ
Please note that while this model performs well in most cases, there may be occasional misclassifications. The dataset was labeled manually, and some age groups may overlap or be inaccurately represented in certain images. Use the model results with this caution in mind.
License π
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgements π
This model was built using the Hugging Face transformers library and Vision Transformer (ViT) architecture.
Happy Coding! π
- Downloads last month
- 16