Back Home
MLApp Repository (Computer Vision) for the Imperial FIFCO - SINAC - UCR Campaign
Developer: Joystick Data Team
First Public Version
Project Description
This application leverages computer vision models as part of the "De vuelta a casa" (Back Home) campaign sponsored by Imperial Beer. The project's goal is to process images and generate predictions regarding confiscated seashells at airports, aiming to identify their origin and facilitate their return to the appropriate beaches.
Due to the high volume of seashells and the limited number of experts available in the country, manual classification is unfeasible. This automated system utilizes artificial intelligence to provide an efficient and accurate solution, thus contributing to the conservation of marine ecosystems and environmental sustainability.
Repository Elements
- requirements.txt: File containing all necessary dependencies to run the project.
- Readme.txt: File with the model description.
- model_final.pth: File containing the model.
Frameworks and Libraries Used: Torch
Programming Language Used: Python
Model Description
This model is based on the ConvNeXt architecture and has been trained to classify images of seashells into two categories: Pacific and Caribbean. It is part of an effort to identify the origin of seashells confiscated at airports and facilitate their return to the corresponding beaches.
Architecture
The model uses an advanced convolution block structure based on the ConvNeXt architecture. This includes convolutional layers, normalization, and residual blocks designed for efficiency and performance.
Model Details
- Total Parameters: 27,819,361
- Trainable Parameters: 14,290,945
- Non-Trainable Parameters: 13,528,416
- Estimated Memory Size: 243.11 MB
- Total Mult-Adds: 321.60 MB
Model Structure
Expected Input: (1, 3, 224, 224) (1 RGB image with 224x224 resolution)
Main Layers:
- Multiple convolutional layers with normalization (Conv2dNormActivation)
- CN blocks for deep learning (CNBlock)
- Adaptive average pooling (AdaptiveAvgPool2d)
- Final linear classifier with sigmoid activation
Performance
- Designed to run on both CPU and GPU.
- Optimized for computational efficiency in image classification tasks.
Requirements
- Memory Required: Approximately 243 MB
- Input Size: (3, 224, 224) in tensor format
Additional Details
- Framework Used: PyTorch
- Model File Size: ~111 MB
- Capabilities: Suitable for binary classification with high accuracy
Model Architecture Visualization
ConvNeXt
├─Conv2dNormActivation
├─CNBlock
├─AdaptiveAvgPool2d
├─Dropout
├─Flatten
├─Linear
References
ConvNeXt: Revisiting Convolutions for Visual Recognition
Used
Load the model:
from transformers import AutoModelForImageClassification
model = AutoModelForImageClassification.from_pretrained("FIFCO/De_vuelta_a_casa")
Make predictions:
outputs = model(images)
predictions = torch.sigmoid(outputs.logits)
You can also use this model directly with the Transformers library. First, install the library:
pip install transformers
If your GPU supports it, we recommend using Flash Attention 2 for greater efficiency. You can install it with:
pip install flash-attn
Use the model with Transformers: You can load the model and make predictions as shown below:
Using AutoModel:
from transformers import AutoFeatureExtractor, AutoModelForImageClassification
from PIL import Image
import torch
# Set model id
model_id = "FIFCO/De_vuelta_a_casa_Clasificacion_imagenes_conchas"
feature_extractor = AutoFeatureExtractor.from_pretrained(model_id)
model = AutoModelForImageClassification.from_pretrained(model_id)
# Load your image
image = Image.open("path_to_your_image.jpg")
# Preprocess the image
inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
# Get the class with the highest probability
predicted_class = outputs.logits.argmax(dim=-1).item()
print(f"Predicted class: {predicted_class}")
Using pipeline
from transformers import pipeline
# Load the classification pipeline
pipe = pipeline(
"image-classification",
model="FIFCO/Back_Home_Image_Classification_Shells"
)
# Perform the classification
results = pipe("path_to_your_image.jpg")
print(results)
=== MODEL CONFIGURATION FOR A NEW TRAINING MODEL ===
from transformers import AutoModelForImageClassification, TrainingArguments, Trainer
from transformers import AutoFeatureExtractor
from datasets import load_dataset
import torch.nn as nn
import torch
# Load the base pretrained model
base_model = AutoModelForImageClassification.from_pretrained(
"FIFCO/De_vuelta_a_casa_Clasificacion_imagenes_conchas"
)
# Number of new categories/classes in the dataset
NUM_CLASSES = 10 # Adjust this according to your dataset
# Create a custom model with additional layers
class CustomImageClassifier(nn.Module):
def __init__(self, base_model, num_classes):
super(CustomImageClassifier, self).__init__()
self.base_model = base_model
self.custom_layers = nn.Sequential(
nn.Linear(base_model.classifier.out_features, 512), # Add a fully connected layer
nn.ReLU(), # ReLU activation
nn.Dropout(0.3), # Regularization
nn.Linear(512, num_classes) # Final layer for new classes
)
def forward(self, x):
x = self.base_model(x).logits # Output from the base model
x = self.custom_layers(x) # Pass through additional layers
return x
# Initialize the model with custom layers
custom_model = CustomImageClassifier(base_model, NUM_CLASSES)
print("Custom model created:")
print(custom_model)
# === TRAINING CONFIGURATION === #
# Define training arguments
training_args = TrainingArguments(
output_dir="./results", # Folder to save results
evaluation_strategy="epoch", # Evaluate at the end of each epoch
save_strategy="epoch", # Save the model at the end of each epoch
learning_rate=5e-5, # Learning rate
per_device_train_batch_size=16, # Training batch size
per_device_eval_batch_size=16, # Evaluation batch size
num_train_epochs=10, # Number of epochs
weight_decay=0.01, # L2 regularization
logging_dir="./logs", # Folder to save logs
logging_steps=10, # Logging frequency
save_total_limit=2, # Limit the number of saved models
load_best_model_at_end=True, # Load the best model at the end
)
# === LOAD DATASET === #
# Load the dataset from Hugging Face or a local directory
dataset = load_dataset("imagefolder", data_dir="path_to_your_dataset")
# Split the dataset into training and validation
train_dataset = dataset["train"]
val_dataset = dataset["validation"]
print("Dataset loaded:")
print(f"Training set: {len(train_dataset)} images")
print(f"Validation set: {len(val_dataset)} images")
# === TRAINING === #
# Configure the Trainer with the custom model and data
trainer = Trainer(
model=custom_model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=val_dataset,
)
# Train the model
print("Starting training...")
trainer.train()
# === SAVE THE FINE-TUNED MODEL === #
# Save the fine-tuned model
custom_model.save_pretrained("./custom_fine_tuned_model")
print("Fine-tuned model saved in './custom_fine_tuned_model'")
# === EVALUATE THE MODEL === #
# Evaluate the model on the validation set
results = trainer.evaluate()
print("Evaluation results:")
print(results)
# === USE THE FINE-TUNED MODEL === #
from transformers import pipeline
# Load the pipeline with the fine-tuned model
pipe = pipeline(
"image-classification",
model="./custom_fine_tuned_model"
)
# Classify a new image
image_path = "path_to_new_image.jpg"
results = pipe(image_path)
print(f"Classification results for {image_path}:")
print(results)
# === IMPORTANT NOTES === #
# - Adjust `NUM_CLASSES` according to your dataset.
# - Ensure that your images are organized into folders for each category.
# - If you have limited data, consider using data augmentation techniques to improve performance.
Requests
Pre-reqs
- Docker (if you want to use containers).
- Python 3.7+
Install
Clone the repositorio
git clone https://github.com/FIFCO/De_vuelta_a_casa.git
cd De_vuelta_a_casa