De_vuelta_a_casa / README.md
alexxxcr's picture
Update README.md
30c4be4 verified
|
raw
history blame
4.92 kB
metadata
license: mit
language:
  - en
pipeline_tag: image-classification
tags:
  - biology

Back Home

MLApp Repository (Computer Vision) for the Imperial FIFCO - SINAC - UCR Campaign

Developer: Joystick Data Team

First Public Version


Project Description

This application leverages computer vision models as part of the "De vuelta a casa" (Back Home) campaign sponsored by Imperial Beer. The project's goal is to process images and generate predictions regarding confiscated seashells at airports, aiming to identify their origin and facilitate their return to the appropriate beaches.

Due to the high volume of seashells and the limited number of experts available in the country, manual classification is unfeasible. This automated system utilizes artificial intelligence to provide an efficient and accurate solution, thus contributing to the conservation of marine ecosystems and environmental sustainability.


Repository Elements

  • requirements.txt: File containing all necessary dependencies to run the project.
  • Readme.txt: File with the model description.
  • model_final.pth: File containing the model.

Frameworks and Libraries Used: Torch

My Skills

Programming Language Used: Python

My Skills


Model Description

This model is based on the ConvNeXt architecture and has been trained to classify images of seashells into two categories: Pacific and Caribbean. It is part of an effort to identify the origin of seashells confiscated at airports and facilitate their return to the corresponding beaches.

Architecture

The model uses an advanced convolution block structure based on the ConvNeXt architecture. This includes convolutional layers, normalization, and residual blocks designed for efficiency and performance.

Model Details

  • Total Parameters: 27,819,361
  • Trainable Parameters: 14,290,945
  • Non-Trainable Parameters: 13,528,416
  • Estimated Memory Size: 243.11 MB
  • Total Mult-Adds: 321.60 MB

Model Structure

Expected Input: (1, 3, 224, 224) (1 RGB image with 224x224 resolution)

Main Layers:

  • Multiple convolutional layers with normalization (Conv2dNormActivation)
  • CN blocks for deep learning (CNBlock)
  • Adaptive average pooling (AdaptiveAvgPool2d)
  • Final linear classifier with sigmoid activation

Performance

  • Designed to run on both CPU and GPU.
  • Optimized for computational efficiency in image classification tasks.

Requirements

  • Memory Required: Approximately 243 MB
  • Input Size: (3, 224, 224) in tensor format

Additional Details

  • Framework Used: PyTorch
  • Model File Size: ~111 MB
  • Capabilities: Suitable for binary classification with high accuracy

Model Architecture Visualization

ConvNeXt
├─Conv2dNormActivation
├─CNBlock
├─AdaptiveAvgPool2d
├─Dropout
├─Flatten
├─Linear

References

ConvNeXt: Revisiting Convolutions for Visual Recognition

Used

Load the model:

from transformers import AutoModelForImageClassification
model = AutoModelForImageClassification.from_pretrained("FIFCO/De_vuelta_a_casa")

Make predictions:

outputs = model(images)
predictions = torch.sigmoid(outputs.logits)

You can also use this model directly with the Transformers library. First, install the library:

pip install transformers

If your GPU supports it, we recommend using Flash Attention 2 for greater efficiency. You can install it with:

pip install flash-attn

Use the model with Transformers: You can load the model and make predictions as shown below:

Using AutoModel:

from transformers import AutoFeatureExtractor, AutoModelForImageClassification
from PIL import Image
import torch

# Set model id
model_id = "FIFCO/De_vuelta_a_casa_Clasificacion_imagenes_conchas"
feature_extractor = AutoFeatureExtractor.from_pretrained(model_id)
model = AutoModelForImageClassification.from_pretrained(model_id)

# Load your image
image = Image.open("path_to_your_image.jpg")

# Preprocess the image
inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)

# Get the class with the highest probability
predicted_class = outputs.logits.argmax(dim=-1).item()
print(f"Predicted class: {predicted_class}")

Using pipeline

from transformers import pipeline

# Load the classification pipeline
pipe = pipeline(
    "image-classification",
    model="FIFCO/Back_Home_Image_Classification_Shells"
)

# Perform the classification
results = pipe("path_to_your_image.jpg")
print(results)

Requests

Pre-reqs

  • Docker (if you want to use containers).
  • Python 3.7+

Install

Clone the repository

git clone https://github.com/FIFCO/De_vuelta_a_casa.git
cd De_vuelta_a_casa