A newer version of the Gradio SDK is available:
5.20.1
metadata
title: Mila_Global_Moth_Classifier
app_file: gradio_demo.py
sdk: gradio
sdk_version: 4.42.0
Global Moth Model
Research related to the development of a global moth species classification model for automated moth monitoring.
Process
The below steps are carrried out to train a global model.
Checklist preparation
- Fetch Leps Checklist: Download the Lepidoptera taxonomy from GBIF (DOI).
- Fetch DwC-A: Fetch the Darwin Core Archive from GBIF for the order Lepidoptera (DOI).
- Curate Moth Checklist (
prepare_gbif_checklist.py
): Clean and curate the Lepidoptera checklist to have only moth species. Remove all non-species taxa and butterfly families. A curated list is here.
Dataset download and curation
The next steps to download and curate data are followed from here.
- Fetch GBIF images: Download the images from GBIF using the command
ami-dataset fetch-images
. An example slurm script with the argument options is provided (job_fetch_images.sh
). The DwC-A file requires about 300GB of RAM to be loaded. There should be smarter ways to load the archive file in (multiple?) smaller memory but we haven't explored it ourselves. - Verify images: Verify the downloaded images for corruption (
job_verify_images.sh
). - Delete corrupted images:
job_delete_images.sh
- Lifestage prediction: Run the lifestage prediction model on images without the lifestage tag. The purpose is to remove non-adult moth images from the dataset (
job_predict_lifestage.sh
). - Final clean dataset: Create the final list of images cleaned after image verification and lifestage prediction (
job_clean_dataset.sh
). - Dataset splits: Create dataset splits for model training (
job_split_dataset.sh
).