Image-Text-to-Text
PEFT
Safetensors
English

IDEFICS3_ROCO

StageLicenseContributors WelcomeOpen In Colab

Star the project

If you appreciate my work, please consider giving it a like! 🤩
I'm also looking for donations of free GPU time to complete the fine-tuning process.
Please contact me if you can help! 🙏

A Fine-tuned Radiology-focused Model based on Hugging Face's Idefics3 Model

This repository contains a fine-tuned version of the Hugging Face Idefics3-8B-Llama3 model, built on top of the Meta Llama 3.1 8B architecture. Our model, IDEFICS3_ROCO, has been fine-tuned on the Radiology Objects in Context (ROCO) dataset, a large-scale medical and multimodal imaging collection.

TL;DR

For immediate use, you can load the model directly from Hugging Face:

from transformers import AutoProcessor, Idefics3ForConditionalGeneration, image_utils
import torch
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') # on CPU it requires ≈ 3h/query 🙈
processor = AutoProcessor.from_pretrained(v)
model = Idefics3ForConditionalGeneration.from_pretrained(
        v, torch_dtype=torch.bfloat16
    ).to(device)

model.load_adapter("eltorio/IDEFICS3_ROCO")

Model Information

  • Base Model: Idefics3-8B-Llama3
  • Fine-tuning Dataset: Radiology Objects in Context (ROCO)
  • License: Apache-2.0
  • Current Status: Fine-tuning process is finished. Contributions to complete the fine-tuning / vallidation / test processes are welcome!

Training Progress Status

  • Current checkpoint: 12267 (100% completed)
  • Estimated remaining GPU time: 0 hours
  • Hardware requirements: T4 GPU with >16GB VRAM
  • Last update: november, 12th 2024

Fine-tuning Code

The fine-tuning code is available as a Jupyter Notebook in the ROCO-radiology dataset repository on Hugging Face:

The Junyper Notebook Open In Colab contains the code to fine-tune the Idefics3-8B-Llama3 model on the ROCO dataset. The fine-tuning process is currently halted at checkpoint 640 (out of 24,000) due to limitations with Colab Free T4 GPU unit. Contributions to complete the fine-tuning process are welcome!

Contributions Welcome

If you have the resources to complete the fine-tuning process, we would appreciate your contribution. Please fork this repository, finish the fine-tuning process, and submit a pull request with your updates.

Citation

If you use this model in your work, please cite the original Idefics3 model and our fine-tuned model:

Contribution Guide

  1. Technical Requirements

    • Access to powerful GPU (T4, V100, A100 or equivalent)
    • Python environment with PyTorch
    • Disk space: ~100GB
  2. Getting Started

  3. Contact

Docker Image

A AI training docker image is available for this model. The image and includes all necessary dependencies to run the fine-tuning process.
You need to set the HF_TOKEN environment variable to your Hugging Face API token.
You also need to have NVidia Docker container runtime installed. Finnaly, you need to run the container with GPU support with --gpus all option. The image is available on Docker Hub:

export HF_TOKEN=hf_some_token
docker run --gpus all --user=42420:42420 -e HF_TOKEN=$HF_TOKEN -it sctg/roco-idefics3:latest bash -i  /start.sh $HF_TOKEN

The Dockerfile is available in the IDEFICS_ROCO repository.

Use this model

According to the Apache license you should cite this model with:

@misc {ronan_l.m._2024,
    author       = { {Ronan L.M.} },
    title        = { IDEFICS3_ROCO (Revision b02598a) },
    year         = 2024,
    url          = { https://huggingface.co/eltorio/IDEFICS3_ROCO },
    doi          = { 10.57967/hf/3504 },
    publisher    = { Hugging Face }
}

Acknowledgments

This work was made possible by the Hugging Face Transformers library and the ROCO-radiology dataset.

Downloads last month
340
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support image-text-to-text models for peft library.

Model tree for eltorio/IDEFICS3_ROCO

Adapter
(15)
this model

Dataset used to train eltorio/IDEFICS3_ROCO

Spaces using eltorio/IDEFICS3_ROCO 7