Model Card for Llama-3.2 11b Vision Medical

This is a vision-language model fine-tuned for radiographic image analysis
Foundation Model: https://huggingface.co/unsloth/Llama-3.2-11B-Vision-Instruct
Dataset: https://huggingface.co/datasets/eltorio/ROCOv2-radiology

The model has been fine-tuned using CUDA-enabled GPU hardware.

Model Details

The model is based upon the foundation model: unsloth/Llama-3.2-11B-Vision-Instruct.
It has been tuned with Supervised Fine-tuning Trainer and PEFT LoRA with vision-language capabilities.

Libraries

unsloth
transformers
torch
datasets
trl
peft

Bias, Risks, and Limitations

To optimize training efficiency, the model has been trained on a subset of the ROCOv2-radiology dataset (1/7th of the total dataset).

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
The model's performance is directly dependent on the quality and diversity of the training data. Medical diagnosis should always be performed by qualified healthcare professionals.
Generation of plausible yet incorrect medical interpretations could occur and should not be used as the sole basis for clinical decisions.

Training Details

Training Parameters

per_device_train_batch_size = 2
gradient_accumulation_steps = 16
num_train_epochs = 3
learning_rate = 5e-5
weight_decay = 0.02
lr_scheduler_type = "linear"
max_seq_length = 2048

LoRA Configuration

r = 32
lora_alpha = 32
lora_dropout = 0
bias = "none"

Hardware Requirements

The model was trained using CUDA-enabled GPU hardware.

Training Statistics

Training duration: 40,989 seconds (approximately 683 minutes)
Peak reserved memory: 12.8 GB
Peak reserved memory for training: 3.975 GB
Peak reserved memory % of max memory: 32.3%
Peak reserved memory for training % of max memory: 10.1%

Training Data

The model was trained on the ROCOv2-radiology dataset, which contains radiographic images and their corresponding medical descriptions. .

The training set was reduced to 1/7th of the original size for computational efficiency.

Usage

The model is designed to provide detailed descriptions of radiographic images. It can be prompted with:

instruction = "You are an expert radiographer. Describe accurately what you see in this image."

Model Access

The model is available on Hugging Face Hub at: bouthros/llma32_11b_vision_medical

Citation

If you use this model, please cite the original ROCOv2-radiology dataset and the Llama-3.2-11B-Vision-Instruct base model.

bouthros
/

llama32_11b_vision_medical_finetune