MedM-VL-2D-3B-en

Introduction

A medical LVLM, trained on English data, accepts text and a single 2D medical image as input, and text-based results as output. enabling tasks such as report generation, medical VQA, referring expression comprehension, referring expression generation and image classification.

Here are the evaluation results on Uni-Med:

Method medmnist_derma medmnist_organs medpix mimic pathvqa samed_identify samed_refer slake_identify slake_refer slakevqa
Med-Flamingo 1.15 8.90 8.14 23.25 33.38 - - - - 21.51
RadFM 5.14 18.90 - 6.81 24.83 - - - - 81.66
LLaVA-Med 25.84 66.80 15.11 20.43 37.79 45.83 8.64 27.21 4.07 33.69
MedM-VL-2D-3B-en 81.05 72.14 13.16 22.63 62.86 70.97 20.46 68.94 31.92 84.45

Quickstart

Please refer to MedM-VL.

Downloads last month
4
Safetensors
Model size
3.65B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .