--- license: apache-2.0 language: - en --- # MedM-VL-2D-3B-en ## Introduction A medical LVLM, trained on **English** data, accepts text and **a single 2D medical image** as input, and text-based results as output. enabling tasks such as **report generation**, **medical VQA**, **referring expression comprehension**, **referring expression generation** and **image classification**. Here are the evaluation results on **Uni-Med**:
Method | medmnist_derma | medmnist_organs | medpix | mimic | pathvqa | samed_identify | samed_refer | slake_identify | slake_refer | slakevqa |
Med-Flamingo | 1.15 | 8.90 | 8.14 | 23.25 | 33.38 | - | - | - | - | 21.51 |
RadFM | 5.14 | 18.90 | - | 6.81 | 24.83 | - | - | - | - | 81.66 |
LLaVA-Med | 25.84 | 66.80 | 15.11 | 20.43 | 37.79 | 45.83 | 8.64 | 27.21 | 4.07 | 33.69 |
MedM-VL-2D-3B-en | 81.05 | 72.14 | 13.16 | 22.63 | 62.86 | 70.97 | 20.46 | 68.94 | 31.92 | 84.45 |