--- license: apache-2.0 language: - en --- # MedM-VL-CT-3B-en ## Introduction A medical LVLM, trained on **English** data, accepts text and **a single 2D medical image** as input, and text-based results as output. enabling tasks such as **report generation**, **medical VQA**, **referring expression comprehension**, **referring expression generation** and **image classification**. Here are the evaluation results on **Uni-Med**:
Method medmnist_derma medmnist_organs medpix mimic pathvqa samed_identify samed_refer slake_identify slake_refer slakevqa
Med-Flamingo 1.15 8.90 8.14 23.25 33.38 - - - - 21.51
RadFM 5.14 18.90 - 6.81 24.83 - - - - 81.66
LLaVA-Med 25.84 66.80 15.11 20.43 37.79 45.83 8.64 27.21 4.07 33.69
MedM-VL-2D-3B-en 81.05 72.14 13.16 22.63 62.86 70.97 20.46 68.94 31.92 84.45
## Quickstart Please refer to [MedM-VL](https://github.com/MSIIP/MedM-VL).