--- license: apache-2.0 language: - en --- # MedM-VL-2D-3B-en ## Introduction A medical LVLM, trained on **English** data, accepts text and **a single 2D medical image** as input, and text-based results as output. enabling tasks such as **report generation**, **medical VQA**, **referring expression comprehension**, **referring expression generation** and **image classification**. Here are the evaluation results on **Uni-Med**:

Method	medmnist_derma	medmnist_organs	medpix	mimic	pathvqa	samed_identify	samed_refer	slake_identify	slake_refer	slakevqa
Med-Flamingo	1.15	8.90	8.14	23.25	33.38	-	-	-	-	21.51
RadFM	5.14	18.90	-	6.81	24.83	-	-	-	-	81.66
LLaVA-Med	25.84	66.80	15.11	20.43	37.79	45.83	8.64	27.21	4.07	33.69
MedM-VL-2D-3B-en	81.05	72.14	13.16	22.63	62.86	70.97	20.46	68.94	31.92	84.45

## Quickstart Please refer to [MedM-VL](https://github.com/MSIIP/MedM-VL).