MedM-VL-2D-3B-en

Introduction

A medical LVLM, trained on English data, accepts text and a single 2D medical image as input, and text-based results as output. enabling tasks such as report generation, medical VQA, referring expression comprehension, referring expression generation and image classification.

Here are the evaluation results on Uni-Med:

Method	medmnist_derma	medmnist_organs	medpix	mimic	pathvqa	samed_identify	samed_refer	slake_identify	slake_refer	slakevqa
Med-Flamingo	1.15	8.90	8.14	23.25	33.38	-	-	-	-	21.51
RadFM	5.14	18.90	-	6.81	24.83	-	-	-	-	81.66
LLaVA-Med	25.84	66.80	15.11	20.43	37.79	45.83	8.64	27.21	4.07	33.69
MedM-VL-2D-3B-en	85.49	80.68	14.45	25.50	64.06	74.11	26.42	82.94	33.51	85.86

Quickstart

Please refer to MedM-VL.