MedM-VL-2D-3B-en
Introduction
A medical LVLM, trained on English data, accepts text and a single 2D medical image as input, and text-based results as output. enabling tasks such as report generation, medical VQA, referring expression comprehension, referring expression generation and image classification.
Here are the evaluation results on Uni-Med:
Method | medmnist_derma | medmnist_organs | medpix | mimic | pathvqa | samed_identify | samed_refer | slake_identify | slake_refer | slakevqa |
Med-Flamingo | 1.15 | 8.90 | 8.14 | 23.25 | 33.38 | - | - | - | - | 21.51 |
RadFM | 5.14 | 18.90 | - | 6.81 | 24.83 | - | - | - | - | 81.66 |
LLaVA-Med | 25.84 | 66.80 | 15.11 | 20.43 | 37.79 | 45.83 | 8.64 | 27.21 | 4.07 | 33.69 |
MedM-VL-2D-3B-en | 81.05 | 72.14 | 13.16 | 22.63 | 62.86 | 70.97 | 20.46 | 68.94 | 31.92 | 84.45 |
Quickstart
Please refer to MedM-VL.
- Downloads last month
- 4