Alibaba-NLP/gme-Qwen2-VL-7B-Instruct

Dec 25, 2024

Great work!!!!
@zyznull it would be better is you can release the UMR benchmark, thanks.

Alibaba-NLP org Dec 25, 2024

Thanks for your attention. We are in the process of integrating the UMRB behchmark into MTEB. We will notice you as soon as it is ready.

izhx

Alibaba-NLP org Dec 25, 2024

Now, all tasks already exist in the mieb branch https://github.com/embeddings-benchmark/mteb/tree/mieb
Task name list refer to https://github.com/izhx/mteb/blob/mieb/mteb/benchmarks/benchmarks.py#L864
But the ViDoRe subsets have some inconsistencies and we are working on fixes https://github.com/embeddings-benchmark/mteb/pull/1607

izhx

Alibaba-NLP org about 15 hours ago

Now MIBE has been merged into the main branch, you can directly use mteb tool to evaluate, the task list:

umrb_tasks=[
    # Single-modal
    ## Text -to- text
    'ArguAna', 'ClimateFEVER',
    'CQADupstackAndroidRetrieval',
    'CQADupstackEnglishRetrieval',
    'CQADupstackGamingRetrieval',
    'CQADupstackGisRetrieval',
    'CQADupstackMathematicaRetrieval',
    'CQADupstackPhysicsRetrieval',
    'CQADupstackProgrammersRetrieval',
    'CQADupstackStatsRetrieval',
    'CQADupstackTexRetrieval',
    'CQADupstackUnixRetrieval',
    'CQADupstackWebmastersRetrieval',
    'CQADupstackWordpressRetrieval',
    'DBPedia', 'FEVER', 'FiQA2018', 'HotpotQA', 'MSMARCO', 'NFCorpus', 'NQ',
    'QuoraRetrieval', 'SCIDOCS', 'SciFact', 'Touche2020', 'TRECCOVID',
    'WebQAT2TRetrieval',
    ## Image -to- image
    'NIGHTSI2IRetrieval',
    # Cross-modal
    ## Text -to- image
    'VisualNewsT2IRetrieval', 'Fashion200kT2IRetrieval', 'MSCOCOT2IRetrieval', 'Flickr30kT2IRetrieval',
    ## Text -to- visual document
    'VidoreArxivQARetrieval', 'VidoreDocVQARetrieval', 'VidoreInfoVQARetrieval',
    'VidoreTabfquadRetrieval', 'VidoreTatdqaRetrieval', 'VidoreShiftProjectRetrieval',
    'VidoreSyntheticDocQAAIRetrieval', 'VidoreSyntheticDocQAEnergyRetrieval',
    'VidoreSyntheticDocQAGovernmentReportsRetrieval', 'VidoreSyntheticDocQAHealthcareIndustryRetrieval',
    ## Image -to- text
    'VisualNewsI2TRetrieval', 'Fashion200kI2TRetrieval', 'MSCOCOI2TRetrieval', 'Flickr30kI2TRetrieval',
    # Fused-modal
    ## Text -to- image,text
    'WebQAT2ITRetrieval', 'EDIST2ITRetrieval',
    ## Image,text -to- text
    'OVENIT2TRetrieval', 'InfoSeekIT2TRetrieval',
    'ReMuQIT2TRetrieval', 'OKVQAIT2TRetrieval', 'LLaVAIT2TRetrieval',
    ## Image,text -to- image
    'FashionIQIT2IRetrieval', 'CIRRIT2IRetrieval',
    ## Text,image -to- text,image
    'OVENIT2ITRetrieval', 'InfoSeekIT2ITRetrieval', 'EncyclopediaVQAIT2ITRetrieval'
]

## Evaluation script
import mteb

# Define the model name
model_name = "Alibaba-NLP/gme-Qwen2-VL-2B-Instruct"

model = mteb.get_model(model_name)
tasks = mteb.get_tasks(tasks=umrb_tasks)
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(model, output_folder=f"results/{model_name}")

izhx pinned discussion about 15 hours ago

Alibaba-NLP
/

gme-Qwen2-VL-7B-Instruct

Release the UMRB