Release the UMRB

#2
by BestWishYsh - opened

Great work!!!!
@zyznull it would be better is you can release the UMR benchmark, thanks.

Alibaba-NLP org

Thanks for your attention. We are in the process of integrating the UMRB behchmark into MTEB. We will notice you as soon as it is ready.

Alibaba-NLP org

Now, all tasks already exist in the mieb branch https://github.com/embeddings-benchmark/mteb/tree/mieb
Task name list refer to https://github.com/izhx/mteb/blob/mieb/mteb/benchmarks/benchmarks.py#L864
But the ViDoRe subsets have some inconsistencies and we are working on fixes https://github.com/embeddings-benchmark/mteb/pull/1607

Alibaba-NLP org

Now MIBE has been merged into the main branch, you can directly use mteb tool to evaluate, the task list:

umrb_tasks=[
    # Single-modal
    ## Text -to- text
    'ArguAna', 'ClimateFEVER',
    'CQADupstackAndroidRetrieval',
    'CQADupstackEnglishRetrieval',
    'CQADupstackGamingRetrieval',
    'CQADupstackGisRetrieval',
    'CQADupstackMathematicaRetrieval',
    'CQADupstackPhysicsRetrieval',
    'CQADupstackProgrammersRetrieval',
    'CQADupstackStatsRetrieval',
    'CQADupstackTexRetrieval',
    'CQADupstackUnixRetrieval',
    'CQADupstackWebmastersRetrieval',
    'CQADupstackWordpressRetrieval',
    'DBPedia', 'FEVER', 'FiQA2018', 'HotpotQA', 'MSMARCO', 'NFCorpus', 'NQ',
    'QuoraRetrieval', 'SCIDOCS', 'SciFact', 'Touche2020', 'TRECCOVID',
    'WebQAT2TRetrieval',
    ## Image -to- image
    'NIGHTSI2IRetrieval',
    # Cross-modal
    ## Text -to- image
    'VisualNewsT2IRetrieval', 'Fashion200kT2IRetrieval', 'MSCOCOT2IRetrieval', 'Flickr30kT2IRetrieval',
    ## Text -to- visual document
    'VidoreArxivQARetrieval', 'VidoreDocVQARetrieval', 'VidoreInfoVQARetrieval',
    'VidoreTabfquadRetrieval', 'VidoreTatdqaRetrieval', 'VidoreShiftProjectRetrieval',
    'VidoreSyntheticDocQAAIRetrieval', 'VidoreSyntheticDocQAEnergyRetrieval',
    'VidoreSyntheticDocQAGovernmentReportsRetrieval', 'VidoreSyntheticDocQAHealthcareIndustryRetrieval',
    ## Image -to- text
    'VisualNewsI2TRetrieval', 'Fashion200kI2TRetrieval', 'MSCOCOI2TRetrieval', 'Flickr30kI2TRetrieval',
    # Fused-modal
    ## Text -to- image,text
    'WebQAT2ITRetrieval', 'EDIST2ITRetrieval',
    ## Image,text -to- text
    'OVENIT2TRetrieval', 'InfoSeekIT2TRetrieval',
    'ReMuQIT2TRetrieval', 'OKVQAIT2TRetrieval', 'LLaVAIT2TRetrieval',
    ## Image,text -to- image
    'FashionIQIT2IRetrieval', 'CIRRIT2IRetrieval',
    ## Text,image -to- text,image
    'OVENIT2ITRetrieval', 'InfoSeekIT2ITRetrieval', 'EncyclopediaVQAIT2ITRetrieval'
]

## Evaluation script
import mteb

# Define the model name
model_name = "Alibaba-NLP/gme-Qwen2-VL-2B-Instruct"

model = mteb.get_model(model_name)
tasks = mteb.get_tasks(tasks=umrb_tasks)
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(model, output_folder=f"results/{model_name}")
izhx pinned discussion

Sign up or log in to comment