Release the UMRB
#2
pinned
by
BestWishYsh
- opened
Great work!!!!
@zyznull
it would be better is you can release the UMR benchmark, thanks.
Thanks for your attention. We are in the process of integrating the UMRB behchmark into MTEB. We will notice you as soon as it is ready.
Now, all tasks already exist in the mieb branch https://github.com/embeddings-benchmark/mteb/tree/mieb
Task name list refer to https://github.com/izhx/mteb/blob/mieb/mteb/benchmarks/benchmarks.py#L864
But the ViDoRe subsets have some inconsistencies and we are working on fixes https://github.com/embeddings-benchmark/mteb/pull/1607
Now MIBE has been merged into the main branch, you can directly use mteb
tool to evaluate, the task list:
umrb_tasks=[
# Single-modal
## Text -to- text
'ArguAna', 'ClimateFEVER',
'CQADupstackAndroidRetrieval',
'CQADupstackEnglishRetrieval',
'CQADupstackGamingRetrieval',
'CQADupstackGisRetrieval',
'CQADupstackMathematicaRetrieval',
'CQADupstackPhysicsRetrieval',
'CQADupstackProgrammersRetrieval',
'CQADupstackStatsRetrieval',
'CQADupstackTexRetrieval',
'CQADupstackUnixRetrieval',
'CQADupstackWebmastersRetrieval',
'CQADupstackWordpressRetrieval',
'DBPedia', 'FEVER', 'FiQA2018', 'HotpotQA', 'MSMARCO', 'NFCorpus', 'NQ',
'QuoraRetrieval', 'SCIDOCS', 'SciFact', 'Touche2020', 'TRECCOVID',
'WebQAT2TRetrieval',
## Image -to- image
'NIGHTSI2IRetrieval',
# Cross-modal
## Text -to- image
'VisualNewsT2IRetrieval', 'Fashion200kT2IRetrieval', 'MSCOCOT2IRetrieval', 'Flickr30kT2IRetrieval',
## Text -to- visual document
'VidoreArxivQARetrieval', 'VidoreDocVQARetrieval', 'VidoreInfoVQARetrieval',
'VidoreTabfquadRetrieval', 'VidoreTatdqaRetrieval', 'VidoreShiftProjectRetrieval',
'VidoreSyntheticDocQAAIRetrieval', 'VidoreSyntheticDocQAEnergyRetrieval',
'VidoreSyntheticDocQAGovernmentReportsRetrieval', 'VidoreSyntheticDocQAHealthcareIndustryRetrieval',
## Image -to- text
'VisualNewsI2TRetrieval', 'Fashion200kI2TRetrieval', 'MSCOCOI2TRetrieval', 'Flickr30kI2TRetrieval',
# Fused-modal
## Text -to- image,text
'WebQAT2ITRetrieval', 'EDIST2ITRetrieval',
## Image,text -to- text
'OVENIT2TRetrieval', 'InfoSeekIT2TRetrieval',
'ReMuQIT2TRetrieval', 'OKVQAIT2TRetrieval', 'LLaVAIT2TRetrieval',
## Image,text -to- image
'FashionIQIT2IRetrieval', 'CIRRIT2IRetrieval',
## Text,image -to- text,image
'OVENIT2ITRetrieval', 'InfoSeekIT2ITRetrieval', 'EncyclopediaVQAIT2ITRetrieval'
]
## Evaluation script
import mteb
# Define the model name
model_name = "Alibaba-NLP/gme-Qwen2-VL-2B-Instruct"
model = mteb.get_model(model_name)
tasks = mteb.get_tasks(tasks=umrb_tasks)
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(model, output_folder=f"results/{model_name}")
izhx
pinned discussion