How did you evaluate BEIR?

#3
by freethenation - opened

I am trying to reproduce your BEIR results. Some results match exactly but others are lower than the numbers in your model card. What tool did you use to evaluate BEIR? I am currently using pyserini & the beir package.

opensearch-project org

Hi @freethenation , we are using OpenSearch as the evaluate engine. The max input length is 512 tokens. Please note that for some BEIR dataset, we need to filter out the query id from the search results, because for these datasets queries and documents are from the same space.

Sign up or log in to comment