Empty main benchmark scores

#1007
by ymcki - opened

I am getting empty results for the main benchmarks from my recent two submissions:
https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/ymcki/gemma-2-2b-ORPO-jpn-it-abliterated-18/results_2024-11-04T02-06-36.084992.json
https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/ymcki/gemma-2-2b-ORPO-jpn-it-abliterated-18-merge/results_2024-11-04T02-06-46.701689.json

Previous submissions about five days were good. What's going on?

Is it possible to calculate the main benchmark scores from the sub scores?

Open LLM Leaderboard org

Hi @ymcki ,

Thank you for reporting!

We've updated the Harness version for our evaluations and encounter this issue – everything is correct know, so I'll resubmit your models to get correct results

Thank you for your patience!

alozowski changed discussion status to closed

Sign up or log in to comment