Open LLM Leaderboard results

#3
by SaisExperiments - opened

Why does this model score so different on the official leaderboard?
Screenshot 2024-10-08 235103.png
Screenshot 2024-10-08 235148.png

GQtNdAaoXZXwf4noU883B-1.png


Metric Model Card Results Leaderboard Results
Avg. 29.79 15.4
IFEval (0-Shot) 32.12 31.54
BBH (3-Shot) 42.23 19.53
MATH Lvl 5 (4-Shot) 8.16 7.63
GPQA (0-shot) 27.10 3.69
MuSR (0-shot) 40.61 9.41
MMLU-PRO (5-shot) 28.49 20.6
Arcee AI org

Hello, this is a confusion that I made during the report (forgot to normalize the score, im adding the new score from the leaderboard rn)

qnguyen3 changed discussion status to closed

Sign up or log in to comment