judgerbench_leaderboard / data /detail_b_corr.csv
linjunyao
added leaderboard data; added Class coloring
0bb476f
raw
history blame contribute delete
466 Bytes
Models,AlignBench,Fofo,WildBench,ArenaHard,Average,Class
CJ-1-32B,0.973,0.951,0.954,0.975,0.963,Judge
CJ-1-14B,0.966,0.956,0.965,0.951,0.959,Judge
CJ-1-7B,0.956,0.936,0.97,0.932,0.948,Judge
Qwen2.5-72B-Chat,0.964,0.916,0.958,0.912,0.937,General
Qwen2-72B-Chat,0.937,0.889,0.976,0.936,0.935,General
CJ-1-1.5B,0.928,0.851,0.981,0.858,0.905,Judge
Qwen2.5-7B-Chat,0.916,0.681,0.967,0.931,0.874,General
Selftaught-llama3.1-70B,0.918,0.667,0.95,0.942,0.869,Judge