effibench-leaderboard / leaderboard_table_20240520.csv
qyhfrank's picture
Init Commit
bb8527c
raw
history blame
1.19 kB
Model,Timeout,Dataset,ET,NET,MU,NMU,TMU,NTMU
OpenCodeInterpreter-DS-1.3B,10,HumanEval,0.2,0.86,57.24,1,6.63,0.84
OpenCodeInterpreter-DS-6.7B,10,HumanEval,0.21,0.98,58.83,1.06,6.79,0.99
OpenCodeInterpreter-DS-33B,10,HumanEval,0.21,0.95,59.9,1.05,7.05,0.94
deepseek-coder-1.3b-instruct,10,HumanEval,0.23,0.9,62.8,1,7.85,0.87
deepseek-coder-6.7b-instruct,10,HumanEval,0.22,0.76,59.57,1,7.34,0.77
deepseek-coder-33b-instruct,10,HumanEval,0.21,0.95,63.52,0.99,7.18,0.95
CodeLlama-7b-Instruct-hf,10,HumanEval,0.2,0.71,57.39,0.91,7.08,0.7
CodeLlama-13b-Instruct-hf,10,HumanEval,0.23,0.95,58.13,0.96,7.97,0.94
CodeLlama-34b-Instruct-hf,10,HumanEval,0.24,0.95,61.79,1.01,8.45,0.96
CodeLlama-70b-Instruct-hf,10,HumanEval,0.21,0.93,60.19,1.01,6.76,1.01
XwinCoder-13B,10,HumanEval,0.27,1.08,61.14,1.04,9.25,1.09
XwinCoder-34B,10,HumanEval,0.25,1.07,60.75,1.05,8.46,1.08
WizardCoder-Python-7B-V1.0-GPTQ,10,HumanEval,0.21,0.91,58.59,1.01,6.63,0.89
WizardCoder-Python-13B-V1.0-GPTQ,10,HumanEval,0.21,0.81,60.59,1,7.22,0.79
WizardCoder-Python-34B-V1.0-GPTQ,10,HumanEval,0.22,0.79,58.13,1,7.1,0.78
starcoder2-3b,10,HumanEval,0.24,1.02,62.45,1,7.73,0.89
starcoder2-7b,10,HumanEval,0.21,0.89,62.53,1,7.41,0.85