Share a mmlu test result,I use 2.51bit,and compare with ds api, baidu's ds,it seems 2.51bit is very smart at least in mmlu
#42
by
tarjintor
- opened
I use cais/mmlu dev part,only 285 questions,I don't have time to test larger part,and I found 2.51bit only wrong in [1,22,32,69,86,88,104,160, 169, 176, 198, 199, 210, 217, 225, 238, 243, 246, 258, 262, 275]
and then I use baidu's ds and ds itself api test on these wrong questions,and baidu's only right in 2 of those questions, ds itself right in 3.
I my day to day use, I can hardly found a question which 2.51bit fail but q4 right