iansotnek commited on
Commit
c5f4b5a
·
1 Parent(s): 018a1b4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -14
README.md CHANGED
@@ -91,17 +91,17 @@ We present the results from various model benchmarks on the EleutherAI LLM Evalu
91
  Model results are sorted by mean score, ascending, to provide an ordering. These metrics serve to further show that none of the DLite models are
92
  state of the art, but rather further show that chat-like behaviors in LLMs can be trained almost independent of model size.
93
 
94
- | model | openbookqa | arc_easy | winogrande | hellaswag | arc_challenge | piqa | boolq |
95
- |:--------------|-------------:|-----------:|-------------:|------------:|----------------:|---------:|---------:|
96
- | gpt2 | 0.164 | 0.438131 | 0.51618 | 0.289185 | 0.190273 | 0.628945 | 0.487156 |
97
- | dlite-v2-124m | 0.174 | 0.44697 | 0.502762 | 0.291974 | 0.192833 | 0.631665 | 0.520183 |
98
- | dlite-v1-124m | 0.17 | 0.462542 | 0.494081 | 0.293268 | 0.223549 | 0.622416 | 0.502446 |
99
- | gpt2-medium | 0.186 | 0.490741 | 0.531176 | 0.333101 | 0.215017 | 0.676279 | 0.585933 |
100
- | dlite-v2-355m | 0.206 | 0.493687 | 0.524073 | 0.334993 | 0.226109 | 0.670838 | 0.582263 |
101
- | dlite-v1-355m | 0.216 | 0.507576 | 0.496448 | 0.338478 | 0.234642 | 0.664309 | 0.600306 |
102
- | gpt2-large | 0.194 | 0.531566 | 0.553275 | 0.363971 | 0.216724 | 0.703482 | 0.604893 |
103
- | dlite-774m-v2 | 0.212 | 0.539562 | 0.5588 | 0.365565 | 0.234642 | 0.700218 | 0.60367 |
104
- | dlite-774m-v1 | 0.218 | 0.545875 | 0.562747 | 0.375124 | 0.250853 | 0.698041 | 0.614985 |
105
- | gpt2-xl | 0.224 | 0.582912 | 0.583268 | 0.400418 | 0.25 | 0.708379 | 0.617737 |
106
- | dlite-v1-1.5b | 0.226 | 0.588384 | 0.584846 | 0.401414 | 0.268771 | 0.708379 | 0.624159 |
107
- | dlite-v2-1.5b | 0.226 | 0.59596 | 0.581689 | 0.40719 | 0.273891 | 0.705114 | 0.630887 |
 
91
  Model results are sorted by mean score, ascending, to provide an ordering. These metrics serve to further show that none of the DLite models are
92
  state of the art, but rather further show that chat-like behaviors in LLMs can be trained almost independent of model size.
93
 
94
+ | Model | arc_challenge | arc_easy | boolq | hellaswag | openbookqa | piqa | winogrande |
95
+ |:--------------|----------------:|-----------:|---------:|------------:|-------------:|---------:|-------------:|
96
+ | dlite-v2-124m | 0.199659 | 0.447811 | 0.494801 | 0.291675 | 0.156 | 0.620239 | 0.487766 |
97
+ | gpt2 | 0.190273 | 0.438131 | 0.487156 | 0.289185 | 0.164 | 0.628945 | 0.51618 |
98
+ | dlite-v1-124m | 0.223549 | 0.462542 | 0.502446 | 0.293268 | 0.17 | 0.622416 | 0.494081 |
99
+ | gpt2-medium | 0.215017 | 0.490741 | 0.585933 | 0.333101 | 0.186 | 0.676279 | 0.531176 |
100
+ | dlite-v2-355m | 0.251706 | 0.486111 | 0.547401 | 0.344354 | 0.216 | 0.671926 | 0.52723 |
101
+ | dlite-v1-355m | 0.234642 | 0.507576 | 0.600306 | 0.338478 | 0.216 | 0.664309 | 0.496448 |
102
+ | gpt2-large | 0.216724 | 0.531566 | 0.604893 | 0.363971 | 0.194 | 0.703482 | 0.553275 |
103
+ | dlite-v1-774m | 0.250853 | 0.545875 | 0.614985 | 0.375124 | 0.218 | 0.698041 | 0.562747 |
104
+ | dlite-v2-774m | 0.269625 | 0.52904 | 0.613761 | 0.395937 | 0.256 | 0.691513 | 0.566693 |
105
+ | gpt2-xl | 0.25 | 0.582912 | 0.617737 | 0.400418 | 0.224 | 0.708379 | 0.583268 |
106
+ | dlite-v1-1_5b | 0.268771 | 0.588384 | 0.624159 | 0.401414 | 0.226 | 0.708379 | 0.584846 |
107
+ | dlite-v2-1_5b | 0.289249 | 0.565657 | 0.601223 | 0.434077 | 0.272 | 0.703482 | 0.588003 |