prince-canuma
commited on
Commit
•
421f0a0
1
Parent(s):
a139750
Update eval result table
Browse files
README.md
CHANGED
@@ -159,35 +159,17 @@ For all these evaluations, a higher score is a better score. We chose these benc
|
|
159 |
|
160 |
Read more [here](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
|
161 |
|
162 |
-
[TODO]
|
163 |
-
|
164 |
### Results
|
165 |
|
166 |
-
|
167 |
-
|
168 |
-
|
169 |
-
|
170 |
-
|
171 |
-
|
172 |
-
|
173 |
-
|
174 |
-
|
175 |
-
"acc": 74.52
|
176 |
-
},
|
177 |
-
"MMLU": {
|
178 |
-
"acc": 56.33
|
179 |
-
},
|
180 |
-
"truthfulqa": {
|
181 |
-
"acc": 46.74,
|
182 |
-
},
|
183 |
-
"winogrande": {
|
184 |
-
"acc": 75.00,
|
185 |
-
},
|
186 |
-
"gsm8k": {
|
187 |
-
"acc": 50.64,
|
188 |
-
}
|
189 |
-
}
|
190 |
-
```
|
191 |
|
192 |
## Technical Specifications
|
193 |
|
|
|
159 |
|
160 |
Read more [here](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
|
161 |
|
|
|
|
|
162 |
### Results
|
163 |
|
164 |
+
| Model | AVG | ARC | Hellaswag | MMLU | Truthful QA | Winogrande | GSM8K |
|
165 |
+
|-------|--------:|------:|----------:|-----:|----------:|----------:|----------:|
|
166 |
+
| [NousResearch/Nous-Puffin-70B](NousResearch/Nous-Puffin-70B) | 64.91 | 67.41 | 87.37 | 69.77 | 46.77 | 83.9 | 34.27 |
|
167 |
+
| [TheBloke/Llama-2-70B-fp16](https://huggingface.co/TheBloke/Llama-2-70B-fp16) | 64.52 | 67.32 | 87.33 | 69.83 | 44.92 | 83.74 | 33.97 |
|
168 |
+
| [NousResearch/Yarn-Mistral-7B-64k](https://huggingface.co/NousResearch/Yarn-Mistral-7b-64k) | 59.63 | 59.9 | 82.51 | 62.96 | 41.86 | 77.27 | 33.28 |
|
169 |
+
| [Qwen1.5-4B-Chat](https://huggingface.co/Qwen/Qwen1.5-4B-Chat) | 46.79 | 43.26 | 69.73 | 55.55 | 44.79 | 64.96 | 2.43 |
|
170 |
+
| [Microsoft/phi-2](https://huggingface.co/microsoft/phi-2) | 61.33 | 61.09 | 75.11 | 58.11 | 44.47 | 74.35 | 54.81 |
|
171 |
+
| [Damysus-2.7B-Chat](https://huggingface.co/prince-canuma/Damysus-2.7B-Chat) (Ours) | 60.49 | 59.81 | 74.52 | 56.33 | **46.74** | **75.06** | 50.64 |
|
172 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
173 |
|
174 |
## Technical Specifications
|
175 |
|