prince-canuma commited on
Commit
421f0a0
1 Parent(s): a139750

Update eval result table

Browse files
Files changed (1) hide show
  1. README.md +9 -27
README.md CHANGED
@@ -159,35 +159,17 @@ For all these evaluations, a higher score is a better score. We chose these benc
159
 
160
  Read more [here](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
161
 
162
- [TODO]
163
-
164
  ### Results
165
 
166
- ```json
167
- {
168
- "AVG": {
169
- "acc": 60.49
170
- },
171
- "ARC": {
172
- "acc": 59.81
173
- },
174
- "HellaSwag": {
175
- "acc": 74.52
176
- },
177
- "MMLU": {
178
- "acc": 56.33
179
- },
180
- "truthfulqa": {
181
- "acc": 46.74,
182
- },
183
- "winogrande": {
184
- "acc": 75.00,
185
- },
186
- "gsm8k": {
187
- "acc": 50.64,
188
- }
189
- }
190
- ```
191
 
192
  ## Technical Specifications
193
 
 
159
 
160
  Read more [here](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
161
 
 
 
162
  ### Results
163
 
164
+ | Model | AVG | ARC | Hellaswag | MMLU | Truthful QA | Winogrande | GSM8K |
165
+ |-------|--------:|------:|----------:|-----:|----------:|----------:|----------:|
166
+ | [NousResearch/Nous-Puffin-70B](NousResearch/Nous-Puffin-70B) | 64.91 | 67.41 | 87.37 | 69.77 | 46.77 | 83.9 | 34.27 |
167
+ | [TheBloke/Llama-2-70B-fp16](https://huggingface.co/TheBloke/Llama-2-70B-fp16) | 64.52 | 67.32 | 87.33 | 69.83 | 44.92 | 83.74 | 33.97 |
168
+ | [NousResearch/Yarn-Mistral-7B-64k](https://huggingface.co/NousResearch/Yarn-Mistral-7b-64k) | 59.63 | 59.9 | 82.51 | 62.96 | 41.86 | 77.27 | 33.28 |
169
+ | [Qwen1.5-4B-Chat](https://huggingface.co/Qwen/Qwen1.5-4B-Chat) | 46.79 | 43.26 | 69.73 | 55.55 | 44.79 | 64.96 | 2.43 |
170
+ | [Microsoft/phi-2](https://huggingface.co/microsoft/phi-2) | 61.33 | 61.09 | 75.11 | 58.11 | 44.47 | 74.35 | 54.81 |
171
+ | [Damysus-2.7B-Chat](https://huggingface.co/prince-canuma/Damysus-2.7B-Chat) (Ours) | 60.49 | 59.81 | 74.52 | 56.33 | **46.74** | **75.06** | 50.64 |
172
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
173
 
174
  ## Technical Specifications
175