Commit
b1b5c4e
1 Parent(s): 94d222b

Adding Evaluation Results (#10)

Browse files

- Adding Evaluation Results (892024d5d0c2239d0ad8831de55bf1e12e59025b)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +106 -1
README.md CHANGED
@@ -12,7 +12,6 @@ tags:
12
  - llama
13
  - llama-3
14
  base_model: MaziyarPanahi/Llama-3-8B-Instruct-v0.4
15
- model_name: Llama-3-8B-Instruct-v0.8
16
  pipeline_tag: text-generation
17
  license_name: llama3
18
  license_link: LICENSE
@@ -122,6 +121,98 @@ model-index:
122
  source:
123
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-8B-Instruct-v0.8
124
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
125
  ---
126
 
127
  <img src="./llama-3-merges.webp" alt="Llama-3 DPO Logo" width="500" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
@@ -228,3 +319,17 @@ outputs = pipeline(
228
  )
229
  print(outputs[0]["generated_text"][len(prompt):])
230
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  - llama
13
  - llama-3
14
  base_model: MaziyarPanahi/Llama-3-8B-Instruct-v0.4
 
15
  pipeline_tag: text-generation
16
  license_name: llama3
17
  license_link: LICENSE
 
121
  source:
122
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-8B-Instruct-v0.8
123
  name: Open LLM Leaderboard
124
+ - task:
125
+ type: text-generation
126
+ name: Text Generation
127
+ dataset:
128
+ name: IFEval (0-Shot)
129
+ type: HuggingFaceH4/ifeval
130
+ args:
131
+ num_few_shot: 0
132
+ metrics:
133
+ - type: inst_level_strict_acc and prompt_level_strict_acc
134
+ value: 75.12
135
+ name: strict accuracy
136
+ source:
137
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-8B-Instruct-v0.8
138
+ name: Open LLM Leaderboard
139
+ - task:
140
+ type: text-generation
141
+ name: Text Generation
142
+ dataset:
143
+ name: BBH (3-Shot)
144
+ type: BBH
145
+ args:
146
+ num_few_shot: 3
147
+ metrics:
148
+ - type: acc_norm
149
+ value: 28.27
150
+ name: normalized accuracy
151
+ source:
152
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-8B-Instruct-v0.8
153
+ name: Open LLM Leaderboard
154
+ - task:
155
+ type: text-generation
156
+ name: Text Generation
157
+ dataset:
158
+ name: MATH Lvl 5 (4-Shot)
159
+ type: hendrycks/competition_math
160
+ args:
161
+ num_few_shot: 4
162
+ metrics:
163
+ - type: exact_match
164
+ value: 7.1
165
+ name: exact match
166
+ source:
167
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-8B-Instruct-v0.8
168
+ name: Open LLM Leaderboard
169
+ - task:
170
+ type: text-generation
171
+ name: Text Generation
172
+ dataset:
173
+ name: GPQA (0-shot)
174
+ type: Idavidrein/gpqa
175
+ args:
176
+ num_few_shot: 0
177
+ metrics:
178
+ - type: acc_norm
179
+ value: 7.38
180
+ name: acc_norm
181
+ source:
182
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-8B-Instruct-v0.8
183
+ name: Open LLM Leaderboard
184
+ - task:
185
+ type: text-generation
186
+ name: Text Generation
187
+ dataset:
188
+ name: MuSR (0-shot)
189
+ type: TAUR-Lab/MuSR
190
+ args:
191
+ num_few_shot: 0
192
+ metrics:
193
+ - type: acc_norm
194
+ value: 10.92
195
+ name: acc_norm
196
+ source:
197
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-8B-Instruct-v0.8
198
+ name: Open LLM Leaderboard
199
+ - task:
200
+ type: text-generation
201
+ name: Text Generation
202
+ dataset:
203
+ name: MMLU-PRO (5-shot)
204
+ type: TIGER-Lab/MMLU-Pro
205
+ config: main
206
+ split: test
207
+ args:
208
+ num_few_shot: 5
209
+ metrics:
210
+ - type: acc
211
+ value: 31.68
212
+ name: accuracy
213
+ source:
214
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-8B-Instruct-v0.8
215
+ name: Open LLM Leaderboard
216
  ---
217
 
218
  <img src="./llama-3-merges.webp" alt="Llama-3 DPO Logo" width="500" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
 
319
  )
320
  print(outputs[0]["generated_text"][len(prompt):])
321
  ```
322
+
323
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
324
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_MaziyarPanahi__Llama-3-8B-Instruct-v0.8)
325
+
326
+ | Metric |Value|
327
+ |-------------------|----:|
328
+ |Avg. |26.75|
329
+ |IFEval (0-Shot) |75.12|
330
+ |BBH (3-Shot) |28.27|
331
+ |MATH Lvl 5 (4-Shot)| 7.10|
332
+ |GPQA (0-shot) | 7.38|
333
+ |MuSR (0-shot) |10.92|
334
+ |MMLU-PRO (5-shot) |31.68|
335
+