leaderboard-pr-bot commited on
Commit
64add20
1 Parent(s): 086bd26

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +111 -1
README.md CHANGED
@@ -1,3 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  This repo is for letting people reproduce Reflection-70B benchmark scores.
2
 
3
- [https://glaive.ai/blog/post/reflection-postmortem](https://glaive.ai/blog/post/reflection-postmortem)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ model-index:
3
+ - name: Reflection-Llama-3.1-70B
4
+ results:
5
+ - task:
6
+ type: text-generation
7
+ name: Text Generation
8
+ dataset:
9
+ name: IFEval (0-Shot)
10
+ type: HuggingFaceH4/ifeval
11
+ args:
12
+ num_few_shot: 0
13
+ metrics:
14
+ - type: inst_level_strict_acc and prompt_level_strict_acc
15
+ value: 59.91
16
+ name: strict accuracy
17
+ source:
18
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=glaiveai/Reflection-Llama-3.1-70B
19
+ name: Open LLM Leaderboard
20
+ - task:
21
+ type: text-generation
22
+ name: Text Generation
23
+ dataset:
24
+ name: BBH (3-Shot)
25
+ type: BBH
26
+ args:
27
+ num_few_shot: 3
28
+ metrics:
29
+ - type: acc_norm
30
+ value: 37.96
31
+ name: normalized accuracy
32
+ source:
33
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=glaiveai/Reflection-Llama-3.1-70B
34
+ name: Open LLM Leaderboard
35
+ - task:
36
+ type: text-generation
37
+ name: Text Generation
38
+ dataset:
39
+ name: MATH Lvl 5 (4-Shot)
40
+ type: hendrycks/competition_math
41
+ args:
42
+ num_few_shot: 4
43
+ metrics:
44
+ - type: exact_match
45
+ value: 0.0
46
+ name: exact match
47
+ source:
48
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=glaiveai/Reflection-Llama-3.1-70B
49
+ name: Open LLM Leaderboard
50
+ - task:
51
+ type: text-generation
52
+ name: Text Generation
53
+ dataset:
54
+ name: GPQA (0-shot)
55
+ type: Idavidrein/gpqa
56
+ args:
57
+ num_few_shot: 0
58
+ metrics:
59
+ - type: acc_norm
60
+ value: 8.61
61
+ name: acc_norm
62
+ source:
63
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=glaiveai/Reflection-Llama-3.1-70B
64
+ name: Open LLM Leaderboard
65
+ - task:
66
+ type: text-generation
67
+ name: Text Generation
68
+ dataset:
69
+ name: MuSR (0-shot)
70
+ type: TAUR-Lab/MuSR
71
+ args:
72
+ num_few_shot: 0
73
+ metrics:
74
+ - type: acc_norm
75
+ value: 13.72
76
+ name: acc_norm
77
+ source:
78
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=glaiveai/Reflection-Llama-3.1-70B
79
+ name: Open LLM Leaderboard
80
+ - task:
81
+ type: text-generation
82
+ name: Text Generation
83
+ dataset:
84
+ name: MMLU-PRO (5-shot)
85
+ type: TIGER-Lab/MMLU-Pro
86
+ config: main
87
+ split: test
88
+ args:
89
+ num_few_shot: 5
90
+ metrics:
91
+ - type: acc
92
+ value: 59.35
93
+ name: accuracy
94
+ source:
95
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=glaiveai/Reflection-Llama-3.1-70B
96
+ name: Open LLM Leaderboard
97
+ ---
98
  This repo is for letting people reproduce Reflection-70B benchmark scores.
99
 
100
+ [https://glaive.ai/blog/post/reflection-postmortem](https://glaive.ai/blog/post/reflection-postmortem)
101
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
102
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_glaiveai__Reflection-Llama-3.1-70B)
103
+
104
+ | Metric |Value|
105
+ |-------------------|----:|
106
+ |Avg. |29.92|
107
+ |IFEval (0-Shot) |59.91|
108
+ |BBH (3-Shot) |37.96|
109
+ |MATH Lvl 5 (4-Shot)| 0.00|
110
+ |GPQA (0-shot) | 8.61|
111
+ |MuSR (0-shot) |13.72|
112
+ |MMLU-PRO (5-shot) |59.35|
113
+