leaderboard-pr-bot commited on
Commit
f239769
1 Parent(s): f209799

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +39 -15
README.md CHANGED
@@ -1,5 +1,7 @@
1
  ---
2
  license: apache-2.0
 
 
3
  model-index:
4
  - name: MetaMath-OpenHermes-2.5-neural-chat-v3-3-Slerp
5
  results:
@@ -20,9 +22,11 @@ model-index:
20
  - type: acc_norm
21
  value: 64.59
22
  name: normalized accuracy
 
 
 
23
  source:
24
- url: >-
25
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=PulsarAI/MetaMath-OpenHermes-2.5-neural-chat-v3-3-Slerp
26
  name: Open LLM Leaderboard
27
  - task:
28
  type: text-generation
@@ -40,9 +44,11 @@ model-index:
40
  - type: acc_norm
41
  value: 85.39
42
  name: normalized accuracy
 
 
 
43
  source:
44
- url: >-
45
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=PulsarAI/MetaMath-OpenHermes-2.5-neural-chat-v3-3-Slerp
46
  name: Open LLM Leaderboard
47
  - task:
48
  type: text-generation
@@ -61,9 +67,11 @@ model-index:
61
  - type: acc
62
  value: 64.27
63
  name: accuracy
 
 
 
64
  source:
65
- url: >-
66
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=PulsarAI/MetaMath-OpenHermes-2.5-neural-chat-v3-3-Slerp
67
  name: Open LLM Leaderboard
68
  - task:
69
  type: text-generation
@@ -80,9 +88,10 @@ model-index:
80
  value: 55.14
81
  - type: mc2
82
  value: 55.14
 
 
83
  source:
84
- url: >-
85
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=PulsarAI/MetaMath-OpenHermes-2.5-neural-chat-v3-3-Slerp
86
  name: Open LLM Leaderboard
87
  - task:
88
  type: text-generation
@@ -101,9 +110,11 @@ model-index:
101
  - type: acc
102
  value: 79.64
103
  name: accuracy
 
 
 
104
  source:
105
- url: >-
106
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=PulsarAI/MetaMath-OpenHermes-2.5-neural-chat-v3-3-Slerp
107
  name: Open LLM Leaderboard
108
  - task:
109
  type: text-generation
@@ -122,12 +133,12 @@ model-index:
122
  - type: acc
123
  value: 71.65
124
  name: accuracy
 
 
 
125
  source:
126
- url: >-
127
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=PulsarAI/MetaMath-OpenHermes-2.5-neural-chat-v3-3-Slerp
128
  name: Open LLM Leaderboard
129
- tags:
130
- - merge
131
  ---
132
 
133
  # MetaMath-OpenHermes-2.5-neural-chat-v3-3-Slerp
@@ -155,4 +166,17 @@ parameters:
155
  - value: 0.5 # fallback for rest of tensors
156
  dtype: bfloat16
157
 
158
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ tags:
4
+ - merge
5
  model-index:
6
  - name: MetaMath-OpenHermes-2.5-neural-chat-v3-3-Slerp
7
  results:
 
22
  - type: acc_norm
23
  value: 64.59
24
  name: normalized accuracy
25
+ - type: acc_norm
26
+ value: 64.59
27
+ name: normalized accuracy
28
  source:
29
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=PulsarAI/MetaMath-OpenHermes-2.5-neural-chat-v3-3-Slerp
 
30
  name: Open LLM Leaderboard
31
  - task:
32
  type: text-generation
 
44
  - type: acc_norm
45
  value: 85.39
46
  name: normalized accuracy
47
+ - type: acc_norm
48
+ value: 85.37
49
+ name: normalized accuracy
50
  source:
51
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=PulsarAI/MetaMath-OpenHermes-2.5-neural-chat-v3-3-Slerp
 
52
  name: Open LLM Leaderboard
53
  - task:
54
  type: text-generation
 
67
  - type: acc
68
  value: 64.27
69
  name: accuracy
70
+ - type: acc
71
+ value: 64.29
72
+ name: accuracy
73
  source:
74
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=PulsarAI/MetaMath-OpenHermes-2.5-neural-chat-v3-3-Slerp
 
75
  name: Open LLM Leaderboard
76
  - task:
77
  type: text-generation
 
88
  value: 55.14
89
  - type: mc2
90
  value: 55.14
91
+ - type: mc2
92
+ value: 55.14
93
  source:
94
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=PulsarAI/MetaMath-OpenHermes-2.5-neural-chat-v3-3-Slerp
 
95
  name: Open LLM Leaderboard
96
  - task:
97
  type: text-generation
 
110
  - type: acc
111
  value: 79.64
112
  name: accuracy
113
+ - type: acc
114
+ value: 79.08
115
+ name: accuracy
116
  source:
117
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=PulsarAI/MetaMath-OpenHermes-2.5-neural-chat-v3-3-Slerp
 
118
  name: Open LLM Leaderboard
119
  - task:
120
  type: text-generation
 
133
  - type: acc
134
  value: 71.65
135
  name: accuracy
136
+ - type: acc
137
+ value: 71.04
138
+ name: accuracy
139
  source:
140
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=PulsarAI/MetaMath-OpenHermes-2.5-neural-chat-v3-3-Slerp
 
141
  name: Open LLM Leaderboard
 
 
142
  ---
143
 
144
  # MetaMath-OpenHermes-2.5-neural-chat-v3-3-Slerp
 
166
  - value: 0.5 # fallback for rest of tensors
167
  dtype: bfloat16
168
 
169
+ ```
170
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
171
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Weyaxi__MetaMath-OpenHermes-2.5-neural-chat-v3-3-Slerp)
172
+
173
+ | Metric |Value|
174
+ |---------------------------------|----:|
175
+ |Avg. |69.92|
176
+ |AI2 Reasoning Challenge (25-Shot)|64.59|
177
+ |HellaSwag (10-Shot) |85.37|
178
+ |MMLU (5-Shot) |64.29|
179
+ |TruthfulQA (0-shot) |55.14|
180
+ |Winogrande (5-shot) |79.08|
181
+ |GSM8k (5-shot) |71.04|
182
+