leaderboard-pr-bot commited on
Commit
a9b62c0
·
verified ·
1 Parent(s): b8f4a43

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +125 -59
README.md CHANGED
@@ -1,22 +1,70 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
- - en
5
- pipeline_tag: text-generation
6
  base_model: EleutherAI/pythia-31m
7
  datasets:
8
- - totally-not-an-llm/EverythingLM-data-V3
9
- - databricks/databricks-dolly-15k
10
- - THUDM/webglm-qa
11
- - starfishmedical/webGPT_x_dolly
12
- - Amod/mental_health_counseling_conversations
13
- - sablo/oasst2_curated
14
- - cognitivecomputations/wizard_vicuna_70k_unfiltered
15
- - mlabonne/chatml_dpo_pairs
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  model-index:
17
  - name: Pythia-31M-Chat-v1
18
  results:
19
- - task:
20
  type: text-generation
21
  name: Text Generation
22
  dataset:
@@ -27,13 +75,13 @@ model-index:
27
  args:
28
  num_few_shot: 25
29
  metrics:
30
- - type: acc_norm
31
- name: normalized accuracy
32
- value: 22.7
33
  source:
34
- name: Open LLM Leaderboard
35
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Pythia-31M-Chat-v1
36
- - task:
 
37
  type: text-generation
38
  name: Text Generation
39
  dataset:
@@ -43,13 +91,13 @@ model-index:
43
  args:
44
  num_few_shot: 10
45
  metrics:
46
- - type: acc_norm
47
- name: normalized accuracy
48
- value: 25.6
49
  source:
50
- name: Open LLM Leaderboard
51
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Pythia-31M-Chat-v1
52
- - task:
 
53
  type: text-generation
54
  name: Text Generation
55
  dataset:
@@ -60,13 +108,13 @@ model-index:
60
  args:
61
  num_few_shot: 5
62
  metrics:
63
- - type: acc
64
- name: accuracy
65
- value: 23.24
66
  source:
67
- name: Open LLM Leaderboard
68
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Pythia-31M-Chat-v1
69
- - task:
 
70
  type: text-generation
71
  name: Text Generation
72
  dataset:
@@ -77,41 +125,45 @@ model-index:
77
  args:
78
  num_few_shot: 5
79
  metrics:
80
- - type: acc
81
- name: accuracy
82
- value: 47.99
83
  source:
 
84
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Pythia-31M-Chat-v1
86
- widget:
87
- - text: |-
88
- <|im_start|>system
89
- You are a career counselor. The user will provide you with an individual looking for guidance in their professional life, and your task is to assist them in determining what careers they are most suited for based on their skills, interests, and experience. You should also conduct research into the various options available, explain the job market trends in different industries, and advice on which qualifications would be beneficial for pursuing particular fields.<|im_end|>
90
- <|im_start|>user
91
- Heya!<|im_end|>
92
- <|im_start|>assistant
93
- Hi! How may I help you?<|im_end|>
94
- <|im_start|>user
95
- I am interested in developing a career in software engineering. What would you recommend me to do?<|im_end|>
96
- <|im_start|>assistant
97
- - text: |-
98
- <|im_start|>system
99
- You are a helpful assistant who answers user's questions with details and curiosity.<|im_end|>
100
- <|im_start|>user
101
- What are some potential applications for quantum computing?<|im_end|>
102
- <|im_start|>assistant
103
- - text: |-
104
- <|im_start|>system
105
- You are a highly knowledgeable assistant. Help the user as much as you can.<|im_end|>
106
- <|im_start|>user
107
- What are some steps I can take to become a healthier person?<|im_end|>
108
- <|im_start|>assistant
109
- inference:
110
- parameters:
111
- max_new_tokens: 250
112
- penalty_alpha: 0.5
113
- top_k: 2
114
- repetition_penalty: 1.0016
115
  ---
116
 
117
  # A Pythia Chat Model of 31M Parameters
@@ -227,3 +279,17 @@ DPOTrainer(
227
  ],
228
  )
229
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
+ - en
4
+ license: apache-2.0
5
  base_model: EleutherAI/pythia-31m
6
  datasets:
7
+ - totally-not-an-llm/EverythingLM-data-V3
8
+ - databricks/databricks-dolly-15k
9
+ - THUDM/webglm-qa
10
+ - starfishmedical/webGPT_x_dolly
11
+ - Amod/mental_health_counseling_conversations
12
+ - sablo/oasst2_curated
13
+ - cognitivecomputations/wizard_vicuna_70k_unfiltered
14
+ - mlabonne/chatml_dpo_pairs
15
+ pipeline_tag: text-generation
16
+ widget:
17
+ - text: '<|im_start|>system
18
+
19
+ You are a career counselor. The user will provide you with an individual looking
20
+ for guidance in their professional life, and your task is to assist them in determining
21
+ what careers they are most suited for based on their skills, interests, and experience.
22
+ You should also conduct research into the various options available, explain the
23
+ job market trends in different industries, and advice on which qualifications
24
+ would be beneficial for pursuing particular fields.<|im_end|>
25
+
26
+ <|im_start|>user
27
+
28
+ Heya!<|im_end|>
29
+
30
+ <|im_start|>assistant
31
+
32
+ Hi! How may I help you?<|im_end|>
33
+
34
+ <|im_start|>user
35
+
36
+ I am interested in developing a career in software engineering. What would you
37
+ recommend me to do?<|im_end|>
38
+
39
+ <|im_start|>assistant'
40
+ - text: '<|im_start|>system
41
+
42
+ You are a helpful assistant who answers user''s questions with details and curiosity.<|im_end|>
43
+
44
+ <|im_start|>user
45
+
46
+ What are some potential applications for quantum computing?<|im_end|>
47
+
48
+ <|im_start|>assistant'
49
+ - text: '<|im_start|>system
50
+
51
+ You are a highly knowledgeable assistant. Help the user as much as you can.<|im_end|>
52
+
53
+ <|im_start|>user
54
+
55
+ What are some steps I can take to become a healthier person?<|im_end|>
56
+
57
+ <|im_start|>assistant'
58
+ inference:
59
+ parameters:
60
+ max_new_tokens: 250
61
+ penalty_alpha: 0.5
62
+ top_k: 2
63
+ repetition_penalty: 1.0016
64
  model-index:
65
  - name: Pythia-31M-Chat-v1
66
  results:
67
+ - task:
68
  type: text-generation
69
  name: Text Generation
70
  dataset:
 
75
  args:
76
  num_few_shot: 25
77
  metrics:
78
+ - type: acc_norm
79
+ value: 22.7
80
+ name: normalized accuracy
81
  source:
 
82
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Pythia-31M-Chat-v1
83
+ name: Open LLM Leaderboard
84
+ - task:
85
  type: text-generation
86
  name: Text Generation
87
  dataset:
 
91
  args:
92
  num_few_shot: 10
93
  metrics:
94
+ - type: acc_norm
95
+ value: 25.6
96
+ name: normalized accuracy
97
  source:
 
98
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Pythia-31M-Chat-v1
99
+ name: Open LLM Leaderboard
100
+ - task:
101
  type: text-generation
102
  name: Text Generation
103
  dataset:
 
108
  args:
109
  num_few_shot: 5
110
  metrics:
111
+ - type: acc
112
+ value: 23.24
113
+ name: accuracy
114
  source:
 
115
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Pythia-31M-Chat-v1
116
+ name: Open LLM Leaderboard
117
+ - task:
118
  type: text-generation
119
  name: Text Generation
120
  dataset:
 
125
  args:
126
  num_few_shot: 5
127
  metrics:
128
+ - type: acc
129
+ value: 47.99
130
+ name: accuracy
131
  source:
132
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Pythia-31M-Chat-v1
133
  name: Open LLM Leaderboard
134
+ - task:
135
+ type: text-generation
136
+ name: Text Generation
137
+ dataset:
138
+ name: TruthfulQA (0-shot)
139
+ type: truthful_qa
140
+ config: multiple_choice
141
+ split: validation
142
+ args:
143
+ num_few_shot: 0
144
+ metrics:
145
+ - type: mc2
146
+ value: 0.0
147
+ source:
148
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Pythia-31M-Chat-v1
149
+ name: Open LLM Leaderboard
150
+ - task:
151
+ type: text-generation
152
+ name: Text Generation
153
+ dataset:
154
+ name: GSM8k (5-shot)
155
+ type: gsm8k
156
+ config: main
157
+ split: test
158
+ args:
159
+ num_few_shot: 5
160
+ metrics:
161
+ - type: acc
162
+ value: 0.0
163
+ name: accuracy
164
+ source:
165
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Pythia-31M-Chat-v1
166
+ name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
167
  ---
168
 
169
  # A Pythia Chat Model of 31M Parameters
 
279
  ],
280
  )
281
  ```
282
+
283
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
284
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Felladrin__Pythia-31M-Chat-v1)
285
+
286
+ | Metric |Value|
287
+ |---------------------------------|----:|
288
+ |Avg. |19.92|
289
+ |AI2 Reasoning Challenge (25-Shot)|22.70|
290
+ |HellaSwag (10-Shot) |25.60|
291
+ |MMLU (5-Shot) |23.24|
292
+ |TruthfulQA (0-shot) | 0.00|
293
+ |Winogrande (5-shot) |47.99|
294
+ |GSM8k (5-shot) | 0.00|
295
+