leaderboard-pr-bot commited on
Commit
1a2b593
·
verified ·
1 Parent(s): 4cccb98

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +129 -7
README.md CHANGED
@@ -1,17 +1,17 @@
1
  ---
2
- license: apache-2.0
3
- datasets:
4
- - Locutusque/hercules-v1.0
5
- - Open-Orca/SlimOrca-Dedup
6
  language:
7
  - en
8
- base_model: Locutusque/TinyMistral-248M-v2.5
9
  tags:
10
  - chemistry
11
  - biology
12
  - not-for-all-audiences
13
  - merge
14
  - code
 
 
 
 
15
  inference:
16
  parameters:
17
  do_sample: true
@@ -25,7 +25,116 @@ inference:
25
  no_repeat_ngram_size: 5
26
  epsilon_cutoff: 0.002
27
  widget:
28
- - text: "<|im_start|>user\nWrite me a Python program that calculates the factorial of n. <|im_end|>\n<|im_start|>assistant\n"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  ---
30
  # Model description
31
  Fine-tuned Locutusque/TinyMistral-248m-v2.5 on SlimOrca-Dedup and Hercules-v1.0. Averaged a loss of 1.5 during training. This model's performance is excellent considering it's size.
@@ -34,4 +143,17 @@ This model may output X-rated content. You and you alone are responsible for dow
34
 
35
  You can use the ChatML prompt format for this model.
36
  # Evaluation
37
- This model will be submitted to the Open LLM Leaderboard.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
 
2
  language:
3
  - en
4
+ license: apache-2.0
5
  tags:
6
  - chemistry
7
  - biology
8
  - not-for-all-audiences
9
  - merge
10
  - code
11
+ datasets:
12
+ - Locutusque/hercules-v1.0
13
+ - Open-Orca/SlimOrca-Dedup
14
+ base_model: Locutusque/TinyMistral-248M-v2.5
15
  inference:
16
  parameters:
17
  do_sample: true
 
25
  no_repeat_ngram_size: 5
26
  epsilon_cutoff: 0.002
27
  widget:
28
+ - text: '<|im_start|>user
29
+
30
+ Write me a Python program that calculates the factorial of n. <|im_end|>
31
+
32
+ <|im_start|>assistant
33
+
34
+ '
35
+ model-index:
36
+ - name: TinyMistral-248M-v2.5-Instruct
37
+ results:
38
+ - task:
39
+ type: text-generation
40
+ name: Text Generation
41
+ dataset:
42
+ name: AI2 Reasoning Challenge (25-Shot)
43
+ type: ai2_arc
44
+ config: ARC-Challenge
45
+ split: test
46
+ args:
47
+ num_few_shot: 25
48
+ metrics:
49
+ - type: acc_norm
50
+ value: 22.27
51
+ name: normalized accuracy
52
+ source:
53
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5-Instruct
54
+ name: Open LLM Leaderboard
55
+ - task:
56
+ type: text-generation
57
+ name: Text Generation
58
+ dataset:
59
+ name: HellaSwag (10-Shot)
60
+ type: hellaswag
61
+ split: validation
62
+ args:
63
+ num_few_shot: 10
64
+ metrics:
65
+ - type: acc_norm
66
+ value: 27.6
67
+ name: normalized accuracy
68
+ source:
69
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5-Instruct
70
+ name: Open LLM Leaderboard
71
+ - task:
72
+ type: text-generation
73
+ name: Text Generation
74
+ dataset:
75
+ name: MMLU (5-Shot)
76
+ type: cais/mmlu
77
+ config: all
78
+ split: test
79
+ args:
80
+ num_few_shot: 5
81
+ metrics:
82
+ - type: acc
83
+ value: 23.9
84
+ name: accuracy
85
+ source:
86
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5-Instruct
87
+ name: Open LLM Leaderboard
88
+ - task:
89
+ type: text-generation
90
+ name: Text Generation
91
+ dataset:
92
+ name: TruthfulQA (0-shot)
93
+ type: truthful_qa
94
+ config: multiple_choice
95
+ split: validation
96
+ args:
97
+ num_few_shot: 0
98
+ metrics:
99
+ - type: mc2
100
+ value: 44.21
101
+ source:
102
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5-Instruct
103
+ name: Open LLM Leaderboard
104
+ - task:
105
+ type: text-generation
106
+ name: Text Generation
107
+ dataset:
108
+ name: Winogrande (5-shot)
109
+ type: winogrande
110
+ config: winogrande_xl
111
+ split: validation
112
+ args:
113
+ num_few_shot: 5
114
+ metrics:
115
+ - type: acc
116
+ value: 48.22
117
+ name: accuracy
118
+ source:
119
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5-Instruct
120
+ name: Open LLM Leaderboard
121
+ - task:
122
+ type: text-generation
123
+ name: Text Generation
124
+ dataset:
125
+ name: GSM8k (5-shot)
126
+ type: gsm8k
127
+ config: main
128
+ split: test
129
+ args:
130
+ num_few_shot: 5
131
+ metrics:
132
+ - type: acc
133
+ value: 0.0
134
+ name: accuracy
135
+ source:
136
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5-Instruct
137
+ name: Open LLM Leaderboard
138
  ---
139
  # Model description
140
  Fine-tuned Locutusque/TinyMistral-248m-v2.5 on SlimOrca-Dedup and Hercules-v1.0. Averaged a loss of 1.5 during training. This model's performance is excellent considering it's size.
 
143
 
144
  You can use the ChatML prompt format for this model.
145
  # Evaluation
146
+ This model will be submitted to the Open LLM Leaderboard.
147
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
148
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Locutusque__TinyMistral-248M-v2.5-Instruct)
149
+
150
+ | Metric |Value|
151
+ |---------------------------------|----:|
152
+ |Avg. |27.70|
153
+ |AI2 Reasoning Challenge (25-Shot)|22.27|
154
+ |HellaSwag (10-Shot) |27.60|
155
+ |MMLU (5-Shot) |23.90|
156
+ |TruthfulQA (0-shot) |44.21|
157
+ |Winogrande (5-shot) |48.22|
158
+ |GSM8k (5-shot) | 0.00|
159
+