leaderboard-pr-bot commited on
Commit
fb26dd8
1 Parent(s): 7ccd7f1

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +116 -7
README.md CHANGED
@@ -1,22 +1,117 @@
1
  ---
2
- license: other
3
- license_name: qwen
4
- license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
5
  language:
6
  - en
7
- pipeline_tag: text-generation
 
8
  tags:
9
  - chat
10
  - qwen
11
  - qwen2.5
12
  - finetune
13
  - english
14
- library_name: transformers
 
 
 
 
15
  inference: false
16
  model_creator: MaziyarPanahi
17
  quantized_by: MaziyarPanahi
18
- base_model: MaziyarPanahi/calme-3-selfmerge-qwen2-78b
19
- model_name: calme-3.1-instruct-78b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  ---
21
 
22
  <img src="./calme_3.png" alt="Calme-3 Models" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
@@ -77,3 +172,17 @@ model = AutoModelForCausalLM.from_pretrained("MaziyarPanahi/calme-3.1-instruct-7
77
  # Ethical Considerations
78
 
79
  As with any large language model, users should be aware of potential biases and limitations. We recommend implementing appropriate safeguards and human oversight when deploying this model in production environments.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
2
  language:
3
  - en
4
+ license: other
5
+ library_name: transformers
6
  tags:
7
  - chat
8
  - qwen
9
  - qwen2.5
10
  - finetune
11
  - english
12
+ base_model: MaziyarPanahi/calme-3-selfmerge-qwen2-78b
13
+ model_name: calme-3.1-instruct-78b
14
+ license_name: qwen
15
+ license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
16
+ pipeline_tag: text-generation
17
  inference: false
18
  model_creator: MaziyarPanahi
19
  quantized_by: MaziyarPanahi
20
+ model-index:
21
+ - name: calme-3.1-instruct-78b
22
+ results:
23
+ - task:
24
+ type: text-generation
25
+ name: Text Generation
26
+ dataset:
27
+ name: IFEval (0-Shot)
28
+ type: HuggingFaceH4/ifeval
29
+ args:
30
+ num_few_shot: 0
31
+ metrics:
32
+ - type: inst_level_strict_acc and prompt_level_strict_acc
33
+ value: 81.36
34
+ name: strict accuracy
35
+ source:
36
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.1-instruct-78b
37
+ name: Open LLM Leaderboard
38
+ - task:
39
+ type: text-generation
40
+ name: Text Generation
41
+ dataset:
42
+ name: BBH (3-Shot)
43
+ type: BBH
44
+ args:
45
+ num_few_shot: 3
46
+ metrics:
47
+ - type: acc_norm
48
+ value: 62.41
49
+ name: normalized accuracy
50
+ source:
51
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.1-instruct-78b
52
+ name: Open LLM Leaderboard
53
+ - task:
54
+ type: text-generation
55
+ name: Text Generation
56
+ dataset:
57
+ name: MATH Lvl 5 (4-Shot)
58
+ type: hendrycks/competition_math
59
+ args:
60
+ num_few_shot: 4
61
+ metrics:
62
+ - type: exact_match
63
+ value: 38.75
64
+ name: exact match
65
+ source:
66
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.1-instruct-78b
67
+ name: Open LLM Leaderboard
68
+ - task:
69
+ type: text-generation
70
+ name: Text Generation
71
+ dataset:
72
+ name: GPQA (0-shot)
73
+ type: Idavidrein/gpqa
74
+ args:
75
+ num_few_shot: 0
76
+ metrics:
77
+ - type: acc_norm
78
+ value: 19.46
79
+ name: acc_norm
80
+ source:
81
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.1-instruct-78b
82
+ name: Open LLM Leaderboard
83
+ - task:
84
+ type: text-generation
85
+ name: Text Generation
86
+ dataset:
87
+ name: MuSR (0-shot)
88
+ type: TAUR-Lab/MuSR
89
+ args:
90
+ num_few_shot: 0
91
+ metrics:
92
+ - type: acc_norm
93
+ value: 36.5
94
+ name: acc_norm
95
+ source:
96
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.1-instruct-78b
97
+ name: Open LLM Leaderboard
98
+ - task:
99
+ type: text-generation
100
+ name: Text Generation
101
+ dataset:
102
+ name: MMLU-PRO (5-shot)
103
+ type: TIGER-Lab/MMLU-Pro
104
+ config: main
105
+ split: test
106
+ args:
107
+ num_few_shot: 5
108
+ metrics:
109
+ - type: acc
110
+ value: 68.72
111
+ name: accuracy
112
+ source:
113
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.1-instruct-78b
114
+ name: Open LLM Leaderboard
115
  ---
116
 
117
  <img src="./calme_3.png" alt="Calme-3 Models" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
 
172
  # Ethical Considerations
173
 
174
  As with any large language model, users should be aware of potential biases and limitations. We recommend implementing appropriate safeguards and human oversight when deploying this model in production environments.
175
+
176
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
177
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_MaziyarPanahi__calme-3.1-instruct-78b)
178
+
179
+ | Metric |Value|
180
+ |-------------------|----:|
181
+ |Avg. |51.20|
182
+ |IFEval (0-Shot) |81.36|
183
+ |BBH (3-Shot) |62.41|
184
+ |MATH Lvl 5 (4-Shot)|38.75|
185
+ |GPQA (0-shot) |19.46|
186
+ |MuSR (0-shot) |36.50|
187
+ |MMLU-PRO (5-shot) |68.72|
188
+