Files changed (1) hide show
  1. README.md +12 -14
README.md CHANGED
@@ -138,7 +138,18 @@ All GGUF models are available here: [MaziyarPanahi/calme-2.2-qwen2-72b-GGUF](htt
138
 
139
  # ๐Ÿ† [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
140
 
141
- Leaderboard 2 coming soon!
 
 
 
 
 
 
 
 
 
 
 
142
 
143
  ## TruthfulQA:
144
  ```
@@ -213,16 +224,3 @@ model = AutoModelForCausalLM.from_pretrained("MaziyarPanahi/calme-2.2-qwen2-72b"
213
  # Ethical Considerations
214
 
215
  As with any large language model, users should be aware of potential biases and limitations. We recommend implementing appropriate safeguards and human oversight when deploying this model in production environments.
216
- # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
217
- Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_MaziyarPanahi__calme-2.2-qwen2-72b)
218
-
219
- | Metric |Value|
220
- |-------------------|----:|
221
- |Avg. |43.40|
222
- |IFEval (0-Shot) |80.08|
223
- |BBH (3-Shot) |56.80|
224
- |MATH Lvl 5 (4-Shot)|41.16|
225
- |GPQA (0-shot) |16.55|
226
- |MuSR (0-shot) |16.52|
227
- |MMLU-PRO (5-shot) |49.27|
228
-
 
138
 
139
  # ๐Ÿ† [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
140
 
141
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_MaziyarPanahi__calme-2.2-qwen2-72b)
142
+
143
+ | Metric |Value|
144
+ |-------------------|----:|
145
+ |Avg. |43.40|
146
+ |IFEval (0-Shot) |80.08|
147
+ |BBH (3-Shot) |56.80|
148
+ |MATH Lvl 5 (4-Shot)|41.16|
149
+ |GPQA (0-shot) |16.55|
150
+ |MuSR (0-shot) |16.52|
151
+ |MMLU-PRO (5-shot) |49.27|
152
+
153
 
154
  ## TruthfulQA:
155
  ```
 
224
  # Ethical Considerations
225
 
226
  As with any large language model, users should be aware of potential biases and limitations. We recommend implementing appropriate safeguards and human oversight when deploying this model in production environments.