Open-Orca
/

Mistral-7B-OpenOrca

@@ -24,7 +24,7 @@ We use [OpenChat](https://huggingface.co/openchat) packing, trained with [Axolot
 This release is trained on a curated filtered subset of most of our GPT-4 augmented data.
 It is the same subset of our data as was used in our [OpenOrcaxOpenChat-Preview2-13B model](https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B).
-**HF Leaderboard evals place this model as #2 for all models smaller than 30B at release time, outperforming all but one 13B model.**
 This release provides a first: a fully open model with class-breaking performance, capable of running fully accelerated on even moderate consumer GPUs.
 Our thanks to the Mistral team for leading the way here.
@@ -112,20 +112,20 @@ pip install git+https://github.com/huggingface/transformers
 ## HuggingFace Leaderboard Performance
 We have evaluated using the methodology and tools for the HuggingFace Leaderboard, and find that we have dramatically improved upon the base model.
-We find **105%** of the base model's performance on HF Leaderboard evals, averaging **65.33**.
-At release time, this beats all 7B models, and all but one 13B.
 ![HF Leaderboard](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca/resolve/main/Images/MistralOrca7BHFLeaderboard.png)
 | Metric | Value |
 |-----------------------|-------|
-| MMLU (5-shot)         | 61.73 |
-| ARC (25-shot)         | 63.57 |
-| HellaSwag (10-shot)   | 83.79 |
-| TruthfulQA (0-shot)   | 52.24 |
-| Avg.                  | 65.33 |
 We use [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard.

 This release is trained on a curated filtered subset of most of our GPT-4 augmented data.
 It is the same subset of our data as was used in our [OpenOrcaxOpenChat-Preview2-13B model](https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B).
+**HF Leaderboard evals place this model as #1 for all models smaller than 30B at release time, outperforming all other 7B and 13B models!**
 This release provides a first: a fully open model with class-breaking performance, capable of running fully accelerated on even moderate consumer GPUs.
 Our thanks to the Mistral team for leading the way here.
 ## HuggingFace Leaderboard Performance
 We have evaluated using the methodology and tools for the HuggingFace Leaderboard, and find that we have dramatically improved upon the base model.
+We find **106%** of the base model's performance on HF Leaderboard evals, averaging **65.84**.
+At release time, this beats all 7B and 13B models!
 ![HF Leaderboard](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca/resolve/main/Images/MistralOrca7BHFLeaderboard.png)
 | Metric | Value |
 |-----------------------|-------|
+| MMLU (5-shot)         | 62.24 |
+| ARC (25-shot)         | 64.08 |
+| HellaSwag (10-shot)   | 83.99 |
+| TruthfulQA (0-shot)   | 53.05 |
+| Avg.                  | 65.84 |
 We use [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard.