Update README.md
Browse files
README.md
CHANGED
@@ -39,13 +39,16 @@ This model was built via parameter-efficient finetuning of the [mistralai/Mixtra
|
|
39 |
|
40 |
## Evaluation Results
|
41 |
|
42 |
-
| Metric | Value
|
43 |
-
|
44 |
-
|
|
45 |
-
| ARC (25-shot) |
|
46 |
-
| HellaSwag (10-shot) |
|
47 |
-
|
|
48 |
-
|
|
|
|
|
|
|
|
49 |
|
50 |
We use Eleuther.AI's [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, the same version as Hugging Face's [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
|
51 |
|
|
|
39 |
|
40 |
## Evaluation Results
|
41 |
|
42 |
+
| Metric | Value |
|
43 |
+
|-----------------------|---------------------------|
|
44 |
+
| Avg. | 68.87 |
|
45 |
+
| ARC (25-shot) | 67.24 |
|
46 |
+
| HellaSwag (10-shot) | 86.03 |
|
47 |
+
| MMLU (5-shot) | 68.59 |
|
48 |
+
| TruthfulQA (0-shot) | 59.54 |
|
49 |
+
| Winogrande (5-shot) | 80.43 |
|
50 |
+
| GSM8K (5-shot) | 51.4 |
|
51 |
+
|
52 |
|
53 |
We use Eleuther.AI's [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, the same version as Hugging Face's [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
|
54 |
|