Adding Evaluation Results
#2
by
leaderboard-pr-bot
- opened
README.md
CHANGED
@@ -21,6 +21,9 @@ model-index:
|
|
21 |
- type: acc_norm
|
22 |
value: 60.41
|
23 |
name: normalized accuracy
|
|
|
|
|
|
|
24 |
source:
|
25 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NLUHOPOE/experiment2-cause-qLoRa
|
26 |
name: Open LLM Leaderboard
|
@@ -37,6 +40,9 @@ model-index:
|
|
37 |
- type: acc_norm
|
38 |
value: 82.76
|
39 |
name: normalized accuracy
|
|
|
|
|
|
|
40 |
source:
|
41 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NLUHOPOE/experiment2-cause-qLoRa
|
42 |
name: Open LLM Leaderboard
|
@@ -54,6 +60,9 @@ model-index:
|
|
54 |
- type: acc
|
55 |
value: 62.15
|
56 |
name: accuracy
|
|
|
|
|
|
|
57 |
source:
|
58 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NLUHOPOE/experiment2-cause-qLoRa
|
59 |
name: Open LLM Leaderboard
|
@@ -70,6 +79,8 @@ model-index:
|
|
70 |
metrics:
|
71 |
- type: mc2
|
72 |
value: 47.13
|
|
|
|
|
73 |
source:
|
74 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NLUHOPOE/experiment2-cause-qLoRa
|
75 |
name: Open LLM Leaderboard
|
@@ -87,6 +98,9 @@ model-index:
|
|
87 |
- type: acc
|
88 |
value: 78.85
|
89 |
name: accuracy
|
|
|
|
|
|
|
90 |
source:
|
91 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NLUHOPOE/experiment2-cause-qLoRa
|
92 |
name: Open LLM Leaderboard
|
@@ -104,6 +118,9 @@ model-index:
|
|
104 |
- type: acc
|
105 |
value: 35.03
|
106 |
name: accuracy
|
|
|
|
|
|
|
107 |
source:
|
108 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NLUHOPOE/experiment2-cause-qLoRa
|
109 |
name: Open LLM Leaderboard
|
@@ -144,3 +161,17 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
|
|
144 |
|Winogrande (5-shot) |78.85|
|
145 |
|GSM8k (5-shot) |35.03|
|
146 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
- type: acc_norm
|
22 |
value: 60.41
|
23 |
name: normalized accuracy
|
24 |
+
- type: acc_norm
|
25 |
+
value: 61.26
|
26 |
+
name: normalized accuracy
|
27 |
source:
|
28 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NLUHOPOE/experiment2-cause-qLoRa
|
29 |
name: Open LLM Leaderboard
|
|
|
40 |
- type: acc_norm
|
41 |
value: 82.76
|
42 |
name: normalized accuracy
|
43 |
+
- type: acc_norm
|
44 |
+
value: 83.4
|
45 |
+
name: normalized accuracy
|
46 |
source:
|
47 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NLUHOPOE/experiment2-cause-qLoRa
|
48 |
name: Open LLM Leaderboard
|
|
|
60 |
- type: acc
|
61 |
value: 62.15
|
62 |
name: accuracy
|
63 |
+
- type: acc
|
64 |
+
value: 63.91
|
65 |
+
name: accuracy
|
66 |
source:
|
67 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NLUHOPOE/experiment2-cause-qLoRa
|
68 |
name: Open LLM Leaderboard
|
|
|
79 |
metrics:
|
80 |
- type: mc2
|
81 |
value: 47.13
|
82 |
+
- type: mc2
|
83 |
+
value: 48.16
|
84 |
source:
|
85 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NLUHOPOE/experiment2-cause-qLoRa
|
86 |
name: Open LLM Leaderboard
|
|
|
98 |
- type: acc
|
99 |
value: 78.85
|
100 |
name: accuracy
|
101 |
+
- type: acc
|
102 |
+
value: 79.79
|
103 |
+
name: accuracy
|
104 |
source:
|
105 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NLUHOPOE/experiment2-cause-qLoRa
|
106 |
name: Open LLM Leaderboard
|
|
|
118 |
- type: acc
|
119 |
value: 35.03
|
120 |
name: accuracy
|
121 |
+
- type: acc
|
122 |
+
value: 39.88
|
123 |
+
name: accuracy
|
124 |
source:
|
125 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NLUHOPOE/experiment2-cause-qLoRa
|
126 |
name: Open LLM Leaderboard
|
|
|
161 |
|Winogrande (5-shot) |78.85|
|
162 |
|GSM8k (5-shot) |35.03|
|
163 |
|
164 |
+
|
165 |
+
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
166 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_NLUHOPOE__experiment2-cause)
|
167 |
+
|
168 |
+
| Metric |Value|
|
169 |
+
|---------------------------------|----:|
|
170 |
+
|Avg. |62.73|
|
171 |
+
|AI2 Reasoning Challenge (25-Shot)|61.26|
|
172 |
+
|HellaSwag (10-Shot) |83.40|
|
173 |
+
|MMLU (5-Shot) |63.91|
|
174 |
+
|TruthfulQA (0-shot) |48.16|
|
175 |
+
|Winogrande (5-shot) |79.79|
|
176 |
+
|GSM8k (5-shot) |39.88|
|
177 |
+
|