Locutusque commited on
Commit
094fbf8
·
verified ·
1 Parent(s): 31b3358

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -1
README.md CHANGED
@@ -113,7 +113,80 @@ This model is intended for researchers, developers, and organizations seeking a
113
  The `Locutusque/Hyperion-3.0-Mistral-7B-DPO` model was fine-tuned on a carefully curated dataset of 20,000 preference pairs, where 4,000 examples were used to fine-tune. These examples were generated by GPT-4 to ensure the highest quality and relevance across various domains, including programming, medical texts, mathematical problems, and reasoning tasks. The training data was further optimized using Direct Preference Optimization (DPO) to align the model's outputs with human preferences and improve overall performance.
114
 
115
  ## Evaluation Results
116
- Detailed evaluation results will be provided soon, showcasing the model's performance across various benchmarks and tasks compared to its predecessors and other state-of-the-art models.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
 
118
  ## How to Use
119
  ```python
 
113
  The `Locutusque/Hyperion-3.0-Mistral-7B-DPO` model was fine-tuned on a carefully curated dataset of 20,000 preference pairs, where 4,000 examples were used to fine-tune. These examples were generated by GPT-4 to ensure the highest quality and relevance across various domains, including programming, medical texts, mathematical problems, and reasoning tasks. The training data was further optimized using Direct Preference Optimization (DPO) to align the model's outputs with human preferences and improve overall performance.
114
 
115
  ## Evaluation Results
116
+ mmlu flan cot 5-shot
117
+
118
+ | Tasks |Version| Filter |n-shot| Metric |Value | |Stderr|
119
+ |-------------------------------------------------------------|-------|----------|-----:|-----------|-----:|---|-----:|
120
+ |mmlu_flan_cot_fewshot |N/A |get-answer| 0|exact_match|0.5833|± |0.0118|
121
+ | - mmlu_flan_cot_fewshot_humanities |N/A |get-answer| 0|exact_match|0.5039|± |0.0205|
122
+ | - mmlu_flan_cot_fewshot_formal_logic | 0|get-answer| 0|exact_match|0.2143|± |0.1138|
123
+ | - mmlu_flan_cot_fewshot_high_school_european_history | 0|get-answer| 0|exact_match|0.6667|± |0.1143|
124
+ | - mmlu_flan_cot_fewshot_high_school_us_history | 0|get-answer| 0|exact_match|0.7727|± |0.0914|
125
+ | - mmlu_flan_cot_fewshot_high_school_world_history | 0|get-answer| 0|exact_match|0.5385|± |0.0997|
126
+ | - mmlu_flan_cot_fewshot_international_law | 0|get-answer| 0|exact_match|0.9231|± |0.0769|
127
+ | - mmlu_flan_cot_fewshot_jurisprudence | 0|get-answer| 0|exact_match|0.5455|± |0.1575|
128
+ | - mmlu_flan_cot_fewshot_logical_fallacies | 0|get-answer| 0|exact_match|0.7778|± |0.1008|
129
+ | - mmlu_flan_cot_fewshot_moral_disputes | 0|get-answer| 0|exact_match|0.5526|± |0.0817|
130
+ | - mmlu_flan_cot_fewshot_moral_scenarios | 0|get-answer| 0|exact_match|0.4000|± |0.0492|
131
+ | - mmlu_flan_cot_fewshot_philosophy | 0|get-answer| 0|exact_match|0.7647|± |0.0738|
132
+ | - mmlu_flan_cot_fewshot_prehistory | 0|get-answer| 0|exact_match|0.6571|± |0.0814|
133
+ | - mmlu_flan_cot_fewshot_professional_law | 0|get-answer| 0|exact_match|0.3294|± |0.0362|
134
+ | - mmlu_flan_cot_fewshot_world_religions | 0|get-answer| 0|exact_match|0.8947|± |0.0723|
135
+ | - mmlu_flan_cot_fewshot_other |N/A |get-answer| 0|exact_match|0.6833|± |0.0244|
136
+ | - mmlu_flan_cot_fewshot_business_ethics | 0|get-answer| 0|exact_match|0.9091|± |0.0909|
137
+ | - mmlu_flan_cot_fewshot_clinical_knowledge | 0|get-answer| 0|exact_match|0.5862|± |0.0931|
138
+ | - mmlu_flan_cot_fewshot_college_medicine | 0|get-answer| 0|exact_match|0.6364|± |0.1050|
139
+ | - mmlu_flan_cot_fewshot_global_facts | 0|get-answer| 0|exact_match|0.6000|± |0.1633|
140
+ | - mmlu_flan_cot_fewshot_human_aging | 0|get-answer| 0|exact_match|0.6087|± |0.1041|
141
+ | - mmlu_flan_cot_fewshot_management | 0|get-answer| 0|exact_match|0.9091|± |0.0909|
142
+ | - mmlu_flan_cot_fewshot_marketing | 0|get-answer| 0|exact_match|0.8000|± |0.0816|
143
+ | - mmlu_flan_cot_fewshot_medical_genetics | 0|get-answer| 0|exact_match|1.0000|± |0.0000|
144
+ | - mmlu_flan_cot_fewshot_miscellaneous | 0|get-answer| 0|exact_match|0.8023|± |0.0432|
145
+ | - mmlu_flan_cot_fewshot_nutrition | 0|get-answer| 0|exact_match|0.6667|± |0.0833|
146
+ | - mmlu_flan_cot_fewshot_professional_accounting | 0|get-answer| 0|exact_match|0.4839|± |0.0912|
147
+ | - mmlu_flan_cot_fewshot_professional_medicine | 0|get-answer| 0|exact_match|0.5806|± |0.0901|
148
+ | - mmlu_flan_cot_fewshot_virology | 0|get-answer| 0|exact_match|0.3889|± |0.1182|
149
+ | - mmlu_flan_cot_fewshot_social_sciences |N/A |get-answer| 0|exact_match|0.7003|± |0.0239|
150
+ | - mmlu_flan_cot_fewshot_econometrics | 0|get-answer| 0|exact_match|0.4167|± |0.1486|
151
+ | - mmlu_flan_cot_fewshot_high_school_geography | 0|get-answer| 0|exact_match|0.9091|± |0.0627|
152
+ | - mmlu_flan_cot_fewshot_high_school_government_and_politics| 0|get-answer| 0|exact_match|0.8095|± |0.0878|
153
+ | - mmlu_flan_cot_fewshot_high_school_macroeconomics | 0|get-answer| 0|exact_match|0.6512|± |0.0735|
154
+ | - mmlu_flan_cot_fewshot_high_school_microeconomics | 0|get-answer| 0|exact_match|0.5769|± |0.0988|
155
+ | - mmlu_flan_cot_fewshot_high_school_psychology | 0|get-answer| 0|exact_match|0.9000|± |0.0391|
156
+ | - mmlu_flan_cot_fewshot_human_sexuality | 0|get-answer| 0|exact_match|0.6667|± |0.1421|
157
+ | - mmlu_flan_cot_fewshot_professional_psychology | 0|get-answer| 0|exact_match|0.6522|± |0.0578|
158
+ | - mmlu_flan_cot_fewshot_public_relations | 0|get-answer| 0|exact_match|0.5833|± |0.1486|
159
+ | - mmlu_flan_cot_fewshot_security_studies | 0|get-answer| 0|exact_match|0.4074|± |0.0964|
160
+ | - mmlu_flan_cot_fewshot_sociology | 0|get-answer| 0|exact_match|0.8182|± |0.0842|
161
+ | - mmlu_flan_cot_fewshot_us_foreign_policy | 0|get-answer| 0|exact_match|0.7273|± |0.1408|
162
+ | - mmlu_flan_cot_fewshot_stem |N/A |get-answer| 0|exact_match|0.4866|± |0.0262|
163
+ | - mmlu_flan_cot_fewshot_abstract_algebra | 0|get-answer| 0|exact_match|0.0909|± |0.0909|
164
+ | - mmlu_flan_cot_fewshot_anatomy | 0|get-answer| 0|exact_match|0.4286|± |0.1373|
165
+ | - mmlu_flan_cot_fewshot_astronomy | 0|get-answer| 0|exact_match|0.5625|± |0.1281|
166
+ | - mmlu_flan_cot_fewshot_college_biology | 0|get-answer| 0|exact_match|0.5000|± |0.1291|
167
+ | - mmlu_flan_cot_fewshot_college_chemistry | 0|get-answer| 0|exact_match|0.5000|± |0.1890|
168
+ | - mmlu_flan_cot_fewshot_college_computer_science | 0|get-answer| 0|exact_match|0.2727|± |0.1408|
169
+ | - mmlu_flan_cot_fewshot_college_mathematics | 0|get-answer| 0|exact_match|0.3636|± |0.1521|
170
+ | - mmlu_flan_cot_fewshot_college_physics | 0|get-answer| 0|exact_match|0.3636|± |0.1521|
171
+ | - mmlu_flan_cot_fewshot_computer_security | 0|get-answer| 0|exact_match|0.7273|± |0.1408|
172
+ | - mmlu_flan_cot_fewshot_conceptual_physics | 0|get-answer| 0|exact_match|0.6538|± |0.0951|
173
+ | - mmlu_flan_cot_fewshot_electrical_engineering | 0|get-answer| 0|exact_match|0.7500|± |0.1118|
174
+ | - mmlu_flan_cot_fewshot_elementary_mathematics | 0|get-answer| 0|exact_match|0.7317|± |0.0701|
175
+ | - mmlu_flan_cot_fewshot_high_school_biology | 0|get-answer| 0|exact_match|0.5938|± |0.0882|
176
+ | - mmlu_flan_cot_fewshot_high_school_chemistry | 0|get-answer| 0|exact_match|0.3636|± |0.1050|
177
+ | - mmlu_flan_cot_fewshot_high_school_computer_science | 0|get-answer| 0|exact_match|0.5556|± |0.1757|
178
+ | - mmlu_flan_cot_fewshot_high_school_mathematics | 0|get-answer| 0|exact_match|0.3103|± |0.0874|
179
+ | - mmlu_flan_cot_fewshot_high_school_physics | 0|get-answer| 0|exact_match|0.2353|± |0.1060|
180
+ | - mmlu_flan_cot_fewshot_high_school_statistics | 0|get-answer| 0|exact_match|0.3043|± |0.0981|
181
+ | - mmlu_flan_cot_fewshot_machine_learning | 0|get-answer| 0|exact_match|0.4545|± |0.1575|
182
+
183
+ | Groups |Version| Filter |n-shot| Metric |Value | |Stderr|
184
+ |----------------------------------------|-------|----------|-----:|-----------|-----:|---|-----:|
185
+ |mmlu_flan_cot_fewshot |N/A |get-answer| 0|exact_match|0.5833|± |0.0118|
186
+ | - mmlu_flan_cot_fewshot_humanities |N/A |get-answer| 0|exact_match|0.5039|± |0.0205|
187
+ | - mmlu_flan_cot_fewshot_other |N/A |get-answer| 0|exact_match|0.6833|± |0.0244|
188
+ | - mmlu_flan_cot_fewshot_social_sciences|N/A |get-answer| 0|exact_match|0.7003|± |0.0239|
189
+ | - mmlu_flan_cot_fewshot_stem |N/A |get-answer| 0|exact_match|0.4866|± |0.0262|
190
 
191
  ## How to Use
192
  ```python