File size: 5,758 Bytes
f43c1aa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
|                 Tasks                 |Version|Filter|n-shot|Metric|Value |   |Stderr|
|---------------------------------------|-------|------|-----:|------|-----:|---|-----:|
|mmlu                                   |N/A    |none  |     0|acc   |0.3892|±  |0.0040|
| - humanities                          |N/A    |none  |     0|acc   |0.3766|±  |0.0069|
|  - formal_logic                       |      0|none  |     0|acc   |0.2857|±  |0.0404|
|  - high_school_european_history       |      0|none  |     0|acc   |0.4909|±  |0.0390|
|  - high_school_us_history             |      0|none  |     0|acc   |0.5196|±  |0.0351|
|  - high_school_world_history          |      0|none  |     0|acc   |0.5105|±  |0.0325|
|  - international_law                  |      0|none  |     0|acc   |0.6364|±  |0.0439|
|  - jurisprudence                      |      0|none  |     0|acc   |0.4074|±  |0.0475|
|  - logical_fallacies                  |      0|none  |     0|acc   |0.4356|±  |0.0390|
|  - moral_disputes                     |      0|none  |     0|acc   |0.4104|±  |0.0265|
|  - moral_scenarios                    |      0|none  |     0|acc   |0.2436|±  |0.0144|
|  - philosophy                         |      0|none  |     0|acc   |0.4277|±  |0.0281|
|  - prehistory                         |      0|none  |     0|acc   |0.4444|±  |0.0276|
|  - professional_law                   |      0|none  |     0|acc   |0.3259|±  |0.0120|
|  - world_religions                    |      0|none  |     0|acc   |0.5789|±  |0.0379|
| - other                               |N/A    |none  |     0|acc   |0.4232|±  |0.0088|
|  - business_ethics                    |      0|none  |     0|acc   |0.4000|±  |0.0492|
|  - clinical_knowledge                 |      0|none  |     0|acc   |0.3811|±  |0.0299|
|  - college_medicine                   |      0|none  |     0|acc   |0.3410|±  |0.0361|
|  - global_facts                       |      0|none  |     0|acc   |0.2700|±  |0.0446|
|  - human_aging                        |      0|none  |     0|acc   |0.4081|±  |0.0330|
|  - management                         |      0|none  |     0|acc   |0.4951|±  |0.0495|
|  - marketing                          |      0|none  |     0|acc   |0.5726|±  |0.0324|
|  - medical_genetics                   |      0|none  |     0|acc   |0.3600|±  |0.0482|
|  - miscellaneous                      |      0|none  |     0|acc   |0.5096|±  |0.0179|
|  - nutrition                          |      0|none  |     0|acc   |0.4346|±  |0.0284|
|  - professional_accounting            |      0|none  |     0|acc   |0.2979|±  |0.0273|
|  - professional_medicine              |      0|none  |     0|acc   |0.3456|±  |0.0289|
|  - virology                           |      0|none  |     0|acc   |0.3976|±  |0.0381|
| - social_sciences                     |N/A    |none  |     0|acc   |0.4374|±  |0.0089|
|  - econometrics                       |      0|none  |     0|acc   |0.2456|±  |0.0405|
|  - high_school_geography              |      0|none  |     0|acc   |0.3788|±  |0.0346|
|  - high_school_government_and_politics|      0|none  |     0|acc   |0.5648|±  |0.0358|
|  - high_school_macroeconomics         |      0|none  |     0|acc   |0.4000|±  |0.0248|
|  - high_school_microeconomics         |      0|none  |     0|acc   |0.3487|±  |0.0310|
|  - high_school_psychology             |      0|none  |     0|acc   |0.4771|±  |0.0214|
|  - human_sexuality                    |      0|none  |     0|acc   |0.4656|±  |0.0437|
|  - professional_psychology            |      0|none  |     0|acc   |0.4134|±  |0.0199|
|  - public_relations                   |      0|none  |     0|acc   |0.4455|±  |0.0476|
|  - security_studies                   |      0|none  |     0|acc   |0.4367|±  |0.0318|
|  - sociology                          |      0|none  |     0|acc   |0.5224|±  |0.0353|
|  - us_foreign_policy                  |      0|none  |     0|acc   |0.6000|±  |0.0492|
| - stem                                |N/A    |none  |     0|acc   |0.3273|±  |0.0082|
|  - abstract_algebra                   |      0|none  |     0|acc   |0.2300|±  |0.0423|
|  - anatomy                            |      0|none  |     0|acc   |0.4000|±  |0.0423|
|  - astronomy                          |      0|none  |     0|acc   |0.4605|±  |0.0406|
|  - college_biology                    |      0|none  |     0|acc   |0.3611|±  |0.0402|
|  - college_chemistry                  |      0|none  |     0|acc   |0.2600|±  |0.0441|
|  - college_computer_science           |      0|none  |     0|acc   |0.3700|±  |0.0485|
|  - college_mathematics                |      0|none  |     0|acc   |0.2300|±  |0.0423|
|  - college_physics                    |      0|none  |     0|acc   |0.2157|±  |0.0409|
|  - computer_security                  |      0|none  |     0|acc   |0.5200|±  |0.0502|
|  - conceptual_physics                 |      0|none  |     0|acc   |0.3362|±  |0.0309|
|  - electrical_engineering             |      0|none  |     0|acc   |0.3862|±  |0.0406|
|  - elementary_mathematics             |      0|none  |     0|acc   |0.2884|±  |0.0233|
|  - high_school_biology                |      0|none  |     0|acc   |0.4645|±  |0.0284|
|  - high_school_chemistry              |      0|none  |     0|acc   |0.2709|±  |0.0313|
|  - high_school_computer_science       |      0|none  |     0|acc   |0.3600|±  |0.0482|
|  - high_school_mathematics            |      0|none  |     0|acc   |0.2556|±  |0.0266|
|  - high_school_physics                |      0|none  |     0|acc   |0.2318|±  |0.0345|
|  - high_school_statistics             |      0|none  |     0|acc   |0.2407|±  |0.0292|
|  - machine_learning                   |      0|none  |     0|acc   |0.3393|±  |0.0449|