Tristan commited on
Commit
1cfe661
·
1 Parent(s): 71a3160

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -4
README.md CHANGED
@@ -94,7 +94,6 @@ The model achieves the following results without any fine-tuning (zero-shot):
94
  |arc_easy |acc/acc_norm|0.4381/0.3948 |**0.4651**/**0.4247** |**0.0082**/**0.0029** |
95
  |arc_challenge|acc/acc_norm|0.1903/0.2270 |0.1997/0.2329 |0.4132/0.6256 |
96
 
97
- To get these results, we used the Eleuther AI evaluation harness [here](https://github.com/EleutherAI/lm-evaluation-harness).\
98
- We chose these 20 tasks, because they are the tasks that the GPT2 and GPT3 papers report results for.\
99
- The harness can produce results a little different than those reported in the GPT2 paper.\
100
- The p-values come from the stderr from the evaluation harness, plus a normal distribution assumption.
 
94
  |arc_easy |acc/acc_norm|0.4381/0.3948 |**0.4651**/**0.4247** |**0.0082**/**0.0029** |
95
  |arc_challenge|acc/acc_norm|0.1903/0.2270 |0.1997/0.2329 |0.4132/0.6256 |
96
 
97
+ To get these results, we used the Eleuther AI evaluation harness [here](https://github.com/EleutherAI/lm-evaluation-harness),
98
+ which can produce results a little different than those reported in the GPT2 paper. The p-values come from the stderr from the evaluation harness, plus a normal distribution assumption.
99
+ We chose these 20 tasks, because they are the tasks that the GPT2 and GPT3 papers report results for.