Update README.md
Browse files
README.md
CHANGED
@@ -125,7 +125,7 @@ Model Params | Sequence Length | Batch Size | Number of Steps | Tokens | Tokens
|
|
125 |
|
126 |
## Evaluations
|
127 |
|
128 |
-
We evaluate our models on the PILE validation set comprising 380M tokens.
|
129 |
|
130 |
#### 0-shot Evaluation
|
131 |
| Model | Params | Training FLOPs | PILE test xent | Hella-Swag | PIQA | Wino-Grande | Lambada | ARC-e | ARC-c | OpenBookQA | Downstream Average |
|
|
|
125 |
|
126 |
## Evaluations
|
127 |
|
128 |
+
We evaluate our models on the PILE validation set comprising 380M tokens. In our paper we also evaluate the public checkpoints of Pythia, Eleuther (2022); OPT, Zhang et al. (2022); GPT-NeoX 20B, Black et al. (2022); and GPT-J 6B, Wang & Komatsuzaki (2021). We trained models from smallest to largest and fit a power law as we went along. The power law was helpful for extrapolating the validation loss of the next largest model we trained and provided confidence about whether the training run was going well.
|
129 |
|
130 |
#### 0-shot Evaluation
|
131 |
| Model | Params | Training FLOPs | PILE test xent | Hella-Swag | PIQA | Wino-Grande | Lambada | ARC-e | ARC-c | OpenBookQA | Downstream Average |
|