Text Generation
Transformers
PyTorch
English
gpt2
causal-lm
text-generation-inference
Inference Endpoints
rskuzma commited on
Commit
44d0230
1 Parent(s): 41cc116

flops typo

Browse files
Files changed (1) hide show
  1. README.md +14 -14
README.md CHANGED
@@ -113,13 +113,13 @@ All models were trained to Chinchilla point: 20x more tokens than model paramete
113
 
114
  Model Params | Sequence Length | Batch Size | Number of Steps | Tokens | Tokens per Parameter | Flops
115
  ------------ | -------------- | ---------- | --------------- | ------ | -------------------- | -----
116
- 111M | 2048 | 120 | 9037 | 2.22E+09 | 20 | 2.5E+18
117
- 256M | 2048 | 264 | 9468 | 5.12E+09 | 20 | 1.1E+19
118
- 590M | 2048 | 264 | 21836 | 1.18E+10 | 20 | 5.3E+19
119
- 1.3B | 2048 | 528 | 24334 | 2.63E+10 | 20 | 2.5E+20
120
- 2.7B | 2048 | 528 | 49041 | 5.30E+10 | 20 | 9.8E+20
121
- 6.7B | 2048 | 1040 | 62522 | 1.33E+11 | 20 | 5.9E+21
122
- 13B | 2048 | 720 | 174335 | 2.57E+11 | 20 | 2.1E+22
123
 
124
  <br><br>
125
 
@@ -133,13 +133,13 @@ We performed upstream (pre-training) evaluations of text prediction cross-entrop
133
  #### 0-shot Evaluation
134
  | Model | Params | Training FLOPs | PILE test xent | Hella-Swag | PIQA | Wino-Grande | Lambada | ARC-e | ARC-c | OpenBookQA | Downstream Average |
135
  | ------- | ----- | -------------- | -------------- | ---------- | ----- | ----------- | ------- | ----- | ----- | ---------- | ------------------ |
136
- | Cerebras-GPT | 111M | 2.5E+18 | 2.566 | 0.268 | 0.594 | 0.488 | 0.194 | 0.380 | 0.166 | 0.118 | 0.315 |
137
- | Cerebras-GPT | 256M | 1.1E+19 | 2.299 | 0.274 | 0.613 | 0.511 | 0.293 | 0.410 | 0.170 | 0.158 | 0.347 |
138
- | Cerebras-GPT | 590M | 5.3E+19 | 2.184 | 0.291 | 0.627 | 0.498 | 0.366 | 0.464 | 0.190 | 0.158 | 0.370 |
139
- | Cerebras-GPT | 1.3B | 2.5E+20 | 1.996 | 0.325 | 0.664 | 0.521 | 0.462 | 0.508 | 0.224 | 0.166 | 0.410 |
140
- | Cerebras-GPT | 2.7B | 9.8E+20 | 1.834 | 0.386 | 0.701 | 0.559 | 0.567 | 0.571 | 0.246 | 0.206 | 0.462 |
141
- | Cerebras-GPT | 6.7B | 5.9E+21 | 1.704 | 0.447 | 0.739 | 0.602 | 0.636 | 0.643 | 0.282 | 0.238 | 0.512 |
142
- | Cerebras-GPT | 13B | 2.1E+22 | 1.575 | 0.513 | 0.766 | 0.646 | 0.696 | 0.714 | 0.367 | 0.286 | 0.570 |
143
 
144
  #### 5-shot Evaluation
145
  | Model | Params | Hella-Swag | PIQA | Wino-Grande | Lambada | ARC-e | ARC-c | OpenBookQA |
 
113
 
114
  Model Params | Sequence Length | Batch Size | Number of Steps | Tokens | Tokens per Parameter | Flops
115
  ------------ | -------------- | ---------- | --------------- | ------ | -------------------- | -----
116
+ 111M | 2048 | 120 | 9037 | 2.22E+09 | 20 | 2.6E+18
117
+ 256M | 2048 | 264 | 9468 | 5.12E+09 | 20 | 1.3E+19
118
+ 590M | 2048 | 264 | 21836 | 1.18E+10 | 20 | 6.1E+19
119
+ 1.3B | 2048 | 528 | 24334 | 2.63E+10 | 20 | 2.8E+20
120
+ 2.7B | 2048 | 528 | 49041 | 5.30E+10 | 20 | 1.1E+21
121
+ 6.7B | 2048 | 1040 | 62522 | 1.33E+11 | 20 | 6.3E+21
122
+ 13B | 2048 | 720 | 174335 | 2.57E+11 | 20 | 2.3E+22
123
 
124
  <br><br>
125
 
 
133
  #### 0-shot Evaluation
134
  | Model | Params | Training FLOPs | PILE test xent | Hella-Swag | PIQA | Wino-Grande | Lambada | ARC-e | ARC-c | OpenBookQA | Downstream Average |
135
  | ------- | ----- | -------------- | -------------- | ---------- | ----- | ----------- | ------- | ----- | ----- | ---------- | ------------------ |
136
+ | Cerebras-GPT | 111M | 2.6E+18 | 2.566 | 0.268 | 0.594 | 0.488 | 0.194 | 0.380 | 0.166 | 0.118 | 0.315 |
137
+ | Cerebras-GPT | 256M | 1.3E+19 | 2.299 | 0.274 | 0.613 | 0.511 | 0.293 | 0.410 | 0.170 | 0.158 | 0.347 |
138
+ | Cerebras-GPT | 590M | 6.1E+19 | 2.184 | 0.291 | 0.627 | 0.498 | 0.366 | 0.464 | 0.190 | 0.158 | 0.370 |
139
+ | Cerebras-GPT | 1.3B | 2.8E+20 | 1.996 | 0.325 | 0.664 | 0.521 | 0.462 | 0.508 | 0.224 | 0.166 | 0.410 |
140
+ | Cerebras-GPT | 2.7B | 1.1E+21 | 1.834 | 0.386 | 0.701 | 0.559 | 0.567 | 0.571 | 0.246 | 0.206 | 0.462 |
141
+ | Cerebras-GPT | 6.7B | 6.3E+21 | 1.704 | 0.447 | 0.739 | 0.602 | 0.636 | 0.643 | 0.282 | 0.238 | 0.512 |
142
+ | Cerebras-GPT | 13B | 2.3E+22 | 1.575 | 0.513 | 0.766 | 0.646 | 0.696 | 0.714 | 0.367 | 0.286 | 0.570 |
143
 
144
  #### 5-shot Evaluation
145
  | Model | Params | Hella-Swag | PIQA | Wino-Grande | Lambada | ARC-e | ARC-c | OpenBookQA |