flops typo
Browse files
README.md
CHANGED
@@ -113,13 +113,13 @@ All models were trained to Chinchilla point: 20x more tokens than model paramete
|
|
113 |
|
114 |
Model Params | Sequence Length | Batch Size | Number of Steps | Tokens | Tokens per Parameter | Flops
|
115 |
------------ | -------------- | ---------- | --------------- | ------ | -------------------- | -----
|
116 |
-
111M | 2048 | 120 | 9037 | 2.22E+09 | 20 | 2.
|
117 |
-
256M | 2048 | 264 | 9468 | 5.12E+09 | 20 | 1.
|
118 |
-
590M | 2048 | 264 | 21836 | 1.18E+10 | 20 |
|
119 |
-
1.3B | 2048 | 528 | 24334 | 2.63E+10 | 20 | 2.
|
120 |
-
2.7B | 2048 | 528 | 49041 | 5.30E+10 | 20 |
|
121 |
-
6.7B | 2048 | 1040 | 62522 | 1.33E+11 | 20 |
|
122 |
-
13B | 2048 | 720 | 174335 | 2.57E+11 | 20 | 2.
|
123 |
|
124 |
<br><br>
|
125 |
|
@@ -133,13 +133,13 @@ We performed upstream (pre-training) evaluations of text prediction cross-entrop
|
|
133 |
#### 0-shot Evaluation
|
134 |
| Model | Params | Training FLOPs | PILE test xent | Hella-Swag | PIQA | Wino-Grande | Lambada | ARC-e | ARC-c | OpenBookQA | Downstream Average |
|
135 |
| ------- | ----- | -------------- | -------------- | ---------- | ----- | ----------- | ------- | ----- | ----- | ---------- | ------------------ |
|
136 |
-
| Cerebras-GPT | 111M | 2.
|
137 |
-
| Cerebras-GPT | 256M | 1.
|
138 |
-
| Cerebras-GPT | 590M |
|
139 |
-
| Cerebras-GPT | 1.3B | 2.
|
140 |
-
| Cerebras-GPT | 2.7B |
|
141 |
-
| Cerebras-GPT | 6.7B |
|
142 |
-
| Cerebras-GPT | 13B | 2.
|
143 |
|
144 |
#### 5-shot Evaluation
|
145 |
| Model | Params | Hella-Swag | PIQA | Wino-Grande | Lambada | ARC-e | ARC-c | OpenBookQA |
|
|
|
113 |
|
114 |
Model Params | Sequence Length | Batch Size | Number of Steps | Tokens | Tokens per Parameter | Flops
|
115 |
------------ | -------------- | ---------- | --------------- | ------ | -------------------- | -----
|
116 |
+
111M | 2048 | 120 | 9037 | 2.22E+09 | 20 | 2.6E+18
|
117 |
+
256M | 2048 | 264 | 9468 | 5.12E+09 | 20 | 1.3E+19
|
118 |
+
590M | 2048 | 264 | 21836 | 1.18E+10 | 20 | 6.1E+19
|
119 |
+
1.3B | 2048 | 528 | 24334 | 2.63E+10 | 20 | 2.8E+20
|
120 |
+
2.7B | 2048 | 528 | 49041 | 5.30E+10 | 20 | 1.1E+21
|
121 |
+
6.7B | 2048 | 1040 | 62522 | 1.33E+11 | 20 | 6.3E+21
|
122 |
+
13B | 2048 | 720 | 174335 | 2.57E+11 | 20 | 2.3E+22
|
123 |
|
124 |
<br><br>
|
125 |
|
|
|
133 |
#### 0-shot Evaluation
|
134 |
| Model | Params | Training FLOPs | PILE test xent | Hella-Swag | PIQA | Wino-Grande | Lambada | ARC-e | ARC-c | OpenBookQA | Downstream Average |
|
135 |
| ------- | ----- | -------------- | -------------- | ---------- | ----- | ----------- | ------- | ----- | ----- | ---------- | ------------------ |
|
136 |
+
| Cerebras-GPT | 111M | 2.6E+18 | 2.566 | 0.268 | 0.594 | 0.488 | 0.194 | 0.380 | 0.166 | 0.118 | 0.315 |
|
137 |
+
| Cerebras-GPT | 256M | 1.3E+19 | 2.299 | 0.274 | 0.613 | 0.511 | 0.293 | 0.410 | 0.170 | 0.158 | 0.347 |
|
138 |
+
| Cerebras-GPT | 590M | 6.1E+19 | 2.184 | 0.291 | 0.627 | 0.498 | 0.366 | 0.464 | 0.190 | 0.158 | 0.370 |
|
139 |
+
| Cerebras-GPT | 1.3B | 2.8E+20 | 1.996 | 0.325 | 0.664 | 0.521 | 0.462 | 0.508 | 0.224 | 0.166 | 0.410 |
|
140 |
+
| Cerebras-GPT | 2.7B | 1.1E+21 | 1.834 | 0.386 | 0.701 | 0.559 | 0.567 | 0.571 | 0.246 | 0.206 | 0.462 |
|
141 |
+
| Cerebras-GPT | 6.7B | 6.3E+21 | 1.704 | 0.447 | 0.739 | 0.602 | 0.636 | 0.643 | 0.282 | 0.238 | 0.512 |
|
142 |
+
| Cerebras-GPT | 13B | 2.3E+22 | 1.575 | 0.513 | 0.766 | 0.646 | 0.696 | 0.714 | 0.367 | 0.286 | 0.570 |
|
143 |
|
144 |
#### 5-shot Evaluation
|
145 |
| Model | Params | Hella-Swag | PIQA | Wino-Grande | Lambada | ARC-e | ARC-c | OpenBookQA |
|