forrest-gradient
commited on
Commit
•
32295b6
1
Parent(s):
e77d64c
Update README.md
Browse files
README.md
CHANGED
@@ -42,13 +42,12 @@ For training data, we generate long contexts by augmenting [SlimPajama](https://
|
|
42 |
| Initialize From | Llama-3-70B-Instruct | 65K | 262K | 524K |
|
43 |
| Sequence Length 2^N | 16 | 18 | 19 | 20 |
|
44 |
| RoPE theta | 15296098 | 207112184 | 1062356830 | 3580165449 |
|
45 |
-
| Batch Size |
|
46 |
| Gradient Accumulation Steps | 1 | 1 | 2 | 4 |
|
47 |
| Steps | 20 | 25 | 25 | 8 |
|
48 |
| Total Tokens | 83886080 | 104857600 | 209715200 | 33554432 |
|
49 |
| Learning rate | 2.00E-05 | 2.00E-05 | 2.00E-05 | 2.00E-05 |
|
50 |
| # GPUs | 512 | 512 | 512 | 128 |
|
51 |
-
| Ring parallelism | 64 | 16 | 8 | 1 |
|
52 |
| GPU Type | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S |
|
53 |
| Minutes to Train (Wall)| 100 | 170 | 284 | 516 |
|
54 |
|
|
|
42 |
| Initialize From | Llama-3-70B-Instruct | 65K | 262K | 524K |
|
43 |
| Sequence Length 2^N | 16 | 18 | 19 | 20 |
|
44 |
| RoPE theta | 15296098 | 207112184 | 1062356830 | 3580165449 |
|
45 |
+
| Batch Size | 64 | 16 | 8 | 1 |
|
46 |
| Gradient Accumulation Steps | 1 | 1 | 2 | 4 |
|
47 |
| Steps | 20 | 25 | 25 | 8 |
|
48 |
| Total Tokens | 83886080 | 104857600 | 209715200 | 33554432 |
|
49 |
| Learning rate | 2.00E-05 | 2.00E-05 | 2.00E-05 | 2.00E-05 |
|
50 |
| # GPUs | 512 | 512 | 512 | 128 |
|
|
|
51 |
| GPU Type | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S |
|
52 |
| Minutes to Train (Wall)| 100 | 170 | 284 | 516 |
|
53 |
|