gradientai
/

Llama-3-70B-Instruct-Gradient-1048k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

forrest-gradient commited on May 4, 2024

Commit

32295b6

·

verified ·

1 Parent(s): e77d64c

Update README.md

Files changed (1) hide show

README.md +1 -2

README.md CHANGED Viewed

@@ -42,13 +42,12 @@ For training data, we generate long contexts by augmenting [SlimPajama](https://
 | Initialize From        | Llama-3-70B-Instruct | 65K         | 262K        | 524K        |
 | Sequence Length 2^N    | 16          | 18          | 19          | 20          |
 | RoPE theta             | 15296098    | 207112184   | 1062356830  | 3580165449  |
-| Batch Size             | 1           | 1           | 1           | 1           |
 | Gradient Accumulation Steps | 1           | 1           | 2           | 4           |
 | Steps                  | 20          | 25          | 25          | 8           |
 | Total Tokens           | 83886080    | 104857600   | 209715200   | 33554432    |
 | Learning rate          | 2.00E-05    | 2.00E-05    | 2.00E-05    | 2.00E-05    |
 | # GPUs                 | 512         | 512         | 512         | 128         |
-| Ring parallelism       | 64          | 16          | 8           | 1           |
 | GPU Type               | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S |
 | Minutes to Train (Wall)| 100         | 170         | 284         | 516         |

 | Initialize From        | Llama-3-70B-Instruct | 65K         | 262K        | 524K        |
 | Sequence Length 2^N    | 16          | 18          | 19          | 20          |
 | RoPE theta             | 15296098    | 207112184   | 1062356830  | 3580165449  |
+| Batch Size             | 64           | 16           | 8           | 1           |
 | Gradient Accumulation Steps | 1           | 1           | 2           | 4           |
 | Steps                  | 20          | 25          | 25          | 8           |
 | Total Tokens           | 83886080    | 104857600   | 209715200   | 33554432    |
 | Learning rate          | 2.00E-05    | 2.00E-05    | 2.00E-05    | 2.00E-05    |
 | # GPUs                 | 512         | 512         | 512         | 128         |
 | GPU Type               | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S |
 | Minutes to Train (Wall)| 100         | 170         | 284         | 516         |