Update README.md
Browse files
README.md
CHANGED
@@ -18,7 +18,7 @@ Training:
|
|
18 |
- LR scheduler: Cosine
|
19 |
- Warmup ratio: 0.05
|
20 |
- Batch size: 1
|
21 |
-
-
|
22 |
- Gradient accumulation steps: 32
|
23 |
-
- Effective batch size:
|
24 |
- Max. context length: 8192 tokens
|
|
|
18 |
- LR scheduler: Cosine
|
19 |
- Warmup ratio: 0.05
|
20 |
- Batch size: 1
|
21 |
+
- 8 A100 (80GB) GPUs
|
22 |
- Gradient accumulation steps: 32
|
23 |
+
- Effective batch size: 256
|
24 |
- Max. context length: 8192 tokens
|