Update README.md
Browse files
README.md
CHANGED
@@ -8,6 +8,9 @@ Hyperparameters
|
|
8 |
- 1e-4 -> 1e-5 with cosine lr decay
|
9 |
- batch size 128
|
10 |
- max sequence length 2048
|
|
|
|
|
|
|
11 |
|
12 |
```
|
13 |
# Load model directly
|
|
|
8 |
- 1e-4 -> 1e-5 with cosine lr decay
|
9 |
- batch size 128
|
10 |
- max sequence length 2048
|
11 |
+
- AdamW(weigth decay=0.01, b1=0.9, b2=0.99, grad_clip=1.0)
|
12 |
+
- no warmup
|
13 |
+
- BF16
|
14 |
|
15 |
```
|
16 |
# Load model directly
|