rokset3 commited on
Commit
d1128de
1 Parent(s): c6ca8d2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -24,6 +24,10 @@ The GOAT-70B-Storytelling model has been developed as an integral component with
24
  - **License:** llama2
25
  - **Context window length:** 4096 tokens
26
 
 
 
 
 
27
  ### Learn more
28
 
29
  - **Blog:** TBA
 
24
  - **License:** llama2
25
  - **Context window length:** 4096 tokens
26
 
27
+ ### Training details
28
+
29
+ For training, we apply the standard recipe with learning rate 1e-5, batch size per GPU 6, optimizer AdamW without weight decay and we train the model via ZeRO-3 on 64xH100 GPU cluster
30
+
31
  ### Learn more
32
 
33
  - **Blog:** TBA