Junrulu
/

Reproduced-tulu2-dpo-13b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Junrulu commited on Mar 29

Commit

2f57a21

•

1 Parent(s): b20d809

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -36,6 +36,7 @@ For best results, format all inputs in this manner. **Make sure to include a new
 ## Training hyperparameters
 The following hyperparameters were used during DPO training:
 - learning_rate: 1e-6 * sqrt(Num of Nodes)
 - total_train_batch_size: 128 * Num of Nodes
 - optimizer: AdamW with beta1 0.9, beta2 0.999 and epsilon 1e-8

 ## Training hyperparameters
 The following hyperparameters were used during DPO training:
+- DPO beta: 0.1
 - learning_rate: 1e-6 * sqrt(Num of Nodes)
 - total_train_batch_size: 128 * Num of Nodes
 - optimizer: AdamW with beta1 0.9, beta2 0.999 and epsilon 1e-8