Update README.md
Browse files
README.md
CHANGED
@@ -32,7 +32,7 @@ We use the HelpSteer2 preference binarized into chosen-rejected pairs using the
|
|
32 |
|
33 |
## Recipes
|
34 |
|
35 |
-
|
36 |
|
37 |
**SFT**
|
38 |
|
|
|
32 |
|
33 |
## Recipes
|
34 |
|
35 |
+
For finetuning, we used 4 nodes (8 x AMD MI250X) to obtain a global batch size of 128 for SFT and 64 for DPO. We used the [Alignment Handbook](https://github.com/huggingface/alignment-handbook/) codebase.
|
36 |
|
37 |
**SFT**
|
38 |
|