Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -65,7 +65,7 @@ The model was trained using a dataset specifically created to detoxify LLMs. DPO
 ### Training Procedure
-The model was trained using efficient fine-tuning techniques with the following hyperparameters:
 | **Hyperparameter** | **Value** |
 |--------------------|-----------|
@@ -76,7 +76,6 @@ The model was trained using efficient fine-tuning techniques with the following
 | Max prompt length  | 1,024     |
 | Beta               | 0.1       |
-*Hyperparameters when applying DPO to LLaMA-2
 ## Objective

 ### Training Procedure
+DPO was applied on "SungJoo/llama2-7b-sft-detox" with the following hyperparameters:
 | **Hyperparameter** | **Value** |
 |--------------------|-----------|
 | Max prompt length  | 1,024     |
 | Beta               | 0.1       |
 ## Objective