Update README.md
Browse files
README.md
CHANGED
@@ -65,7 +65,7 @@ The model was trained using a dataset specifically created to detoxify LLMs. DPO
|
|
65 |
|
66 |
### Training Procedure
|
67 |
|
68 |
-
|
69 |
|
70 |
| **Hyperparameter** | **Value** |
|
71 |
|--------------------|-----------|
|
@@ -76,7 +76,6 @@ The model was trained using efficient fine-tuning techniques with the following
|
|
76 |
| Max prompt length | 1,024 |
|
77 |
| Beta | 0.1 |
|
78 |
|
79 |
-
*Hyperparameters when applying DPO to LLaMA-2
|
80 |
|
81 |
|
82 |
## Objective
|
|
|
65 |
|
66 |
### Training Procedure
|
67 |
|
68 |
+
DPO was applied on "SungJoo/llama2-7b-sft-detox" with the following hyperparameters:
|
69 |
|
70 |
| **Hyperparameter** | **Value** |
|
71 |
|--------------------|-----------|
|
|
|
76 |
| Max prompt length | 1,024 |
|
77 |
| Beta | 0.1 |
|
78 |
|
|
|
79 |
|
80 |
|
81 |
## Objective
|