Update README.md
Browse files
README.md
CHANGED
@@ -11,9 +11,7 @@ base_model: allenai/tulu-2-13b
|
|
11 |
|
12 |
# Model Card for Reproduced Tulu2 DPO 13B
|
13 |
|
14 |
-
|
15 |
-
- Therefore, we obey all licenses mentioned in Tulu2's work.
|
16 |
-
- Check our codes for more details: https://github.com/LuJunru/LLM_Finetune/tree/DPO. The codes are built with [TRL](https://github.com/huggingface/trl/tree/main).
|
17 |
|
18 |
## Performance
|
19 |
|
@@ -44,3 +42,7 @@ The following hyperparameters were used during DPO training:
|
|
44 |
- lr_scheduler_warmup_ratio: 0.1
|
45 |
- Weight Decay: 0.05
|
46 |
- num_epochs: 3.0
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
# Model Card for Reproduced Tulu2 DPO 13B
|
13 |
|
14 |
+
This repository provides a reproduction version of Tulu2-DPO-13B finetuned upon [Tulu2-13B](https://huggingface.co/allenai/tulu-2-13b) and [Ultrafeedback](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized). Therefore, we obey all licenses mentioned in Tulu2's work. Check our codes for more details: https://github.com/LuJunru/LLM_Finetune/tree/DPO, which is built with [TRL](https://github.com/huggingface/trl/tree/main).
|
|
|
|
|
15 |
|
16 |
## Performance
|
17 |
|
|
|
42 |
- lr_scheduler_warmup_ratio: 0.1
|
43 |
- Weight Decay: 0.05
|
44 |
- num_epochs: 3.0
|
45 |
+
|
46 |
+
## Progressive metrics
|
47 |
+
|
48 |
+
We present
|