Junrulu
/

Reproduced-tulu2-dpo-13b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Junrulu commited on Mar 12, 2024

Commit

0b401b0

·

verified ·

1 Parent(s): 4c889f0

Update README.md

Files changed (1) hide show

README.md +5 -3

README.md CHANGED Viewed

@@ -11,9 +11,7 @@ base_model: allenai/tulu-2-13b
 # Model Card for Reproduced Tulu2 DPO 13B
-- This repository provides a reproduction version of Tulu2-DPO-13B finetuned upon [Tulu2-13B](https://huggingface.co/allenai/tulu-2-13b) and [Ultrafeedback](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized).
-- Therefore, we obey all licenses mentioned in Tulu2's work.
-- Check our codes for more details: https://github.com/LuJunru/LLM_Finetune/tree/DPO. The codes are built with [TRL](https://github.com/huggingface/trl/tree/main).
 ## Performance
@@ -44,3 +42,7 @@ The following hyperparameters were used during DPO training:
 - lr_scheduler_warmup_ratio: 0.1
 - Weight Decay: 0.05
 - num_epochs: 3.0

 # Model Card for Reproduced Tulu2 DPO 13B
+This repository provides a reproduction version of Tulu2-DPO-13B finetuned upon [Tulu2-13B](https://huggingface.co/allenai/tulu-2-13b) and [Ultrafeedback](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized). Therefore, we obey all licenses mentioned in Tulu2's work. Check our codes for more details: https://github.com/LuJunru/LLM_Finetune/tree/DPO, which is built with [TRL](https://github.com/huggingface/trl/tree/main).
 ## Performance
 - lr_scheduler_warmup_ratio: 0.1
 - Weight Decay: 0.05
 - num_epochs: 3.0
+## Progressive metrics
+We present