zhezi12138
/

llama-3b-iter-3-mixp

Model card Files Files and versions Community

zhezi12138 commited on Jan 16

Commit

30e987f

·

verified ·

1 Parent(s): 42b1cf6

Create README.md

Files changed (1) hide show

README.md +8 -0

README.md ADDED Viewed

	@@ -0,0 +1,8 @@

+---
+license: mit
+datasets:
+- RLHFlow/iterative-prompt-v1-iter1-20K
+language:
+- en
+---
+This model is for the reproduction of results on Iterative-Prompt dataset of paper "The crucial role of samplers in online direct preference optimization". Iteration 3 of DPO-mixp algorithm, trained on https://huggingface.co/zhezi12138/llama-3b-iter-2-mixp.