zhezi12138
/

llama-3b-iter-3-mixp

Model card Files Files and versions Community

llama-3b-iter-3-mixp / README.md

zhezi12138's picture

Create README.md

30e987f verified about 1 month ago

|

history blame contribute delete

343 Bytes

	---
	license: mit
	datasets:
	- RLHFlow/iterative-prompt-v1-iter1-20K
	language:
	- en
	---
	This model is for the reproduction of results on Iterative-Prompt dataset of paper "The crucial role of samplers in online direct preference optimization". Iteration 3 of DPO-mixp algorithm, trained on https://huggingface.co/zhezi12138/llama-3b-iter-2-mixp.