RTO-RL
/

Llama3.2-1B-RewardModel

Model card Files Files and versions Community

Llama3.2-1B-RewardModel / README.md

zkshan2002's picture

Update README.md

4fada23 verified about 1 month ago

|

history blame contribute delete

433 Bytes

	---
	datasets:
	- HuggingFaceH4/ultrafeedback_binarized
	base_model:
	- unsloth/Llama-3.2-1B-Instruct
	---
	Base model: [unsloth/Llama-3.2-1B-Instruct](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct)

	Tokenizer: [OpenRLHF/Llama-3-8b-sft-mixture](https://huggingface.co/OpenRLHF/Llama-3-8b-sft-mixture)

	Preference dataset: [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized)