Description

Llama3-Instruct-8B model finetuned by off-polciy WPO. Details in WPO: Enhancing RLHF with Weighted Preference Optimization.

License

This model is licensed under the Zoom software license and is permitted for use only for noncommercial, educational, or academic research purposes.

Downloads last month
11
Safetensors
Model size
8.03B params
Tensor type
F32
·
Inference Providers NEW
Inference Providers available for this model are disabled. Settings

Collection including wzhouad/Llama3-Instruct-8B-WPO-FP