This model is fine-tuned from the HuggingFaceH4/mistral-7b-sft-beta model using the SelectiveDPO algorithm on the Ultrafeedback_binarized dataset.

For the recipe to reproduce this model, please visit our GitHub page.

Safetensors

Model size

7.11B params

Tensor type

F32

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

Model tree for glorgao/SelectiveDPO-Mistral-7B-SFT-UFBinarized

Base model

Finetuned

Finetuned

(156)

this model

Dataset used to train glorgao/SelectiveDPO-Mistral-7B-SFT-UFBinarized