Regularized-Preference-Optimization
Collection
The models trained in https://github.com/YSLIU627/Regularized-Preference-Optimization
•
4 items
•
Updated
This model is a fine-tuned version of HuggingFaceH4/mistral-7b-sft-beta on the updated and the original datasets.
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
Base model
mistralai/Mistral-7B-v0.1