|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- Intel/orca_dpo_pairs |
|
--- |
|
# Model Summary |
|
Neuralphi-2 is an experiment in DPO finetuning. It was made following Max Labonne's excellent [article](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac) about fine-tuning mistral-7b. |
|
Neuralphi-2 is [phi-2-sft](https://huggingface.co/lxuechen/phi-2-sft) finetuned using DPO with [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs). |
|
# Prompt Format |
|
``` |
|
"""### Human: {instruction} |
|
|
|
### Assistant:""" |
|
``` |
|
|
|
|