You can see the process with instructions for creating the model in the notebook: Aligning_DPO_phi3.ipynb

To create it, we started with the Phi-3-Mini-4K-Instruct model and applied DPO alignment using the distilabel-capybara-dpo-7k-binarized dataset.

Phi-3 is a state-of-the-art model with 3.8 billion parameters that has outperformed other models with 7 billion parameters. The DPO alignment process has produced good results, modifying the model's responses and making them more similar to those in the capybara dataset.

Row in the Dataset:

{'prompt': '<|user|>\nAssist me in calculating 9319357631 plus 595. Numbers and symbols only, please.<|end|>\n<|assistant|>\n', 'chosen': 'The sum of 9319357631 and 595 is 9319358226.<|end|>\n', 'rejected': 'The result of adding 9319357631 and 595 is 9319363626.<|end|>\n'}

Prompt: 3713841893836/4? Limit your response to mathematical expressions and symbols.

Response from the Base model:

To find the result of the division, we can simply divide the given number by 4: $$ \frac{3713841893836}{4} = 928460473459

Response from the fine-tuned model:

3713841893836 ÷ 4 = 928460473459

If you want to see how the model was created, you can check out the repository where the book's notebooks are kept up-to-date.

Downloads last month
12
Safetensors
Model size
3.82B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train oopere/martra-phi-3-mini-dpo