oopere
/

martra-phi-3-mini-dpo

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

martra-phi-3-mini-dpo / README.md

oopere's picture

Update README.md

c4a7648 verified 7 months ago

|

history blame contribute delete

2.09 kB

	---
	language:
	- en
	license: mit
	library_name: transformers
	tags:
	- dpo
	- phi-3
	datasets:
	- argilla/distilabel-capybara-dpo-7k-binarized
	pipeline_tag: text-generation
	widget:
	- text: "3713841893836/4? \nLimit your response to mathematical expressions and symbols."
	example_title: 'Return only numbers. '
	- text: A group of 10 people is split into 3 different committees of 3, 4, and 3 people,
	respectively. In how many ways can this be done?
	example_title: Solve Problem
	---

	You can see the process with instructions for creating the model in the notebook: [Aligning_DPO_phi3.ipynb](https://github.com/peremartra/Large-Language-Model-Notebooks-Course/blob/main/P2-MHF/Aligning_DPO_phi3.ipynb)

	To create it, we started with the [Phi-3-Mini-4K-Instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) model and applied DPO alignment using the [distilabel-capybara-dpo-7k-binarized dataset](https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized).

	Phi-3 is a state-of-the-art model with 3.8 billion parameters that has outperformed other models with 7 billion parameters. The DPO alignment process has produced good results, modifying the model's responses and making them more similar to those in the capybara dataset.

	Row in the Dataset:

	*{'prompt': '<\|user\|>\nAssist me in calculating 9319357631 plus 595. Numbers and symbols only, please.<\|end\|>\n<\|assistant\|>\n',
	'chosen': 'The sum of 9319357631 and 595 is 9319358226.<\|end\|>\n',
	'rejected': 'The result of adding 9319357631 and 595 is 9319363626.<\|end\|>\n'}*

	Prompt:
	*3713841893836/4?
	Limit your response to mathematical expressions and symbols.*

	Response from the Base model:

	*To find the result of the division, we can simply divide the given number by 4:
	$$
	\frac{3713841893836}{4} = 928460473459*

	Response from the fine-tuned model:

	3713841893836 ÷ 4 = 928460473459


	If you want to see how the model was created, you can check out the [repository](https://github.com/peremartra/Large-Language-Model-Notebooks-Course) where the book's notebooks are kept up-to-date.