|
--- |
|
language: |
|
- en |
|
library_name: transformers |
|
tags: |
|
- orpo |
|
- Mistral |
|
- Mistral-7B-v0.3 |
|
- sft |
|
datasets: |
|
- mlabonne/orpo-dpo-mix-40k |
|
--- |
|
|
|
# Model description |
|
This model is an ORPO fine-tuned version of the [mistralai/Mistral-7B-v0.3](https://huggingface.co/mistralai/Mistral-7B-v0.3) on 2.5k subsamples of the [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k) dataset. Thanks to [Maxime Labonne](https://huggingface.co/mlabonne) for providing this [amazing guide](https://huggingface.co/blog/mlabonne/orpo-llama-3) on Odds Ratio Policy Optimization (ORPO). ORPO combines the traditional supervised fine-tuning and preference alignment stages into a single process. |
|
|
|
This model follows the ChatML chat template! |
|
|
|
|
|
## How to use |
|
```` |
|
import torch |
|
from transformers import AutoTokenizer, pipeline |
|
|
|
model_id = "MuntasirHossain/Orpo-Mistral-7B-v0.3" |
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
|
llm = pipeline( |
|
"text-generation", |
|
model=model_id, |
|
torch_dtype=torch.float16, |
|
device_map="auto", |
|
) |
|
|
|
def generate(input_text): |
|
messages = [{"role": "user", "content": input_text}] |
|
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
outputs = llm(prompt, max_new_tokens=512,) |
|
return outputs[0]["generated_text"][len(prompt):] |
|
|
|
generate("Explain quantum tunneling in simple terms.") |
|
```` |