File size: 3,278 Bytes
180ab05 0ed5882 180ab05 5a150fc 180ab05 5a150fc 180ab05 5a150fc 180ab05 5a150fc 180ab05 e53699e 5a150fc 180ab05 5a150fc 180ab05 5a150fc 180ab05 5a150fc 180ab05 5a150fc 180ab05 5a150fc 180ab05 5a150fc 180ab05 5a150fc 180ab05 5a150fc 180ab05 5a150fc 180ab05 5a150fc 180ab05 5a150fc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
---
language:
- en
license: other
library_name: transformers
tags:
- orpo
- llama 3
datasets:
- mlabonne/orpo-dpo-mix-40k
---
# OrpoLlama-3-8B

This is a quick fine-tune of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on 1k samples of [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k) created for [this article](https://huggingface.co/blog/mlabonne/orpo-llama-3).
It's not very good at the moment (it's the sassiest model ever), but I'm currently training a version on the entire dataset.
**Try the demo**: https://huggingface.co/spaces/mlabonne/OrpoLlama-3-8B
## π Evaluation
### Nous
Evaluation performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval), see the entire leaderboard [here](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard).
| Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------: | --------: | --------: | ---------: | --------: |
| [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) [π](https://gist.github.com/mlabonne/88b21dd9698ffed75d6163ebdc2f6cc8) | 52.42 | 42.75 | 72.99 | 52.99 | 40.94 |
| [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) [π](https://gist.github.com/mlabonne/8329284d86035e6019edb11eb0933628) | 51.34 | 41.22 | 69.86 | 51.65 | 42.64 |
| [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) [π](https://gist.github.com/mlabonne/7a0446c3d30dfce72834ef780491c4b2) | 49.15 | 33.36 | 67.87 | 55.89 | 39.48 |
| [**mlabonne/OrpoLlama-3-8B**](https://huggingface.co/mlabonne/OrpoLlama-3-8B) [π](https://gist.github.com/mlabonne/f41dad371d1781d0434a4672fd6f0b82) | **46.76** | **31.56** | **70.19** | **48.11** | **37.17** |
| [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) [π](https://gist.github.com/mlabonne/616b6245137a9cfc4ea80e4c6e55d847) | 45.42 | 31.1 | 69.95 | 43.91 | 36.7 |
## π Training curves

## π» Usage
```python
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "mlabonne/OrpoLlama-3-8B"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
``` |