---
library_name: transformers
license: mit
datasets:
- mlabonne/orpo-dpo-mix-40k
base_model:
- meta-llama/Llama-3.2-1B
pipeline_tag: text-generation
---

# Orpo-Llama-3.2-1B-40k

AdamLucek/Orpo-Llama-3.2-1B-40k is an [ORPO](https://arxiv.org/abs/2403.07691) fine tuned version of [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) on 1 epoch of [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k).

Trained for 11 hours on an A100 GPU with [this training script](https://colab.research.google.com/drive/1kax8rsqtBhR7is6XNHgVkvtblz9Pmldi?usp=sharing)

For full model details, refer to the base model page [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B)

## Evaluations

In comparsion to [AdamLucek/Orpo-Llama-3.2-1B-15k](https://huggingface.co/AdamLucek/Orpo-Llama-3.2-1B-15k) using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness).

| Benchmark      | 15k Accuracy | 15k Normalized | 40k Accuracy | 40k Normalized | Notes                                     |
|----------------|--------------|----------------|--------------|----------------|-------------------------------------------|
| AGIEval        | 22.14%       | 21.01%         | 23.57%       | 23.26%         | 0-Shot Average across multiple reasoning tasks |
| GPT4ALL        | 51.15%       | 54.38%         | 51.63%       | 55.00%         | 0-Shot Average across all categories      |
| TruthfulQA     | 42.79%       | N/A            | 42.14%       | N/A            | MC2 accuracy                              |
| MMLU           | 31.22%       | N/A            | 31.01%       | N/A            | 5-Shot Average across all categories      |
| Winogrande     | 61.72%       | N/A            | 61.12%       | N/A            | 0-shot evaluation                         |
| ARC Challenge  | 32.94%       | 36.01%         | 33.36%       | 37.63%         | 0-shot evaluation                         |
| ARC Easy       | 64.52%       | 60.40%         | 65.91%       | 60.90%         | 0-shot evaluation                         |
| BoolQ          | 50.24%       | N/A            | 52.29%       | N/A            | 0-shot evaluation                         |
| PIQA           | 75.46%       | 74.37%         | 75.63%       | 75.19%         | 0-shot evaluation                         |
| HellaSwag      | 48.56%       | 64.71%         | 48.46%       | 64.50%         | 0-shot evaluation                         |

## Using this Model

```python
from transformers import AutoTokenizer
import transformers
import torch

# Load Model and Pipeline
model = "AdamLucek/Orpo-Llama-3.2-1B-40k"

pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

# Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model)

# Generate Message
messages = [{"role": "user", "content": "What is a language model?"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=1024, do_sample=True, temperature=0.3, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
```

## Training Statistics

<div style="display: grid; grid-template-columns: repeat(2, 1fr); gap: 5px; max-width: 1000px;">
  <div>
    <img src="https://cdn-uploads.huggingface.co/production/uploads/65ba68a15d2ef0a4b2c892b4/KbFvnZVerJFpDXTu-Ibzf.png" alt="Panel 1" style="width: 100%; height: auto;">
  </div>
  <div>
    <img src="https://cdn-uploads.huggingface.co/production/uploads/65ba68a15d2ef0a4b2c892b4/iBoUh911iPCnEoi0uGRlO.png" alt="Panel 2" style="width: 100%; height: auto;">
  </div>
  <div>
    <img src="https://cdn-uploads.huggingface.co/production/uploads/65ba68a15d2ef0a4b2c892b4/cZqnsh-6eINJr7YvoEFq-.png" alt="Panel 3" style="width: 100%; height: auto;">
  </div>
  <div>
    <img src="https://cdn-uploads.huggingface.co/production/uploads/65ba68a15d2ef0a4b2c892b4/FXWGnqGAuNZG_MkKpSFVZ.png" alt="Panel 4" style="width: 100%; height: auto;">
  </div>
</div>

## OpenLLM Leaderboard Metrics

|                           Tasks                           |Version|Filter|n-shot|        Metric         |   |Value |   |Stderr|
|-----------------------------------------------------------|-------|------|-----:|-----------------------|---|-----:|---|------|
|leaderboard                                                |    N/A|      |      |                       |   |      |   |      |
| - leaderboard_bbh                                         |    N/A|      |      |                       |   |0.3290|   |      |
|  - leaderboard_bbh_boolean_expressions                    |      1|none  |     3|acc_norm               |↑  |0.6840|±  |0.0295|
|  - leaderboard_bbh_causal_judgement                       |      1|none  |     3|acc_norm               |↑  |0.5134|±  |0.0366|
|  - leaderboard_bbh_date_understanding                     |      1|none  |     3|acc_norm               |↑  |0.1920|±  |0.0250|
|  - leaderboard_bbh_disambiguation_qa                      |      1|none  |     3|acc_norm               |↑  |0.3880|±  |0.0309|
|  - leaderboard_bbh_formal_fallacies                       |      1|none  |     3|acc_norm               |↑  |0.4680|±  |0.0316|
|  - leaderboard_bbh_geometric_shapes                       |      1|none  |     3|acc_norm               |↑  |0.0000|±  |     0|
|  - leaderboard_bbh_hyperbaton                             |      1|none  |     3|acc_norm               |↑  |0.4840|±  |0.0317|
|  - leaderboard_bbh_logical_deduction_five_objects         |      1|none  |     3|acc_norm               |↑  |0.2000|±  |0.0253|
|  - leaderboard_bbh_logical_deduction_seven_objects        |      1|none  |     3|acc_norm               |↑  |0.1360|±  |0.0217|
|  - leaderboard_bbh_logical_deduction_three_objects        |      1|none  |     3|acc_norm               |↑  |0.3440|±  |0.0301|
|  - leaderboard_bbh_movie_recommendation                   |      1|none  |     3|acc_norm               |↑  |0.2280|±  |0.0266|
|  - leaderboard_bbh_navigate                               |      1|none  |     3|acc_norm               |↑  |0.4200|±  |0.0313|
|  - leaderboard_bbh_object_counting                        |      1|none  |     3|acc_norm               |↑  |0.3880|±  |0.0309|
|  - leaderboard_bbh_penguins_in_a_table                    |      1|none  |     3|acc_norm               |↑  |0.1575|±  |0.0303|
|  - leaderboard_bbh_reasoning_about_colored_objects        |      1|none  |     3|acc_norm               |↑  |0.1280|±  |0.0212|
|  - leaderboard_bbh_ruin_names                             |      1|none  |     3|acc_norm               |↑  |0.2000|±  |0.0253|
|  - leaderboard_bbh_salient_translation_error_detection    |      1|none  |     3|acc_norm               |↑  |0.2280|±  |0.0266|
|  - leaderboard_bbh_snarks                                 |      1|none  |     3|acc_norm               |↑  |0.5393|±  |0.0375|
|  - leaderboard_bbh_sports_understanding                   |      1|none  |     3|acc_norm               |↑  |0.5240|±  |0.0316|
|  - leaderboard_bbh_temporal_sequences                     |      1|none  |     3|acc_norm               |↑  |0.2000|±  |0.0253|
|  - leaderboard_bbh_tracking_shuffled_objects_five_objects |      1|none  |     3|acc_norm               |↑  |0.1640|±  |0.0235|
|  - leaderboard_bbh_tracking_shuffled_objects_seven_objects|      1|none  |     3|acc_norm               |↑  |0.1400|±  |0.0220|
|  - leaderboard_bbh_tracking_shuffled_objects_three_objects|      1|none  |     3|acc_norm               |↑  |0.3520|±  |0.0303|
|  - leaderboard_bbh_web_of_lies                            |      1|none  |     3|acc_norm               |↑  |0.4880|±  |0.0317|
| - leaderboard_gpqa                                        |    N/A|      |      |                       |   |0.2482|   |      |
|  - leaderboard_gpqa_diamond                               |      1|none  |     0|acc_norm               |↑  |0.2576|±  |0.0312|
|  - leaderboard_gpqa_extended                              |      1|none  |     0|acc_norm               |↑  |0.2436|±  |0.0184|
|  - leaderboard_gpqa_main                                  |      1|none  |     0|acc_norm               |↑  |0.2433|±  |0.0203|
| - leaderboard_ifeval                                      |      3|none  |     0|inst_level_loose_acc   |↑  |0.2962|±  |   N/A|
|                                                           |       |none  |     0|inst_level_strict_acc  |↑  |0.2842|±  |   N/A|
|                                                           |       |none  |     0|prompt_level_loose_acc |↑  |0.1516|±  |0.0154|
|                                                           |       |none  |     0|prompt_level_strict_acc|↑  |0.1386|±  |0.0149|
| - leaderboard_math_hard                                   |    N/A|      |      |                       |   |      |   |      |
|  - leaderboard_math_algebra_hard                          |      2|none  |     4|exact_match            |↑  |0.0000|±  |     0|
|  - leaderboard_math_counting_and_prob_hard                |      2|none  |     4|exact_match            |↑  |0.0000|±  |     0|
|  - leaderboard_math_geometry_hard                         |      2|none  |     4|exact_match            |↑  |0.0000|±  |     0|
|  - leaderboard_math_intermediate_algebra_hard             |      2|none  |     4|exact_match            |↑  |0.0000|±  |     0|
|  - leaderboard_math_num_theory_hard                       |      2|none  |     4|exact_match            |↑  |0.0000|±  |     0|
|  - leaderboard_math_prealgebra_hard                       |      2|none  |     4|exact_match            |↑  |0.0000|±  |     0|
|  - leaderboard_math_precalculus_hard                      |      2|none  |     4|exact_match            |↑  |0.0000|±  |     0|
| - leaderboard_mmlu_pro                                    |    0.1|none  |     5|acc                    |↑  |0.1222|±  |0.0030|
| - leaderboard_musr                                        |    N/A|      |      |avg acc_norm           |   |0.3433|   |      |
|  - leaderboard_musr_murder_mysteries                      |      1|none  |     0|acc_norm               |↑  |0.5120|±  |0.0317|
|  - leaderboard_musr_object_placements                     |      1|none  |     0|acc_norm               |↑  |0.2500|±  |0.0271|
|  - leaderboard_musr_team_allocation                       |      1|none  |     0|acc_norm               |↑  |0.2680|±  |0.0281|