mistral-orpo-alpha / README.md
JW17's picture
Update README.md
c9c3f0c verified
|
raw
history blame
1.84 kB
metadata
language:
  - en
license: apache-2.0
base_model:
  - mistralai/Mistral-7B-v0.1
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
pipeline_tag: text-generation
model-index:
  - name: Mistral-ORPO-⍺
    results:
      - task:
          type: text-generation
        dataset:
          name: AlpacaEval 1
          type: AlpacaEval
        metrics:
          - type: AlpacaEval 1.0
            value: 87.92%
            name: Win Rate
          - type: AlpacaEval 2.0
            value: 11.33%
            name: Win Rate
        source:
          url: https://github.com/tatsu-lab/alpaca_eval
          name: self-reported
      - task:
          type: text-generation
        dataset:
          name: MT-Bench
          type: MT-Bench
        metrics:
          - type: MT-Bench
            value: 7.23
            name: Score
        source:
          url: https://github.com/lm-sys/FastChat/blob/main/fastchat/llm_judge/
          name: self-reported

Mistral-ORPO-⍺ (7B)

Mistral-ORPO is a fine-tuned version of mistralai/Mistral-7B-v0.1 using the odds ratio preference optimization (ORPO). With ORPO, the model directly learns the preference without the supervised fine-tuning warmup phase. Mistral-ORPO-⍺ is fine-tuned exclusively on HuggingFaceH4/ultrafeedback_binarized.

Model Performance

Model Name Size Align MT-Bench AlpacaEval 1.0 AlpacaEval 2.0
Mistral-ORPO-⍺ 7B ORPO 7.23 87.92 11.33
Mistral-ORPO 7B ORPO 7.32 91.41 12.20
Zephyr β 7B DPO 7.34 90.60 10.99
TULU-2-DPO 13B DPO 7.00 89.5 10.12
Llama-2-Chat 7B RLHF 6.27 71.37 4.96
Llama-2-Chat 13B RLHF 6.65 81.09 7.70

Chat Template

<|user|>
Hi! How are you doing?</s>
<|assistant|>