leaderboard-pr-bot commited on
Commit
1cac3f7
1 Parent(s): 56dd2ca

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +111 -0
README.md CHANGED
@@ -1 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  This is a model released from the preprint: *[SimPO: Simple Preference Optimization with a Reference-Free Reward](https://arxiv.org/abs/2405.14734)* Please refer to our [repository](https://github.com/princeton-nlp/SimPO) for more details.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ model-index:
3
+ - name: Llama-3-Instruct-8B-DPO
4
+ results:
5
+ - task:
6
+ type: text-generation
7
+ name: Text Generation
8
+ dataset:
9
+ name: IFEval (0-Shot)
10
+ type: HuggingFaceH4/ifeval
11
+ args:
12
+ num_few_shot: 0
13
+ metrics:
14
+ - type: inst_level_strict_acc and prompt_level_strict_acc
15
+ value: 67.57
16
+ name: strict accuracy
17
+ source:
18
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=princeton-nlp/Llama-3-Instruct-8B-DPO
19
+ name: Open LLM Leaderboard
20
+ - task:
21
+ type: text-generation
22
+ name: Text Generation
23
+ dataset:
24
+ name: BBH (3-Shot)
25
+ type: BBH
26
+ args:
27
+ num_few_shot: 3
28
+ metrics:
29
+ - type: acc_norm
30
+ value: 28.51
31
+ name: normalized accuracy
32
+ source:
33
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=princeton-nlp/Llama-3-Instruct-8B-DPO
34
+ name: Open LLM Leaderboard
35
+ - task:
36
+ type: text-generation
37
+ name: Text Generation
38
+ dataset:
39
+ name: MATH Lvl 5 (4-Shot)
40
+ type: hendrycks/competition_math
41
+ args:
42
+ num_few_shot: 4
43
+ metrics:
44
+ - type: exact_match
45
+ value: 3.32
46
+ name: exact match
47
+ source:
48
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=princeton-nlp/Llama-3-Instruct-8B-DPO
49
+ name: Open LLM Leaderboard
50
+ - task:
51
+ type: text-generation
52
+ name: Text Generation
53
+ dataset:
54
+ name: GPQA (0-shot)
55
+ type: Idavidrein/gpqa
56
+ args:
57
+ num_few_shot: 0
58
+ metrics:
59
+ - type: acc_norm
60
+ value: 2.91
61
+ name: acc_norm
62
+ source:
63
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=princeton-nlp/Llama-3-Instruct-8B-DPO
64
+ name: Open LLM Leaderboard
65
+ - task:
66
+ type: text-generation
67
+ name: Text Generation
68
+ dataset:
69
+ name: MuSR (0-shot)
70
+ type: TAUR-Lab/MuSR
71
+ args:
72
+ num_few_shot: 0
73
+ metrics:
74
+ - type: acc_norm
75
+ value: 3.93
76
+ name: acc_norm
77
+ source:
78
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=princeton-nlp/Llama-3-Instruct-8B-DPO
79
+ name: Open LLM Leaderboard
80
+ - task:
81
+ type: text-generation
82
+ name: Text Generation
83
+ dataset:
84
+ name: MMLU-PRO (5-shot)
85
+ type: TIGER-Lab/MMLU-Pro
86
+ config: main
87
+ split: test
88
+ args:
89
+ num_few_shot: 5
90
+ metrics:
91
+ - type: acc
92
+ value: 29.61
93
+ name: accuracy
94
+ source:
95
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=princeton-nlp/Llama-3-Instruct-8B-DPO
96
+ name: Open LLM Leaderboard
97
+ ---
98
  This is a model released from the preprint: *[SimPO: Simple Preference Optimization with a Reference-Free Reward](https://arxiv.org/abs/2405.14734)* Please refer to our [repository](https://github.com/princeton-nlp/SimPO) for more details.
99
+
100
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
101
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_princeton-nlp__Llama-3-Instruct-8B-DPO)
102
+
103
+ | Metric |Value|
104
+ |-------------------|----:|
105
+ |Avg. |22.64|
106
+ |IFEval (0-Shot) |67.57|
107
+ |BBH (3-Shot) |28.51|
108
+ |MATH Lvl 5 (4-Shot)| 3.32|
109
+ |GPQA (0-shot) | 2.91|
110
+ |MuSR (0-shot) | 3.93|
111
+ |MMLU-PRO (5-shot) |29.61|
112
+