agentlans commited on
Commit
7bec0a6
·
verified ·
1 Parent(s): 0d38d69

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (3f059dd45a16f6f0d1066a4ec90c1e8541559054)

Files changed (1) hide show
  1. README.md +121 -8
README.md CHANGED
@@ -1,10 +1,109 @@
1
- ---
2
- license: llama3.1
3
- datasets:
4
- - agentlans/crash-course
5
- base_model:
6
- - agentlans/Llama3.1-SuperDeepFuse
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  # Llama3.1-SuperDeepFuse-CrashCourse12K
9
 
10
  Llama3.1-SuperDeepFuse-CrashCourse12K is an 8B parameter language model based on [Llama3.1-SuperDeepFuse](https://huggingface.co/agentlans/Llama3.1-SuperDeepFuse)
@@ -55,4 +154,18 @@ However:
55
  ## Additional Information
56
 
57
  - For the original model, see [agentlans/Llama3.1-SuperDeepFuse](https://huggingface.co/agentlans/Llama3.1-SuperDeepFuse)
58
- - For the base Llama 3.1 model, including training data and model architecture, refer to the original [Llama 3.1](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) model card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.1
3
+ datasets:
4
+ - agentlans/crash-course
5
+ base_model:
6
+ - agentlans/Llama3.1-SuperDeepFuse
7
+ model-index:
8
+ - name: Llama3.1-SuperDeepFuse-CrashCourse12K
9
+ results:
10
+ - task:
11
+ type: text-generation
12
+ name: Text Generation
13
+ dataset:
14
+ name: IFEval (0-Shot)
15
+ type: wis-k/instruction-following-eval
16
+ split: train
17
+ args:
18
+ num_few_shot: 0
19
+ metrics:
20
+ - type: inst_level_strict_acc and prompt_level_strict_acc
21
+ value: 71.87
22
+ name: averaged accuracy
23
+ source:
24
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-SuperDeepFuse-CrashCourse12K
25
+ name: Open LLM Leaderboard
26
+ - task:
27
+ type: text-generation
28
+ name: Text Generation
29
+ dataset:
30
+ name: BBH (3-Shot)
31
+ type: SaylorTwift/bbh
32
+ split: test
33
+ args:
34
+ num_few_shot: 3
35
+ metrics:
36
+ - type: acc_norm
37
+ value: 31.83
38
+ name: normalized accuracy
39
+ source:
40
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-SuperDeepFuse-CrashCourse12K
41
+ name: Open LLM Leaderboard
42
+ - task:
43
+ type: text-generation
44
+ name: Text Generation
45
+ dataset:
46
+ name: MATH Lvl 5 (4-Shot)
47
+ type: lighteval/MATH-Hard
48
+ split: test
49
+ args:
50
+ num_few_shot: 4
51
+ metrics:
52
+ - type: exact_match
53
+ value: 17.67
54
+ name: exact match
55
+ source:
56
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-SuperDeepFuse-CrashCourse12K
57
+ name: Open LLM Leaderboard
58
+ - task:
59
+ type: text-generation
60
+ name: Text Generation
61
+ dataset:
62
+ name: GPQA (0-shot)
63
+ type: Idavidrein/gpqa
64
+ split: train
65
+ args:
66
+ num_few_shot: 0
67
+ metrics:
68
+ - type: acc_norm
69
+ value: 8.39
70
+ name: acc_norm
71
+ source:
72
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-SuperDeepFuse-CrashCourse12K
73
+ name: Open LLM Leaderboard
74
+ - task:
75
+ type: text-generation
76
+ name: Text Generation
77
+ dataset:
78
+ name: MuSR (0-shot)
79
+ type: TAUR-Lab/MuSR
80
+ args:
81
+ num_few_shot: 0
82
+ metrics:
83
+ - type: acc_norm
84
+ value: 8.6
85
+ name: acc_norm
86
+ source:
87
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-SuperDeepFuse-CrashCourse12K
88
+ name: Open LLM Leaderboard
89
+ - task:
90
+ type: text-generation
91
+ name: Text Generation
92
+ dataset:
93
+ name: MMLU-PRO (5-shot)
94
+ type: TIGER-Lab/MMLU-Pro
95
+ config: main
96
+ split: test
97
+ args:
98
+ num_few_shot: 5
99
+ metrics:
100
+ - type: acc
101
+ value: 29.24
102
+ name: accuracy
103
+ source:
104
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-SuperDeepFuse-CrashCourse12K
105
+ name: Open LLM Leaderboard
106
+ ---
107
  # Llama3.1-SuperDeepFuse-CrashCourse12K
108
 
109
  Llama3.1-SuperDeepFuse-CrashCourse12K is an 8B parameter language model based on [Llama3.1-SuperDeepFuse](https://huggingface.co/agentlans/Llama3.1-SuperDeepFuse)
 
154
  ## Additional Information
155
 
156
  - For the original model, see [agentlans/Llama3.1-SuperDeepFuse](https://huggingface.co/agentlans/Llama3.1-SuperDeepFuse)
157
+ - For the base Llama 3.1 model, including training data and model architecture, refer to the original [Llama 3.1](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) model card.
158
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
159
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/agentlans__Llama3.1-SuperDeepFuse-CrashCourse12K-details)!
160
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=agentlans%2FLlama3.1-SuperDeepFuse-CrashCourse12K&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
161
+
162
+ | Metric |Value (%)|
163
+ |-------------------|--------:|
164
+ |**Average** | 27.93|
165
+ |IFEval (0-Shot) | 71.87|
166
+ |BBH (3-Shot) | 31.83|
167
+ |MATH Lvl 5 (4-Shot)| 17.67|
168
+ |GPQA (0-shot) | 8.39|
169
+ |MuSR (0-shot) | 8.60|
170
+ |MMLU-PRO (5-shot) | 29.24|
171
+