Adding the Open Portuguese LLM Leaderboard Evaluation Results

#8
Files changed (1) hide show
  1. README.md +170 -4
README.md CHANGED
@@ -1,7 +1,5 @@
1
  ---
2
  license: other
3
- license_name: yi-license
4
- license_link: https://huggingface.co/01-ai/Yi-34B-200K/blob/main/LICENSE
5
  datasets:
6
  - ai2_arc
7
  - unalignment/spicy-3.1
@@ -27,11 +25,160 @@ datasets:
27
  - Intel/orca_dpo_pairs
28
  - unalignment/toxic-dpo-v0.1
29
  - jondurbin/truthy-dpo-v0.1
30
- - allenai/ultrafeedback_binarized_cleaned
31
  - Squish42/bluemoon-fandom-1-1-rp-cleaned
32
  - LDJnr/Capybara
33
  - JULIELab/EmoBank
34
  - kingbri/PIPPA-shareGPT
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  ---
36
 
37
  # A bagel, with everything
@@ -227,4 +374,23 @@ To help me with the OpenAI/compute costs:
227
 
228
  - https://bmc.link/jondurbin
229
  - ETH 0xce914eAFC2fe52FdceE59565Dd92c06f776fcb11
230
- - BTC bc1qdwuth4vlg8x37ggntlxu5cjfwgmdy5zaa7pswf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: other
 
 
3
  datasets:
4
  - ai2_arc
5
  - unalignment/spicy-3.1
 
25
  - Intel/orca_dpo_pairs
26
  - unalignment/toxic-dpo-v0.1
27
  - jondurbin/truthy-dpo-v0.1
28
+ - allenai/ultrafeedback_binarized_cleaned
29
  - Squish42/bluemoon-fandom-1-1-rp-cleaned
30
  - LDJnr/Capybara
31
  - JULIELab/EmoBank
32
  - kingbri/PIPPA-shareGPT
33
+ license_name: yi-license
34
+ license_link: https://huggingface.co/01-ai/Yi-34B-200K/blob/main/LICENSE
35
+ model-index:
36
+ - name: bagel-dpo-34b-v0.2
37
+ results:
38
+ - task:
39
+ type: text-generation
40
+ name: Text Generation
41
+ dataset:
42
+ name: ENEM Challenge (No Images)
43
+ type: eduagarcia/enem_challenge
44
+ split: train
45
+ args:
46
+ num_few_shot: 3
47
+ metrics:
48
+ - type: acc
49
+ value: 71.94
50
+ name: accuracy
51
+ source:
52
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=jondurbin/bagel-dpo-34b-v0.2
53
+ name: Open Portuguese LLM Leaderboard
54
+ - task:
55
+ type: text-generation
56
+ name: Text Generation
57
+ dataset:
58
+ name: BLUEX (No Images)
59
+ type: eduagarcia-temp/BLUEX_without_images
60
+ split: train
61
+ args:
62
+ num_few_shot: 3
63
+ metrics:
64
+ - type: acc
65
+ value: 66.9
66
+ name: accuracy
67
+ source:
68
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=jondurbin/bagel-dpo-34b-v0.2
69
+ name: Open Portuguese LLM Leaderboard
70
+ - task:
71
+ type: text-generation
72
+ name: Text Generation
73
+ dataset:
74
+ name: OAB Exams
75
+ type: eduagarcia/oab_exams
76
+ split: train
77
+ args:
78
+ num_few_shot: 3
79
+ metrics:
80
+ - type: acc
81
+ value: 53.03
82
+ name: accuracy
83
+ source:
84
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=jondurbin/bagel-dpo-34b-v0.2
85
+ name: Open Portuguese LLM Leaderboard
86
+ - task:
87
+ type: text-generation
88
+ name: Text Generation
89
+ dataset:
90
+ name: Assin2 RTE
91
+ type: assin2
92
+ split: test
93
+ args:
94
+ num_few_shot: 15
95
+ metrics:
96
+ - type: f1_macro
97
+ value: 91.53
98
+ name: f1-macro
99
+ source:
100
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=jondurbin/bagel-dpo-34b-v0.2
101
+ name: Open Portuguese LLM Leaderboard
102
+ - task:
103
+ type: text-generation
104
+ name: Text Generation
105
+ dataset:
106
+ name: Assin2 STS
107
+ type: eduagarcia/portuguese_benchmark
108
+ split: test
109
+ args:
110
+ num_few_shot: 15
111
+ metrics:
112
+ - type: pearson
113
+ value: 78.93
114
+ name: pearson
115
+ source:
116
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=jondurbin/bagel-dpo-34b-v0.2
117
+ name: Open Portuguese LLM Leaderboard
118
+ - task:
119
+ type: text-generation
120
+ name: Text Generation
121
+ dataset:
122
+ name: FaQuAD NLI
123
+ type: ruanchaves/faquad-nli
124
+ split: test
125
+ args:
126
+ num_few_shot: 15
127
+ metrics:
128
+ - type: f1_macro
129
+ value: 83.86
130
+ name: f1-macro
131
+ source:
132
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=jondurbin/bagel-dpo-34b-v0.2
133
+ name: Open Portuguese LLM Leaderboard
134
+ - task:
135
+ type: text-generation
136
+ name: Text Generation
137
+ dataset:
138
+ name: HateBR Binary
139
+ type: ruanchaves/hatebr
140
+ split: test
141
+ args:
142
+ num_few_shot: 25
143
+ metrics:
144
+ - type: f1_macro
145
+ value: 72.79
146
+ name: f1-macro
147
+ source:
148
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=jondurbin/bagel-dpo-34b-v0.2
149
+ name: Open Portuguese LLM Leaderboard
150
+ - task:
151
+ type: text-generation
152
+ name: Text Generation
153
+ dataset:
154
+ name: PT Hate Speech Binary
155
+ type: hate_speech_portuguese
156
+ split: test
157
+ args:
158
+ num_few_shot: 25
159
+ metrics:
160
+ - type: f1_macro
161
+ value: 72.41
162
+ name: f1-macro
163
+ source:
164
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=jondurbin/bagel-dpo-34b-v0.2
165
+ name: Open Portuguese LLM Leaderboard
166
+ - task:
167
+ type: text-generation
168
+ name: Text Generation
169
+ dataset:
170
+ name: tweetSentBR
171
+ type: eduagarcia/tweetsentbr_fewshot
172
+ split: test
173
+ args:
174
+ num_few_shot: 25
175
+ metrics:
176
+ - type: f1_macro
177
+ value: 71.99
178
+ name: f1-macro
179
+ source:
180
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=jondurbin/bagel-dpo-34b-v0.2
181
+ name: Open Portuguese LLM Leaderboard
182
  ---
183
 
184
  # A bagel, with everything
 
374
 
375
  - https://bmc.link/jondurbin
376
  - ETH 0xce914eAFC2fe52FdceE59565Dd92c06f776fcb11
377
+ - BTC bc1qdwuth4vlg8x37ggntlxu5cjfwgmdy5zaa7pswf
378
+
379
+
380
+ # Open Portuguese LLM Leaderboard Evaluation Results
381
+
382
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/jondurbin/bagel-dpo-34b-v0.2) and on the [πŸš€ Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
383
+
384
+ | Metric | Value |
385
+ |--------------------------|---------|
386
+ |Average |**73.71**|
387
+ |ENEM Challenge (No Images)| 71.94|
388
+ |BLUEX (No Images) | 66.90|
389
+ |OAB Exams | 53.03|
390
+ |Assin2 RTE | 91.53|
391
+ |Assin2 STS | 78.93|
392
+ |FaQuAD NLI | 83.86|
393
+ |HateBR Binary | 72.79|
394
+ |PT Hate Speech Binary | 72.41|
395
+ |tweetSentBR | 71.99|
396
+