leaderboard-pt-pr-bot commited on
Commit
012756c
•
1 Parent(s): fd18943

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard) to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +169 -3
README.md CHANGED
@@ -1,7 +1,5 @@
1
  ---
2
  license: other
3
- license_name: yi-license
4
- license_link: https://huggingface.co/01-ai/Yi-34B-200K/blob/main/LICENSE
5
  base_model: 01-ai/yi-34b-200k
6
  datasets:
7
  - ai2_arc
@@ -45,6 +43,155 @@ datasets:
45
  - WhiteRabbitNeo/WRN-Chapter-1
46
  - WhiteRabbitNeo/WRN-Chapter-2
47
  - winogrande
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
  ---
49
 
50
  # A bagel, with everything
@@ -803,4 +950,23 @@ For assistance with the VM join the [Massed Compute Discord Server](https://disc
803
 
804
  - https://bmc.link/jondurbin
805
  - ETH 0xce914eAFC2fe52FdceE59565Dd92c06f776fcb11
806
- - BTC bc1qdwuth4vlg8x37ggntlxu5cjfwgmdy5zaa7pswf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: other
 
 
3
  base_model: 01-ai/yi-34b-200k
4
  datasets:
5
  - ai2_arc
 
43
  - WhiteRabbitNeo/WRN-Chapter-1
44
  - WhiteRabbitNeo/WRN-Chapter-2
45
  - winogrande
46
+ license_name: yi-license
47
+ license_link: https://huggingface.co/01-ai/Yi-34B-200K/blob/main/LICENSE
48
+ model-index:
49
+ - name: bagel-dpo-34b-v0.5
50
+ results:
51
+ - task:
52
+ type: text-generation
53
+ name: Text Generation
54
+ dataset:
55
+ name: ENEM Challenge (No Images)
56
+ type: eduagarcia/enem_challenge
57
+ split: train
58
+ args:
59
+ num_few_shot: 3
60
+ metrics:
61
+ - type: acc
62
+ value: 71.66
63
+ name: accuracy
64
+ source:
65
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=jondurbin/bagel-dpo-34b-v0.5
66
+ name: Open Portuguese LLM Leaderboard
67
+ - task:
68
+ type: text-generation
69
+ name: Text Generation
70
+ dataset:
71
+ name: BLUEX (No Images)
72
+ type: eduagarcia-temp/BLUEX_without_images
73
+ split: train
74
+ args:
75
+ num_few_shot: 3
76
+ metrics:
77
+ - type: acc
78
+ value: 66.76
79
+ name: accuracy
80
+ source:
81
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=jondurbin/bagel-dpo-34b-v0.5
82
+ name: Open Portuguese LLM Leaderboard
83
+ - task:
84
+ type: text-generation
85
+ name: Text Generation
86
+ dataset:
87
+ name: OAB Exams
88
+ type: eduagarcia/oab_exams
89
+ split: train
90
+ args:
91
+ num_few_shot: 3
92
+ metrics:
93
+ - type: acc
94
+ value: 54.17
95
+ name: accuracy
96
+ source:
97
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=jondurbin/bagel-dpo-34b-v0.5
98
+ name: Open Portuguese LLM Leaderboard
99
+ - task:
100
+ type: text-generation
101
+ name: Text Generation
102
+ dataset:
103
+ name: Assin2 RTE
104
+ type: assin2
105
+ split: test
106
+ args:
107
+ num_few_shot: 15
108
+ metrics:
109
+ - type: f1_macro
110
+ value: 91.16
111
+ name: f1-macro
112
+ source:
113
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=jondurbin/bagel-dpo-34b-v0.5
114
+ name: Open Portuguese LLM Leaderboard
115
+ - task:
116
+ type: text-generation
117
+ name: Text Generation
118
+ dataset:
119
+ name: Assin2 STS
120
+ type: eduagarcia/portuguese_benchmark
121
+ split: test
122
+ args:
123
+ num_few_shot: 15
124
+ metrics:
125
+ - type: pearson
126
+ value: 77.46
127
+ name: pearson
128
+ source:
129
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=jondurbin/bagel-dpo-34b-v0.5
130
+ name: Open Portuguese LLM Leaderboard
131
+ - task:
132
+ type: text-generation
133
+ name: Text Generation
134
+ dataset:
135
+ name: FaQuAD NLI
136
+ type: ruanchaves/faquad-nli
137
+ split: test
138
+ args:
139
+ num_few_shot: 15
140
+ metrics:
141
+ - type: f1_macro
142
+ value: 65.39
143
+ name: f1-macro
144
+ source:
145
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=jondurbin/bagel-dpo-34b-v0.5
146
+ name: Open Portuguese LLM Leaderboard
147
+ - task:
148
+ type: text-generation
149
+ name: Text Generation
150
+ dataset:
151
+ name: HateBR Binary
152
+ type: ruanchaves/hatebr
153
+ split: test
154
+ args:
155
+ num_few_shot: 25
156
+ metrics:
157
+ - type: f1_macro
158
+ value: 87.27
159
+ name: f1-macro
160
+ source:
161
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=jondurbin/bagel-dpo-34b-v0.5
162
+ name: Open Portuguese LLM Leaderboard
163
+ - task:
164
+ type: text-generation
165
+ name: Text Generation
166
+ dataset:
167
+ name: PT Hate Speech Binary
168
+ type: hate_speech_portuguese
169
+ split: test
170
+ args:
171
+ num_few_shot: 25
172
+ metrics:
173
+ - type: f1_macro
174
+ value: 69.82
175
+ name: f1-macro
176
+ source:
177
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=jondurbin/bagel-dpo-34b-v0.5
178
+ name: Open Portuguese LLM Leaderboard
179
+ - task:
180
+ type: text-generation
181
+ name: Text Generation
182
+ dataset:
183
+ name: tweetSentBR
184
+ type: eduagarcia/tweetsentbr_fewshot
185
+ split: test
186
+ args:
187
+ num_few_shot: 25
188
+ metrics:
189
+ - type: f1_macro
190
+ value: 72.85
191
+ name: f1-macro
192
+ source:
193
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=jondurbin/bagel-dpo-34b-v0.5
194
+ name: Open Portuguese LLM Leaderboard
195
  ---
196
 
197
  # A bagel, with everything
 
950
 
951
  - https://bmc.link/jondurbin
952
  - ETH 0xce914eAFC2fe52FdceE59565Dd92c06f776fcb11
953
+ - BTC bc1qdwuth4vlg8x37ggntlxu5cjfwgmdy5zaa7pswf
954
+
955
+
956
+ # Open Portuguese LLM Leaderboard Evaluation Results
957
+
958
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/jondurbin/bagel-dpo-34b-v0.5) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
959
+
960
+ | Metric | Value |
961
+ |--------------------------|---------|
962
+ |Average |**72.95**|
963
+ |ENEM Challenge (No Images)| 71.66|
964
+ |BLUEX (No Images) | 66.76|
965
+ |OAB Exams | 54.17|
966
+ |Assin2 RTE | 91.16|
967
+ |Assin2 STS | 77.46|
968
+ |FaQuAD NLI | 65.39|
969
+ |HateBR Binary | 87.27|
970
+ |PT Hate Speech Binary | 69.82|
971
+ |tweetSentBR | 72.85|
972
+