Adding the Open Portuguese LLM Leaderboard Evaluation Results

#3
Files changed (1) hide show
  1. README.md +170 -4
README.md CHANGED
@@ -1,13 +1,160 @@
1
  ---
 
 
 
2
  license: apache-2.0
3
  datasets:
4
  - Azure99/blossom-chat-v3
5
  - Azure99/blossom-math-v4
6
  - Azure99/blossom-wizard-v3
7
  - Azure99/blossom-orca-v3
8
- language:
9
- - zh
10
- - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
  # **BLOSSOM-v5.1-34b**
13
 
@@ -41,4 +188,23 @@ A chat between a human and an artificial intelligence bot. The bot gives helpful
41
  |Bot|:
42
  ```
43
 
44
- Note: At the end of the Bot's output in the historical conversation, append a `<|endoftext|>`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - zh
4
+ - en
5
  license: apache-2.0
6
  datasets:
7
  - Azure99/blossom-chat-v3
8
  - Azure99/blossom-math-v4
9
  - Azure99/blossom-wizard-v3
10
  - Azure99/blossom-orca-v3
11
+ model-index:
12
+ - name: blossom-v5.1-34b
13
+ results:
14
+ - task:
15
+ type: text-generation
16
+ name: Text Generation
17
+ dataset:
18
+ name: ENEM Challenge (No Images)
19
+ type: eduagarcia/enem_challenge
20
+ split: train
21
+ args:
22
+ num_few_shot: 3
23
+ metrics:
24
+ - type: acc
25
+ value: 72.64
26
+ name: accuracy
27
+ source:
28
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Azure99/blossom-v5.1-34b
29
+ name: Open Portuguese LLM Leaderboard
30
+ - task:
31
+ type: text-generation
32
+ name: Text Generation
33
+ dataset:
34
+ name: BLUEX (No Images)
35
+ type: eduagarcia-temp/BLUEX_without_images
36
+ split: train
37
+ args:
38
+ num_few_shot: 3
39
+ metrics:
40
+ - type: acc
41
+ value: 67.18
42
+ name: accuracy
43
+ source:
44
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Azure99/blossom-v5.1-34b
45
+ name: Open Portuguese LLM Leaderboard
46
+ - task:
47
+ type: text-generation
48
+ name: Text Generation
49
+ dataset:
50
+ name: OAB Exams
51
+ type: eduagarcia/oab_exams
52
+ split: train
53
+ args:
54
+ num_few_shot: 3
55
+ metrics:
56
+ - type: acc
57
+ value: 54.44
58
+ name: accuracy
59
+ source:
60
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Azure99/blossom-v5.1-34b
61
+ name: Open Portuguese LLM Leaderboard
62
+ - task:
63
+ type: text-generation
64
+ name: Text Generation
65
+ dataset:
66
+ name: Assin2 RTE
67
+ type: assin2
68
+ split: test
69
+ args:
70
+ num_few_shot: 15
71
+ metrics:
72
+ - type: f1_macro
73
+ value: 90.88
74
+ name: f1-macro
75
+ source:
76
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Azure99/blossom-v5.1-34b
77
+ name: Open Portuguese LLM Leaderboard
78
+ - task:
79
+ type: text-generation
80
+ name: Text Generation
81
+ dataset:
82
+ name: Assin2 STS
83
+ type: eduagarcia/portuguese_benchmark
84
+ split: test
85
+ args:
86
+ num_few_shot: 15
87
+ metrics:
88
+ - type: pearson
89
+ value: 82.94
90
+ name: pearson
91
+ source:
92
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Azure99/blossom-v5.1-34b
93
+ name: Open Portuguese LLM Leaderboard
94
+ - task:
95
+ type: text-generation
96
+ name: Text Generation
97
+ dataset:
98
+ name: FaQuAD NLI
99
+ type: ruanchaves/faquad-nli
100
+ split: test
101
+ args:
102
+ num_few_shot: 15
103
+ metrics:
104
+ - type: f1_macro
105
+ value: 81.88
106
+ name: f1-macro
107
+ source:
108
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Azure99/blossom-v5.1-34b
109
+ name: Open Portuguese LLM Leaderboard
110
+ - task:
111
+ type: text-generation
112
+ name: Text Generation
113
+ dataset:
114
+ name: HateBR Binary
115
+ type: ruanchaves/hatebr
116
+ split: test
117
+ args:
118
+ num_few_shot: 25
119
+ metrics:
120
+ - type: f1_macro
121
+ value: 85.2
122
+ name: f1-macro
123
+ source:
124
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Azure99/blossom-v5.1-34b
125
+ name: Open Portuguese LLM Leaderboard
126
+ - task:
127
+ type: text-generation
128
+ name: Text Generation
129
+ dataset:
130
+ name: PT Hate Speech Binary
131
+ type: hate_speech_portuguese
132
+ split: test
133
+ args:
134
+ num_few_shot: 25
135
+ metrics:
136
+ - type: f1_macro
137
+ value: 72.09
138
+ name: f1-macro
139
+ source:
140
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Azure99/blossom-v5.1-34b
141
+ name: Open Portuguese LLM Leaderboard
142
+ - task:
143
+ type: text-generation
144
+ name: Text Generation
145
+ dataset:
146
+ name: tweetSentBR
147
+ type: eduagarcia/tweetsentbr_fewshot
148
+ split: test
149
+ args:
150
+ num_few_shot: 25
151
+ metrics:
152
+ - type: f1_macro
153
+ value: 72.13
154
+ name: f1-macro
155
+ source:
156
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Azure99/blossom-v5.1-34b
157
+ name: Open Portuguese LLM Leaderboard
158
  ---
159
  # **BLOSSOM-v5.1-34b**
160
 
 
188
  |Bot|:
189
  ```
190
 
191
+ Note: At the end of the Bot's output in the historical conversation, append a `<|endoftext|>`.
192
+
193
+
194
+ # Open Portuguese LLM Leaderboard Evaluation Results
195
+
196
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/Azure99/blossom-v5.1-34b) and on the [πŸš€ Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
197
+
198
+ | Metric | Value |
199
+ |--------------------------|---------|
200
+ |Average |**75.49**|
201
+ |ENEM Challenge (No Images)| 72.64|
202
+ |BLUEX (No Images) | 67.18|
203
+ |OAB Exams | 54.44|
204
+ |Assin2 RTE | 90.88|
205
+ |Assin2 STS | 82.94|
206
+ |FaQuAD NLI | 81.88|
207
+ |HateBR Binary | 85.20|
208
+ |PT Hate Speech Binary | 72.09|
209
+ |tweetSentBR | 72.13|
210
+