Adding the Open Portuguese LLM Leaderboard Evaluation Results

#30
Files changed (1) hide show
  1. README.md +168 -2
README.md CHANGED
@@ -1,8 +1,155 @@
1
  ---
2
  license: other
 
3
  license_name: tongyi-qianwen-license-agreement
4
  license_link: https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20LICENSE%20AGREEMENT
5
- base_model: moreh/MoMo-72B-lora-1.8.7-DPO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  ---
7
 
8
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f6b02e1f8f67c73bd05/pf4d6FA7DriRtVq5HCkxd.png)
@@ -194,4 +341,23 @@ Please cite the paper if you use data, model, or method in this repo.
194
  journal={arXiv preprint arXiv:2402.13228},
195
  year={2024}
196
  }
197
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: other
3
+ base_model: moreh/MoMo-72B-lora-1.8.7-DPO
4
  license_name: tongyi-qianwen-license-agreement
5
  license_link: https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20LICENSE%20AGREEMENT
6
+ model-index:
7
+ - name: Smaug-72B-v0.1
8
+ results:
9
+ - task:
10
+ type: text-generation
11
+ name: Text Generation
12
+ dataset:
13
+ name: ENEM Challenge (No Images)
14
+ type: eduagarcia/enem_challenge
15
+ split: train
16
+ args:
17
+ num_few_shot: 3
18
+ metrics:
19
+ - type: acc
20
+ value: 77.82
21
+ name: accuracy
22
+ source:
23
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-72B-v0.1
24
+ name: Open Portuguese LLM Leaderboard
25
+ - task:
26
+ type: text-generation
27
+ name: Text Generation
28
+ dataset:
29
+ name: BLUEX (No Images)
30
+ type: eduagarcia-temp/BLUEX_without_images
31
+ split: train
32
+ args:
33
+ num_few_shot: 3
34
+ metrics:
35
+ - type: acc
36
+ value: 68.29
37
+ name: accuracy
38
+ source:
39
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-72B-v0.1
40
+ name: Open Portuguese LLM Leaderboard
41
+ - task:
42
+ type: text-generation
43
+ name: Text Generation
44
+ dataset:
45
+ name: OAB Exams
46
+ type: eduagarcia/oab_exams
47
+ split: train
48
+ args:
49
+ num_few_shot: 3
50
+ metrics:
51
+ - type: acc
52
+ value: 58.77
53
+ name: accuracy
54
+ source:
55
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-72B-v0.1
56
+ name: Open Portuguese LLM Leaderboard
57
+ - task:
58
+ type: text-generation
59
+ name: Text Generation
60
+ dataset:
61
+ name: Assin2 RTE
62
+ type: assin2
63
+ split: test
64
+ args:
65
+ num_few_shot: 15
66
+ metrics:
67
+ - type: f1_macro
68
+ value: 92.96
69
+ name: f1-macro
70
+ source:
71
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-72B-v0.1
72
+ name: Open Portuguese LLM Leaderboard
73
+ - task:
74
+ type: text-generation
75
+ name: Text Generation
76
+ dataset:
77
+ name: Assin2 STS
78
+ type: eduagarcia/portuguese_benchmark
79
+ split: test
80
+ args:
81
+ num_few_shot: 15
82
+ metrics:
83
+ - type: pearson
84
+ value: 81.79
85
+ name: pearson
86
+ source:
87
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-72B-v0.1
88
+ name: Open Portuguese LLM Leaderboard
89
+ - task:
90
+ type: text-generation
91
+ name: Text Generation
92
+ dataset:
93
+ name: FaQuAD NLI
94
+ type: ruanchaves/faquad-nli
95
+ split: test
96
+ args:
97
+ num_few_shot: 15
98
+ metrics:
99
+ - type: f1_macro
100
+ value: 77.47
101
+ name: f1-macro
102
+ source:
103
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-72B-v0.1
104
+ name: Open Portuguese LLM Leaderboard
105
+ - task:
106
+ type: text-generation
107
+ name: Text Generation
108
+ dataset:
109
+ name: HateBR Binary
110
+ type: ruanchaves/hatebr
111
+ split: test
112
+ args:
113
+ num_few_shot: 25
114
+ metrics:
115
+ - type: f1_macro
116
+ value: 85.65
117
+ name: f1-macro
118
+ source:
119
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-72B-v0.1
120
+ name: Open Portuguese LLM Leaderboard
121
+ - task:
122
+ type: text-generation
123
+ name: Text Generation
124
+ dataset:
125
+ name: PT Hate Speech Binary
126
+ type: hate_speech_portuguese
127
+ split: test
128
+ args:
129
+ num_few_shot: 25
130
+ metrics:
131
+ - type: f1_macro
132
+ value: 65.31
133
+ name: f1-macro
134
+ source:
135
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-72B-v0.1
136
+ name: Open Portuguese LLM Leaderboard
137
+ - task:
138
+ type: text-generation
139
+ name: Text Generation
140
+ dataset:
141
+ name: tweetSentBR
142
+ type: eduagarcia/tweetsentbr_fewshot
143
+ split: test
144
+ args:
145
+ num_few_shot: 25
146
+ metrics:
147
+ - type: f1_macro
148
+ value: 72.38
149
+ name: f1-macro
150
+ source:
151
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-72B-v0.1
152
+ name: Open Portuguese LLM Leaderboard
153
  ---
154
 
155
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f6b02e1f8f67c73bd05/pf4d6FA7DriRtVq5HCkxd.png)
 
341
  journal={arXiv preprint arXiv:2402.13228},
342
  year={2024}
343
  }
344
+ ```
345
+
346
+
347
+ # Open Portuguese LLM Leaderboard Evaluation Results
348
+
349
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/abacusai/Smaug-72B-v0.1) and on the [πŸš€ Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
350
+
351
+ | Metric | Value |
352
+ |--------------------------|--------|
353
+ |Average |**75.6**|
354
+ |ENEM Challenge (No Images)| 77.82|
355
+ |BLUEX (No Images) | 68.29|
356
+ |OAB Exams | 58.77|
357
+ |Assin2 RTE | 92.96|
358
+ |Assin2 STS | 81.79|
359
+ |FaQuAD NLI | 77.47|
360
+ |HateBR Binary | 85.65|
361
+ |PT Hate Speech Binary | 65.31|
362
+ |tweetSentBR | 72.38|
363
+