leaderboard-pt-pr-bot commited on
Commit
aca6bd3
1 Parent(s): 9709ef7

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the Open Portuguese LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +167 -4
README.md CHANGED
@@ -1,4 +1,7 @@
1
  ---
 
 
 
2
  library_name: transformers
3
  tags:
4
  - Misral
@@ -6,13 +9,157 @@ tags:
6
  - 7b
7
  - chat
8
  - portugues
9
- license: apache-2.0
10
  base_model: mistralai/Mistral-7B-Instruct-v0.2
11
  datasets:
12
  - rhaymison/ultrachat-easy-use
13
- language:
14
- - pt
15
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  ---
17
 
18
  # Mistral-portuguese-luana-7b-chat
@@ -136,4 +283,20 @@ email: [email protected]
136
  </a>
137
  <a href="https://github.com/rhaymisonbetini" target="_blank">
138
  <img src="https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white">
139
- </a>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - pt
4
+ license: apache-2.0
5
  library_name: transformers
6
  tags:
7
  - Misral
 
9
  - 7b
10
  - chat
11
  - portugues
 
12
  base_model: mistralai/Mistral-7B-Instruct-v0.2
13
  datasets:
14
  - rhaymison/ultrachat-easy-use
 
 
15
  pipeline_tag: text-generation
16
+ model-index:
17
+ - name: Mistral-portuguese-luana-7b-chat
18
+ results:
19
+ - task:
20
+ type: text-generation
21
+ name: Text Generation
22
+ dataset:
23
+ name: ENEM Challenge (No Images)
24
+ type: eduagarcia/enem_challenge
25
+ split: train
26
+ args:
27
+ num_few_shot: 3
28
+ metrics:
29
+ - type: acc
30
+ value: 59.13
31
+ name: accuracy
32
+ source:
33
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-portuguese-luana-7b-chat
34
+ name: Open Portuguese LLM Leaderboard
35
+ - task:
36
+ type: text-generation
37
+ name: Text Generation
38
+ dataset:
39
+ name: BLUEX (No Images)
40
+ type: eduagarcia-temp/BLUEX_without_images
41
+ split: train
42
+ args:
43
+ num_few_shot: 3
44
+ metrics:
45
+ - type: acc
46
+ value: 49.24
47
+ name: accuracy
48
+ source:
49
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-portuguese-luana-7b-chat
50
+ name: Open Portuguese LLM Leaderboard
51
+ - task:
52
+ type: text-generation
53
+ name: Text Generation
54
+ dataset:
55
+ name: OAB Exams
56
+ type: eduagarcia/oab_exams
57
+ split: train
58
+ args:
59
+ num_few_shot: 3
60
+ metrics:
61
+ - type: acc
62
+ value: 36.58
63
+ name: accuracy
64
+ source:
65
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-portuguese-luana-7b-chat
66
+ name: Open Portuguese LLM Leaderboard
67
+ - task:
68
+ type: text-generation
69
+ name: Text Generation
70
+ dataset:
71
+ name: Assin2 RTE
72
+ type: assin2
73
+ split: test
74
+ args:
75
+ num_few_shot: 15
76
+ metrics:
77
+ - type: f1_macro
78
+ value: 90.47
79
+ name: f1-macro
80
+ source:
81
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-portuguese-luana-7b-chat
82
+ name: Open Portuguese LLM Leaderboard
83
+ - task:
84
+ type: text-generation
85
+ name: Text Generation
86
+ dataset:
87
+ name: Assin2 STS
88
+ type: eduagarcia/portuguese_benchmark
89
+ split: test
90
+ args:
91
+ num_few_shot: 15
92
+ metrics:
93
+ - type: pearson
94
+ value: 76.55
95
+ name: pearson
96
+ source:
97
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-portuguese-luana-7b-chat
98
+ name: Open Portuguese LLM Leaderboard
99
+ - task:
100
+ type: text-generation
101
+ name: Text Generation
102
+ dataset:
103
+ name: FaQuAD NLI
104
+ type: ruanchaves/faquad-nli
105
+ split: test
106
+ args:
107
+ num_few_shot: 15
108
+ metrics:
109
+ - type: f1_macro
110
+ value: 66.75
111
+ name: f1-macro
112
+ source:
113
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-portuguese-luana-7b-chat
114
+ name: Open Portuguese LLM Leaderboard
115
+ - task:
116
+ type: text-generation
117
+ name: Text Generation
118
+ dataset:
119
+ name: HateBR Binary
120
+ type: ruanchaves/hatebr
121
+ split: test
122
+ args:
123
+ num_few_shot: 25
124
+ metrics:
125
+ - type: f1_macro
126
+ value: 77.46
127
+ name: f1-macro
128
+ source:
129
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-portuguese-luana-7b-chat
130
+ name: Open Portuguese LLM Leaderboard
131
+ - task:
132
+ type: text-generation
133
+ name: Text Generation
134
+ dataset:
135
+ name: PT Hate Speech Binary
136
+ type: hate_speech_portuguese
137
+ split: test
138
+ args:
139
+ num_few_shot: 25
140
+ metrics:
141
+ - type: f1_macro
142
+ value: 69.45
143
+ name: f1-macro
144
+ source:
145
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-portuguese-luana-7b-chat
146
+ name: Open Portuguese LLM Leaderboard
147
+ - task:
148
+ type: text-generation
149
+ name: Text Generation
150
+ dataset:
151
+ name: tweetSentBR
152
+ type: eduagarcia-temp/tweetsentbr
153
+ split: test
154
+ args:
155
+ num_few_shot: 25
156
+ metrics:
157
+ - type: f1_macro
158
+ value: 59.63
159
+ name: f1-macro
160
+ source:
161
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-portuguese-luana-7b-chat
162
+ name: Open Portuguese LLM Leaderboard
163
  ---
164
 
165
  # Mistral-portuguese-luana-7b-chat
 
283
  </a>
284
  <a href="https://github.com/rhaymisonbetini" target="_blank">
285
  <img src="https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white">
286
+ </a>
287
+ # [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
288
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/rhaymison/Mistral-portuguese-luana-7b-chat)
289
+
290
+ | Metric | Value |
291
+ |--------------------------|---------|
292
+ |Average |**65.03**|
293
+ |ENEM Challenge (No Images)| 59.13|
294
+ |BLUEX (No Images) | 49.24|
295
+ |OAB Exams | 36.58|
296
+ |Assin2 RTE | 90.47|
297
+ |Assin2 STS | 76.55|
298
+ |FaQuAD NLI | 66.75|
299
+ |HateBR Binary | 77.46|
300
+ |PT Hate Speech Binary | 69.45|
301
+ |tweetSentBR | 59.63|
302
+