Update README.md
Browse files
README.md
CHANGED
@@ -14,7 +14,17 @@ library_name: transformers
|
|
14 |
|
15 |
This model is based on our pretrained [5CD-AI/visobert-14gb-corpus](https://huggingface.co/5CD-AI/visobert-14gb-corpus), which has been continuously trained on a 14GB dataset of Vietnamese social content.
|
16 |
|
17 |
-
Our model is fine-tuned on <b>120K</b> Vietnamese sentiment datasets, including comments and reviews from e-commerce platforms, social media, and forums
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
|
19 |
The model will give softmax outputs for three labels.
|
20 |
|
@@ -27,7 +37,7 @@ The model will give softmax outputs for three labels.
|
|
27 |
```
|
28 |
|
29 |
## Dataset
|
30 |
-
|
31 |
<table border="2">
|
32 |
<tr align="center">
|
33 |
<th rowspan="2">Dataset</th>
|
@@ -83,7 +93,7 @@ The model will give softmax outputs for three labels.
|
|
83 |
<td>-</td>
|
84 |
</tr>
|
85 |
<tr align="center">
|
86 |
-
<td align="left">UIT-VSMEC</td>
|
87 |
<td>3219</td>
|
88 |
<td>1665</td>
|
89 |
<td>594</td>
|
@@ -108,7 +118,7 @@ The model will give softmax outputs for three labels.
|
|
108 |
<td>-</td>
|
109 |
</tr>
|
110 |
<tr align="center">
|
111 |
-
<td align="left">UIT-ViCTSD</td>
|
112 |
<td>3370</td>
|
113 |
<td>2615</td>
|
114 |
<td>933</td>
|
@@ -156,7 +166,7 @@ The model will give softmax outputs for three labels.
|
|
156 |
<td>-</td>
|
157 |
</tr>
|
158 |
<tr align="center">
|
159 |
-
<td align="left">
|
160 |
<td>20093</td>
|
161 |
<td>6669</td>
|
162 |
<td>4698</td>
|
@@ -168,16 +178,16 @@ The model will give softmax outputs for three labels.
|
|
168 |
<td>-</td>
|
169 |
</tr>
|
170 |
<tr align="center">
|
171 |
-
<td align="left">VOZ-HSD</td>
|
172 |
<td>2676</td>
|
173 |
<td>1213</td>
|
174 |
<td>1071</td>
|
175 |
-
<td
|
176 |
-
<td
|
177 |
-
<td
|
178 |
-
<td
|
179 |
-
<td
|
180 |
-
<td
|
181 |
</tr>
|
182 |
<tr align="center">
|
183 |
<td align="left">Vietnamese-amazon-polarity</td>
|
@@ -200,8 +210,8 @@ The model will give softmax outputs for three labels.
|
|
200 |
<td colspan=4><b>SA-VLSP2016</td>
|
201 |
<td colspan=4><b>AIVIVN-2019</td>
|
202 |
<td colspan=4><b>UIT-VSFC</td>
|
203 |
-
<td colspan=4><b>UIT-VSMEC</td>
|
204 |
-
<td colspan=4><b>UIT-ViCTSD</td>
|
205 |
</tr>
|
206 |
<tr align="center">
|
207 |
<td><b>Acc</td>
|
@@ -281,7 +291,6 @@ The model will give softmax outputs for three labels.
|
|
281 |
<td rowspan=2><b>Model</td>
|
282 |
<td colspan=4><b>UIT-ViOCD</td>
|
283 |
<td colspan=4><b>UIT-ViSFD</td>
|
284 |
-
<td colspan=4><b>VOZ-HSD</td>
|
285 |
<td colspan=4><b>Vi-amazon-polar</td>
|
286 |
</tr>
|
287 |
<tr align="center">
|
@@ -297,14 +306,11 @@ The model will give softmax outputs for three labels.
|
|
297 |
<td><b>Prec</td>
|
298 |
<td><b>Recall</td>
|
299 |
<td><b>WF1</td>
|
300 |
-
<td><b>Acc</td>
|
301 |
-
<td><b>Prec</td>
|
302 |
-
<td><b>Recall</td>
|
303 |
-
<td><b>WF1</td>
|
304 |
</tr>
|
305 |
<tr align="center">
|
306 |
<tr align="center">
|
307 |
<td align="left">wonrax/phobert-base-vietnamese-sentiment</td>
|
|
|
308 |
<td>87.14</td>
|
309 |
<td>74.68</td>
|
310 |
<td>78.13</td>
|
@@ -312,10 +318,6 @@ The model will give softmax outputs for three labels.
|
|
312 |
<td>67.95</td>
|
313 |
<td>67.90</td>
|
314 |
<td>66.98</td>
|
315 |
-
<td>51.89</td>
|
316 |
-
<td>60.18</td>
|
317 |
-
<td>51.89</td>
|
318 |
-
<td>53.61</td>
|
319 |
<td>61.40</td>
|
320 |
<td>76.53</td>
|
321 |
<td>61.40</td>
|
@@ -331,10 +333,6 @@ The model will give softmax outputs for three labels.
|
|
331 |
<td><b>93.20</td>
|
332 |
<td><b>93.26</td>
|
333 |
<td><b>93.21</td>
|
334 |
-
<td><b>67.78</td>
|
335 |
-
<td><b>69.82</td>
|
336 |
-
<td><b>67.78</td>
|
337 |
-
<td><b>68.39</td>
|
338 |
<td><b>89.90</td>
|
339 |
<td><b>90.13</td>
|
340 |
<td><b>89.90</td>
|
|
|
14 |
|
15 |
This model is based on our pretrained [5CD-AI/visobert-14gb-corpus](https://huggingface.co/5CD-AI/visobert-14gb-corpus), which has been continuously trained on a 14GB dataset of Vietnamese social content.
|
16 |
|
17 |
+
Our model is fine-tuned on <b>120K</b> Vietnamese sentiment datasets, including comments and reviews from e-commerce platforms, social media, and forums
|
18 |
+
|
19 |
+
Our model get over performace in datasets:
|
20 |
+
- SA-VLSP2016
|
21 |
+
- AIVIVN-2019
|
22 |
+
- UIT-VSFC
|
23 |
+
- UIT-VSMEC
|
24 |
+
- UIT-ViCTSD
|
25 |
+
- UIT-ViOCD
|
26 |
+
- UIT-ViSFD
|
27 |
+
- Vi-amazon-polar
|
28 |
|
29 |
The model will give softmax outputs for three labels.
|
30 |
|
|
|
37 |
```
|
38 |
|
39 |
## Dataset
|
40 |
+
Our training dataset. With UIT-VSMEC, UIT-ViCTSD, VOZ-HSD, we re-label the dataset with Gemini 1.5 Flash API follow the 3 labels.
|
41 |
<table border="2">
|
42 |
<tr align="center">
|
43 |
<th rowspan="2">Dataset</th>
|
|
|
93 |
<td>-</td>
|
94 |
</tr>
|
95 |
<tr align="center">
|
96 |
+
<td align="left">UIT-VSMEC (Gemini-label)</td>
|
97 |
<td>3219</td>
|
98 |
<td>1665</td>
|
99 |
<td>594</td>
|
|
|
118 |
<td>-</td>
|
119 |
</tr>
|
120 |
<tr align="center">
|
121 |
+
<td align="left">UIT-ViCTSD (Gemini-label)</td>
|
122 |
<td>3370</td>
|
123 |
<td>2615</td>
|
124 |
<td>933</td>
|
|
|
166 |
<td>-</td>
|
167 |
</tr>
|
168 |
<tr align="center">
|
169 |
+
<td align="left">Tiki-reviews</td>
|
170 |
<td>20093</td>
|
171 |
<td>6669</td>
|
172 |
<td>4698</td>
|
|
|
178 |
<td>-</td>
|
179 |
</tr>
|
180 |
<tr align="center">
|
181 |
+
<td align="left">VOZ-HSD (Gemini-label)</td>
|
182 |
<td>2676</td>
|
183 |
<td>1213</td>
|
184 |
<td>1071</td>
|
185 |
+
<td>-</td>
|
186 |
+
<td>-</td>
|
187 |
+
<td>-</td>
|
188 |
+
<td>-</td>
|
189 |
+
<td>-</td>
|
190 |
+
<td>-</td>
|
191 |
</tr>
|
192 |
<tr align="center">
|
193 |
<td align="left">Vietnamese-amazon-polarity</td>
|
|
|
210 |
<td colspan=4><b>SA-VLSP2016</td>
|
211 |
<td colspan=4><b>AIVIVN-2019</td>
|
212 |
<td colspan=4><b>UIT-VSFC</td>
|
213 |
+
<td colspan=4><b>UIT-VSMEC (Gemini-label)</td>
|
214 |
+
<td colspan=4><b>UIT-ViCTSD (Gemini-label)</td>
|
215 |
</tr>
|
216 |
<tr align="center">
|
217 |
<td><b>Acc</td>
|
|
|
291 |
<td rowspan=2><b>Model</td>
|
292 |
<td colspan=4><b>UIT-ViOCD</td>
|
293 |
<td colspan=4><b>UIT-ViSFD</td>
|
|
|
294 |
<td colspan=4><b>Vi-amazon-polar</td>
|
295 |
</tr>
|
296 |
<tr align="center">
|
|
|
306 |
<td><b>Prec</td>
|
307 |
<td><b>Recall</td>
|
308 |
<td><b>WF1</td>
|
|
|
|
|
|
|
|
|
309 |
</tr>
|
310 |
<tr align="center">
|
311 |
<tr align="center">
|
312 |
<td align="left">wonrax/phobert-base-vietnamese-sentiment</td>
|
313 |
+
<td>74.68</td>
|
314 |
<td>87.14</td>
|
315 |
<td>74.68</td>
|
316 |
<td>78.13</td>
|
|
|
318 |
<td>67.95</td>
|
319 |
<td>67.90</td>
|
320 |
<td>66.98</td>
|
|
|
|
|
|
|
|
|
321 |
<td>61.40</td>
|
322 |
<td>76.53</td>
|
323 |
<td>61.40</td>
|
|
|
333 |
<td><b>93.20</td>
|
334 |
<td><b>93.26</td>
|
335 |
<td><b>93.21</td>
|
|
|
|
|
|
|
|
|
336 |
<td><b>89.90</td>
|
337 |
<td><b>90.13</td>
|
338 |
<td><b>89.90</td>
|