Updated README with note about model update and new performance numbers
Browse files
README.md
CHANGED
@@ -12,6 +12,9 @@ tags:
|
|
12 |
|
13 |
This is an [ELECTRA large discriminator](https://huggingface.co/google/electra-large-discriminator) fine-tuned for sentiment analysis of reviews. It has a mean pooling layer and a classifier head (2 layers of 1024 dimension) with SwishGLU activation and dropout (0.3). It classifies text into three sentiment categories: 'negative' (0), 'neutral' (1), and 'positive' (2). It was fine-tuned on the [Sentiment Merged](https://huggingface.co/datasets/jbeno/sentiment_merged) dataset, which is a merge of Stanford Sentiment Treebank (SST-3), and DynaSent Rounds 1 and 2.
|
14 |
|
|
|
|
|
|
|
15 |
|
16 |
## Labels
|
17 |
|
@@ -86,17 +89,17 @@ The research paper can be found here: [ELECTRA and GPT-4o: Cost-Effective Partne
|
|
86 |
### Performance Summary
|
87 |
|
88 |
- **Merged Dataset**
|
89 |
-
- Macro Average F1: **82.36
|
90 |
-
- Accuracy: **82.96
|
91 |
- **DynaSent R1**
|
92 |
-
- Macro Average F1: **85.91
|
93 |
-
- Accuracy: **85.83
|
94 |
- **DynaSent R2**
|
95 |
-
- Macro Average F1: **76.29
|
96 |
-
- Accuracy: **76.53
|
97 |
- **SST-3**
|
98 |
-
- Macro Average F1: **70.90
|
99 |
-
- Accuracy: **80.36
|
100 |
|
101 |
## Model Architecture
|
102 |
|
@@ -254,7 +257,113 @@ The model's configuration (config.json) includes custom parameters:
|
|
254 |
- `dropout_rate`: Dropout rate used in the classifier.
|
255 |
- `pooling`: Pooling strategy used ('mean').
|
256 |
|
257 |
-
## Performance by Dataset
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
258 |
|
259 |
### Merged Dataset
|
260 |
|
|
|
12 |
|
13 |
This is an [ELECTRA large discriminator](https://huggingface.co/google/electra-large-discriminator) fine-tuned for sentiment analysis of reviews. It has a mean pooling layer and a classifier head (2 layers of 1024 dimension) with SwishGLU activation and dropout (0.3). It classifies text into three sentiment categories: 'negative' (0), 'neutral' (1), and 'positive' (2). It was fine-tuned on the [Sentiment Merged](https://huggingface.co/datasets/jbeno/sentiment_merged) dataset, which is a merge of Stanford Sentiment Treebank (SST-3), and DynaSent Rounds 1 and 2.
|
14 |
|
15 |
+
## Updates
|
16 |
+
|
17 |
+
- **2025-Mar-25**: Uploaded a better performing model fine-tuned with a different random seed (123 vs. 42) and from an earlier training checkpoint (epoch 10 vs. 13).
|
18 |
|
19 |
## Labels
|
20 |
|
|
|
89 |
### Performance Summary
|
90 |
|
91 |
- **Merged Dataset**
|
92 |
+
- Macro Average F1: **83.16** (was 82.36)
|
93 |
+
- Accuracy: **83.71** (was 82.96)
|
94 |
- **DynaSent R1**
|
95 |
+
- Macro Average F1: **86.53** (was 85.91)
|
96 |
+
- Accuracy: **86.44** (was 85.83)
|
97 |
- **DynaSent R2**
|
98 |
+
- Macro Average F1: **78.36** (was 76.29)
|
99 |
+
- Accuracy: **78.61** (was 76.53)
|
100 |
- **SST-3**
|
101 |
+
- Macro Average F1: **72.63** (was 70.90)
|
102 |
+
- Accuracy: **80.91** (was 80.36)
|
103 |
|
104 |
## Model Architecture
|
105 |
|
|
|
257 |
- `dropout_rate`: Dropout rate used in the classifier.
|
258 |
- `pooling`: Pooling strategy used ('mean').
|
259 |
|
260 |
+
## Updated Performance by Dataset
|
261 |
+
|
262 |
+
### Merged Dataset
|
263 |
+
|
264 |
+
```
|
265 |
+
Merged Dataset Classification Report
|
266 |
+
|
267 |
+
precision recall f1-score support
|
268 |
+
|
269 |
+
negative 0.874178 0.847789 0.860781 2352
|
270 |
+
neutral 0.741715 0.770913 0.756032 1829
|
271 |
+
positive 0.878194 0.877820 0.878007 2349
|
272 |
+
|
273 |
+
accuracy 0.837060 6530
|
274 |
+
macro avg 0.831362 0.832174 0.831607 6530
|
275 |
+
weighted avg 0.838521 0.837060 0.837639 6530
|
276 |
+
|
277 |
+
ROC AUC: 0.947808
|
278 |
+
|
279 |
+
Predicted negative neutral positive
|
280 |
+
Actual
|
281 |
+
negative 1994 268 90
|
282 |
+
neutral 223 1410 196
|
283 |
+
positive 64 223 2062
|
284 |
+
|
285 |
+
Macro F1 Score: 0.83
|
286 |
+
```
|
287 |
+
|
288 |
+
### DynaSent Round 1
|
289 |
+
|
290 |
+
```
|
291 |
+
DynaSent Round 1 Classification Report
|
292 |
+
|
293 |
+
precision recall f1-score support
|
294 |
+
|
295 |
+
negative 0.925512 0.828333 0.874230 1200
|
296 |
+
neutral 0.781536 0.924167 0.846888 1200
|
297 |
+
positive 0.911472 0.840833 0.874729 1200
|
298 |
+
|
299 |
+
accuracy 0.864444 3600
|
300 |
+
macro avg 0.872840 0.864444 0.865283 3600
|
301 |
+
weighted avg 0.872840 0.864444 0.865283 3600
|
302 |
+
|
303 |
+
ROC AUC: 0.962647
|
304 |
+
|
305 |
+
Predicted negative neutral positive
|
306 |
+
Actual
|
307 |
+
negative 994 159 47
|
308 |
+
neutral 40 1109 51
|
309 |
+
positive 40 151 1009
|
310 |
+
|
311 |
+
Macro F1 Score: 0.87
|
312 |
+
```
|
313 |
+
|
314 |
+
### DynaSent Round 2
|
315 |
+
|
316 |
+
```
|
317 |
+
DynaSent Round 2 Classification Report
|
318 |
+
|
319 |
+
precision recall f1-score support
|
320 |
+
|
321 |
+
negative 0.791339 0.837500 0.813765 240
|
322 |
+
neutral 0.803030 0.662500 0.726027 240
|
323 |
+
positive 0.768657 0.858333 0.811024 240
|
324 |
+
|
325 |
+
accuracy 0.786111 720
|
326 |
+
macro avg 0.787675 0.786111 0.783605 720
|
327 |
+
weighted avg 0.787675 0.786111 0.783605 720
|
328 |
+
|
329 |
+
ROC AUC: 0.932089
|
330 |
+
|
331 |
+
Predicted negative neutral positive
|
332 |
+
Actual
|
333 |
+
negative 201 18 21
|
334 |
+
neutral 40 159 41
|
335 |
+
positive 13 21 206
|
336 |
+
|
337 |
+
Macro F1 Score: 0.78
|
338 |
+
```
|
339 |
+
|
340 |
+
### Stanford Sentiment Treebank (SST-3)
|
341 |
+
|
342 |
+
```
|
343 |
+
SST-3 Classification Report
|
344 |
+
|
345 |
+
precision recall f1-score support
|
346 |
+
|
347 |
+
negative 0.838405 0.876096 0.856836 912
|
348 |
+
neutral 0.500000 0.365039 0.421991 389
|
349 |
+
positive 0.870504 0.931793 0.900106 909
|
350 |
+
|
351 |
+
accuracy 0.809050 2210
|
352 |
+
macro avg 0.736303 0.724309 0.726311 2210
|
353 |
+
weighted avg 0.792042 0.809050 0.798093 2210
|
354 |
+
|
355 |
+
ROC AUC: 0.905255
|
356 |
+
|
357 |
+
Predicted negative neutral positive
|
358 |
+
Actual
|
359 |
+
negative 799 91 22
|
360 |
+
neutral 143 142 104
|
361 |
+
positive 11 51 847
|
362 |
+
|
363 |
+
Macro F1 Score: 0.73
|
364 |
+
```
|
365 |
+
|
366 |
+
## Old Performance by Dataset
|
367 |
|
368 |
### Merged Dataset
|
369 |
|