jbeno commited on
Commit
bc80c18
·
verified ·
1 Parent(s): b383877

Updated README with note about model update and new performance numbers

Browse files
Files changed (1) hide show
  1. README.md +118 -9
README.md CHANGED
@@ -12,6 +12,9 @@ tags:
12
 
13
  This is an [ELECTRA large discriminator](https://huggingface.co/google/electra-large-discriminator) fine-tuned for sentiment analysis of reviews. It has a mean pooling layer and a classifier head (2 layers of 1024 dimension) with SwishGLU activation and dropout (0.3). It classifies text into three sentiment categories: 'negative' (0), 'neutral' (1), and 'positive' (2). It was fine-tuned on the [Sentiment Merged](https://huggingface.co/datasets/jbeno/sentiment_merged) dataset, which is a merge of Stanford Sentiment Treebank (SST-3), and DynaSent Rounds 1 and 2.
14
 
 
 
 
15
 
16
  ## Labels
17
 
@@ -86,17 +89,17 @@ The research paper can be found here: [ELECTRA and GPT-4o: Cost-Effective Partne
86
  ### Performance Summary
87
 
88
  - **Merged Dataset**
89
- - Macro Average F1: **82.36**
90
- - Accuracy: **82.96**
91
  - **DynaSent R1**
92
- - Macro Average F1: **85.91**
93
- - Accuracy: **85.83**
94
  - **DynaSent R2**
95
- - Macro Average F1: **76.29**
96
- - Accuracy: **76.53**
97
  - **SST-3**
98
- - Macro Average F1: **70.90**
99
- - Accuracy: **80.36**
100
 
101
  ## Model Architecture
102
 
@@ -254,7 +257,113 @@ The model's configuration (config.json) includes custom parameters:
254
  - `dropout_rate`: Dropout rate used in the classifier.
255
  - `pooling`: Pooling strategy used ('mean').
256
 
257
- ## Performance by Dataset
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
258
 
259
  ### Merged Dataset
260
 
 
12
 
13
  This is an [ELECTRA large discriminator](https://huggingface.co/google/electra-large-discriminator) fine-tuned for sentiment analysis of reviews. It has a mean pooling layer and a classifier head (2 layers of 1024 dimension) with SwishGLU activation and dropout (0.3). It classifies text into three sentiment categories: 'negative' (0), 'neutral' (1), and 'positive' (2). It was fine-tuned on the [Sentiment Merged](https://huggingface.co/datasets/jbeno/sentiment_merged) dataset, which is a merge of Stanford Sentiment Treebank (SST-3), and DynaSent Rounds 1 and 2.
14
 
15
+ ## Updates
16
+
17
+ - **2025-Mar-25**: Uploaded a better performing model fine-tuned with a different random seed (123 vs. 42) and from an earlier training checkpoint (epoch 10 vs. 13).
18
 
19
  ## Labels
20
 
 
89
  ### Performance Summary
90
 
91
  - **Merged Dataset**
92
+ - Macro Average F1: **83.16** (was 82.36)
93
+ - Accuracy: **83.71** (was 82.96)
94
  - **DynaSent R1**
95
+ - Macro Average F1: **86.53** (was 85.91)
96
+ - Accuracy: **86.44** (was 85.83)
97
  - **DynaSent R2**
98
+ - Macro Average F1: **78.36** (was 76.29)
99
+ - Accuracy: **78.61** (was 76.53)
100
  - **SST-3**
101
+ - Macro Average F1: **72.63** (was 70.90)
102
+ - Accuracy: **80.91** (was 80.36)
103
 
104
  ## Model Architecture
105
 
 
257
  - `dropout_rate`: Dropout rate used in the classifier.
258
  - `pooling`: Pooling strategy used ('mean').
259
 
260
+ ## Updated Performance by Dataset
261
+
262
+ ### Merged Dataset
263
+
264
+ ```
265
+ Merged Dataset Classification Report
266
+
267
+ precision recall f1-score support
268
+
269
+ negative 0.874178 0.847789 0.860781 2352
270
+ neutral 0.741715 0.770913 0.756032 1829
271
+ positive 0.878194 0.877820 0.878007 2349
272
+
273
+ accuracy 0.837060 6530
274
+ macro avg 0.831362 0.832174 0.831607 6530
275
+ weighted avg 0.838521 0.837060 0.837639 6530
276
+
277
+ ROC AUC: 0.947808
278
+
279
+ Predicted negative neutral positive
280
+ Actual
281
+ negative 1994 268 90
282
+ neutral 223 1410 196
283
+ positive 64 223 2062
284
+
285
+ Macro F1 Score: 0.83
286
+ ```
287
+
288
+ ### DynaSent Round 1
289
+
290
+ ```
291
+ DynaSent Round 1 Classification Report
292
+
293
+ precision recall f1-score support
294
+
295
+ negative 0.925512 0.828333 0.874230 1200
296
+ neutral 0.781536 0.924167 0.846888 1200
297
+ positive 0.911472 0.840833 0.874729 1200
298
+
299
+ accuracy 0.864444 3600
300
+ macro avg 0.872840 0.864444 0.865283 3600
301
+ weighted avg 0.872840 0.864444 0.865283 3600
302
+
303
+ ROC AUC: 0.962647
304
+
305
+ Predicted negative neutral positive
306
+ Actual
307
+ negative 994 159 47
308
+ neutral 40 1109 51
309
+ positive 40 151 1009
310
+
311
+ Macro F1 Score: 0.87
312
+ ```
313
+
314
+ ### DynaSent Round 2
315
+
316
+ ```
317
+ DynaSent Round 2 Classification Report
318
+
319
+ precision recall f1-score support
320
+
321
+ negative 0.791339 0.837500 0.813765 240
322
+ neutral 0.803030 0.662500 0.726027 240
323
+ positive 0.768657 0.858333 0.811024 240
324
+
325
+ accuracy 0.786111 720
326
+ macro avg 0.787675 0.786111 0.783605 720
327
+ weighted avg 0.787675 0.786111 0.783605 720
328
+
329
+ ROC AUC: 0.932089
330
+
331
+ Predicted negative neutral positive
332
+ Actual
333
+ negative 201 18 21
334
+ neutral 40 159 41
335
+ positive 13 21 206
336
+
337
+ Macro F1 Score: 0.78
338
+ ```
339
+
340
+ ### Stanford Sentiment Treebank (SST-3)
341
+
342
+ ```
343
+ SST-3 Classification Report
344
+
345
+ precision recall f1-score support
346
+
347
+ negative 0.838405 0.876096 0.856836 912
348
+ neutral 0.500000 0.365039 0.421991 389
349
+ positive 0.870504 0.931793 0.900106 909
350
+
351
+ accuracy 0.809050 2210
352
+ macro avg 0.736303 0.724309 0.726311 2210
353
+ weighted avg 0.792042 0.809050 0.798093 2210
354
+
355
+ ROC AUC: 0.905255
356
+
357
+ Predicted negative neutral positive
358
+ Actual
359
+ negative 799 91 22
360
+ neutral 143 142 104
361
+ positive 11 51 847
362
+
363
+ Macro F1 Score: 0.73
364
+ ```
365
+
366
+ ## Old Performance by Dataset
367
 
368
  ### Merged Dataset
369