mshenoda
/

roberta-spam

Text Classification

Inference Endpoints

Model card Files Files and versions Community

mshenoda commited on Feb 2

Commit

bf36083

•

1 Parent(s): 3b92a10

Update README.md

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -46,6 +46,12 @@ The dataset is composed of messages labeled by ham or spam, merged from three da
 The prepare script for enron is available at https://github.com/mshenoda/roberta-spam/tree/main/data/enron.
 The data is split 80% train 10% validation, and 10% test sets; the scripts used to split and merge of the three data sources are available at: https://github.com/mshenoda/roberta-spam/tree/main/data/utils.
 ## Architecture
 The model is fine tuned RoBERTa

 The prepare script for enron is available at https://github.com/mshenoda/roberta-spam/tree/main/data/enron.
 The data is split 80% train 10% validation, and 10% test sets; the scripts used to split and merge of the three data sources are available at: https://github.com/mshenoda/roberta-spam/tree/main/data/utils.
+### Dataset Class Distribution
+Training  80%  |  Validation  10%   |  Testing  10%
+:-------------------------:|:-------------------------:|:-------------------------:
+![](plots/train_set_distribution.jpg "Train / Validation Loss") Class Distribution | ![](plots/val_set_distribution.jpg "Class Distribution") Class Distribution | ![](plots/test_set_distribution.jpg "Class Distribution")  Class Distribution
 ## Architecture
 The model is fine tuned RoBERTa