Update README.md
Browse files
README.md
CHANGED
@@ -46,6 +46,12 @@ The dataset is composed of messages labeled by ham or spam, merged from three da
|
|
46 |
The prepare script for enron is available at https://github.com/mshenoda/roberta-spam/tree/main/data/enron.
|
47 |
The data is split 80% train 10% validation, and 10% test sets; the scripts used to split and merge of the three data sources are available at: https://github.com/mshenoda/roberta-spam/tree/main/data/utils.
|
48 |
|
|
|
|
|
|
|
|
|
|
|
|
|
49 |
|
50 |
## Architecture
|
51 |
The model is fine tuned RoBERTa
|
|
|
46 |
The prepare script for enron is available at https://github.com/mshenoda/roberta-spam/tree/main/data/enron.
|
47 |
The data is split 80% train 10% validation, and 10% test sets; the scripts used to split and merge of the three data sources are available at: https://github.com/mshenoda/roberta-spam/tree/main/data/utils.
|
48 |
|
49 |
+
### Dataset Class Distribution
|
50 |
+
|
51 |
+
Training 80% | Validation 10% | Testing 10%
|
52 |
+
:-------------------------:|:-------------------------:|:-------------------------:
|
53 |
+
![](plots/train_set_distribution.jpg "Train / Validation Loss") Class Distribution | ![](plots/val_set_distribution.jpg "Class Distribution") Class Distribution | ![](plots/test_set_distribution.jpg "Class Distribution") Class Distribution
|
54 |
+
|
55 |
|
56 |
## Architecture
|
57 |
The model is fine tuned RoBERTa
|