HamSpamBERT

This model is a fine-tuned version of bert-base-uncased on Spam-Ham dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0072
  • Accuracy: 0.9991
  • Precision: 1.0
  • Recall: 0.9933
  • F1: 0.9966
from transformers import pipeline, BertTokenizer, BertForSequenceClassification

tokenizer = BertTokenizer.from_pretrained("udit-k/HamSpamBERT")
model = BertForSequenceClassification.from_pretrained("udit-k/HamSpamBERT")

classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
print(classifier("Call this number to win FREE IPL FINAL tickets!!!"))
print(classifier("Call me when you reach home :)"))
[{'label': 'LABEL_1', 'score': 0.9999189376831055}]
[{'label': 'LABEL_0', 'score': 0.9999370574951172}]

Model description

This model is a fine-tuned version of the BERT model on Spam-Ham dataset to improve the performance of sentiment analysis on Spam Detection tasks.

  • LABEL_0 = Ham (Not spam)
  • LABEL_1 = Spam

Intended uses & limitations

This model can be used to detect spam texts. The primary limitation of this model is that it was trained on a corpus of about 4700 rows and evaluated on around 1200 rows.

Training and evaluation data

  • Training corpus = 80%
  • Evaluation corpus = 20%

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 7

Training results

Training Loss Epoch Step Validation Loss Accuracy Precision Recall F1
No log 1.0 279 0.0492 0.9901 1.0 0.9262 0.9617
0.0635 2.0 558 0.0117 0.9982 1.0 0.9866 0.9932
0.0635 3.0 837 0.0120 0.9982 0.9933 0.9933 0.9933
0.0138 4.0 1116 0.0072 0.9991 1.0 0.9933 0.9966
0.0138 5.0 1395 0.0086 0.9982 0.9933 0.9933 0.9933
0.0007 6.0 1674 0.0090 0.9982 0.9933 0.9933 0.9933
0.0007 7.0 1953 0.0091 0.9982 0.9933 0.9933 0.9933

Framework versions

  • Transformers 4.30.0
  • Pytorch 2.1.2
  • Datasets 2.18.0
  • Tokenizers 0.13.3
Downloads last month
13
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.