--- license: apache-2.0 tags: - generated_from_trainer metrics: - accuracy - precision - recall - f1 model-index: - name: HamSpamBERT results: [] widget: - text: "Ok i am on the way to home bye" example_title: "Ham" - text: "PRIVATE! Your 2004 Account Statement for 07742676969 shows 786 unredeemed Bonus Points. To claim call 08719180248 Identifier Code: 45239 Expires" example_title: "Spam" --- # HamSpamBERT This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on [Spam-Ham](https://huggingface.co/datasets/SalehAhmad/Spam-Ham) dataset. It achieves the following results on the evaluation set: - Loss: 0.0072 - Accuracy: 0.9991 - Precision: 1.0 - Recall: 0.9933 - F1: 0.9966 ```python from transformers import pipeline, BertTokenizer, BertForSequenceClassification tokenizer = BertTokenizer.from_pretrained("udit-k/HamSpamBERT") model = BertForSequenceClassification.from_pretrained("udit-k/HamSpamBERT") classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer) print(classifier("Call this number to win FREE IPL FINAL tickets!!!")) print(classifier("Call me when you reach home :)")) ``` ``` [{'label': 'LABEL_1', 'score': 0.9999189376831055}] [{'label': 'LABEL_0', 'score': 0.9999370574951172}] ``` ## Model description This model is a fine-tuned version of the [BERT](https://huggingface.co/bert-base-uncased) model on [Spam-Ham](https://huggingface.co/datasets/SalehAhmad/Spam-Ham) dataset to improve the performance of sentiment analysis on Spam Detection tasks. - LABEL_0 = Ham (Not spam) - LABEL_1 = Spam ## Intended uses & limitations This model can be used to detect spam texts. The primary limitation of this model is that it was trained on a corpus of about 4700 rows and evaluated on around 1200 rows. ## Training and evaluation data - Training corpus = 80% - Evaluation corpus = 20% ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 7 ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 | |:-------------:|:-----:|:----:|:---------------:|:--------:|:---------:|:------:|:------:| | No log | 1.0 | 279 | 0.0492 | 0.9901 | 1.0 | 0.9262 | 0.9617 | | 0.0635 | 2.0 | 558 | 0.0117 | 0.9982 | 1.0 | 0.9866 | 0.9932 | | 0.0635 | 3.0 | 837 | 0.0120 | 0.9982 | 0.9933 | 0.9933 | 0.9933 | | 0.0138 | 4.0 | 1116 | 0.0072 | 0.9991 | 1.0 | 0.9933 | 0.9966 | | 0.0138 | 5.0 | 1395 | 0.0086 | 0.9982 | 0.9933 | 0.9933 | 0.9933 | | 0.0007 | 6.0 | 1674 | 0.0090 | 0.9982 | 0.9933 | 0.9933 | 0.9933 | | 0.0007 | 7.0 | 1953 | 0.0091 | 0.9982 | 0.9933 | 0.9933 | 0.9933 | ### Framework versions - Transformers 4.30.0 - Pytorch 2.1.2 - Datasets 2.18.0 - Tokenizers 0.13.3