ayushsinha commited on
Commit
c4a28d1
Β·
verified Β·
1 Parent(s): 7b88180

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +101 -0
README.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Roberta Base Quantized Model for Spam Detection
2
+
3
+ This repository hosts a quantized version of the **roberta-base** model, fine-tuned for **spam detection** tasks. The model has been optimized for efficient deployment while maintaining high accuracy, making it suitable for resource-constrained environments.
4
+
5
+ ## Model Details
6
+
7
+ - **Model Architecture:** Roberta Base
8
+ - **Task:** Spam Detection
9
+ - **Dataset:** Hugging Face's `sms_spam`, `spam_mail`, and `mail_spam_ham_dataset`
10
+ - **Quantization:** Float16
11
+ - **Fine-tuning Framework:** Hugging Face Transformers
12
+
13
+ ## Usage
14
+
15
+ ### Installation
16
+
17
+ ```sh
18
+ pip install transformers torch
19
+ ```
20
+
21
+ ### Loading the Model
22
+
23
+ ```python
24
+ from transformers import RobertaTokenizer, RobertaForSequenceClassification
25
+ import torch
26
+
27
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
28
+
29
+ model_name = "AventIQ-AI/roberta-spam-detection"
30
+ model = RobertaForSequenceClassification.from_pretrained(model_name).to(device)
31
+ tokenizer = RobertaTokenizer.from_pretrained(model_name)
32
+
33
+
34
+ def predict(text):
35
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
36
+
37
+ # Move input tensors to the same device as the model
38
+ inputs = {key: value.to(device) for key, value in inputs.items()}
39
+
40
+ with torch.no_grad():
41
+ outputs = model(**inputs)
42
+ logits = outputs.logits
43
+ predicted_class = torch.argmax(logits).item()
44
+
45
+ return "Spam" if predicted_class == 1 else "Ham"
46
+
47
+ # Sample test messages
48
+ input_text = "Congratulations! You have won a free iPhone. Click here to claim your prize."
49
+ print(f"Prediction: {predict(input_text)}") # Expected output: Spam
50
+ ```
51
+
52
+ ## πŸ“Š Classification Report (Quantized Model - bfloat16)
53
+
54
+ | Metric | Class 0 (Non-Spam) | Class 1 (Spam) | Macro Avg | Weighted Avg |
55
+ |------------|----------------|----------------|------------|--------------|
56
+ | **Precision** | 1.00 | 0.98 | 0.99 | 0.99 |
57
+ | **Recall** | 0.99 | 0.99 | 0.99 | 0.99 |
58
+ | **F1-Score** | 0.99 | 0.99 | 0.99 | 0.99 |
59
+ | **Accuracy** | **99%** | **99%** | **99%** | **99%** |
60
+
61
+ ### πŸ” **Observations**
62
+ βœ… **Precision:** High (1.00 for non-spam, 0.98 for spam) β†’ **Few false positives**
63
+ βœ… **Recall:** High (0.99 for both classes) β†’ **Few false negatives**
64
+ βœ… **F1-Score:** **Near-perfect balance** between precision & recall
65
+
66
+ ## Fine-Tuning Details
67
+
68
+ ### Dataset
69
+
70
+ The Hugging Face's `sms_spam`, `spam_mail`, and `mail_spam_ham_dataset` dataset was used, containing both spam and ham (non-spam) examples.
71
+
72
+ ### Training
73
+
74
+ - Number of epochs: 3
75
+ - Batch size: 8
76
+ - Evaluation strategy: epoch
77
+ - Learning rate: 3e-5
78
+
79
+ ### Quantization
80
+
81
+ Post-training quantization was applied using PyTorch's built-in quantization framework to reduce the model size and improve inference efficiency.
82
+
83
+ ## Repository Structure
84
+
85
+ ```
86
+ .
87
+ β”œβ”€β”€ model/ # Contains the quantized model files
88
+ β”œβ”€β”€ tokenizer_config/ # Tokenizer configuration and vocabulary files
89
+ β”œβ”€β”€ model.safetensors/ # Fine Tuned Model
90
+ β”œβ”€β”€ README.md # Model documentation
91
+ ```
92
+
93
+ ## Limitations
94
+
95
+ - The model may not generalize well to domains outside the fine-tuning dataset.
96
+ - Quantization may result in minor accuracy degradation compared to full-precision models.
97
+
98
+ ## Contributing
99
+
100
+ Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.
101
+