|
---
|
|
license: apache-2.0
|
|
tags:
|
|
- transformers
|
|
- text-classification
|
|
- spam-detection
|
|
---
|
|
# SPAM Mail Classifier
|
|
|
|
This model is fine-tuned from `microsoft/Multilingual-MiniLM-L12-H384` to classify email subjects as SPAM or NOSPAM.
|
|
|
|
## Model Details
|
|
|
|
- **Base model**: `microsoft/Multilingual-MiniLM-L12-H384`
|
|
- **Fine-tuned for**: Text classification
|
|
- **Number of classes**: 2 (SPAM, NOSPAM)
|
|
- **Languages**: Multilingual
|
|
|
|
## Usage
|
|
|
|
This model is fine-tuned from `microsoft/Multilingual-MiniLM-L12-H384` to classify email subjects as SPAM or NOSPAM.
|
|
|
|
```python
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
|
|
|
model_name = "Goodmotion/spam-mail-classifier"
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|
model = AutoModelForSequenceClassification.from_pretrained(
|
|
model_name
|
|
)
|
|
|
|
text = "Félicitations ! Vous avez gagné un iPhone."
|
|
inputs = tokenizer(text, return_tensors="pt")
|
|
outputs = model(**inputs)
|
|
print(outputs.logits)
|
|
```
|
|
|
|
### Exemple for list
|
|
|
|
```python
|
|
import torch
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
|
|
|
model_name = "Goodmotion/spam-mail-classifier"
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name)
|
|
|
|
texts = [
|
|
'Join us for a webinar on AI innovations',
|
|
'Urgent: Verify your account immediately.',
|
|
'Meeting rescheduled to 3 PM',
|
|
'Happy Birthday!',
|
|
'Limited time offer: Act now!',
|
|
'Join us for a webinar on AI innovations',
|
|
'Claim your free prize now!',
|
|
'You have unclaimed rewards waiting!',
|
|
'Weekly newsletter from Tech World',
|
|
'Update on the project status',
|
|
'Lunch tomorrow at 12:30?',
|
|
'Get rich quick with this amazing opportunity!',
|
|
'Invoice for your recent purchase',
|
|
'Don\'t forget: Gym session at 6 AM',
|
|
'Join us for a webinar on AI innovations',
|
|
'bonjour comment allez vous ?',
|
|
'Documents suite à notre rendez-vous',
|
|
'Valentin Dupond mentioned you in a comment',
|
|
'Bolt x Supabase = 🤯',
|
|
'Modification site web de la société',
|
|
'Image de mise en avant sur les articles',
|
|
'Bring new visitors to your site',
|
|
'Le Cloud Éthique sans bullshit',
|
|
'Remix Newsletter #25: React Router v7',
|
|
'Votre essai auprès de X va bientôt prendre fin',
|
|
'Introducing a Google Docs integration, styles and more in Claude.ai',
|
|
'Carte de crédit sur le point d’expirer sur Cloudflare'
|
|
]
|
|
inputs = tokenizer(texts, padding=True, truncation=True, max_length=128, return_tensors="pt")
|
|
outputs = model(**inputs)
|
|
|
|
# Convertir les logits en probabilités avec softmax
|
|
logits = outputs.logits
|
|
probabilities = torch.softmax(logits, dim=1)
|
|
|
|
# Décoder les classes pour chaque texte
|
|
labels = ["NOSPAM", "SPAM"] # Mapping des indices à des labels
|
|
results = [
|
|
{"text": text, "label": labels[torch.argmax(prob).item()], "confidence": prob.max().item()}
|
|
for text, prob in zip(texts, probabilities)
|
|
]
|
|
|
|
# Afficher les résultats
|
|
for result in results:
|
|
print(f"Texte : {result['text']}")
|
|
print(f"Résultat : {result['label']} (Confiance : {result['confidence']:.2%})\n")
|
|
```
|
|
|