File size: 3,124 Bytes
f134b07
 
3e76b54
 
 
 
f134b07
781432b
 
 
 
 
 
 
 
 
 
 
 
 
4abc5f4
 
781432b
 
 
 
 
4abc5f4
41aeba4
4abc5f4
781432b
 
 
 
 
e00793e
 
 
 
 
 
 
 
 
 
 
41aeba4
e00793e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---

license: apache-2.0
tags:
- transformers
- text-classification
- spam-detection
---

# SPAM Mail Classifier

This model is fine-tuned from `microsoft/Multilingual-MiniLM-L12-H384` to classify email subjects as SPAM or NOSPAM.

## Model Details

- **Base model**: `microsoft/Multilingual-MiniLM-L12-H384`
- **Fine-tuned for**: Text classification
- **Number of classes**: 2 (SPAM, NOSPAM)
- **Languages**: Multilingual

## Usage

This model is fine-tuned from `microsoft/Multilingual-MiniLM-L12-H384` to classify email subjects as SPAM or NOSPAM.

```python

from transformers import AutoTokenizer, AutoModelForSequenceClassification



model_name = "Goodmotion/spam-mail-classifier"

tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForSequenceClassification.from_pretrained(

    model_name

)



text = "Félicitations ! Vous avez gagné un iPhone."

inputs = tokenizer(text, return_tensors="pt")

outputs = model(**inputs)

print(outputs.logits)

```

### Exemple for list

```python

import torch

from transformers import AutoTokenizer, AutoModelForSequenceClassification



model_name = "Goodmotion/spam-mail-classifier"



tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForSequenceClassification.from_pretrained(model_name)



texts = [

'Join us for a webinar on AI innovations',

'Urgent: Verify your account immediately.',

'Meeting rescheduled to 3 PM',

'Happy Birthday!',

'Limited time offer: Act now!',

'Join us for a webinar on AI innovations',

'Claim your free prize now!',

'You have unclaimed rewards waiting!',

'Weekly newsletter from Tech World',

'Update on the project status',

'Lunch tomorrow at 12:30?',

'Get rich quick with this amazing opportunity!',

'Invoice for your recent purchase',

'Don\'t forget: Gym session at 6 AM',

'Join us for a webinar on AI innovations',

'bonjour comment allez vous ?',

'Documents suite à notre rendez-vous',

'Valentin Dupond mentioned you in a comment',

'Bolt x Supabase = 🤯',

'Modification site web de la société',

'Image de mise en avant sur les articles',

'Bring new visitors to your site',

'Le Cloud Éthique sans bullshit',

'Remix Newsletter #25: React Router v7',

'Votre essai auprès de X va bientôt prendre fin',

'Introducing a Google Docs integration, styles and more in Claude.ai',

'Carte de crédit sur le point d’expirer sur Cloudflare'

]

inputs = tokenizer(texts, padding=True, truncation=True, max_length=128, return_tensors="pt")

outputs = model(**inputs)



# Convertir les logits en probabilités avec softmax

logits = outputs.logits

probabilities = torch.softmax(logits, dim=1)



# Décoder les classes pour chaque texte

labels = ["NOSPAM", "SPAM"]  # Mapping des indices à des labels

results = [

    {"text": text, "label": labels[torch.argmax(prob).item()], "confidence": prob.max().item()}

    for text, prob in zip(texts, probabilities)

]



# Afficher les résultats

for result in results:

    print(f"Texte : {result['text']}")

    print(f"Résultat : {result['label']} (Confiance : {result['confidence']:.2%})\n")

```