eliasalbouzidi commited on
Commit
e91d765
·
verified ·
1 Parent(s): af058f9

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +105 -0
README.md ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ widget:
3
+ - text: A family hiking in the mountains
4
+ example_title: Safe
5
+ - text: A child playing with a puppy
6
+ example_title: Safe
7
+ - text: A couple kissing passionately in bed
8
+ example_title: Nsfw
9
+ - text: A woman naked
10
+ example_title: Nsfw
11
+ - text: A man killing people
12
+ example_title: Nsfw
13
+ - text: A mass shooting
14
+ example_title: Nsfw
15
+ license: mit
16
+ language:
17
+ - en
18
+ metrics:
19
+ - f1
20
+ pipeline_tag: text-classification
21
+ tags:
22
+ - bert
23
+ - Transformers
24
+ - ' PyTorch'
25
+ ---
26
+
27
+ # Model Card for Model ID
28
+
29
+ <!-- Provide a quick summary of what the model is/does. -->
30
+
31
+ This model is designed to categorize text into three classes: "Safe", or "Nsfw", which makes it suitable for content moderation and filtering applications.
32
+
33
+ The model was trained using a dataset containing 100,000 labeled text samples, distributed among the three classes of "Safe" and "Nsfw".
34
+
35
+ The model is based on the Distilbert-base model.
36
+
37
+ In terms of performance, the model has achieved a score of 0.92 for F1 and 0.92 for accuracy.
38
+ ### Model Description
39
+
40
+ The model can be used directly to classify text into one of the two classes. It takes in a string of text as input and outputs a probability distribution over the two classes. The class with the highest probability is selected as the predicted class.
41
+
42
+
43
+
44
+ - **Developed by:** Centrale Supélec Students
45
+ - **Model type:** 80M
46
+ - **Language(s) (NLP):** English
47
+ - **License:** MIT
48
+
49
+
50
+ ### Training Procedure
51
+
52
+ The model was trained utilizing the Hugging Face Transformers library. The training approach employed transfer learning, where the original layers of the Distilbert-base model were frozen, and only the classification layers were fine-tuned on the labeled dataset. This selective fine-tuning allowed the model to leverage the pre-existing knowledge of the Distilbert-base model while adapting to the specific task at hand. To optimize memory usage and accelerate training, mixed precision fp16 was used. Further details regarding the training procedure can be found in the Technical Specifications section.
53
+
54
+ ### Training Data
55
+
56
+ The training data for the text classification model consists of a large corpus of text labeled with one of the three classes: "Safe" and "Nsfw". The dataset contains a total of 100,000 examples, which are distributed as follows:
57
+
58
+ 60,000 examples labeled as "Safe"
59
+
60
+ 40,000 examples labeled as "Nsfw"
61
+
62
+ The data was preprocessed to remove stop words and punctuation, and to convert all text to lowercase.
63
+
64
+ More information about the training data can be found in the Dataset Card (availabe soon).
65
+
66
+ ## Uses
67
+
68
+ The model can be integrated into larger systems for content moderation or filtering.
69
+
70
+
71
+
72
+ ### Out-of-Scope Use
73
+
74
+ It should not be used for any illegal activities.
75
+
76
+ ## Bias, Risks, and Limitations
77
+
78
+ The model may exhibit biases based on the training data used. It may not perform well on text that is written in languages other than English. It may also struggle with sarcasm, irony, or other forms of figurative language. The model may produce false positives or false negatives, which could lead to incorrect categorization of text.
79
+
80
+ ### Recommendations
81
+
82
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
83
+
84
+ Users should be aware of the limitations and biases of the model and use it accordingly. They should also be prepared to handle false positives and false negatives. It is recommended to fine-tune the model for specific downstream tasks and to evaluate its performance on relevant datasets.
85
+
86
+
87
+
88
+ ### Load model directly
89
+ ```python
90
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
91
+
92
+ tokenizer = AutoTokenizer.from_pretrained("eliasalbouzidi/distilbert-nsfw-text-classifier")
93
+
94
+ model = AutoModelForSequenceClassification.from_pretrained("eliasalbouzidi/distilbert-nsfw-text-classifier")
95
+ ```
96
+ ### Use a pipeline as a high-level helper
97
+ ```python
98
+ from transformers import pipeline
99
+
100
+ pipe = pipeline("text-classification", model="eliasalbouzidi/distilbert-nsfw-text-classifier")
101
+ ```
102
+
103
+
104
+ ## Contact
105
+ Please reach out to [email protected] if you have any questions or feedback.