victoriadreis commited on
Commit
e6756dd
1 Parent(s): ed995da

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -0
README.md ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - Silly-Machine/TuPyE-Dataset
5
+ language:
6
+ - pt
7
+
8
+ pipeline_tag: text-classification
9
+ base_model: neuralmind/bert-base-portuguese-cased
10
+ widget:
11
+ - text: 'Bom dia, flor do dia!!'
12
+
13
+ model-index:
14
+ - name: Yi-34B
15
+ results:
16
+ - task:
17
+ type: text-classfication
18
+ dataset:
19
+ name: TuPyE-Dataset
20
+ type: Silly-Machine/TuPyE-Dataset
21
+ metrics:
22
+ - type: accuracy
23
+ value: 0.901
24
+ name: Accuracy
25
+ verified: true
26
+ - type: f1
27
+ value: 0.899
28
+ name: F1-score
29
+ verified: true
30
+ - type: precision
31
+ value: 0.897
32
+ name: Precision
33
+ verified: true
34
+ - type: recall
35
+ value: 0.901
36
+ name: Recall
37
+ verified: true
38
+ ---
39
+
40
+ ## Introduction
41
+
42
+
43
+ Tupy-BERT-Base-Binary is a fine-tuned BERT model designed specifically for binary classification of hate speech in Portuguese.
44
+ Derived from the [BERTimbau base](https://huggingface.co/neuralmind/bert-base-portuguese-cased),
45
+ TuPy-Base is a refined solution for addressing binary hate speech concerns (hate or not hate).
46
+ For more details or specific inquiries, please refer to the [BERTimbau repository](https://github.com/neuralmind-ai/portuguese-bert/).
47
+
48
+ The efficacy of Language Models can exhibit notable variations when confronted with a shift in domain between training and test data.
49
+ In the creation of a specialized Portuguese Language Model tailored for hate speech classification,
50
+ the original BERTimbau model underwent fine-tuning processe carried out on
51
+ the [TuPy Hate Speech DataSet](https://huggingface.co/datasets/Silly-Machine/TuPyE-Dataset), sourced from diverse social networks.
52
+
53
+ ## Available models
54
+
55
+ | Model | Arch. | #Layers | #Params |
56
+ | ---------------------------------------- | ---------- | ------- | ------- |
57
+ | `Silly-Machine/TuPy-Bert-Base-Binary-Classifier` | BERT-Base |12 |109M|
58
+ | `Silly-Machine/TuPy-Bert-Large-Binary-Classifier` | BERT-Large | 24 | 334M |
59
+ | `Silly-Machine/TuPy-Bert-Base-Multilabel` | BERT-Base | 12 | 109M |
60
+ | `Silly-Machine/TuPy-Bert-Large-Multilabel` | BERT-Large | 24 | 334M |
61
+
62
+ ## Example usage
63
+
64
+ ```python
65
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig
66
+ import torch
67
+ import numpy as np
68
+ from scipy.special import softmax
69
+
70
+ def classify_hate_speech(model_name, text):
71
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
72
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
73
+ config = AutoConfig.from_pretrained(model_name)
74
+
75
+ # Tokenize input text and prepare model input
76
+ model_input = tokenizer(text, padding=True, return_tensors="pt")
77
+
78
+ # Get model output scores
79
+ with torch.no_grad():
80
+ output = model(**model_input)
81
+ scores = softmax(output.logits.numpy(), axis=1)
82
+ ranking = np.argsort(scores[0])[::-1]
83
+
84
+ # Print the results
85
+ for i, rank in enumerate(ranking):
86
+ label = config.id2label[rank]
87
+ score = scores[0, rank]
88
+ print(f"{i + 1}) Label: {label} Score: {score:.4f}")
89
+
90
+ # Example usage
91
+ model_name = "Silly-Machine/TuPy-Bert-Base-Binary-Classifier"
92
+ text = "Bom dia, flor do dia!!"
93
+ classify_hate_speech(model_name, text)
94
+
95
+ ```