File size: 8,020 Bytes
f475ccd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
import streamlit as st

# Custom CSS for better styling
st.markdown("""

    <style>

        .main-title {

            font-size: 36px;

            color: #4A90E2;

            font-weight: bold;

            text-align: center;

        }

        .sub-title {

            font-size: 24px;

            color: #4A90E2;

            margin-top: 20px;

        }

        .section {

            background-color: #f9f9f9;

            padding: 15px;

            border-radius: 10px;

            margin-top: 20px;

        }

        .section h2 {

            font-size: 22px;

            color: #4A90E2;

        }

        .section p, .section ul {

            color: #666666;

        }

        .link {

            color: #4A90E2;

            text-decoration: none;

        }

    </style>

""", unsafe_allow_html=True)

# Introduction
st.markdown('<div class="main-title">Cyberbullying Detection in Tweets with Spark NLP</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <p>Welcome to the Spark NLP Cyberbullying Detection Demo App! Detecting cyberbullying in social media posts is crucial to creating a safer online environment. This app demonstrates how to use Spark NLP's powerful tools to identify and classify cyberbullying in tweets.</p>

</div>

""", unsafe_allow_html=True)

st.write("")
st.image('images/Cyberbullying.jpeg', use_column_width='auto')

# About Cyberbullying Detection
st.markdown('<div class="sub-title">About Cyberbullying Detection</div>', unsafe_allow_html=True)
st.markdown("""

<div class="section">

    <p>Cyberbullying detection involves analyzing text to identify instances of harmful, threatening, or abusive language. Cyberbullying can have severe psychological effects on victims, making it essential to identify and address it promptly. Using Spark NLP, we can build a model to detect and classify cyberbullying in social media posts, helping to mitigate the negative impacts of online harassment.</p>

</div>

""", unsafe_allow_html=True)

# Using Cyberbullying Detection Model in Spark NLP
st.markdown('<div class="sub-title">Using Cyberbullying Detection Model in Spark NLP</div>', unsafe_allow_html=True)
st.markdown("""

<div class="section">

    <p>The following pipeline uses the Universal Sentence Encoder and a pre-trained ClassifierDL model to detect cyberbullying in tweets. This model can identify various forms of cyberbullying such as racism and sexism.</p>

</div>

""", unsafe_allow_html=True)

st.markdown('<h2 class="sub-title">Example Usage in Python</h2>', unsafe_allow_html=True)

# Setup Instructions
st.markdown('<div class="sub-title">Setup</div>', unsafe_allow_html=True)
st.markdown('<p>To install Spark NLP in Python, use your favorite package manager (conda, pip, etc.). For example:</p>', unsafe_allow_html=True)
st.code("""

pip install spark-nlp

pip install pyspark

""", language="bash")

st.markdown("<p>Then, import Spark NLP and start a Spark session:</p>", unsafe_allow_html=True)
st.code("""

import sparknlp



# Start Spark Session

spark = sparknlp.start()

""", language='python')

# Cyberbullying Detection Example
st.markdown('<div class="sub-title">Example Usage: Cyberbullying Detection with Spark NLP</div>', unsafe_allow_html=True)
st.code('''

from sparknlp.base import DocumentAssembler, LightPipeline

from sparknlp.annotator import UniversalSentenceEncoder, ClassifierDLModel

from pyspark.ml import Pipeline



# Step 1: Transforms raw texts to document annotation

document_assembler = DocumentAssembler()\\

    .setInputCol("text")\\

    .setOutputCol("document")



# Step 2: Universal Sentence Encoder

use = UniversalSentenceEncoder.pretrained('tfhub_use', lang="en") \\

    .setInputCols(["document"])\\

    .setOutputCol("sentence_embeddings")



# Step 3: ClassifierDLModel for Cyberbullying Detection

document_classifier = ClassifierDLModel.pretrained('classifierdl_use_cyberbullying', 'en') \\

    .setInputCols(["sentence_embeddings"]) \\

    .setOutputCol("class")



# Define the pipeline

nlp_pipeline = Pipeline(stages=[document_assembler, use, document_classifier])



# Create a light pipeline for prediction

light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))



# Predict cyberbullying in a tweet

annotations = light_pipeline.fullAnnotate('@geeky_zekey Thanks for showing again that blacks are the biggest racists. Blocked')

print(annotations[0]['class'][0])

''', language='python')

st.text("""

Output:

Annotation(category, 0, 81, racism, {'sentence': '0', 'sexism': '2.4904006E-7', 'neutral': '9.4820876E-5', 'racism': '0.9999049'}, [])

""")

st.markdown("""

<p>The annotation classifies the text as "racism" with a probability score of 0.9999049, indicating very high confidence, while also providing low probability scores for "sexism" and "neutral."</p>

""", unsafe_allow_html=True)

# Benchmarking Section
st.markdown('<div class="sub-title">Benchmarking</div>', unsafe_allow_html=True)
st.markdown("""

<div class="section">

    <p>The following table summarizes the performance of the Cyberbullying Detection model in terms of precision, recall, and f1-score:</p>

    <pre>

    precision    recall  f1-score   support



    neutral      0.72      0.76      0.74       700

    racism       0.89      0.94      0.92       773

    sexism       0.82      0.71      0.76       622



    accuracy                           0.81      2095

    macro avg       0.81      0.80     0.80      2095

    weighted avg    0.81      0.81     0.81      2095

    </pre>

</div>

""", unsafe_allow_html=True)

# Conclusion
st.markdown("""

<div class="section">

    <h2>Conclusion</h2>

    <p>In this app, we demonstrated how to use Spark NLP's ClassifierDL model to perform cyberbullying detection on tweet data. These powerful tools enable users to efficiently process large datasets and identify harmful content, providing deeper insights for various applications. By integrating these annotators into your NLP pipelines, you can enhance text understanding, information extraction, and online safety measures.</p>

</div>

""", unsafe_allow_html=True)

# References and Additional Information
st.markdown('<div class="sub-title">For additional information, please check the following references.</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <ul>

        <li>Documentation : <a href="https://nlp.johnsnowlabs.com/docs/en/transformers#classifierdl" target="_blank" rel="noopener">ClassifierDLModel</a></li>

        <li>Python Docs : <a href="https://nlp.johnsnowlabs.com/api/python/reference/autosummary/sparknlp/annotator/classifierdl/index.html#sparknlp.annotator.classifierdl.ClassifierDLModel" target="_blank" rel="noopener">ClassifierDLModel</a></li>

        <li>Model Used : <a href="https://sparknlp.org/2021/01/09/classifierdl_use_cyberbullying_en.html" target="_blank" rel="noopener">classifierdl_use_cyberbullying</a></li>

    </ul>

</div>

""", unsafe_allow_html=True)

st.markdown('<div class="sub-title">Community & Support</div>', unsafe_allow_html=True)
st.markdown("""

<div class="section">

    <ul>

        <li><a class="link" href="https://sparknlp.org/" target="_blank">Official Website</a>: Documentation and examples</li>

        <li><a class="link" href="https://join.slack.com/t/spark-nlp/shared_invite/zt-198dipu77-L3UWNe_AJ8xqDk0ivmih5Q" target="_blank">Slack</a>: Live discussion with the community and team</li>

        <li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp" target="_blank">GitHub</a>: Bug reports, feature requests, and contributions</li>

        <li><a class="link" href="https://medium.com/spark-nlp" target="_blank">Medium</a>: Spark NLP articles</li>

        <li><a class="link" href="https://www.youtube.com/channel/UCmFOjlpYEhxf_wJUDuz6xxQ/videos" target="_blank">YouTube</a>: Video tutorials</li>

    </ul>

</div>

""", unsafe_allow_html=True)