import streamlit as st # Custom CSS for better styling st.markdown(""" """, unsafe_allow_html=True) # Introduction st.markdown('
Cyberbullying Detection in Tweets with Spark NLP
', unsafe_allow_html=True) st.markdown("""

Welcome to the Spark NLP Cyberbullying Detection Demo App! Detecting cyberbullying in social media posts is crucial to creating a safer online environment. This app demonstrates how to use Spark NLP's powerful tools to identify and classify cyberbullying in tweets.

""", unsafe_allow_html=True) st.write("") st.image('images/Cyberbullying.jpeg', use_column_width='auto') # About Cyberbullying Detection st.markdown('
About Cyberbullying Detection
', unsafe_allow_html=True) st.markdown("""

Cyberbullying detection involves analyzing text to identify instances of harmful, threatening, or abusive language. Cyberbullying can have severe psychological effects on victims, making it essential to identify and address it promptly. Using Spark NLP, we can build a model to detect and classify cyberbullying in social media posts, helping to mitigate the negative impacts of online harassment.

""", unsafe_allow_html=True) # Using Cyberbullying Detection Model in Spark NLP st.markdown('
Using Cyberbullying Detection Model in Spark NLP
', unsafe_allow_html=True) st.markdown("""

The following pipeline uses the Universal Sentence Encoder and a pre-trained ClassifierDL model to detect cyberbullying in tweets. This model can identify various forms of cyberbullying such as racism and sexism.

""", unsafe_allow_html=True) st.markdown('

Example Usage in Python

', unsafe_allow_html=True) # Setup Instructions st.markdown('
Setup
', unsafe_allow_html=True) st.markdown('

To install Spark NLP in Python, use your favorite package manager (conda, pip, etc.). For example:

', unsafe_allow_html=True) st.code(""" pip install spark-nlp pip install pyspark """, language="bash") st.markdown("

Then, import Spark NLP and start a Spark session:

", unsafe_allow_html=True) st.code(""" import sparknlp # Start Spark Session spark = sparknlp.start() """, language='python') # Cyberbullying Detection Example st.markdown('
Example Usage: Cyberbullying Detection with Spark NLP
', unsafe_allow_html=True) st.code(''' from sparknlp.base import DocumentAssembler, LightPipeline from sparknlp.annotator import UniversalSentenceEncoder, ClassifierDLModel from pyspark.ml import Pipeline # Step 1: Transforms raw texts to document annotation document_assembler = DocumentAssembler()\\ .setInputCol("text")\\ .setOutputCol("document") # Step 2: Universal Sentence Encoder use = UniversalSentenceEncoder.pretrained('tfhub_use', lang="en") \\ .setInputCols(["document"])\\ .setOutputCol("sentence_embeddings") # Step 3: ClassifierDLModel for Cyberbullying Detection document_classifier = ClassifierDLModel.pretrained('classifierdl_use_cyberbullying', 'en') \\ .setInputCols(["sentence_embeddings"]) \\ .setOutputCol("class") # Define the pipeline nlp_pipeline = Pipeline(stages=[document_assembler, use, document_classifier]) # Create a light pipeline for prediction light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text"))) # Predict cyberbullying in a tweet annotations = light_pipeline.fullAnnotate('@geeky_zekey Thanks for showing again that blacks are the biggest racists. Blocked') print(annotations[0]['class'][0]) ''', language='python') st.text(""" Output: Annotation(category, 0, 81, racism, {'sentence': '0', 'sexism': '2.4904006E-7', 'neutral': '9.4820876E-5', 'racism': '0.9999049'}, []) """) st.markdown("""

The annotation classifies the text as "racism" with a probability score of 0.9999049, indicating very high confidence, while also providing low probability scores for "sexism" and "neutral."

""", unsafe_allow_html=True) # Benchmarking Section st.markdown('
Benchmarking
', unsafe_allow_html=True) st.markdown("""

The following table summarizes the performance of the Cyberbullying Detection model in terms of precision, recall, and f1-score:

    precision    recall  f1-score   support

    neutral      0.72      0.76      0.74       700
    racism       0.89      0.94      0.92       773
    sexism       0.82      0.71      0.76       622

    accuracy                           0.81      2095
    macro avg       0.81      0.80     0.80      2095
    weighted avg    0.81      0.81     0.81      2095
    
""", unsafe_allow_html=True) # Conclusion st.markdown("""

Conclusion

In this app, we demonstrated how to use Spark NLP's ClassifierDL model to perform cyberbullying detection on tweet data. These powerful tools enable users to efficiently process large datasets and identify harmful content, providing deeper insights for various applications. By integrating these annotators into your NLP pipelines, you can enhance text understanding, information extraction, and online safety measures.

""", unsafe_allow_html=True) # References and Additional Information st.markdown('
For additional information, please check the following references.
', unsafe_allow_html=True) st.markdown("""
""", unsafe_allow_html=True) st.markdown('
Community & Support
', unsafe_allow_html=True) st.markdown("""
""", unsafe_allow_html=True)