XquanL commited on
Commit
22e5788
·
unverified ·
1 Parent(s): 826732e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -3
README.md CHANGED
@@ -31,8 +31,12 @@ Reddit is a place where people come together to have a variety of conversations
31
  In this project, we created a text classifier Hugging Face Spaces app and Gradio interface that classifies not safe for work (NSFW) content, specifically text that is considered inappropriate and unprofessional. We used a pre-trained DistilBERT transformer model for the sentiment analysis. The model was fine-tuned on Reddit posts and predicts 2 classes - which are NSFW and safe for work (SFW).
32
 
33
  ## Workflow
 
 
 
34
 
35
  ### Get Reddit data
 
36
  * Data pulled in notebook `reddit_data/reddit_new.ipynb`
37
 
38
  ### Verify GPU works
@@ -54,11 +58,24 @@ In this project, we created a text classifier Hugging Face Spaces app and Gradio
54
  * Check out the fine-tuned model [here](https://huggingface.co/michellejieli/inappropriate_text_classifier)
55
  * Check out the spaces app [Spaces APP](https://huggingface.co/spaces/yjzhu0225/reddit_text_classification_app)
56
 
 
 
 
 
 
 
 
 
 
 
57
  ### Gradio interface
58
  * In terminal, run `python3 app.py`
59
  * Open the browser
60
  * Put reddit URL in *input_url* and get output
61
- <p align="center">
62
- <img width="700" height="450" src="https://user-images.githubusercontent.com/112578003/207481683-9a38c9e9-fd8f-48d9-be59-27f1583f96b6.jpeg">
63
- </p>
64
 
 
 
 
 
 
 
 
31
  In this project, we created a text classifier Hugging Face Spaces app and Gradio interface that classifies not safe for work (NSFW) content, specifically text that is considered inappropriate and unprofessional. We used a pre-trained DistilBERT transformer model for the sentiment analysis. The model was fine-tuned on Reddit posts and predicts 2 classes - which are NSFW and safe for work (SFW).
32
 
33
  ## Workflow
34
+ <p align="center">
35
+ <img width="750" height="450" src="https://user-images.githubusercontent.com/112578003/207698683-233c228e-c2d0-441f-bbba-139dd24a98d3.png" />
36
+ </p>
37
 
38
  ### Get Reddit data
39
+
40
  * Data pulled in notebook `reddit_data/reddit_new.ipynb`
41
 
42
  ### Verify GPU works
 
58
  * Check out the fine-tuned model [here](https://huggingface.co/michellejieli/inappropriate_text_classifier)
59
  * Check out the spaces app [Spaces APP](https://huggingface.co/spaces/yjzhu0225/reddit_text_classification_app)
60
 
61
+ **WARNING Reddit URL**
62
+ <p align="center">
63
+ <img width="700" height="300" src="https://user-images.githubusercontent.com/112578003/207698979-f3751140-fc91-4613-9892-c22f2e5b7dfa.png">
64
+ </p>
65
+
66
+ **SAFE Reddit URL**
67
+ <p align="center">
68
+ <img width="700" height="300" src="https://user-images.githubusercontent.com/112578003/207699308-8847e2f3-be76-47e4-8a0b-ba4406f5a693.png">
69
+ </p>
70
+
71
  ### Gradio interface
72
  * In terminal, run `python3 app.py`
73
  * Open the browser
74
  * Put reddit URL in *input_url* and get output
 
 
 
75
 
76
+ ### Reference
77
+ [1] “CADD_dataset,” GitHub, Sep. 26, 2022. https://github.com/nlpcl-lab/cadd_dataset
78
+
79
+ [2] H. Song, S. H. Ryu, H. Lee, and J. Park, “A Large-scale Comprehensive Abusiveness Detection Dataset with Multifaceted Labels from Reddit,” ACLWeb, Nov. 01, 2021. https://aclanthology.org/2021.conll-1.43/
80
+
81
+