Spaces:
Runtime error
Runtime error
XquanL
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -31,8 +31,12 @@ Reddit is a place where people come together to have a variety of conversations
|
|
31 |
In this project, we created a text classifier Hugging Face Spaces app and Gradio interface that classifies not safe for work (NSFW) content, specifically text that is considered inappropriate and unprofessional. We used a pre-trained DistilBERT transformer model for the sentiment analysis. The model was fine-tuned on Reddit posts and predicts 2 classes - which are NSFW and safe for work (SFW).
|
32 |
|
33 |
## Workflow
|
|
|
|
|
|
|
34 |
|
35 |
### Get Reddit data
|
|
|
36 |
* Data pulled in notebook `reddit_data/reddit_new.ipynb`
|
37 |
|
38 |
### Verify GPU works
|
@@ -54,11 +58,24 @@ In this project, we created a text classifier Hugging Face Spaces app and Gradio
|
|
54 |
* Check out the fine-tuned model [here](https://huggingface.co/michellejieli/inappropriate_text_classifier)
|
55 |
* Check out the spaces app [Spaces APP](https://huggingface.co/spaces/yjzhu0225/reddit_text_classification_app)
|
56 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
57 |
### Gradio interface
|
58 |
* In terminal, run `python3 app.py`
|
59 |
* Open the browser
|
60 |
* Put reddit URL in *input_url* and get output
|
61 |
-
<p align="center">
|
62 |
-
<img width="700" height="450" src="https://user-images.githubusercontent.com/112578003/207481683-9a38c9e9-fd8f-48d9-be59-27f1583f96b6.jpeg">
|
63 |
-
</p>
|
64 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
In this project, we created a text classifier Hugging Face Spaces app and Gradio interface that classifies not safe for work (NSFW) content, specifically text that is considered inappropriate and unprofessional. We used a pre-trained DistilBERT transformer model for the sentiment analysis. The model was fine-tuned on Reddit posts and predicts 2 classes - which are NSFW and safe for work (SFW).
|
32 |
|
33 |
## Workflow
|
34 |
+
<p align="center">
|
35 |
+
<img width="750" height="450" src="https://user-images.githubusercontent.com/112578003/207698683-233c228e-c2d0-441f-bbba-139dd24a98d3.png" />
|
36 |
+
</p>
|
37 |
|
38 |
### Get Reddit data
|
39 |
+
|
40 |
* Data pulled in notebook `reddit_data/reddit_new.ipynb`
|
41 |
|
42 |
### Verify GPU works
|
|
|
58 |
* Check out the fine-tuned model [here](https://huggingface.co/michellejieli/inappropriate_text_classifier)
|
59 |
* Check out the spaces app [Spaces APP](https://huggingface.co/spaces/yjzhu0225/reddit_text_classification_app)
|
60 |
|
61 |
+
**WARNING Reddit URL**
|
62 |
+
<p align="center">
|
63 |
+
<img width="700" height="300" src="https://user-images.githubusercontent.com/112578003/207698979-f3751140-fc91-4613-9892-c22f2e5b7dfa.png">
|
64 |
+
</p>
|
65 |
+
|
66 |
+
**SAFE Reddit URL**
|
67 |
+
<p align="center">
|
68 |
+
<img width="700" height="300" src="https://user-images.githubusercontent.com/112578003/207699308-8847e2f3-be76-47e4-8a0b-ba4406f5a693.png">
|
69 |
+
</p>
|
70 |
+
|
71 |
### Gradio interface
|
72 |
* In terminal, run `python3 app.py`
|
73 |
* Open the browser
|
74 |
* Put reddit URL in *input_url* and get output
|
|
|
|
|
|
|
75 |
|
76 |
+
### Reference
|
77 |
+
[1] “CADD_dataset,” GitHub, Sep. 26, 2022. https://github.com/nlpcl-lab/cadd_dataset
|
78 |
+
|
79 |
+
[2] H. Song, S. H. Ryu, H. Lee, and J. Park, “A Large-scale Comprehensive Abusiveness Detection Dataset with Multifaceted Labels from Reddit,” ACLWeb, Nov. 01, 2021. https://aclanthology.org/2021.conll-1.43/
|
80 |
+
|
81 |
+
|