Yuanjing Zhu commited on
Commit
ad0c90c
·
unverified ·
1 Parent(s): 1e78e17

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -1
README.md CHANGED
@@ -1 +1,47 @@
1
- # reddit_scrapper
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # reddit_text_classification
2
+
3
+ Ideas - text classification
4
+
5
+ API that does a microservice
6
+
7
+ Use swagger documentation
8
+
9
+ Filter words
10
+ Query on it
11
+ Text classification - sentiment analysis
12
+
13
+ Filter, spam, sensitive content, cuss words
14
+
15
+ https://new.pythonforengineers.com/blog/build-a-reddit-bot-part-1/
16
+
17
+ Look for explicit things, further research on which subreddit to use
18
+
19
+ Find data contains labels with data that is similar to how people type on reddit
20
+
21
+ Figure out which model
22
+
23
+ If it does
24
+
25
+ Next steps:
26
+ EOD Wednesday
27
+
28
+ Meet next Monday 12:30 PM
29
+
30
+ https://new.pythonforengineers.com/blog/build-a-reddit-bot-part-1/
31
+
32
+
33
+
34
+ Trying to break the project into as small as pieces as possible, where we are able to get in static copy of some known matches and some where they dont match, some examples of posts that are good and some examples of posts are bad, get everything locally first, and then once we get that working, then will try to hook it up to API, get reddit api (if we can get it), real time stuff - cherry on sundae (nice to have) but without the system working, better to not do it at all, download posts first, take half an hour to see if API gave me ability to grab a post, use API to grab a few posts, put in some fake bad words, inject some bad words into it inject into original post, get everything working, hugging face model to detect toxic content (that itself is good enough), have some data, get spaces app, command line tool app, 99% on that, real time is only if we have time
35
+
36
+ Things to do:
37
+ - (Yuanjing and Xiaoquan) Find examples of reddit posts for both classes we are trying to classify, Convert to CSV (2 columns - text, class), 2 classes, Thursday December 8, 2022
38
+ - (Michelle) Find Hugging Face model to use to classify posts
39
+ - (Michelle) Finetune model on reddit posts and upload to Hugging Face (create API)
40
+ - (Susanna) Create CLI or spaces app on Hugging Face
41
+ - Connect to real time (optional)
42
+ - (Xiaoquan and Yuanjing) Make demo video
43
+
44
+
45
+ Due date: December 16, 2022
46
+
47
+ Demo - split up the workload so that it uses everybody’s best talents, not everyone has to present, break problem up so that final outcome is the best, one person really good at editing, can be editor, if one person is good at voiceocer then do the voiceover, if one person is good at documentation, then one person does documentation, if one person is doing coding, then one person is doing coding,