awacke1 commited on
Commit
7d9e94d
Β·
1 Parent(s): d6e6796

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +20 -14
app.py CHANGED
@@ -72,20 +72,26 @@ Here is an outline of some of the most exciting recent developments in AI:
72
  9. [xP3](https://paperswithcode.com/dataset/xp3)
73
  10. [DiaBLa](https://paperswithcode.com/dataset/diabla)
74
 
75
- ## Deep RL ML Strategy 🧠
76
-
77
- 1. Language Model Preparation, Human Augmented with Supervised Fine Tuning
78
- 2. Reward Model Training with Prompts Dataset Multi-Model Generate Data to Rank
79
- 3. Fine Tuning with Reinforcement Reward and Distance Distribution Regret Score
80
- 4. Proximal Policy Optimization Fine Tuning
81
-
82
- ## Variations - Preference Model Pretraining πŸ€”
83
- 1. Use Ranking Datasets Sentiment - Thumbs Up/Down, Distribution
84
- 2. Online Version Getting Feedback
85
- 3. OpenAI - InstructGPT - Humans generate LM Training Text
86
- 4. DeepMind - Advantage Actor Critic Sparrow, GopherCite
87
- 5. Reward Model Human Prefence Feedback
88
-
 
 
 
 
 
 
89
  """)
90
 
91
  demo.launch()
 
72
  9. [xP3](https://paperswithcode.com/dataset/xp3)
73
  10. [DiaBLa](https://paperswithcode.com/dataset/diabla)
74
 
75
+ # Deep RL ML Strategy 🧠
76
+
77
+ The AI strategies are:
78
+ - Language Model Preparation using Human Augmented with Supervised Fine Tuning πŸ€–
79
+ - Reward Model Training with Prompts Dataset Multi-Model Generate Data to Rank 🎁
80
+ - Fine Tuning with Reinforcement Reward and Distance Distribution Regret Score 🎯
81
+ - Proximal Policy Optimization Fine Tuning 🀝
82
+ - Variations - Preference Model Pretraining πŸ€”
83
+ - Use Ranking Datasets Sentiment - Thumbs Up/Down, Distribution πŸ“Š
84
+ - Online Version Getting Feedback πŸ’¬
85
+ - OpenAI - InstructGPT - Humans generate LM Training Text πŸ”
86
+ - DeepMind - Advantage Actor Critic Sparrow, GopherCite 🦜
87
+ - Reward Model Human Prefence Feedback πŸ†
88
+
89
+ For more information on specific techniques and implementations, check out the following resources:
90
+ - OpenAI's paper on [GPT-3](https://arxiv.org/abs/2005.14165) which details their Language Model Preparation approach
91
+ - DeepMind's paper on [SAC](https://arxiv.org/abs/1801.01290) which describes the Advantage Actor Critic algorithm
92
+ - OpenAI's paper on [Reward Learning](https://arxiv.org/abs/1810.06580) which explains their approach to training Reward Models
93
+ - OpenAI's blog post on [GPT-3's fine-tuning process](https://openai.com/blog/fine-tuning-gpt-3/)
94
+
95
  """)
96
 
97
  demo.launch()