Update app.py
Browse files
app.py
CHANGED
@@ -72,20 +72,26 @@ Here is an outline of some of the most exciting recent developments in AI:
|
|
72 |
9. [xP3](https://paperswithcode.com/dataset/xp3)
|
73 |
10. [DiaBLa](https://paperswithcode.com/dataset/diabla)
|
74 |
|
75 |
-
|
76 |
-
|
77 |
-
|
78 |
-
|
79 |
-
|
80 |
-
|
81 |
-
|
82 |
-
|
83 |
-
|
84 |
-
|
85 |
-
|
86 |
-
|
87 |
-
|
88 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
89 |
""")
|
90 |
|
91 |
demo.launch()
|
|
|
72 |
9. [xP3](https://paperswithcode.com/dataset/xp3)
|
73 |
10. [DiaBLa](https://paperswithcode.com/dataset/diabla)
|
74 |
|
75 |
+
# Deep RL ML Strategy π§
|
76 |
+
|
77 |
+
The AI strategies are:
|
78 |
+
- Language Model Preparation using Human Augmented with Supervised Fine Tuning π€
|
79 |
+
- Reward Model Training with Prompts Dataset Multi-Model Generate Data to Rank π
|
80 |
+
- Fine Tuning with Reinforcement Reward and Distance Distribution Regret Score π―
|
81 |
+
- Proximal Policy Optimization Fine Tuning π€
|
82 |
+
- Variations - Preference Model Pretraining π€
|
83 |
+
- Use Ranking Datasets Sentiment - Thumbs Up/Down, Distribution π
|
84 |
+
- Online Version Getting Feedback π¬
|
85 |
+
- OpenAI - InstructGPT - Humans generate LM Training Text π
|
86 |
+
- DeepMind - Advantage Actor Critic Sparrow, GopherCite π¦
|
87 |
+
- Reward Model Human Prefence Feedback π
|
88 |
+
|
89 |
+
For more information on specific techniques and implementations, check out the following resources:
|
90 |
+
- OpenAI's paper on [GPT-3](https://arxiv.org/abs/2005.14165) which details their Language Model Preparation approach
|
91 |
+
- DeepMind's paper on [SAC](https://arxiv.org/abs/1801.01290) which describes the Advantage Actor Critic algorithm
|
92 |
+
- OpenAI's paper on [Reward Learning](https://arxiv.org/abs/1810.06580) which explains their approach to training Reward Models
|
93 |
+
- OpenAI's blog post on [GPT-3's fine-tuning process](https://openai.com/blog/fine-tuning-gpt-3/)
|
94 |
+
|
95 |
""")
|
96 |
|
97 |
demo.launch()
|