Spaces:

awacke1
/

Bloom.Big.Science.Continual.Generator

Sleeping

awacke1 commited on Feb 19, 2023

Commit

7d9e94d

1 Parent(s): d6e6796

Update app.py

Files changed (1) hide show

app.py CHANGED Viewed

@@ -72,20 +72,26 @@ Here is an outline of some of the most exciting recent developments in AI:
   9. [xP3](https://paperswithcode.com/dataset/xp3)
   10. [DiaBLa](https://paperswithcode.com/dataset/diabla)
-## Deep RL ML Strategy 🧠
-1. Language Model Preparation, Human Augmented with Supervised Fine Tuning
-2. Reward Model Training with Prompts Dataset Multi-Model Generate Data to Rank
-3. Fine Tuning with Reinforcement Reward and Distance Distribution Regret Score
-4. Proximal Policy Optimization Fine Tuning
-## Variations - Preference Model Pretraining 🤔
-1. Use Ranking Datasets Sentiment - Thumbs Up/Down, Distribution
-2. Online Version Getting Feedback
-3. OpenAI - InstructGPT - Humans generate LM Training Text
-4. DeepMind - Advantage Actor Critic Sparrow, GopherCite
-5. Reward Model Human Prefence Feedback
         """)
 demo.launch()

   9. [xP3](https://paperswithcode.com/dataset/xp3)
   10. [DiaBLa](https://paperswithcode.com/dataset/diabla)
+# Deep RL ML Strategy 🧠
+The AI strategies are:
+- Language Model Preparation using Human Augmented with Supervised Fine Tuning 🤖
+- Reward Model Training with Prompts Dataset Multi-Model Generate Data to Rank 🎁
+- Fine Tuning with Reinforcement Reward and Distance Distribution Regret Score 🎯
+- Proximal Policy Optimization Fine Tuning 🤝
+- Variations - Preference Model Pretraining 🤔
+- Use Ranking Datasets Sentiment - Thumbs Up/Down, Distribution 📊
+- Online Version Getting Feedback 💬
+- OpenAI - InstructGPT - Humans generate LM Training Text 🔍
+- DeepMind - Advantage Actor Critic Sparrow, GopherCite 🦜
+- Reward Model Human Prefence Feedback 🏆
+For more information on specific techniques and implementations, check out the following resources:
+- OpenAI's paper on [GPT-3](https://arxiv.org/abs/2005.14165) which details their Language Model Preparation approach
+- DeepMind's paper on [SAC](https://arxiv.org/abs/1801.01290) which describes the Advantage Actor Critic algorithm
+- OpenAI's paper on [Reward Learning](https://arxiv.org/abs/1810.06580) which explains their approach to training Reward Models
+- OpenAI's blog post on [GPT-3's fine-tuning process](https://openai.com/blog/fine-tuning-gpt-3/)
         """)
 demo.launch()