Rsr2425 commited on
Commit
8f1ccd9
·
1 Parent(s): 2f47bd0

[HW3] Finished answering questions

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -345,13 +345,12 @@ Does this application pass your vibe check? Are there any immediate pitfalls you
345
 
346
  #### ✅ Discussion Answer #1:
347
  1. What is RL and how does it help reasoning?
348
- Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by taking actions in an environment to maximize some notion of cumulative reward.
349
 
350
- In the context of reasoning, RL helps improve the reasoning capabilities of language models by allowing them to learn from their interactions and the feedback they receive, rather than relying solely on supervised data. By using RL, models, like DeepSeek-R1-Zero, can self-evolve their reasoning capabilities through a process of trial and error. This approach has demonstrated significant effectiveness, leading to improved reasoning performance on various benchmarks. Specifically, RL encourages the development of powerful reasoning behaviors without the need for extensive datasets that require manual labeling, making the learning process more efficient.
351
  2. What is the difference between DeepSeek-R1 and DeepSeek-R1-Zero?
352
  The main difference between DeepSeek-R1 and DeepSeek-R1-Zero lies in their approaches to reinforcement learning and the incorporation of data. DeepSeek-R1-Zero demonstrates strong reasoning capabilities and exhibits self-evolution through reinforcement learning, but it faces issues like poor readability and language mixing. In contrast, DeepSeek-R1 builds upon the strengths of DeepSeek-R1-Zero by exploring the use of a small amount of high-quality data as a cold start, aiming to improve reasoning performance and create a user-friendly model. Additionally, DeepSeek-R1 is designed to make its reasoning processes more readable and accessible to the community, addressing some of the drawbacks identified in DeepSeek-R1-Zero.
353
- 3. What is this paper about? I don't know the answer.
354
 
 
355
 
356
  Does this application pass your vibe check? Are there any immediate pitfalls you're noticing? It particularly struggled to answer broad high-level questions, like what the totality of th paper is about. The app did okay, but clearly there are gaps. This parituclar issue is likely the result of suboptimal chunking.
357
 
 
345
 
346
  #### ✅ Discussion Answer #1:
347
  1. What is RL and how does it help reasoning?
348
+ Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by taking actions in an environment to maximize some notion of cumulative reward. In the context of reasoning, RL helps improve the reasoning capabilities of language models by allowing them to learn from their interactions and the feedback they receive, rather than relying solely on supervised data. By using RL, models, like DeepSeek-R1-Zero, can self-evolve their reasoning capabilities through a process of trial and error. This approach has demonstrated significant effectiveness, leading to improved reasoning performance on various benchmarks. Specifically, RL encourages the development of powerful reasoning behaviors without the need for extensive datasets that require manual labeling, making the learning process more efficient.
349
 
 
350
  2. What is the difference between DeepSeek-R1 and DeepSeek-R1-Zero?
351
  The main difference between DeepSeek-R1 and DeepSeek-R1-Zero lies in their approaches to reinforcement learning and the incorporation of data. DeepSeek-R1-Zero demonstrates strong reasoning capabilities and exhibits self-evolution through reinforcement learning, but it faces issues like poor readability and language mixing. In contrast, DeepSeek-R1 builds upon the strengths of DeepSeek-R1-Zero by exploring the use of a small amount of high-quality data as a cold start, aiming to improve reasoning performance and create a user-friendly model. Additionally, DeepSeek-R1 is designed to make its reasoning processes more readable and accessible to the community, addressing some of the drawbacks identified in DeepSeek-R1-Zero.
 
352
 
353
+ 3. What is this paper about? I don't know the answer.
354
 
355
  Does this application pass your vibe check? Are there any immediate pitfalls you're noticing? It particularly struggled to answer broad high-level questions, like what the totality of th paper is about. The app did okay, but clearly there are gaps. This parituclar issue is likely the result of suboptimal chunking.
356