view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • Feb 7 • 70
Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B Viewer • Updated Jan 27 • 250k • 4.87k • 92
🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 14 items • Updated 2 days ago • 100