On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published 9 days ago • 144
cyberagent/DeepSeek-R1-Distill-Qwen-14B-Japanese Text Generation • 15B • Updated Jan 27 • 1.37k • • 90