I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy
ByteDance drops OmniHuman🔥 This is peak SOTA performance - flawless natural gestures with perfect lip sync and facial expressions. This is the second time they released SOTA level talking-heads only this time with hands and body motion. Project: https://omnihuman-lab.github.io/
The beauty in GRPO is the fact that it doesn’t care if the rewards are rule-based or learned, the hack: let the data self-normalize— trajectories in a batch compete against their mean, no value model, no extra params, just clean, efficient RL that cuts memory usage by 50%, while maintaining SOTA performance. btw it was introduced 9months prior to R1: arxiv.org/pdf/2402.03300