SWE-bench Collection SWE-bench is a benchmark for evaluating Language Models and AI Systems on their ability resolve real world GitHub Issues. • 4 items • Updated Mar 8 • 6
view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • Jul 8 • 626
HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation Paper • 2407.17438 • Published Jul 24, 2024 • 27
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment Paper • 2403.05135 • Published Mar 8, 2024 • 46
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference Paper • 2403.04132 • Published Mar 7, 2024 • 41