view article Article Agent Leaderboard: Evaluating AI Agents in Multi-Domain Scenarios By pratikbhavsar and 1 other • 17 days ago • 15
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC Paper • 2502.14282 • Published 9 days ago • 17
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published 9 days ago • 167
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models Paper • 2502.09696 • Published 16 days ago • 38