TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published 7 days ago • 43
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published 7 days ago • 43
Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs Paper • 2403.05020 • Published Mar 8 • 2
WebArena: A Realistic Web Environment for Building Autonomous Agents Paper • 2307.13854 • Published Jul 25, 2023 • 23