HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions Paper • 2409.16427 • Published Sep 24, 2024 • 1
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published Dec 18, 2024 • 51
Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs Paper • 2403.05020 • Published Mar 8, 2024 • 2