Papers
arxiv:2503.12349

SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?

Published on Mar 16
· Submitted by yyyyyyjjjjzzz on Mar 18
Authors:
,
,
,

Abstract

Reasoning and strategic behavior in social interactions is a hallmark of intelligence. This form of reasoning is significantly more sophisticated than isolated planning or reasoning tasks in static settings (e.g., math problem solving). In this paper, we present Strategic Planning, Interaction, and Negotiation (SPIN-Bench), a new multi-domain evaluation designed to measure the intelligence of strategic planning and social reasoning. While many existing benchmarks focus on narrow planning or single-agent reasoning, SPIN-Bench combines classical PDDL tasks, competitive board games, cooperative card games, and multi-agent negotiation scenarios in one unified framework. The framework includes both a benchmark as well as an arena to simulate and evaluate the variety of social settings to test reasoning and strategic behavior of AI agents. We formulate the benchmark SPIN-Bench by systematically varying action spaces, state complexity, and the number of interacting agents to simulate a variety of social settings where success depends on not only methodical and step-wise decision making, but also conceptual inference of other (adversarial or cooperative) participants. Our experiments reveal that while contemporary LLMs handle basic fact retrieval and short-range planning reasonably well, they encounter significant performance bottlenecks in tasks requiring deep multi-hop reasoning over large state spaces and socially adept coordination under uncertainty. We envision SPIN-Bench as a catalyst for future research on robust multi-agent planning, social reasoning, and human--AI teaming.

Community

Paper author Paper submitter
This comment has been hidden

TL;DR Summary
SPIN-Bench reveals that while LLMs excel in short-term planning and factual recall, they falter in deep strategic reasoning and social interactions. They struggle with tasks like Chess due to high branching factors, and their simplistic negotiation styles expose a critical gap in theory-of-mind and dynamic multi-agent coordination.

Visit our website at https://spinbench.github.io/ for more details. You can also read our paper on ArXiv at https://arxiv.org/abs/2503.12349. Be sure to check out our visual demo, which features engaging visuals and detailed LLM reasoning steps.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2503.12349 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.12349 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.12349 in a Space README.md to link it from this page.

Collections including this paper 1