Goal Alignment in LLM-Based User Simulators for Conversational AI
Abstract
A novel framework, User Goal State Tracking (UGST), is introduced to improve goal-aligned behavior in user simulators for conversational AI, demonstrating significant improvements in goal alignment across benchmarks.
User simulators are essential to conversational AI, enabling scalable agent development and evaluation through simulated interactions. While current Large Language Models (LLMs) have advanced user simulation capabilities, we reveal that they struggle to consistently demonstrate goal-oriented behavior across multi-turn conversations--a critical limitation that compromises their reliability in downstream applications. We introduce User Goal State Tracking (UGST), a novel framework that tracks user goal progression throughout conversations. Leveraging UGST, we present a three-stage methodology for developing user simulators that can autonomously track goal progression and reason to generate goal-aligned responses. Moreover, we establish comprehensive evaluation metrics for measuring goal alignment in user simulators, and demonstrate that our approach yields substantial improvements across two benchmarks (MultiWOZ 2.4 and {\tau}-Bench). Our contributions address a critical gap in conversational AI and establish UGST as an essential framework for developing goal-aligned user simulators.
Community
User simulators are essential for creating environments that allow conversational agents to learn through interaction. In this work, we introduce User Goal State Tracking (UGST) to develop simulators that stay aligned with their intended behaviors throughout conversations.
We find that existing LLM-based user simulators fail to consistently align with up to 40% of their user goals throughout multi-turn conversations. This unreliable behaviour can lead to inaccurate evaluations and compromises the effectiveness of RL for conversational agents.
UGST decomposes a user goal into modular sub-components (e.g. “find a restaurant”, “be polite”) and dynamically tracks their progress throughout a conversation. We leverage UGST to develop goal-aligned user simulators: (1) Inference-time steering with the latest goal states, (2) SFT on responses with reasoning traces, and (3) GRPO using UGST-derived rewards.
This gives us simulators that autonomously track and reason to progress towards their goals. We observe up to 14% improvement in goal alignment across MultiWOZ and τ-bench benchmarks. Notably, our enhanced Llama-3.1-8B and Qwen-2.5-7B now match or exceed their 70B+ counterparts--while maintaining naturalness and coherence.
Our work addresses a critical limitation in existing user simulators and lays the groundwork for future advances in user simulation and conversational AI.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- PersonaLens: A Benchmark for Personalization Evaluation in Conversational AI Assistants (2025)
- Expectation Confirmation Preference Optimization for Multi-Turn Conversational Recommendation Agent (2025)
- ConsistentChat: Building Skeleton-Guided Consistent Dialogues for Large Language Models from Scratch (2025)
- The Behavior Gap: Evaluating Zero-shot LLM Agents in Complex Task-Oriented Dialogs (2025)
- τ2-Bench: Evaluating Conversational Agents in a Dual-Control Environment (2025)
- DialogueForge: LLM Simulation of Human-Chatbot Dialogue (2025)
- RMTBench: Benchmarking LLMs Through Multi-Turn User-Centric Role-Playing (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper