Papers
arxiv:2507.20152

Goal Alignment in LLM-Based User Simulators for Conversational AI

Published on Jul 27
· Submitted by shuhaibmehri on Jul 29
Authors:
,
,
,
,
,

Abstract

A novel framework, User Goal State Tracking (UGST), is introduced to improve goal-aligned behavior in user simulators for conversational AI, demonstrating significant improvements in goal alignment across benchmarks.

AI-generated summary

User simulators are essential to conversational AI, enabling scalable agent development and evaluation through simulated interactions. While current Large Language Models (LLMs) have advanced user simulation capabilities, we reveal that they struggle to consistently demonstrate goal-oriented behavior across multi-turn conversations--a critical limitation that compromises their reliability in downstream applications. We introduce User Goal State Tracking (UGST), a novel framework that tracks user goal progression throughout conversations. Leveraging UGST, we present a three-stage methodology for developing user simulators that can autonomously track goal progression and reason to generate goal-aligned responses. Moreover, we establish comprehensive evaluation metrics for measuring goal alignment in user simulators, and demonstrate that our approach yields substantial improvements across two benchmarks (MultiWOZ 2.4 and {\tau}-Bench). Our contributions address a critical gap in conversational AI and establish UGST as an essential framework for developing goal-aligned user simulators.

Community

Paper submitter

User simulators are essential for creating environments that allow conversational agents to learn through interaction. In this work, we introduce User Goal State Tracking (UGST) to develop simulators that stay aligned with their intended behaviors throughout conversations.

We find that existing LLM-based user simulators fail to consistently align with up to 40% of their user goals throughout multi-turn conversations. This unreliable behaviour can lead to inaccurate evaluations and compromises the effectiveness of RL for conversational agents.

UGST decomposes a user goal into modular sub-components (e.g. “find a restaurant”, “be polite”) and dynamically tracks their progress throughout a conversation. We leverage UGST to develop goal-aligned user simulators: (1) Inference-time steering with the latest goal states, (2) SFT on responses with reasoning traces, and (3) GRPO using UGST-derived rewards.

This gives us simulators that autonomously track and reason to progress towards their goals. We observe up to 14% improvement in goal alignment across MultiWOZ and τ-bench benchmarks. Notably, our enhanced Llama-3.1-8B and Qwen-2.5-7B now match or exceed their 70B+ counterparts--while maintaining naturalness and coherence.

Our work addresses a critical limitation in existing user simulators and lays the groundwork for future advances in user simulation and conversational AI.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.20152 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.20152 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.20152 in a Space README.md to link it from this page.

Collections including this paper 1