-
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Paper • 2504.08942 • Published • 27 -
McGill-NLP/agent-reward-bench
Viewer • Updated • 1.41k • 3.49k • 3 -
4
Agent Reward Bench Demo
💻Visualize agent interactions with WebArena tasks
-
Agent Reward Bench Leaderboard
🥇Leaderboard for AgentRewardBench
Xing Han Lù
xhluca
AI & ML interests
None yet
Recent Activity
upvoted
an
article
19 days ago
How to Train Your LLM Web Agent: A Statistical Diagnosis
upvoted
a
paper
21 days ago
How to Train Your LLM Web Agent: A Statistical Diagnosis