Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Paper • 2412.04454 • Published 20 days ago • 48
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents Paper • 2401.10935 • Published Jan 17 • 4
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11 • 46
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents Paper • 2410.05243 • Published Oct 7 • 17
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents Paper • 2410.23218 • Published Oct 30 • 46
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms Paper • 2410.18967 • Published Oct 24 • 1
EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage Paper • 2409.11295 • Published Sep 17
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? Paper • 2407.10956 • Published Jul 15 • 6
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale Paper • 2409.08264 • Published Sep 12 • 43
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents Paper • 2407.01511 • Published Jul 1
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents Paper • 2405.14573 • Published May 23
Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale Paper • 2409.15637 • Published Sep 24
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents Paper • 2406.10819 • Published Jun 16
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement Paper • 2402.07456 • Published Feb 12 • 41
NNetscape Navigator: Complex Demonstrations for Web Agents Without a Demonstrator Paper • 2410.02907 • Published Oct 3
Towards General Computer Control: A Multimodal Agent for Red Dead Redemption II as a Case Study Paper • 2403.03186 • Published Mar 5 • 5
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant Paper • 2410.18603 • Published Oct 24 • 32
OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning Paper • 2410.18963 • Published Oct 24
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents Paper • 2408.07199 • Published Aug 13 • 21
Agent S: An Open Agentic Framework that Uses Computers Like a Human Paper • 2410.08164 • Published Oct 10 • 24
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials Paper • 2412.09605 • Published 13 days ago • 25