Awesome Computer Use Agents - a ranpox Collection

ranpox 's Collections

AgentTrek: Browser-Use Agent Data Synthesis

AGUVIS: Unified Pure Vision GUI Agents

LayoutLM and Document Intelligence

Awesome Computer Use Agents

Awesome Computer Use Agents

updated Dec 18, 2024

https://github.com/ranpox/awesome-computer-use

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Paper • 2412.04454 • Published Dec 5, 2024 • 64
Tree Search for Language Model Agents

Paper • 2407.01476 • Published Jul 1, 2024
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

Paper • 2401.10935 • Published Jan 17, 2024 • 4
OmniParser for Pure Vision Based GUI Agent

Paper • 2408.00203 • Published Aug 1, 2024 • 25
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Paper • 2404.07972 • Published Apr 11, 2024 • 48
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

Paper • 2410.05243 • Published Oct 7, 2024 • 19
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 50
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

Paper • 2410.18967 • Published Oct 24, 2024 • 1
Adversarial Attacks on Multimodal Agents

Paper • 2406.12814 • Published Jun 18, 2024 • 4
EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage

Paper • 2409.11295 • Published Sep 17, 2024
Attacking Vision-Language Computer Agents via Pop-ups

Paper • 2411.02391 • Published Nov 4, 2024
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

Paper • 2407.10956 • Published Jul 15, 2024 • 7
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Paper • 2409.08264 • Published Sep 12, 2024 • 46
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents

Paper • 2407.01511 • Published Jul 1, 2024
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents

Paper • 2405.14573 • Published May 23, 2024
Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale

Paper • 2409.15637 • Published Sep 24, 2024
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents

Paper • 2406.10819 • Published Jun 16, 2024 • 1
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

Paper • 2402.07456 • Published Feb 12, 2024 • 44
NNetscape Navigator: Complex Demonstrations for Web Agents Without a Demonstrator

Paper • 2410.02907 • Published Oct 3, 2024
Towards General Computer Control: A Multimodal Agent for Red Dead Redemption II as a Case Study

Paper • 2403.03186 • Published Mar 5, 2024 • 5
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant

Paper • 2410.18603 • Published Oct 24, 2024 • 32
OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning

Paper • 2410.18963 • Published Oct 24, 2024
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents

Paper • 2408.07199 • Published Aug 13, 2024 • 21
Agent S: An Open Agentic Framework that Uses Computers Like a Human

Paper • 2410.08164 • Published Oct 10, 2024 • 24
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials

Paper • 2412.09605 • Published Dec 12, 2024 • 29