Open-World Skill Discovery from Unsegmented Demonstrations Paper • 2503.10684 • Published Mar 11 • 5
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective Paper • 2507.01925 • Published 27 days ago • 35
GROOT-2: Weakly Supervised Multi-Modal Instruction Following Agents Paper • 2412.10410 • Published Dec 7, 2024
Rethinking Graph Neural Architecture Search from Message-passing Paper • 2103.14282 • Published Mar 26, 2021
ROCKET-2: Steering Visuomotor Policy via Cross-View Goal Alignment Paper • 2503.02505 • Published Mar 4 • 7
ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting Paper • 2410.17856 • Published Oct 23, 2024 • 52
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents Paper • 2302.01560 • Published Feb 3, 2023 • 1
GROOT: Learning to Follow Instructions by Watching Gameplay Videos Paper • 2310.08235 • Published Oct 12, 2023 • 1
Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction Paper • 2301.10034 • Published Jan 21, 2023
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models Paper • 2311.05997 • Published Nov 10, 2023 • 37
OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents Paper • 2407.00114 • Published Jun 27, 2024 • 13