Shaofei Cai's picture

4 12 4

Shaofei Cai

phython96

·

https://phython96.github.io

phython96

AI & ML interests

Embodied Decision Making, Computer Vision, Game AI, Robotics

Recent Activity

commented on a paper 2 days ago

Scalable Multi-Task Reinforcement Learning for Generalizable Spatial Intelligence in Visuomotor Agents

commented on a paper 2 days ago

Scalable Multi-Task Reinforcement Learning for Generalizable Spatial Intelligence in Visuomotor Agents

updated a collection 2 days ago

View all activity

Organizations

authored 3 papers about 1 month ago

Edge-featured Graph Neural Architecture Search

Paper • 2109.01356 • Published Sep 3, 2021

Open-World Skill Discovery from Unsegmented Demonstrations

Paper • 2503.10684 • Published Mar 11 • 5

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Paper • 2507.01925 • Published Jul 2 • 35

authored 4 papers 4 months ago

GROOT-2: Weakly Supervised Multi-Modal Instruction Following Agents

Paper • 2412.10410 • Published Dec 7, 2024

Rethinking Graph Neural Architecture Search from Message-passing

Paper • 2103.14282 • Published Mar 26, 2021

Automatic Relation-aware Graph Network Proliferation

Paper • 2205.15678 • Published May 31, 2022

ROCKET-2: Steering Visuomotor Policy via Cross-View Goal Alignment

Paper • 2503.02505 • Published Mar 4 • 7

authored a paper 9 months ago

ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting

Paper • 2410.17856 • Published Oct 23, 2024 • 52

authored 5 papers about 1 year ago

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

Paper • 2302.01560 • Published Feb 3, 2023 • 1

GROOT: Learning to Follow Instructions by Watching Gameplay Videos

Paper • 2310.08235 • Published Oct 12, 2023 • 1

Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction

Paper • 2301.10034 • Published Jan 21, 2023

JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models

Paper • 2311.05997 • Published Nov 10, 2023 • 37

OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

Paper • 2407.00114 • Published Jun 27, 2024 • 13