view article Article ฯ0 and ฯ0-FAST: Vision-Language-Action Models for General Robot Control Feb 4 โข 116
meta-llama/Llama-3.2-11B-Vision-Instruct Image-Text-to-Text โข Updated Dec 4, 2024 โข 1.42M โข โข 1.38k
Towards A Unified Agent with Foundation Models Paper โข 2307.09668 โข Published Jul 18, 2023 โข 13
Android in the Wild: A Large-Scale Dataset for Android Device Control Paper โข 2307.10088 โข Published Jul 19, 2023 โข 11
Planting a SEED of Vision in Large Language Model Paper โข 2307.08041 โข Published Jul 16, 2023 โข 11
OpenMask3D: Open-Vocabulary 3D Instance Segmentation Paper โข 2306.13631 โข Published Jun 23, 2023 โข 9