LLaRA Model Card

This model is released with paper LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

Xiang Li¹, Cristina Mata¹, Jongwoo Park¹, Kumara Kahatapitiya¹, Yoo Sung Jang¹, Jinghuan Shang¹, Kanchana Ranasinghe¹, Ryan Burgert¹, Mu Cai², Yong Jae Lee², and Michael S. Ryoo¹

¹Stony Brook University ²University of Wisconsin-Madison

Model details

Model type: D-RT2-Style is one of the baselines in our LLaRA paper, following the style of RT2. This is an open-source visuomotor policy trained by fine-tuning LLaVA-7b-v1.5 on instruction-following data D-RT2-Style, converted from VIMA-Data. For the conversion code, please refer to convert_vima.ipynb

Model date: llava-1.5-7b-llara-D-RT2-Style-VIMA-80k was trained in June 2024.

Paper or resources for more information: https://github.com/LostXine/LLaRA

Where to send questions or comments about the model: https://github.com/LostXine/LLaRA/issues

Intended use

Primary intended uses: The primary use of LLaRA is research on large multimodal models for robotics.

Primary intended users: The primary intended users of the model are researchers and hobbyists in robotics, computer vision, natural language processing, machine learning, and artificial intelligence.

variante
/

llava-1.5-7b-llara-D-RT2-Style-VIMA-80k

LLaRA Model Card

Model details

Intended use

Dataset used to train variante/llava-1.5-7b-llara-D-RT2-Style-VIMA-80k

Collection including variante/llava-1.5-7b-llara-D-RT2-Style-VIMA-80k

LLaRA