Model details Model type: LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture.

Model date: LLaVA-flint-v0.5-1B was trained in Nov 2023.

This model is an implementation of Llava using the TinyLlama 1.1b as the frozen llm model

It's designed to be able to run in low-resource environments We plan to release further versions designed for specific tasks so stay tuned.

Paper or resources for more information on the original Llava: https://llava-vl.github.io/

License Apache 2 (TinyLlama) Where to send questions or comments about the model: ask me here on huggingface :)

Intended use Primary intended uses: The primary use of LLaVA is research on large multimodal models and chatbots.

Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

Training dataset 558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP. 158K GPT-generated multimodal instruction-following data. 450K academic-task-oriented VQA data mixture. 40K ShareGPT data. Evaluation dataset A collection of 12 benchmarks, including 5 academic VQA benchmarks and 7 recent benchmarks specifically proposed for instruction-following LMMs.

Downloads last month
28
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.