Zery
/

Image-Text-to-Text
Transformers
PyTorch
English
share4v
text-generation
MV-LLaVA-7B / README.md
Zery's picture
Update README.md
7f276a7 verified
metadata
inference: false
pipeline_tag: image-text-to-text
license: apache-2.0
datasets:
  - Zery/BS-Objaverse
  - Lin-Chen/ShareGPT4V
language:
  - en


MV-LLaVA-7B Model Card

Model details

Model type: MV-LLaVA-7B is an open-source chatbot for 3D multi-view images trained by fine-tuning CLIP vision tower and LLaMA/Vicuna on GPT4-Vision-assisted BS-Objaverse data and ShareGPT4V data.

Model date: MV-LLaVA-7B was trained in Apr, 2024.

Paper or resources for more information: [Project] [Paper] [Code]

Usage

You can directly utilize this model as we provide in our [repository].

License

Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.

Intended use

Primary intended uses: The primary use of ShareGPT4V-7B is research on large multimodal models and chatbots for 3D content. Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

Training dataset

  • 1.2M ShareGPT4V-PT data
  • 30K GPT4-Vision-generated multi-view image-text pairs
  • LLaVA instruction-tuning data