File size: 1,819 Bytes
945df20
5a5a22f
 
1ba54f5
 
5a5a22f
1ba54f5
945df20
 
5a5a22f
945df20
5a5a22f
 
 
945df20
 
5a5a22f
 
945df20
5a5a22f
945df20
5a5a22f
945df20
5a5a22f
 
 
 
945df20
5a5a22f
 
945df20
5a5a22f
 
945df20
5a5a22f
 
945df20
5a5a22f
 
945df20
5a5a22f
 
 
945df20
5a5a22f
 
945df20
5a5a22f
 
 
 
 
 
945df20
5a5a22f
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
license: apache-2.0
library_name: sglang
tags:
- llava
inference: false
pipeline_tag: image-text-to-text
---

## Inference Preparation

This is a fork of [liuhaotian/llava-v1.6-mistral-7b](https://huggingface.co/liuhaotian/llava-v1.6-mistral-7b) to be fully
compatible for inference with [SGLang](https://github.com/sgl-project/sglang/).
No other changes were made.


<br>
<br>

# LLaVA Model Card

## Model details

**Model type:**
LLaVA is an open-source chatbot trained by fine-tuning LLM on multimodal instruction-following data.
It is an auto-regressive language model, based on the transformer architecture.
Base LLM: [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)

**Model date:**
LLaVA-v1.6-Mistral-7B was trained in December 2023.

**Paper or resources for more information:**
https://llava-vl.github.io/

## License
[mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) license.

**Where to send questions or comments about the model:**
https://github.com/haotian-liu/LLaVA/issues

## Intended use
**Primary intended uses:**
The primary use of LLaVA is research on large multimodal models and chatbots.

**Primary intended users:**
The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

## Training dataset
- 558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP.
- 158K GPT-generated multimodal instruction-following data.
- 500K academic-task-oriented VQA data mixture.
- 50K GPT-4V data mixture.
- 40K ShareGPT data.

## Evaluation dataset
A collection of 12 benchmarks, including 5 academic VQA benchmarks and 7 recent benchmarks specifically proposed for instruction-following LMMs.