Image-Text-to-Text
Transformers
Safetensors
English
Chinese
llava
vision-language
llm
lmm
conversational
Inference Endpoints
bczhou commited on
Commit
b83932c
1 Parent(s): 858c854

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -2
README.md CHANGED
@@ -8,11 +8,20 @@ language:
8
  library_name: transformers
9
  ---
10
 
11
- **Model type:**
12
  TinyLLaVA, a tiny model (1.4B) trained using the exact recipe of [LLaVA-1.5](https://github.com/haotian-liu/LLaVA).
13
  We trained our TinyLLaVA using [TinyLlama](https://huggingface.co/PY007/TinyLlama-1.1B-Chat-v0.3) as our LLM backbone, and [clip-vit-large-patch14-336](https://huggingface.co/openai/clip-vit-large-patch14-336) as our vision backbone.
14
 
15
- **Model use:**
 
 
 
 
 
 
 
 
 
16
  The weights have been converted to hf format.
17
 
18
  ## How to use the model
 
8
  library_name: transformers
9
  ---
10
 
11
+ ## Model type
12
  TinyLLaVA, a tiny model (1.4B) trained using the exact recipe of [LLaVA-1.5](https://github.com/haotian-liu/LLaVA).
13
  We trained our TinyLLaVA using [TinyLlama](https://huggingface.co/PY007/TinyLlama-1.1B-Chat-v0.3) as our LLM backbone, and [clip-vit-large-patch14-336](https://huggingface.co/openai/clip-vit-large-patch14-336) as our vision backbone.
14
 
15
+ ## Model Performance
16
+ We have evaluated TinyLLaVA on [GQA](https://cs.stanford.edu/people/dorarad/gqa/about.html), [VizWiz](https://www.vizwiz.com/), [VQAv2](https://visualqa.org/), [TextVQA](https://textvqa.org/) and [SQA](https://github.com/lupantech/ScienceQA).
17
+
18
+ | Model | VQAv2 | GQA | SQA | TextVQA | VizWiz |
19
+ | -------------------- | :------------: | :------------: | :------------: | :------------: | :------------: |
20
+ | TinyLLaVA-v1 | 73.41 | 57.54 | 59.40 | 46.37 | 49.56 |
21
+
22
+ More evaluations are ongoing.
23
+
24
+ ## Model use
25
  The weights have been converted to hf format.
26
 
27
  ## How to use the model