bczhou
/

tiny-llava-v1-hf

Image-Text-to-Text

vision-language

Inference Endpoints

Model card Files Files and versions Community

bczhou commited on Jan 12

Commit

b83932c

•

1 Parent(s): 858c854

Update README.md

Files changed (1) hide show

README.md +11 -2

README.md CHANGED Viewed

@@ -8,11 +8,20 @@ language:
 library_name: transformers
 ---
-**Model type:**
 TinyLLaVA, a tiny model (1.4B) trained using the exact recipe of [LLaVA-1.5](https://github.com/haotian-liu/LLaVA).
 We trained our TinyLLaVA using [TinyLlama](https://huggingface.co/PY007/TinyLlama-1.1B-Chat-v0.3) as our LLM backbone, and [clip-vit-large-patch14-336](https://huggingface.co/openai/clip-vit-large-patch14-336) as our vision backbone.
-**Model use:**
 The weights have been converted to hf format.
 ## How to use the model

 library_name: transformers
 ---
+## Model type
 TinyLLaVA, a tiny model (1.4B) trained using the exact recipe of [LLaVA-1.5](https://github.com/haotian-liu/LLaVA).
 We trained our TinyLLaVA using [TinyLlama](https://huggingface.co/PY007/TinyLlama-1.1B-Chat-v0.3) as our LLM backbone, and [clip-vit-large-patch14-336](https://huggingface.co/openai/clip-vit-large-patch14-336) as our vision backbone.
+## Model Performance
+We have evaluated TinyLLaVA on [GQA](https://cs.stanford.edu/people/dorarad/gqa/about.html), [VizWiz](https://www.vizwiz.com/), [VQAv2](https://visualqa.org/), [TextVQA](https://textvqa.org/) and [SQA](https://github.com/lupantech/ScienceQA).
+|   Model   |     VQAv2      |      GQA       |       SQA      |      TextVQA   |      VizWiz    |
+| -------------------- | :------------: | :------------: | :------------: | :------------: | :------------: |
+|   TinyLLaVA-v1       |      73.41     |     57.54      |     59.40      |     46.37      |      49.56      |
+More evaluations are ongoing.
+## Model use
 The weights have been converted to hf format.
 ## How to use the model