Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,6 @@ library_name: transformers
|
|
10 |
---
|
11 |
|
12 |
# WORK IN PROGRESS
|
13 |
-
|
14 |
We present TinyLLaVA, a small vision-language chatbot (1.4B) that reaches comparable performances with contemporary vision language models on common benchmarks, using less parameters.
|
15 |
TinyLLaVA was trained by finetuning [TinyLlama](https://huggingface.co/PY007/TinyLlama-1.1B-Chat-v0.3) on the [LLaVA-1.5](https://github.com/haotian-liu/LLaVA) dataset, following the training recipe of [LLaVA-1.5](https://github.com/haotian-liu/LLaVA). For more details, please refer to the [LLaVA-1.5 paper](https://arxiv.org/abs/2310.03744).
|
16 |
|
@@ -32,13 +31,16 @@ We have evaluated TinyLLaVA on [GQA](https://cs.stanford.edu/people/dorarad/gqa/
|
|
32 |
More evaluations are ongoing.
|
33 |
|
34 |
|
35 |
-
##
|
36 |
|
37 |
-
|
38 |
-
|
39 |
|
40 |
-
###
|
|
|
41 |
|
|
|
|
|
42 |
Below we used [`"bczhou/tiny-llava-v1-hf"`](https://huggingface.co/bczhou/tiny-llava-v1-hf) checkpoint.
|
43 |
|
44 |
```python
|
@@ -56,7 +58,6 @@ print(outputs[0])
|
|
56 |
```
|
57 |
|
58 |
### Using pure `transformers`:
|
59 |
-
|
60 |
Below is an example script to run generation in `float16` precision on a GPU device:
|
61 |
|
62 |
```python
|
@@ -80,5 +81,4 @@ print(processor.decode(output[0][2:], skip_special_tokens=True))
|
|
80 |
```
|
81 |
|
82 |
## Contact
|
83 |
-
|
84 |
This model was trained by [Baichuan Zhou](https://baichuanzhou.github.io/), from Beihang Univerisity, under the supervision of [Prof. Lei Huang](https://huangleibuaa.github.io/).
|
|
|
10 |
---
|
11 |
|
12 |
# WORK IN PROGRESS
|
|
|
13 |
We present TinyLLaVA, a small vision-language chatbot (1.4B) that reaches comparable performances with contemporary vision language models on common benchmarks, using less parameters.
|
14 |
TinyLLaVA was trained by finetuning [TinyLlama](https://huggingface.co/PY007/TinyLlama-1.1B-Chat-v0.3) on the [LLaVA-1.5](https://github.com/haotian-liu/LLaVA) dataset, following the training recipe of [LLaVA-1.5](https://github.com/haotian-liu/LLaVA). For more details, please refer to the [LLaVA-1.5 paper](https://arxiv.org/abs/2310.03744).
|
15 |
|
|
|
31 |
More evaluations are ongoing.
|
32 |
|
33 |
|
34 |
+
## Model Preparations
|
35 |
|
36 |
+
### Transformers Version
|
37 |
+
Make sure to have `transformers >= 4.35.3`.
|
38 |
|
39 |
+
### Prompt Template
|
40 |
+
The model supports multi-image and multi-prompt generation. When using the model, make sure to follow the correct prompt template (`USER: <image>xxx\nASSISTANT:`), where `<image>` token is a place-holding special token for image embeddings.
|
41 |
|
42 |
+
## Model Inference from `pipeline` and `transformers`
|
43 |
+
### Using `pipeline`:
|
44 |
Below we used [`"bczhou/tiny-llava-v1-hf"`](https://huggingface.co/bczhou/tiny-llava-v1-hf) checkpoint.
|
45 |
|
46 |
```python
|
|
|
58 |
```
|
59 |
|
60 |
### Using pure `transformers`:
|
|
|
61 |
Below is an example script to run generation in `float16` precision on a GPU device:
|
62 |
|
63 |
```python
|
|
|
81 |
```
|
82 |
|
83 |
## Contact
|
|
|
84 |
This model was trained by [Baichuan Zhou](https://baichuanzhou.github.io/), from Beihang Univerisity, under the supervision of [Prof. Lei Huang](https://huangleibuaa.github.io/).
|