Commit
•
f7be668
1
Parent(s):
cec0338
added colab
Browse files
README.md
CHANGED
@@ -16,6 +16,8 @@ arxiv: 2408.03326
|
|
16 |
|
17 |
![image/png](llava_onevision_arch.png)
|
18 |
|
|
|
|
|
19 |
Below is the model card of 0.5B LLaVA-Onevision model which is copied from the original LLaVA-Onevision model card that you can find [here](https://huggingface.co/lmms-lab/llava-onevision-qwen2-0.5b-si).
|
20 |
|
21 |
|
@@ -53,12 +55,13 @@ The model supports multi-image and multi-prompt generation. Meaning that you can
|
|
53 |
Below we used [`"llava-hf/llava-onevision-qwen2-0.5b-si-hf"`](https://huggingface.co/llava-hf/llava-onevision-qwen2-0.5b-si-hf) checkpoint.
|
54 |
|
55 |
```python
|
56 |
-
from transformers import pipeline
|
57 |
from PIL import Image
|
58 |
import requests
|
59 |
|
60 |
model_id = "llava-hf/llava-onevision-qwen2-0.5b-si-hf"
|
61 |
pipe = pipeline("image-to-text", model=model_id)
|
|
|
62 |
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"
|
63 |
image = Image.open(requests.get(url, stream=True).raw)
|
64 |
|
@@ -74,7 +77,7 @@ conversation = [
|
|
74 |
],
|
75 |
},
|
76 |
]
|
77 |
-
prompt =
|
78 |
|
79 |
outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 200})
|
80 |
print(outputs)
|
|
|
16 |
|
17 |
![image/png](llava_onevision_arch.png)
|
18 |
|
19 |
+
Check out also the Google Colab demo to run Llava on a free-tier Google Colab instance: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1-4AtYjR8UMtCALV0AswU1kiNkWCLTALT?usp=sharing)
|
20 |
+
|
21 |
Below is the model card of 0.5B LLaVA-Onevision model which is copied from the original LLaVA-Onevision model card that you can find [here](https://huggingface.co/lmms-lab/llava-onevision-qwen2-0.5b-si).
|
22 |
|
23 |
|
|
|
55 |
Below we used [`"llava-hf/llava-onevision-qwen2-0.5b-si-hf"`](https://huggingface.co/llava-hf/llava-onevision-qwen2-0.5b-si-hf) checkpoint.
|
56 |
|
57 |
```python
|
58 |
+
from transformers import pipeline, AutoProcessor
|
59 |
from PIL import Image
|
60 |
import requests
|
61 |
|
62 |
model_id = "llava-hf/llava-onevision-qwen2-0.5b-si-hf"
|
63 |
pipe = pipeline("image-to-text", model=model_id)
|
64 |
+
processor = AutoProcessor.from_pretrained(model_id)
|
65 |
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"
|
66 |
image = Image.open(requests.get(url, stream=True).raw)
|
67 |
|
|
|
77 |
],
|
78 |
},
|
79 |
]
|
80 |
+
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
|
81 |
|
82 |
outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 200})
|
83 |
print(outputs)
|