llava-hf
/

llava-onevision-qwen2-0.5b-si-hf

Image-Text-to-Text

llava_onevision

Model card Files Files and versions Community

RaushanTurganbay HF staff commited on Aug 20

Commit

f7be668

•

1 Parent(s): cec0338

added colab

Files changed (1) hide show

README.md +5 -2

README.md CHANGED Viewed

@@ -16,6 +16,8 @@ arxiv: 2408.03326
 ![image/png](llava_onevision_arch.png)
 Below is the model card of 0.5B LLaVA-Onevision model which is copied from the original LLaVA-Onevision model card that you can find [here](https://huggingface.co/lmms-lab/llava-onevision-qwen2-0.5b-si).
@@ -53,12 +55,13 @@ The model supports multi-image and multi-prompt generation. Meaning that you can
 Below we used [`"llava-hf/llava-onevision-qwen2-0.5b-si-hf"`](https://huggingface.co/llava-hf/llava-onevision-qwen2-0.5b-si-hf) checkpoint.
 ```python
-from transformers import pipeline
 from PIL import Image
 import requests
 model_id = "llava-hf/llava-onevision-qwen2-0.5b-si-hf"
 pipe = pipeline("image-to-text", model=model_id)
 url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"
 image = Image.open(requests.get(url, stream=True).raw)
@@ -74,7 +77,7 @@ conversation = [
         ],
     },
 ]
-prompt = pipe.processor.apply_chat_template(conversation, add_generation_prompt=True)
 outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 200})
 print(outputs)

 ![image/png](llava_onevision_arch.png)
+Check out also the Google Colab demo to run Llava on a free-tier Google Colab instance: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1-4AtYjR8UMtCALV0AswU1kiNkWCLTALT?usp=sharing)
 Below is the model card of 0.5B LLaVA-Onevision model which is copied from the original LLaVA-Onevision model card that you can find [here](https://huggingface.co/lmms-lab/llava-onevision-qwen2-0.5b-si).
 Below we used [`"llava-hf/llava-onevision-qwen2-0.5b-si-hf"`](https://huggingface.co/llava-hf/llava-onevision-qwen2-0.5b-si-hf) checkpoint.
 ```python
+from transformers import pipeline, AutoProcessor
 from PIL import Image
 import requests
 model_id = "llava-hf/llava-onevision-qwen2-0.5b-si-hf"
 pipe = pipeline("image-to-text", model=model_id)
+processor = AutoProcessor.from_pretrained(model_id)
 url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"
 image = Image.open(requests.get(url, stream=True).raw)
         ],
     },
 ]
+prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
 outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 200})
 print(outputs)