NCSOFT
/

VARCO-VISION-14B

Image-Text-to-Text

text-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

kimyoungjune commited on Nov 27, 2024

Commit

1fb8042

·

verified ·

1 Parent(s): 17a675d

Update README.md

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -22,12 +22,14 @@ library_name: transformers
 - **Developed by:** NC Research, Multimodal Generation Team
 - **Technical Report:** [Coming Soon]()
 - **Languages:** Korean, English
 - **License:** CC BY-NC 4.0
 - **Architecture:** VARCO-VISION-14B follows the architecture of [LLaVA-OneVision](https://arxiv.org/abs/2408.03326).
 - **Base Model:**
   - **Language Model:** [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct)
   - **Vision Encoder:** [google/siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384)
@@ -49,6 +51,7 @@ After installing **LLaVA-NeXT**, you can load VARCO-VISION-14B using the followi
 import torch
 from transformers import AutoTokenizer
 from llava.model.language_model.llava_qwen import LlavaQwenForCausalLM
 from llava.mm_utils import tokenizer_image_token, process_images
 model_name = "NCSOFT/VARCO-VISION-14B"
@@ -179,6 +182,7 @@ To perform Optical Character Recognition (OCR), use the `<ocr>` token.
 ```python
 image_file = "./assets/ocr_1.png"
 conversation = [
     {

 - **Developed by:** NC Research, Multimodal Generation Team
 - **Technical Report:** [Coming Soon]()
+- **Demo Page:** [Coming Soon]()
 - **Languages:** Korean, English
 - **License:** CC BY-NC 4.0
 - **Architecture:** VARCO-VISION-14B follows the architecture of [LLaVA-OneVision](https://arxiv.org/abs/2408.03326).
 - **Base Model:**
   - **Language Model:** [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct)
   - **Vision Encoder:** [google/siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384)
+- **Huggingface Version Model:** [NCSOFT/VARCO-VISION-HF](https://huggingface.co/NCSOFT/VARCO-VISION-14B-HF)
 import torch
 from transformers import AutoTokenizer
 from llava.model.language_model.llava_qwen import LlavaQwenForCausalLM
+from llava.conversation import apply_chat_template
 from llava.mm_utils import tokenizer_image_token, process_images
 model_name = "NCSOFT/VARCO-VISION-14B"
 ```python
 image_file = "./assets/ocr_1.png"
+raw_image = Image.open(image_file)
 conversation = [
     {