weizhiwang
/

Open-Qwen2VL

Image-Text-to-Text

Model card Files Files and versions Community

weizhiwang commited on Apr 3

Commit

a2a09fb

·

verified ·

1 Parent(s): c70690d

Update README.md

Files changed (1) hide show

README.md +9 -10

README.md CHANGED Viewed

@@ -16,13 +16,10 @@ library_name: transformers
 Open-Qwen2VL is a multimodal model that takes images and text as input and produces text as output.  This model is described in the paper [Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources](https://huggingface.co/papers/2504.00595).  The code is available at [https://github.com/Victorwz/Open-Qwen2VL](https://github.com/Victorwz/Open-Qwen2VL).
-<!-- Please follow my reproduced implementation [LLaVA-Unified](https://github.com/Victorwz/LLaVA-Unified) for more details on fine-tuning LLaVA model with Llama-3 as the foundatiaon LLM. -->
 ## Updates
-<!-- - [5/14/2024] The codebase has been upgraded to llava-next (llava-v1.6). Now it supports the latest llama-3, phi-3, mistral-v0.1-7b models. -->
-## Model Details
-<!-- Follows LLavA-1.5 pre-train and supervised fine-tuning pipeline. You do not need to change the LLaVA codebase to accommodate Llama-3.  -->
 ## How to Use
@@ -66,11 +63,13 @@ The image caption results look like:
 The image depicts a blue and orange bus parked on the side of a street. ...
 ```
-<!-- # Fine-Tune LLaVA-Llama-3 on Your Visual Instruction Data ... -->
 ## Citation
-<!--
 ```bibtex
-@misc{wang2024llavallama3,
-...
-``` -->

 Open-Qwen2VL is a multimodal model that takes images and text as input and produces text as output.  This model is described in the paper [Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources](https://huggingface.co/papers/2504.00595).  The code is available at [https://github.com/Victorwz/Open-Qwen2VL](https://github.com/Victorwz/Open-Qwen2VL).
 ## Updates
+- [4/1/2025] The codebase, model, data, and paper are released.
+<!-- ## Model Details -->
 ## How to Use
 The image depicts a blue and orange bus parked on the side of a street. ...
 ```
 ## Citation
 ```bibtex
+@article{Open-Qwen2VL,
+    title={Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources},
+    author={Wang, Weizhi and Tian, Yu and Yang, Linjie and Wang, Heng and Yan, Xifeng},
+    journal={arXiv preprint arXiv:2504.00595},
+    year={2025}
+  }
+...