Image-Text-to-Text
English
weizhiwang commited on
Commit
a2a09fb
·
verified ·
1 Parent(s): c70690d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -10
README.md CHANGED
@@ -16,13 +16,10 @@ library_name: transformers
16
 
17
  Open-Qwen2VL is a multimodal model that takes images and text as input and produces text as output. This model is described in the paper [Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources](https://huggingface.co/papers/2504.00595). The code is available at [https://github.com/Victorwz/Open-Qwen2VL](https://github.com/Victorwz/Open-Qwen2VL).
18
 
19
- <!-- Please follow my reproduced implementation [LLaVA-Unified](https://github.com/Victorwz/LLaVA-Unified) for more details on fine-tuning LLaVA model with Llama-3 as the foundatiaon LLM. -->
20
-
21
  ## Updates
22
- <!-- - [5/14/2024] The codebase has been upgraded to llava-next (llava-v1.6). Now it supports the latest llama-3, phi-3, mistral-v0.1-7b models. -->
23
 
24
- ## Model Details
25
- <!-- Follows LLavA-1.5 pre-train and supervised fine-tuning pipeline. You do not need to change the LLaVA codebase to accommodate Llama-3. -->
26
 
27
  ## How to Use
28
 
@@ -66,11 +63,13 @@ The image caption results look like:
66
  The image depicts a blue and orange bus parked on the side of a street. ...
67
  ```
68
 
69
- <!-- # Fine-Tune LLaVA-Llama-3 on Your Visual Instruction Data ... -->
70
 
71
  ## Citation
72
- <!--
73
  ```bibtex
74
- @misc{wang2024llavallama3,
75
- ...
76
- ``` -->
 
 
 
 
 
16
 
17
  Open-Qwen2VL is a multimodal model that takes images and text as input and produces text as output. This model is described in the paper [Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources](https://huggingface.co/papers/2504.00595). The code is available at [https://github.com/Victorwz/Open-Qwen2VL](https://github.com/Victorwz/Open-Qwen2VL).
18
 
 
 
19
  ## Updates
20
+ - [4/1/2025] The codebase, model, data, and paper are released.
21
 
22
+ <!-- ## Model Details -->
 
23
 
24
  ## How to Use
25
 
 
63
  The image depicts a blue and orange bus parked on the side of a street. ...
64
  ```
65
 
 
66
 
67
  ## Citation
 
68
  ```bibtex
69
+ @article{Open-Qwen2VL,
70
+ title={Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources},
71
+ author={Wang, Weizhi and Tian, Yu and Yang, Linjie and Wang, Heng and Yan, Xifeng},
72
+ journal={arXiv preprint arXiv:2504.00595},
73
+ year={2025}
74
+ }
75
+ ...