nielsr HF staff commited on
Commit
b295998
·
verified ·
1 Parent(s): addb4a8

Improve model card for Virgo-72B

Browse files

This PR improves the model card for Virgo-72B by adding essential metadata (`pipeline_tag`, `library_name`, `license`), a more detailed model description based on the Github README, and clarified usage instructions. The license is assumed to be MIT; please verify and update if necessary. Additional tags have been added to improve discoverability.

Files changed (1) hide show
  1. README.md +35 -17
README.md CHANGED
@@ -1,27 +1,30 @@
1
  ---
2
  library_name: transformers
3
- tags: []
 
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Vigor-72B
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
 
 
11
 
12
  ## Model Details
13
 
14
  ### Model Sources
15
 
16
- <!-- Provide the basic links for the model. -->
17
-
18
  - **Repository:** https://github.com/RUCAIBox/Virgo
19
  - **Paper:** https://arxiv.org/pdf/2501.01904
20
 
21
  ## Quick Start
22
 
23
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
24
- ```
 
25
  from vllm import LLM, SamplingParams
26
  from PIL import Image
27
 
@@ -30,19 +33,23 @@ placeholder = "<|image_pad|>"
30
  llm = LLM(
31
  model=model_name,
32
  trust_remote_code=True,
33
- tensor_parallel_size=8,
34
  )
35
- question = "Please first think deeply about the question, and then put the final answer in \\boxed{}.\nIn the diagram, $\\angle E A D=90^{\\circ}, \\angle A C D=90^{\\circ}$, and $\\angle A B C=90^{\\circ}$. Also, $E D=13, E A=12$, $D C=4$, and $C B=2$. Determine the length of $A B$."
36
- prompt = ("<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n"
37
- f"<|im_start|>user\n<|vision_start|>{placeholder}<|vision_end|>"
38
- f"{question}<|im_end|>\n"
39
- "<|im_start|>assistant\n")
40
- stop_token_ids = None
 
 
 
 
 
41
  sampling_params = SamplingParams(
42
  temperature=0.0,
43
  top_k=1,
44
  top_p=1.0,
45
- stop_token_ids=stop_token_ids,
46
  repetition_penalty=1.05,
47
  max_tokens=8192
48
  )
@@ -55,4 +62,15 @@ inputs = {
55
  }
56
  outputs = llm.generate(inputs, sampling_params)
57
  print(outputs[0].outputs[0].text)
 
 
 
 
 
 
 
 
 
 
 
58
  ```
 
1
  ---
2
  library_name: transformers
3
+ pipeline_tag: image-text-to-text
4
+ license: mit
5
+ tags:
6
+ - multimodal
7
+ - vision-language
8
+ - reasoning
9
+ - qwen2
10
  ---
11
 
12
+ # Model Card for Virgo-72B
 
 
 
13
 
14
+ Virgo is a multi-modal slow-thinking reasoning model based on Qwen2-VL-72B-Instruct. It excels in image-text-to-text tasks, demonstrating strong performance on various multimodal benchmarks. Virgo leverages a long-form thought process for enhanced reasoning capabilities, effectively integrating visual information into its responses.
15
 
16
  ## Model Details
17
 
18
  ### Model Sources
19
 
 
 
20
  - **Repository:** https://github.com/RUCAIBox/Virgo
21
  - **Paper:** https://arxiv.org/pdf/2501.01904
22
 
23
  ## Quick Start
24
 
25
+ This example demonstrates how to use Virgo-72B with the `vllm` library for text generation given an image and text input. Ensure you have `vllm` and `Pillow` installed (`pip install vllm Pillow`) and a suitable image file (`case/2246_image_1.jpg` in this example).
26
+
27
+ ```python
28
  from vllm import LLM, SamplingParams
29
  from PIL import Image
30
 
 
33
  llm = LLM(
34
  model=model_name,
35
  trust_remote_code=True,
36
+ tensor_parallel_size=8, # Adjust based on your hardware
37
  )
38
+ question = "Please first think deeply about the question, and then put the final answer in \\boxed{}.
39
+ In the diagram, $\\angle E A D=90^{\\circ}, \\angle A C D=90^{\\circ}$, and $\\angle A B C=90^{\\circ}$. Also, $E D=13, E A=12$, $D C=4$, and $C B=2$. Determine the length of $A B$."
40
+ prompt = ("<|im_start|>system
41
+ You are a helpful assistant.<|im_end|>
42
+ "
43
+ f"<|im_start|>user
44
+ <|vision_start|>{placeholder}<|vision_end|>"
45
+ f"{question}<|im_end|>
46
+ "
47
+ "<|im_start|>assistant
48
+ ")
49
  sampling_params = SamplingParams(
50
  temperature=0.0,
51
  top_k=1,
52
  top_p=1.0,
 
53
  repetition_penalty=1.05,
54
  max_tokens=8192
55
  )
 
62
  }
63
  outputs = llm.generate(inputs, sampling_params)
64
  print(outputs[0].outputs[0].text)
65
+ ```
66
+
67
+ ## Citation
68
+
69
+ ```
70
+ @article{du2025virgo,
71
+ title={Virgo: A Preliminary Exploration on Reproducing o1-like MLLM},
72
+ author={Yifan Du and Zikang Liu and Yifan Li and Wayne Xin Zhao and Yuqi Huo and Bingning Wang and Weipeng Chen and Zheng Liu and Zhongyuan Wang and Ji-Rong Wen},
73
+ journal={arXiv preprint arXiv:2501.01904},
74
+ year={2025}
75
+ }
76
  ```