Improve model card for Virgo-72B

This PR improves the model card for Virgo-72B by adding essential metadata (`pipeline_tag`, `library_name`, `license`), a more detailed model description based on the Github README, and clarified usage instructions. The license is assumed to be MIT; please verify and update if necessary. Additional tags have been added to improve discoverability.

Files changed (1) hide show

README.md +35 -17

README.md CHANGED Viewed

@@ -1,27 +1,30 @@
 ---
 library_name: transformers
-tags: []
 ---
-# Model Card for Vigor-72B
-<!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
 ### Model Sources
-<!-- Provide the basic links for the model. -->
 - **Repository:** https://github.com/RUCAIBox/Virgo
 - **Paper:** https://arxiv.org/pdf/2501.01904
 ## Quick Start
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-```
 from vllm import LLM, SamplingParams
 from PIL import Image
@@ -30,19 +33,23 @@ placeholder = "<|image_pad|>"
 llm = LLM(
         model=model_name,
         trust_remote_code=True,
-        tensor_parallel_size=8,
     )
-question = "Please first think deeply about the question, and then put the final answer in \\boxed{}.\nIn the diagram, $\\angle E A D=90^{\\circ}, \\angle A C D=90^{\\circ}$, and $\\angle A B C=90^{\\circ}$. Also, $E D=13, E A=12$, $D C=4$, and $C B=2$. Determine the length of $A B$."
-prompt = ("<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n"
-              f"<|im_start|>user\n<|vision_start|>{placeholder}<|vision_end|>"
-              f"{question}<|im_end|>\n"
-              "<|im_start|>assistant\n")
-stop_token_ids = None
 sampling_params = SamplingParams(
     temperature=0.0,
     top_k=1,
     top_p=1.0,
-    stop_token_ids=stop_token_ids,
     repetition_penalty=1.05,
     max_tokens=8192
 )
@@ -55,4 +62,15 @@ inputs = {
         }
 outputs = llm.generate(inputs, sampling_params)
 print(outputs[0].outputs[0].text)
 ```

 ---
 library_name: transformers
+pipeline_tag: image-text-to-text
+license: mit
+tags:
+  - multimodal
+  - vision-language
+  - reasoning
+  - qwen2
 ---
+# Model Card for Virgo-72B
+Virgo is a multi-modal slow-thinking reasoning model based on Qwen2-VL-72B-Instruct. It excels in image-text-to-text tasks, demonstrating strong performance on various multimodal benchmarks.  Virgo leverages a long-form thought process for enhanced reasoning capabilities, effectively integrating visual information into its responses.
 ## Model Details
 ### Model Sources
 - **Repository:** https://github.com/RUCAIBox/Virgo
 - **Paper:** https://arxiv.org/pdf/2501.01904
 ## Quick Start
+This example demonstrates how to use Virgo-72B with the `vllm` library for text generation given an image and text input. Ensure you have `vllm` and `Pillow` installed (`pip install vllm Pillow`) and a suitable image file (`case/2246_image_1.jpg` in this example).
+```python
 from vllm import LLM, SamplingParams
 from PIL import Image
 llm = LLM(
         model=model_name,
         trust_remote_code=True,
+        tensor_parallel_size=8,  # Adjust based on your hardware
     )
+question = "Please first think deeply about the question, and then put the final answer in \\boxed{}.
+In the diagram, $\\angle E A D=90^{\\circ}, \\angle A C D=90^{\\circ}$, and $\\angle A B C=90^{\\circ}$. Also, $E D=13, E A=12$, $D C=4$, and $C B=2$. Determine the length of $A B$."
+prompt = ("<|im_start|>system
+You are a helpful assistant.<|im_end|>
+"
+              f"<|im_start|>user
+<|vision_start|>{placeholder}<|vision_end|>"
+              f"{question}<|im_end|>
+"
+              "<|im_start|>assistant
+")
 sampling_params = SamplingParams(
     temperature=0.0,
     top_k=1,
     top_p=1.0,
     repetition_penalty=1.05,
     max_tokens=8192
 )
         }
 outputs = llm.generate(inputs, sampling_params)
 print(outputs[0].outputs[0].text)
+```
+## Citation
+```
+@article{du2025virgo,
+      title={Virgo: A Preliminary Exploration on Reproducing o1-like MLLM},
+      author={Yifan Du and Zikang Liu and Yifan Li and Wayne Xin Zhao and Yuqi Huo and Bingning Wang and Weipeng Chen and Zheng Liu and Zhongyuan Wang and Ji-Rong Wen},
+      journal={arXiv preprint arXiv:2501.01904},
+      year={2025}
+}
 ```