Challenging666 commited on
Commit
4b07da9
verified
1 Parent(s): d4196da

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +129 -3
README.md CHANGED
@@ -1,7 +1,133 @@
 
1
  ---
2
  license: apache-2.0
3
  language:
4
  - en
5
- base_model:
6
- - Qwen/Qwen2-VL-7B-Base
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
  ---
3
  license: apache-2.0
4
  language:
5
  - en
6
+ pipeline_tag: image-text-to-text
7
+ tags:
8
+ - multimodal
9
+ library_name: transformers
10
+ ---
11
+
12
+ # MM-Coder-7B: (from Qwen2-VL-7B)
13
+
14
+
15
+ ## Introduction
16
+
17
+ MM-Coder-7B is a multimodal model that can process both text and images, 鎿呴暱鏍规嵁UML/FlowChart鐢熸垚鐩稿簲浠g爜. It is based on Qwen2-VL-7B and has been fine-tuned on dataset from [MMc-Instruct-Stage1](Coming soon) and [MMc-Instruct-Stage2](https://huggingface.co/datasets/Multilingual-Multimodal-NLP/MMc-Instruct-Stage2).
18
+
19
+
20
+ ## Requirements
21
+ Verified on:
22
+ - vllm==0.9.1
23
+ - transformers==4.49.0
24
+ - qwen-vl-utils==0.0.11
25
+ - accelerate==1.9.0
26
+
27
+ (Note: Higher version transformers may cause errors(https://github.com/vllm-project/vllm/issues/15614), please use the version above.)
28
+
29
+
30
+ ## Quickstart
31
+
32
+ Below, we provide simple examples to show inference of MM-Coder-7B with transformers. Our model is fully compatible with Qwen-2-7B-Instruct. More usage method could refer to
33
+ [Qwen-2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct).
34
+
35
+ ```python
36
+
37
+ from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
38
+ from qwen_vl_utils import process_vision_info
39
+
40
+
41
+ model = Qwen2VLForConditionalGeneration.from_pretrained(
42
+ "Multilingual-Multimodal-NLP/MM-Coder-7B", torch_dtype="auto", device_map="auto"
43
+ )
44
+
45
+ # default processer
46
+ processor = AutoProcessor.from_pretrained("Multilingual-Multimodal-NLP/MM-Coder-7B")
47
+
48
+ # The default range for the number of visual tokens per image in the model is 4-16384. You can set min_pixels and max_pixels according to your needs, such as a token count range of 256-1280, to balance speed and memory usage.
49
+ # min_pixels = 256*28*28
50
+ # max_pixels = 1280*28*28
51
+ # processor = AutoProcessor.from_pretrained("Multilingual-Multimodal-NLP/MM-Coder-7B", min_pixels=min_pixels, max_pixels=max_pixels)
52
+
53
+ messages = [
54
+ {
55
+ "role": "user",
56
+ "content": [
57
+ {
58
+ "type": "image",
59
+ "image": "[IMAGE_PATH]",
60
+ },
61
+ {"type": "text", "text": "Use Python to complete the task as described in the diagram:\nDesign a Crop class in a virtual farm management system."},
62
+ ],
63
+ }
64
+ ]
65
+
66
+ # Preparation for inference
67
+ text = processor.apply_chat_template(
68
+ messages, tokenize=False, add_generation_prompt=True
69
+ )
70
+ image_inputs, video_inputs = process_vision_info(messages)
71
+ inputs = processor(
72
+ text=[text],
73
+ images=image_inputs,
74
+ videos=video_inputs,
75
+ padding=True,
76
+ return_tensors="pt",
77
+ )
78
+
79
+ inputs = inputs.to("cuda")
80
+
81
+ # Inference: Generation of the output
82
+ generated_ids = model.generate(**inputs, max_new_tokens=1024)
83
+ generated_ids_trimmed = [
84
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
85
+ ]
86
+ output_text = processor.batch_decode(
87
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
88
+ )
89
+ print(output_text)
90
+
91
+
92
+ #[OUTPUT]
93
+ # Here is a comprehensive solution for the Crop class based on the provided diagram:
94
+
95
+ # ```python
96
+ # class Crop:
97
+ # def __init__(self, name, plant_date):
98
+ # self.name = name
99
+ # self.plant_date = plant_date
100
+ # self.status = "Planted"
101
+
102
+ # def grow(self):
103
+ # if self.status == "Planted":
104
+ # self.status = "Growing"
105
+ # elif self.status == "Growing":
106
+ # self.status = "Harvested"
107
+
108
+ # def get_crop_infos(self):
109
+ # return f"Crop(name={self.name}, status={self.status})"
110
+
111
+ # ...
112
+ ```
113
+
114
+
115
+
116
+
117
+
118
+ ## Citation
119
+
120
+ If you find our work helpful, feel free to give us a cite.
121
+
122
+ ```
123
+ @misc{mmcoder,
124
+ title={Multilingual Multimodal Software Developer for Code Generation},
125
+ author={Linzheng Chai and Jian Yang and Shukai Liu and Wei Zhang and Liran Wang and Ke Jin and Tao Sun and Congnan Liu and Chenchen Zhang and Hualei Zhu and Jiaheng Liu and Xianjie Wu and Ge Zhang and Tianyu Liu and Zhoujun Li},
126
+ year={2025},
127
+ eprint={2507.08719},
128
+ archivePrefix={arXiv},
129
+ primaryClass={cs.CL},
130
+ url={https://arxiv.org/abs/2507.08719},
131
+ }
132
+
133
+ ```