LinhIcey commited on
Commit
363f094
·
1 Parent(s): 6ff9810

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +163 -0
README.md CHANGED
@@ -1,3 +1,166 @@
1
  ---
2
  license: gpl-3.0
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: gpl-3.0
3
+ language:
4
+ - zh
5
+ - en
6
+ pipeline_tag: visual-question-answering
7
+ tags:
8
+ - ziya
9
+ - fengshenbang
10
+ - LVLM
11
+ - visual question answering
12
  ---
13
+
14
+ # Ziya-Visual-14B-Chat
15
+
16
+ - Main Page:[Fengshenbang](https://fengshenbang-lm.com/)
17
+ - Github: [Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM)
18
+
19
+ # 姜子牙系列模型
20
+
21
+ - [Ziya-LLaMA-13B-v1.1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1.1)
22
+ - [Ziya-LLaMA-13B-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1)
23
+ - [Ziya-LLaMA-7B-Reward](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-7B-Reward)
24
+ - [Ziya-LLaMA-13B-Pretrain-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-Pretrain-v1)
25
+
26
+
27
+ ## 软件依赖
28
+ ```
29
+ pip install torch==1.12.1 tokenizers==0.13.3 git+https://github.com/huggingface/transformers
30
+ ```
31
+
32
+ ## 模型分类 Model Taxonomy
33
+
34
+ | 需求 Demand | 任务 Task | 系列 Series | 模型 Model | 参数 Parameter | 额外 Extra |
35
+ | :----: | :----: | :----: | :----: | :----: | :----: |
36
+ | 多模态 Multi-Modal | 通用 General | 姜子牙-多模态 Ziya-Visual | InstructBLIP LLaMA | 14B | English&Chinese |
37
+
38
+
39
+ ## 使用 Usage
40
+
41
+
42
+ ```python
43
+ import gradio as gr
44
+ from PIL import Image
45
+ import torch
46
+ import random
47
+ from fengshen.models.instruct_ditto.modeling_instruct_ditto import InstructDittoLMForConditionalGeneration, DittoQFromerForPretrain, DittoLMForConditionalGeneration
48
+ from torchvision.transforms import Compose, ToTensor, Resize, Normalize
49
+ from transformers import LlamaTokenizer, BertTokenizer, GenerationConfig
50
+ from torchvision.transforms import Normalize, Compose, RandomResizedCrop, InterpolationMode, ToTensor, RandomHorizontalFlip
51
+
52
+ OPENAI_DATASET_MEAN = (0.48145466, 0.4578275, 0.40821073)
53
+ OPENAI_DATASET_STD = (0.26862954, 0.26130258, 0.27577711)
54
+
55
+ device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
56
+ _MODEL_PATH = "your model path"
57
+
58
+ transforms = Compose([
59
+ RandomResizedCrop(
60
+ 224,
61
+ scale=(0.5, 1.0),
62
+ interpolation=InterpolationMode.BICUBIC,
63
+ ),
64
+ RandomHorizontalFlip(),
65
+ ToTensor(),
66
+ Normalize(mean=OPENAI_DATASET_MEAN, std=OPENAI_DATASET_STD),
67
+ ])
68
+
69
+ model = InstructDittoLMForConditionalGeneration.from_pretrained(_MODEL_PATH).to(device).eval()
70
+ instruct_tokenizer = BertTokenizer.from_pretrained(os.path.join(_MODEL_PATH, "qformer_tokenizer"))
71
+ tokenizer = LlamaTokenizer.from_pretrained(_MODEL_PATH, use_fast = False)
72
+
73
+ qformer_prompt = "{prompt}"
74
+ qformer_prompt_list = []
75
+ prompt_prefix = ''
76
+ llm_prompt = "<human>: {prompt}\n<bot>:"
77
+ llm_prompt_list = []
78
+
79
+ prompt = ["your prompt"]
80
+
81
+ for i in prompt:
82
+ qformer_prompt_list.append(qformer_prompt.format_map({"prompt":i}))
83
+ llm_prompt_list.append(llm_prompt.format_map({"prompt":i}))
84
+
85
+ image_url = ["your image"]
86
+
87
+ imgs = []
88
+ for img_url in image_url:
89
+ imgs.append(transforms(Image.open(img_url).convert('RGB')))
90
+
91
+ config = GenerationConfig(
92
+ # do_sample=True, #False
93
+ # num_beams=3, # 3
94
+ # min_length=4,
95
+ max_new_tokens=128,
96
+ repetition_penalty=1.18,
97
+ # length_penalty=1,
98
+ temperature=0.7,
99
+ top_p=0.1,
100
+ bos_token_id=1,
101
+ eos_token_id=2,
102
+ pad_token_id=39410,
103
+ )
104
+
105
+ imgs = torch.stack(imgs)
106
+
107
+ instruct_tokenizer.padding_side = 'right'
108
+ tokenizer.padding_side = 'left'
109
+
110
+ for i in range(imgs.shape[0]):
111
+ prompt_prefix_ids = tokenizer(prompt_prefix, return_tensors="pt").input_ids
112
+ qformer_instruct_ids = instruct_tokenizer(qformer_prompt_list[i], return_tensors="pt").input_ids
113
+ llm_instruct_ids = tokenizer(llm_prompt_list[i], return_tensors="pt", add_special_tokens=False).input_ids
114
+ qformer_instruct_atts = instruct_tokenizer(qformer_prompt_list[i], return_tensors="pt").attention_mask
115
+ llm_instruct_atts = tokenizer(llm_prompt_list[i], return_tensors="pt", add_special_tokens=False).attention_mask
116
+ captions = model.generate(
117
+ imgs[i].unsqueeze(0).to('cuda'),
118
+ qformer_instruct_ids=qformer_instruct_ids.to('cuda'),
119
+ prompt_prefix_ids = prompt_prefix_ids.to('cuda'),
120
+ llm_instruct_ids=llm_instruct_ids.to('cuda'),
121
+ generation_config=config
122
+ )
123
+ caption = tokenizer.decode(captions[0])
124
+ print("问: " + prompt[i] + "\n" + "答: " + caption)
125
+
126
+
127
+ ```
128
+
129
+
130
+
131
+ ## 引用 Citation
132
+
133
+ 如果您在您的工作中使用了我们的模型,可以引用我们的[论文](https://arxiv.org/abs/2210.08590),[论文](https://arxiv.org/abs/2310.08166):
134
+
135
+ If you are using the resource for your work, please cite the our [paper](https://arxiv.org/abs/2210.08590), [paper](https://arxiv.org/abs/2310.08166):
136
+
137
+ ```text
138
+ @article{fengshenbang,
139
+ author = {Jiaxing Zhang and Ruyi Gan and Junjie Wang and Yuxiang Zhang and Lin Zhang and Ping Yang and Xinyu Gao and Ziwei Wu and Xiaoqun Dong and Junqing He and Jianheng Zhuo and Qi Yang and Yongfeng Huang and Xiayu Li and Yanghan Wu and Junyu Lu and Xinyu Zhu and Weifeng Chen and Ting Han and Kunhao Pan and Rui Wang and Hao Wang and Xiaojun Wu and Zhongshen Zeng and Chongpei Chen},
140
+ title = {Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence},
141
+ journal = {CoRR},
142
+ volume = {abs/2209.02970},
143
+ year = {2022}
144
+ }
145
+ ```
146
+
147
+ ```text
148
+ @article{lu2023ziya,
149
+ title={Ziya-VL: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning},
150
+ author={Lu, Junyu and Zhang, Dixiang and Wu, Xiaojun and Gao, Xinyu and Gan, Ruyi and Zhang, Jiaxing and Song, Yan and Zhang, Pingjian},
151
+ journal={arXiv preprint arXiv:2310.08166},
152
+ year={2023}
153
+ }
154
+ ```
155
+
156
+ You can also cite our [website](https://github.com/IDEA-CCNL/Fengshenbang-LM/):
157
+
158
+ 欢迎引用我们的[网站](https://github.com/IDEA-CCNL/Fengshenbang-LM/):
159
+ ```text
160
+ @misc{Fengshenbang-LM,
161
+ title={Fengshenbang-LM},
162
+ author={IDEA-CCNL},
163
+ year={2021},
164
+ howpublished={\url{https://github.com/IDEA-CCNL/Fengshenbang-LM}},
165
+ }
166
+ ```