msr2000 commited on
Commit
5941330
·
1 Parent(s): cb6425f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +267 -0
README.md ADDED
@@ -0,0 +1,267 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: deepseek
4
+ license_link: https://github.com/deepseek-ai/DeepSeek-V2/blob/main/LICENSE-MODEL
5
+ ---
6
+
7
+ <!-- markdownlint-disable first-line-h1 -->
8
+ <!-- markdownlint-disable html -->
9
+ <!-- markdownlint-disable no-duplicate-header -->
10
+
11
+ <div align="center">
12
+ <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V2" />
13
+ </div>
14
+ <hr>
15
+ <div align="center" style="line-height: 1;">
16
+ <a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
17
+ <img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;"/>
18
+ </a>
19
+ <a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
20
+ <img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-DeepSeek%20V2-536af5?color=536af5&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
21
+ </a>
22
+ <a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
23
+ <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
24
+ </a>
25
+ </div>
26
+
27
+ <div align="center" style="line-height: 1;">
28
+ <a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
29
+ <img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da" style="display: inline-block; vertical-align: middle;"/>
30
+ </a>
31
+ <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
32
+ <img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
33
+ </a>
34
+ <a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
35
+ <img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
36
+ </a>
37
+ </div>
38
+
39
+ <div align="center" style="line-height: 1;">
40
+ <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/LICENSE-CODE" style="margin: 2px;">
41
+ <img alt="Code License" src="https://img.shields.io/badge/Code_License-MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
42
+ </a>
43
+ <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/LICENSE-MODEL" style="margin: 2px;">
44
+ <img alt="Model License" src="https://img.shields.io/badge/Model_License-Model_Agreement-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
45
+ </a>
46
+ </div>
47
+
48
+ <p align="center">
49
+ <a href="https://arxiv.org/abs/2405.04434"><b>Paper Link</b>👁️</a>
50
+ </p>
51
+
52
+ # DeepSeek-V2.5
53
+
54
+ ## 1. Introduction
55
+
56
+ DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions.
57
+ For model details, please visit [DeepSeek-V2 page](https://github.com/deepseek-ai/DeepSeek-V2) for more information.
58
+
59
+ DeepSeek-V2.5 better aligns with human preferences and has been optimized in various aspects, including writing and instruction following:
60
+
61
+ - ArenaHard winrate increased from 68.3% to 76.3%
62
+ - AlpacaEval 2.0 LC winrate increased from 46.61% to 50.52%
63
+ - MT-Bench score increased from 8.84 to 9.02
64
+ - AlignBench score increased from 7.88 to 8.04
65
+
66
+ DeepSeek-V2.5 further enhances code generation capabilities, optimizing for common programming application scenarios, and achieving the following results on benchmarks:
67
+
68
+ - HumanEval: 89%
69
+ - LiveCodeBench (January - September): 41%
70
+
71
+ ## 2. How to run locally
72
+
73
+ **To utilize DeepSeek-V2.5 in BF16 format for inference, 80GB*8 GPUs are required.**
74
+ ### Inference with Huggingface's Transformers
75
+ You can directly employ [Huggingface's Transformers](https://github.com/huggingface/transformers) for model inference.
76
+
77
+ ```python
78
+ import torch
79
+ from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
80
+
81
+ model_name = "deepseek-ai/DeepSeek-V2.5"
82
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
83
+ # `max_memory` should be set based on your devices
84
+ max_memory = {i: "75GB" for i in range(8)}
85
+ # `device_map` cannot be set to `auto`
86
+ model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, device_map="sequential", torch_dtype=torch.bfloat16, max_memory=max_memory, attn_implementation="eager")
87
+ model.generation_config = GenerationConfig.from_pretrained(model_name)
88
+ model.generation_config.pad_token_id = model.generation_config.eos_token_id
89
+
90
+ messages = [
91
+ {"role": "user", "content": "Write a piece of quicksort code in C++"}
92
+ ]
93
+ input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
94
+ outputs = model.generate(input_tensor.to(model.device), max_new_tokens=100)
95
+
96
+ result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
97
+ print(result)
98
+ ```
99
+
100
+ The complete chat template can be found within `tokenizer_config.json` located in the huggingface model repository.
101
+
102
+ **Note: The chat template has been updated compared to the previous DeepSeek-V2-Chat version.**
103
+
104
+ An example of chat template is as belows:
105
+
106
+ ```bash
107
+ <|begin▁of▁sentence|><|User|>{user_message_1}<|Assistant|>{assistant_message_1}<|end▁of▁sentence|><|User|>{user_message_2}<|Assistant|>
108
+ ```
109
+
110
+ You can also add an optional system message:
111
+
112
+ ```bash
113
+ <|begin▁of▁sentence|>{system_message}<|User|>{user_message_1}<|Assistant|>{assistant_message_1}<|end▁of▁sentence|><|User|>{user_message_2}<|Assistant|>
114
+ ```
115
+
116
+ ### Inference with vLLM (recommended)
117
+ To utilize [vLLM](https://github.com/vllm-project/vllm) for model inference, please merge this Pull Request into your vLLM codebase: https://github.com/vllm-project/vllm/pull/4650.
118
+
119
+ ```python
120
+ from transformers import AutoTokenizer
121
+ from vllm import LLM, SamplingParams
122
+
123
+ max_model_len, tp_size = 8192, 8
124
+ model_name = "deepseek-ai/DeepSeek-V2.5"
125
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
126
+ llm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True, enforce_eager=True)
127
+ sampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id])
128
+
129
+ messages_list = [
130
+ [{"role": "user", "content": "Who are you?"}],
131
+ [{"role": "user", "content": "Translate the following content into Chinese directly: DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference."}],
132
+ [{"role": "user", "content": "Write a piece of quicksort code in C++."}],
133
+ ]
134
+
135
+ prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]
136
+
137
+ outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)
138
+
139
+ generated_text = [output.outputs[0].text for output in outputs]
140
+ print(generated_text)
141
+ ```
142
+
143
+ ### Function calling
144
+
145
+ Function calling allows the model to call external tools to enhance its capabilities.
146
+
147
+ Here is an example:
148
+
149
+ ```python
150
+ # Assume that `model` and `tokenizer` are loaded
151
+ model.generation_config = GenerationConfig(do_sample=False, max_new_tokens=128, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.eos_token_id)
152
+
153
+ tool_system_prompt = """You are a helpful Assistant.
154
+
155
+ ## Tools
156
+
157
+ ### Function
158
+
159
+ You have the following functions available:
160
+
161
+ - `get_current_weather`:
162
+ ```json
163
+ {
164
+ "name": "get_current_weather",
165
+ "description": "Get the current weather in a given location",
166
+ "parameters": {
167
+ "type": "object",
168
+ "properties": {
169
+ "location": {
170
+ "type": "string",
171
+ "description": "The city and state, e.g. San Francisco, CA"
172
+ },
173
+ "unit": {
174
+ "type": "string",
175
+ "enum": [
176
+ "celsius",
177
+ "fahrenheit"
178
+ ]
179
+ }
180
+ },
181
+ "required": [
182
+ "location"
183
+ ]
184
+ }
185
+ }
186
+ ```"""
187
+
188
+ tool_call_messages = [{"role": "system", "content": tool_system_prompt}, {"role": "user", "content": "What's the weather like in Tokyo and Paris?"}]
189
+ tool_call_inputs = tokenizer.apply_chat_template(tool_call_messages, add_generation_prompt=True, return_tensors="pt")
190
+ tool_call_outputs = model.generate(tool_call_inputs.to(model.device))
191
+ # Generated text: '<|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>get_current_weather\n```json\n{"location": "Tokyo"}\n```<|tool▁call▁end|>\n<|tool▁call▁begin|>function<|tool▁sep|>get_current_weather\n```json\n{"location": "Paris"}\n```<|tool▁call▁end|><|tool▁calls▁end|><|end▁of▁sentence|>'
192
+
193
+ # Mock response of calling `get_current_weather`
194
+ tool_messages = [{"role": "tool", "content": '{"location": "Tokyo", "temperature": "10", "unit": null}'}, {"role": "tool", "content": '{"location": "Paris", "temperature": "22", "unit": null}'}]
195
+ tool_inputs = tokenizer.apply_chat_template(tool_messages, add_generation_prompt=False, return_tensors="pt")[:, 1:]
196
+ tool_inputs = torch.cat([tool_call_outputs, tool_inputs.to(model.device)], dim=1)
197
+ tool_outputs = model.generate(tool_inputs)
198
+ # Generated text: The current weather in Tokyo is 10 degrees, and in Paris, it is 22 degrees.<|end▁of▁sentence|>
199
+ ```
200
+
201
+ ### JSON output
202
+
203
+ You can use JSON Output Mode to ensure the model generates a valid JSON object. To active this mode, a special instruction should be appended to your system prompt.
204
+
205
+ ```python
206
+ # Assume that `model` and `tokenizer` are loaded
207
+ model.generation_config = GenerationConfig(do_sample=False, max_new_tokens=128, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.eos_token_id)
208
+
209
+ user_system_prompt = 'The user will provide some exam text. Please parse the "question" and "answer" and output them in JSON format.'
210
+ json_system_prompt = f"""{user_system_prompt}
211
+
212
+ ## Response Format
213
+
214
+ Reply with JSON object ONLY."""
215
+
216
+ json_messages = [{"role": "system", "content": json_system_prompt}, {"role": "user", "content": "Which is the highest mountain in the world? Mount Everest."}]
217
+ json_inputs = tokenizer.apply_chat_template(json_messages, add_generation_prompt=True, return_tensors="pt")
218
+ json_outpus = model.generate(json_inputs.to(model.device))
219
+ # Generated text: '```json\n{\n "question": "Which is the highest mountain in the world?",\n "answer": "Mount Everest."\n}\n```<|end▁of▁sentence|>'
220
+ ```
221
+
222
+ ### FIM completion
223
+
224
+ In FIM (Fill In the Middle) completion, you can provide a prefix and an optional suffix, and the model will complete the content in between.
225
+
226
+ ```python
227
+ # Assume that `model` and `tokenizer` are loaded
228
+ model.generation_config = GenerationConfig(do_sample=False, max_new_tokens=128, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.eos_token_id)
229
+
230
+ prefix = """def quick_sort(arr):
231
+ if len(arr) <= 1:
232
+ return arr
233
+ pivot = arr[0]
234
+ left = []
235
+ right = []
236
+ """
237
+
238
+ suffix = """
239
+ if arr[i] < pivot:
240
+ left.append(arr[i])
241
+ else:
242
+ right.append(arr[i])
243
+ return quick_sort(left) + [pivot] + quick_sort(right)"""
244
+
245
+ fim_prompt = f"<|fim▁begin|>{prefix}<|fim▁hole|>{suffix}<|fim▁end|>"
246
+ fim_inputs = tokenizer(fim_prompt, add_special_tokens=True, return_tensors="pt").input_ids
247
+ fim_outputs = model.generate(fim_inputs.to(model.device))
248
+ # Generated text: " for i in range(1, len(arr)):<|end▁of▁sentence|>"
249
+ ```
250
+
251
+ ## 3. License
252
+ This code repository is licensed under the MIT License. The use of DeepSeek-V2 Base/Chat models is subject to [the Model License](LICENSE). DeepSeek-V2 series (including Base and Chat) supports commercial use.
253
+
254
+ ## 4. Citation
255
+ ```
256
+ @misc{deepseekv2,
257
+ title={DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model},
258
+ author={DeepSeek-AI},
259
+ year={2024},
260
+ eprint={2405.04434},
261
+ archivePrefix={arXiv},
262
+ primaryClass={cs.CL}
263
+ }
264
+ ```
265
+
266
+ ## 5. Contact
267
+ If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).