zqh11 commited on
Commit
dd5954a
·
verified ·
1 Parent(s): b3333a6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -16
README.md CHANGED
@@ -3,19 +3,19 @@
3
  <!-- markdownlint-disable no-duplicate-header -->
4
 
5
  <div align="center">
6
- <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg" width="60%" alt="DeepSeek LLM" />
7
  </div>
8
  <hr>
9
  <div align="center">
10
 
11
  <a href="https://www.deepseek.com/" target="_blank">
12
- <img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg" />
13
  </a>
14
  <a href="https://chat.deepseek.com/" target="_blank">
15
- <img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-DeepSeek%20LLM-536af5?color=536af5&logoColor=white" />
16
  </a>
17
  <a href="https://huggingface.co/deepseek-ai" target="_blank">
18
- <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white" />
19
  </a>
20
 
21
  </div>
@@ -23,13 +23,13 @@
23
  <div align="center">
24
 
25
  <a href="https://discord.gg/Tc7c45Zzu5" target="_blank">
26
- <img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da" />
27
  </a>
28
  <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg" target="_blank">
29
- <img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white" />
30
  </a>
31
  <a href="https://twitter.com/deepseek_ai" target="_blank">
32
- <img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white" />
33
  </a>
34
 
35
  </div>
@@ -37,10 +37,10 @@
37
  <div align="center">
38
 
39
  <a href="LICENSE-CODE">
40
- <img alt="Code License" src="https://img.shields.io/badge/Code_License-MIT-f5de53?&color=f5de53">
41
  </a>
42
  <a href="LICENSE-MODEL">
43
- <img alt="Model License" src="https://img.shields.io/badge/Model_License-Model_Agreement-f5de53?&color=f5de53">
44
  </a>
45
  </div>
46
 
@@ -66,8 +66,8 @@ Today, we’re introducing DeepSeek-V2, a strong Mixture-of-Experts (MoE) langua
66
  <p align="center">
67
 
68
  <div style="display: flex; justify-content: center;">
69
- <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/activationparameters.png" style="height:300px; width:auto; margin-right:10px">
70
- <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/trainingcost.png" style="height:300px; width:auto; margin-left:10px">
71
  </div>
72
  </p>
73
  We pretrained DeepSeek-V2 on a diverse and high-quality corpus comprising 8.1 trillion tokens. This comprehensive pretraining was followed by a process of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unleash the model's capabilities. The evaluation results validate the effectiveness of our approach as DeepSeek-V2 achieves remarkable performance on both standard benchmarks and open-ended generation evaluation.
@@ -107,7 +107,7 @@ For more evaluation details, such as few-shot settings and prompts, please check
107
 
108
  #### Context Window
109
  <p align="center">
110
- <img width="80%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/niah.png">
111
  </p>
112
 
113
  Evaluation results on the ``Needle In A Haystack`` (NIAH) tests. DeepSeek-V2 performs well across all context window lengths up to **128K**.
@@ -133,7 +133,7 @@ Evaluation results on the ``Needle In A Haystack`` (NIAH) tests. DeepSeek-V2 pe
133
  #### English Open Ended Generation Evaluation
134
  We evaluate our model on AlpacaEval 2.0 and MTBench, showing the competitive performance of DeepSeek-V2-Chat-RL on English conversation generation.
135
  <p align="center">
136
- <img width="50%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/mtbench.png" />
137
  </p>
138
 
139
  #### Chinese Open Ended Generation Evaluation
@@ -160,7 +160,7 @@ We evaluate our model on AlpacaEval 2.0 and MTBench, showing the competitive per
160
  We evaluate our model on LiveCodeBench (0901-0401), a benchmark designed for live coding challenges. As illustrated, DeepSeek-V2 demonstrates considerable proficiency in LiveCodeBench, achieving a Pass@1 score that surpasses several other sophisticated models. This performance highlights the model's effectiveness in tackling live coding tasks.
161
 
162
  <p align="center">
163
- <img width="50%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/code_benchmarks.png">
164
  </p>
165
 
166
  ## 4. Model Architecture
@@ -169,7 +169,7 @@ DeepSeek-V2 adopts innovative architectures to guarantee economical training and
169
  - For Feed-Forward Networks (FFNs), we adopt DeepSeekMoE architecture, a high-performance MoE architecture that enables training stronger models at lower costs.
170
 
171
  <p align="center">
172
- <img width="90%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/architecture.png" />
173
  </p>
174
 
175
  ## 5. Chat Website
@@ -180,7 +180,7 @@ We also provide OpenAI-Compatible API at DeepSeek Platform: [platform.deepseek.c
180
 
181
 
182
  <p align="center">
183
- <img width="40%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/model_price.png">
184
  </p>
185
 
186
 
 
3
  <!-- markdownlint-disable no-duplicate-header -->
4
 
5
  <div align="center">
6
+ <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek LLM" />
7
  </div>
8
  <hr>
9
  <div align="center">
10
 
11
  <a href="https://www.deepseek.com/" target="_blank">
12
+ <img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" />
13
  </a>
14
  <a href="https://chat.deepseek.com/" target="_blank">
15
+ <img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-DeepSeek%20LLM-536af5?color=536af5&logoColor=white?raw=true" />
16
  </a>
17
  <a href="https://huggingface.co/deepseek-ai" target="_blank">
18
+ <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white?raw=true" />
19
  </a>
20
 
21
  </div>
 
23
  <div align="center">
24
 
25
  <a href="https://discord.gg/Tc7c45Zzu5" target="_blank">
26
+ <img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da?raw=true" />
27
  </a>
28
  <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg" target="_blank">
29
+ <img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white?raw=true" />
30
  </a>
31
  <a href="https://twitter.com/deepseek_ai" target="_blank">
32
+ <img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white?raw=true" />
33
  </a>
34
 
35
  </div>
 
37
  <div align="center">
38
 
39
  <a href="LICENSE-CODE">
40
+ <img alt="Code License" src="https://img.shields.io/badge/Code_License-MIT-f5de53?&color=f5de53?raw=true">
41
  </a>
42
  <a href="LICENSE-MODEL">
43
+ <img alt="Model License" src="https://img.shields.io/badge/Model_License-Model_Agreement-f5de53?&color=f5de53?raw=true">
44
  </a>
45
  </div>
46
 
 
66
  <p align="center">
67
 
68
  <div style="display: flex; justify-content: center;">
69
+ <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/activationparameters.png?raw=true" style="height:300px; width:auto; margin-right:10px">
70
+ <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/trainingcost.png?raw=true" style="height:300px; width:auto; margin-left:10px">
71
  </div>
72
  </p>
73
  We pretrained DeepSeek-V2 on a diverse and high-quality corpus comprising 8.1 trillion tokens. This comprehensive pretraining was followed by a process of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unleash the model's capabilities. The evaluation results validate the effectiveness of our approach as DeepSeek-V2 achieves remarkable performance on both standard benchmarks and open-ended generation evaluation.
 
107
 
108
  #### Context Window
109
  <p align="center">
110
+ <img width="80%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/niah.png?raw=true">
111
  </p>
112
 
113
  Evaluation results on the ``Needle In A Haystack`` (NIAH) tests. DeepSeek-V2 performs well across all context window lengths up to **128K**.
 
133
  #### English Open Ended Generation Evaluation
134
  We evaluate our model on AlpacaEval 2.0 and MTBench, showing the competitive performance of DeepSeek-V2-Chat-RL on English conversation generation.
135
  <p align="center">
136
+ <img width="50%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/mtbench.png?raw=true" />
137
  </p>
138
 
139
  #### Chinese Open Ended Generation Evaluation
 
160
  We evaluate our model on LiveCodeBench (0901-0401), a benchmark designed for live coding challenges. As illustrated, DeepSeek-V2 demonstrates considerable proficiency in LiveCodeBench, achieving a Pass@1 score that surpasses several other sophisticated models. This performance highlights the model's effectiveness in tackling live coding tasks.
161
 
162
  <p align="center">
163
+ <img width="50%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/code_benchmarks.png?raw=true">
164
  </p>
165
 
166
  ## 4. Model Architecture
 
169
  - For Feed-Forward Networks (FFNs), we adopt DeepSeekMoE architecture, a high-performance MoE architecture that enables training stronger models at lower costs.
170
 
171
  <p align="center">
172
+ <img width="90%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/architecture.png?raw=true" />
173
  </p>
174
 
175
  ## 5. Chat Website
 
180
 
181
 
182
  <p align="center">
183
+ <img width="40%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/model_price.png?raw=true">
184
  </p>
185
 
186