deepseek-ai
/

DeepSeek-V2

@@ -3,19 +3,19 @@
 <!-- markdownlint-disable no-duplicate-header -->
 <div align="center">
-  <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg" width="60%" alt="DeepSeek LLM" />
 </div>
 <hr>
 <div align="center">
   <a href="https://www.deepseek.com/" target="_blank">
-    <img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg" />
   </a>
   <a href="https://chat.deepseek.com/" target="_blank">
-    <img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-DeepSeek%20LLM-536af5?color=536af5&logoColor=white" />
   </a>
   <a href="https://huggingface.co/deepseek-ai" target="_blank">
-    <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white" />
   </a>
 </div>
@@ -23,13 +23,13 @@
 <div align="center">
   <a href="https://discord.gg/Tc7c45Zzu5" target="_blank">
-    <img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da" />
   </a>
   <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg" target="_blank">
-    <img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white" />
   </a>
   <a href="https://twitter.com/deepseek_ai" target="_blank">
-    <img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white" />
   </a>
 </div>
@@ -37,10 +37,10 @@
 <div align="center">
   <a href="LICENSE-CODE">
-    <img alt="Code License" src="https://img.shields.io/badge/Code_License-MIT-f5de53?&color=f5de53">
   </a>
   <a href="LICENSE-MODEL">
-    <img alt="Model License" src="https://img.shields.io/badge/Model_License-Model_Agreement-f5de53?&color=f5de53">
   </a>
 </div>
@@ -66,8 +66,8 @@ Today, we’re introducing DeepSeek-V2, a strong Mixture-of-Experts (MoE) langua
 <p align="center">
 <div style="display: flex; justify-content: center;">
-    <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/activationparameters.png" style="height:300px; width:auto; margin-right:10px">
-    <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/trainingcost.png" style="height:300px; width:auto; margin-left:10px">
 </div>
 </p>
 We pretrained DeepSeek-V2 on a diverse and high-quality corpus comprising 8.1 trillion tokens. This comprehensive pretraining was followed by a process of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unleash the model's capabilities. The evaluation results validate the effectiveness of our approach as DeepSeek-V2 achieves remarkable performance on both standard benchmarks and open-ended generation evaluation.
@@ -107,7 +107,7 @@ For more evaluation details, such as few-shot settings and prompts, please check
 #### Context Window
 <p align="center">
-  <img width="80%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/niah.png">
 </p>
 Evaluation results on the ``Needle In A Haystack`` (NIAH) tests.  DeepSeek-V2 performs well across all context window lengths up to **128K**.
@@ -133,7 +133,7 @@ Evaluation results on the ``Needle In A Haystack`` (NIAH) tests.  DeepSeek-V2 pe
 #### English Open Ended Generation Evaluation
 We evaluate our model on AlpacaEval 2.0 and MTBench, showing the competitive performance of DeepSeek-V2-Chat-RL on English conversation generation.
 <p align="center">
-  <img width="50%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/mtbench.png" />
 </p>
 #### Chinese Open Ended Generation Evaluation
@@ -160,7 +160,7 @@ We evaluate our model on AlpacaEval 2.0 and MTBench, showing the competitive per
 We evaluate our model on LiveCodeBench (0901-0401), a benchmark designed for live coding challenges. As illustrated, DeepSeek-V2 demonstrates considerable proficiency in LiveCodeBench, achieving a Pass@1 score that surpasses several other sophisticated models. This performance highlights the model's effectiveness in tackling live coding tasks.
 <p align="center">
-  <img width="50%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/code_benchmarks.png">
 </p>
 ## 4. Model Architecture
@@ -169,7 +169,7 @@ DeepSeek-V2 adopts innovative architectures to guarantee economical training and
 - For Feed-Forward Networks (FFNs), we adopt DeepSeekMoE architecture, a high-performance MoE architecture that enables training stronger models at lower costs.
 <p align="center">
-  <img width="90%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/architecture.png" />
 </p>
 ## 5. Chat Website
@@ -180,7 +180,7 @@ We also provide OpenAI-Compatible API at DeepSeek Platform: [platform.deepseek.c
 <p align="center">
-  <img width="40%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/model_price.png">
 </p>

 <!-- markdownlint-disable no-duplicate-header -->
 <div align="center">
+  <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek LLM" />
 </div>
 <hr>
 <div align="center">
   <a href="https://www.deepseek.com/" target="_blank">
+    <img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" />
   </a>
   <a href="https://chat.deepseek.com/" target="_blank">
+    <img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-DeepSeek%20LLM-536af5?color=536af5&logoColor=white?raw=true" />
   </a>
   <a href="https://huggingface.co/deepseek-ai" target="_blank">
+    <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white?raw=true" />
   </a>
 </div>
 <div align="center">
   <a href="https://discord.gg/Tc7c45Zzu5" target="_blank">
+    <img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da?raw=true" />
   </a>
   <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg" target="_blank">
+    <img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white?raw=true" />
   </a>
   <a href="https://twitter.com/deepseek_ai" target="_blank">
+    <img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white?raw=true" />
   </a>
 </div>
 <div align="center">
   <a href="LICENSE-CODE">
+    <img alt="Code License" src="https://img.shields.io/badge/Code_License-MIT-f5de53?&color=f5de53?raw=true">
   </a>
   <a href="LICENSE-MODEL">
+    <img alt="Model License" src="https://img.shields.io/badge/Model_License-Model_Agreement-f5de53?&color=f5de53?raw=true">
   </a>
 </div>
 <p align="center">
 <div style="display: flex; justify-content: center;">
+    <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/activationparameters.png?raw=true" style="height:300px; width:auto; margin-right:10px">
+    <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/trainingcost.png?raw=true" style="height:300px; width:auto; margin-left:10px">
 </div>
 </p>
 We pretrained DeepSeek-V2 on a diverse and high-quality corpus comprising 8.1 trillion tokens. This comprehensive pretraining was followed by a process of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unleash the model's capabilities. The evaluation results validate the effectiveness of our approach as DeepSeek-V2 achieves remarkable performance on both standard benchmarks and open-ended generation evaluation.
 #### Context Window
 <p align="center">
+  <img width="80%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/niah.png?raw=true">
 </p>
 Evaluation results on the ``Needle In A Haystack`` (NIAH) tests.  DeepSeek-V2 performs well across all context window lengths up to **128K**.
 #### English Open Ended Generation Evaluation
 We evaluate our model on AlpacaEval 2.0 and MTBench, showing the competitive performance of DeepSeek-V2-Chat-RL on English conversation generation.
 <p align="center">
+  <img width="50%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/mtbench.png?raw=true" />
 </p>
 #### Chinese Open Ended Generation Evaluation
 We evaluate our model on LiveCodeBench (0901-0401), a benchmark designed for live coding challenges. As illustrated, DeepSeek-V2 demonstrates considerable proficiency in LiveCodeBench, achieving a Pass@1 score that surpasses several other sophisticated models. This performance highlights the model's effectiveness in tackling live coding tasks.
 <p align="center">
+  <img width="50%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/code_benchmarks.png?raw=true">
 </p>
 ## 4. Model Architecture
 - For Feed-Forward Networks (FFNs), we adopt DeepSeekMoE architecture, a high-performance MoE architecture that enables training stronger models at lower costs.
 <p align="center">
+  <img width="90%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/architecture.png?raw=true" />
 </p>
 ## 5. Chat Website
 <p align="center">
+  <img width="40%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/model_price.png?raw=true">
 </p>