Spaces:
Running
Running
<html> | |
<head> | |
<meta charset="utf-8"> | |
<meta name="description" content="DeepSeek: Advancing Open-Source Language Models"> | |
<meta name="keywords" content="DeepSeek, LLM, AI"> | |
<meta name="viewport" content="width=device-width, initial-scale=1"> | |
<title>DeepSeek: Advancing Open-Source Language Models</title> | |
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet"> | |
<link rel="stylesheet" href="./static/css/bulma.min.css"> | |
<link rel="stylesheet" href="./static/css/bulma-carousel.min.css"> | |
<link rel="stylesheet" href="./static/css/bulma-slider.min.css"> | |
<link rel="stylesheet" href="./static/css/fontawesome.all.min.css"> | |
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css"> | |
<link rel="stylesheet" href="./static/css/index.css"> | |
<link rel="icon" href="./static/images/favicon.svg"> | |
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script> | |
<script defer src="./static/js/fontawesome.all.min.js"></script> | |
<script src="./static/js/bulma-carousel.min.js"></script> | |
<script src="./static/js/bulma-slider.min.js"></script> | |
<script src="./static/js/index.js"></script> | |
</head> | |
<body> | |
<section class="hero"> | |
<div class="hero-body"> | |
<div class="container is-max-desktop"> | |
<div class="columns is-centered"> | |
<div class="column has-text-centered"> | |
<h1 class="title is-1 publication-title">DeepSeek Papers</h1> | |
<div class="is-size-5 publication-authors"> | |
Advancing Open-Source Language Models | |
</div> | |
</div> | |
</div> | |
</div> | |
</div> | |
</section> | |
<section class="section"> | |
<div class="container is-max-desktop"> | |
<!-- Abstract. --> | |
<div class="columns is-centered has-text-centered"> | |
<div class="column is-four-fifths"> | |
<h2 class="title is-3">DeepSeek Research Contributions</h2> | |
<div class="content has-text-justified"> | |
<p> | |
Below is a list of significant papers by DeepSeek detailing advancements in large language models (LLMs), | |
ordered by release date from most recent to oldest. Each paper includes a brief description and highlights | |
upcoming deep dives. | |
</p> | |
</div> | |
</div> | |
</div> | |
<!--/ Abstract. --> | |
<!-- Paper Collection --> | |
<div class="columns is-centered"> | |
<div class="column is-four-fifths"> | |
<div class="content"> | |
<div class="publication-list"> | |
<!-- Papers in chronological order --> | |
<div class="publication-item"> | |
<div class="publication-title"> | |
<a href="#">DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning</a> | |
<span class="tag is-info is-light">[Deep Dive Coming Soon]</span> | |
</div> | |
<div class="publication-info"> | |
<strong>Release Date:</strong> January 20, 2025 | |
</div> | |
<div class="publication-description"> | |
The R1 model enhances reasoning capabilities through large-scale reinforcement learning, competing | |
directly with leading models like OpenAI's o1. | |
</div> | |
</div> | |
<div class="publication-item"> | |
<div class="publication-title"> | |
<a href="#">DeepSeek-V3 Technical Report</a> | |
<span class="tag is-info is-light">[Deep Dive Coming Soon]</span> | |
</div> | |
<div class="publication-info"> | |
<strong>Release Date:</strong> December 2024 | |
</div> | |
<div class="publication-description"> | |
This report discusses the scaling of sparse MoE networks to 671 billion parameters, utilizing mixed | |
precision training and HPC co-design strategies. | |
</div> | |
</div> | |
<div class="publication-item"> | |
<div class="publication-title"> | |
<a href="#">DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model</a> | |
<span class="tag is-info is-light">[Deep Dive Coming Soon]</span> | |
</div> | |
<div class="publication-info"> | |
<strong>Release Date:</strong> May 2024 | |
</div> | |
<div class="publication-description"> | |
This paper introduces a Mixture-of-Experts (MoE) architecture, enhancing performance while reducing | |
training costs by 42%. | |
</div> | |
</div> | |
<div class="publication-item"> | |
<div class="publication-title"> | |
<a href="#">DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models</a> | |
<span class="tag is-info is-light">[Deep Dive Coming Soon]</span> | |
</div> | |
<div class="publication-info"> | |
<strong>Release Date:</strong> April 2024 | |
</div> | |
<div class="publication-description"> | |
This paper presents methods to improve mathematical reasoning in LLMs, introducing the Group | |
Relative Policy Optimization (GRPO) algorithm. | |
</div> | |
</div> | |
<div class="publication-item"> | |
<div class="publication-title"> | |
<a href="#">DeepSeekLLM: Scaling Open-Source Language Models with Longer-termism</a> | |
<span class="tag is-info is-light">[Deep Dive Coming Soon]</span> | |
</div> | |
<div class="publication-info"> | |
<strong>Release Date:</strong> November 29, 2023 | |
</div> | |
<div class="publication-description"> | |
This foundational paper explores scaling laws and the trade-offs between data and model size, | |
establishing the groundwork for subsequent models. | |
</div> | |
</div> | |
<div class="publication-item"> | |
<div class="publication-title"> | |
<a href="#">DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data</a> | |
<span class="tag is-info is-light">[Deep Dive Coming Soon]</span> | |
</div> | |
<div class="publication-description"> | |
Focuses on enhancing theorem proving capabilities in language models using synthetic data for training. | |
</div> | |
</div> | |
<div class="publication-item"> | |
<div class="publication-title"> | |
<a href="#">DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence</a> | |
<span class="tag is-info is-light">[Deep Dive Coming Soon]</span> | |
</div> | |
<div class="publication-description"> | |
This paper details advancements in code-related tasks with an emphasis on open-source methodologies, | |
improving upon earlier coding models. | |
</div> | |
</div> | |
<div class="publication-item"> | |
<div class="publication-title"> | |
<a href="#">DeepSeekMoE</a> | |
<span class="tag is-info is-light">[Deep Dive Coming Soon]</span> | |
</div> | |
<div class="publication-description"> | |
Discusses the integration and benefits of the Mixture-of-Experts approach within the DeepSeek framework. | |
</div> | |
</div> | |
</div> | |
</div> | |
</div> | |
</div> | |
</div> | |
</section> | |
<footer class="footer"> | |
<div class="container"> | |
<div class="content has-text-centered"> | |
<p> | |
This website is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative | |
Commons Attribution-ShareAlike 4.0 International License</a>. | |
</p> | |
</div> | |
</div> | |
</footer> | |
</body> | |
</html> |