Spaces:
Running
Running
File size: 8,002 Bytes
85bcb47 94b76d5 85bcb47 94b76d5 85bcb47 94b76d5 85bcb47 94b76d5 85bcb47 f24167e 85bcb47 f24167e 85bcb47 f24167e 85bcb47 f24167e 85bcb47 94b76d5 f24167e 85bcb47 f24167e 94b76d5 85bcb47 f24167e 94b76d5 85bcb47 f24167e 94b76d5 85bcb47 f24167e 85bcb47 f24167e 94b76d5 85bcb47 f24167e 85bcb47 f24167e 85bcb47 f24167e 94b76d5 85bcb47 f24167e 85bcb47 94b76d5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 |
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="description" content="DeepSeek: Advancing Open-Source Language Models">
<meta name="keywords" content="DeepSeek, LLM, AI">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>DeepSeek: Advancing Open-Source Language Models</title>
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">
<link rel="stylesheet" href="./static/css/bulma.min.css">
<link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
<link rel="stylesheet" href="./static/css/bulma-slider.min.css">
<link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
<link rel="stylesheet" href="./static/css/index.css">
<link rel="icon" href="./static/images/favicon.svg">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<script defer src="./static/js/fontawesome.all.min.js"></script>
<script src="./static/js/bulma-carousel.min.js"></script>
<script src="./static/js/bulma-slider.min.js"></script>
<script src="./static/js/index.js"></script>
</head>
<body>
<section class="hero">
<div class="hero-body">
<div class="container is-max-desktop">
<div class="columns is-centered">
<div class="column has-text-centered">
<h1 class="title is-1 publication-title">DeepSeek Papers</h1>
<div class="is-size-5 publication-authors">
Advancing Open-Source Language Models
</div>
</div>
</div>
</div>
</div>
</section>
<section class="section">
<div class="container is-max-desktop">
<!-- Abstract. -->
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<h2 class="title is-3">DeepSeek Research Contributions</h2>
<div class="content has-text-justified">
<p>
Below is a list of significant papers by DeepSeek detailing advancements in large language models (LLMs),
ordered by release date from most recent to oldest. Each paper includes a brief description and highlights
upcoming deep dives.
</p>
</div>
</div>
</div>
<!--/ Abstract. -->
<!-- Paper Collection -->
<div class="columns is-centered">
<div class="column is-four-fifths">
<div class="content">
<div class="publication-list">
<!-- Papers in chronological order -->
<div class="publication-item">
<div class="publication-title">
<a href="#">DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning</a>
<span class="tag is-info is-light">[Deep Dive Coming Soon]</span>
</div>
<div class="publication-info">
<strong>Release Date:</strong> January 20, 2025
</div>
<div class="publication-description">
The R1 model enhances reasoning capabilities through large-scale reinforcement learning, competing
directly with leading models like OpenAI's o1.
</div>
</div>
<div class="publication-item">
<div class="publication-title">
<a href="#">DeepSeek-V3 Technical Report</a>
<span class="tag is-info is-light">[Deep Dive Coming Soon]</span>
</div>
<div class="publication-info">
<strong>Release Date:</strong> December 2024
</div>
<div class="publication-description">
This report discusses the scaling of sparse MoE networks to 671 billion parameters, utilizing mixed
precision training and HPC co-design strategies.
</div>
</div>
<div class="publication-item">
<div class="publication-title">
<a href="#">DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model</a>
<span class="tag is-info is-light">[Deep Dive Coming Soon]</span>
</div>
<div class="publication-info">
<strong>Release Date:</strong> May 2024
</div>
<div class="publication-description">
This paper introduces a Mixture-of-Experts (MoE) architecture, enhancing performance while reducing
training costs by 42%.
</div>
</div>
<div class="publication-item">
<div class="publication-title">
<a href="#">DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models</a>
<span class="tag is-info is-light">[Deep Dive Coming Soon]</span>
</div>
<div class="publication-info">
<strong>Release Date:</strong> April 2024
</div>
<div class="publication-description">
This paper presents methods to improve mathematical reasoning in LLMs, introducing the Group
Relative Policy Optimization (GRPO) algorithm.
</div>
</div>
<div class="publication-item">
<div class="publication-title">
<a href="#">DeepSeekLLM: Scaling Open-Source Language Models with Longer-termism</a>
<span class="tag is-info is-light">[Deep Dive Coming Soon]</span>
</div>
<div class="publication-info">
<strong>Release Date:</strong> November 29, 2023
</div>
<div class="publication-description">
This foundational paper explores scaling laws and the trade-offs between data and model size,
establishing the groundwork for subsequent models.
</div>
</div>
<div class="publication-item">
<div class="publication-title">
<a href="#">DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data</a>
<span class="tag is-info is-light">[Deep Dive Coming Soon]</span>
</div>
<div class="publication-description">
Focuses on enhancing theorem proving capabilities in language models using synthetic data for training.
</div>
</div>
<div class="publication-item">
<div class="publication-title">
<a href="#">DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence</a>
<span class="tag is-info is-light">[Deep Dive Coming Soon]</span>
</div>
<div class="publication-description">
This paper details advancements in code-related tasks with an emphasis on open-source methodologies,
improving upon earlier coding models.
</div>
</div>
<div class="publication-item">
<div class="publication-title">
<a href="#">DeepSeekMoE</a>
<span class="tag is-info is-light">[Deep Dive Coming Soon]</span>
</div>
<div class="publication-description">
Discusses the integration and benefits of the Mixture-of-Experts approach within the DeepSeek framework.
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</section>
<footer class="footer">
<div class="container">
<div class="content has-text-centered">
<p>
This website is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative
Commons Attribution-ShareAlike 4.0 International License</a>.
</p>
</div>
</div>
</footer>
</body>
</html> |