File size: 11,886 Bytes
4a00e43 cb42fb5 4a00e43 28e5488 8d08f8c 4a00e43 7c537d9 4a00e43 8e3115f 38cb3c2 8e3115f 38cb3c2 8e3115f 38cb3c2 8e3115f 38cb3c2 8e3115f 38cb3c2 7c537d9 4a00e43 8e3115f 38cb3c2 8e3115f 38cb3c2 8e3115f 38cb3c2 8e3115f 38cb3c2 7c537d9 4a00e43 8e3115f 38cb3c2 8e3115f 38cb3c2 8e3115f 38cb3c2 7c537d9 cb42fb5 4a00e43 7c537d9 cb42fb5 4a00e43 0e12c29 8d1d8ae 0e12c29 49bde7b 0e12c29 4a00e43 0e12c29 1c64301 0e12c29 4a00e43 38cb3c2 52f5445 38cb3c2 4a00e43 0e12c29 4a00e43 713e52d 5c7e293 8d08f8c 713e52d cb42fb5 7c537d9 4a00e43 cb42fb5 84b38c9 4a00e43 bf2e6d8 a7a2b39 bf2e6d8 0e12c29 40eda1d 4a00e43 7bf2f17 4a00e43 7c537d9 302d0a4 bf2e6d8 302d0a4 bf2e6d8 0bfff66 bf2e6d8 302d0a4 bf2e6d8 38cb3c2 0e12c29 4a00e43 37cf514 0e12c29 aa490c1 49bde7b aa490c1 0e12c29 3a48bd4 4a00e43 0e12c29 3a48bd4 4a00e43 0e12c29 3a48bd4 4a00e43 fbe5b7c 1e718ae fbe5b7c 4a00e43 7c537d9 4a00e43 7c537d9 4a00e43 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 |
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="description" content="Atla Selene Mini: A General Purpose Evaluation Model">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Atla Selene Mini: A General Purpose Evaluation Model</title>
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">
<link rel="stylesheet" href="./static/css/bulma.min.css">
<link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
<link rel="stylesheet" href="./static/css/bulma-slider.min.css">
<link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
<link rel="stylesheet" href="./static/css/index.css">
<link rel="icon" href="./static/images/favicon.svg">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<script defer src="./static/js/fontawesome.all.min.js"></script>
<script src="./static/js/bulma-carousel.min.js"></script>
<script src="./static/js/bulma-slider.min.js"></script>
<script src="./static/js/index.js"></script>
<style>
@keyframes rainbow-shimmer {
0% { background-position: 0% 50%; }
50% { background-position: 100% 50%; }
100% { background-position: 0% 50%; }
}
.rainbow-button {
background: linear-gradient(45deg, #ff0000, #ff7f00, #ffff00, #00ff00, #0000ff, #8b00ff);
background-size: 300% 300%;
color: white !important;
font-weight: bold;
animation: rainbow-shimmer 5s ease infinite;
transition: all 0.3s ease;
}
.rainbow-button:hover {
transform: scale(1.05);
box-shadow: 0 0 10px rgba(0,0,0,0.2);
}
</style>
</head>
<body>
<section class="hero">
<div class="hero-body">
<div class="container is-max-desktop">
<div class="columns is-centered">
<div class="column has-text-centered">
<h1 class="title is-1 publication-title">Atla Selene Mini:<br>A General Purpose Evaluation Model</h1>
<div class="is-size-5 publication-authors">
<span class="author-block">
<a href="https://huggingface.co/inwaves" target="_blank">Andrei Alexandru</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://huggingface.co/NinaCalvi" target="_blank">Antonia Calvi</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://huggingface.co/HennersBro98" target="_blank">Henry Broomfield</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://huggingface.co/jacksongolden" target="_blank">Jackson Golden</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://huggingface.co/kaikaidai" target="_blank">Kyle Dai</a><sup>1</sup>,</span>
</div>
<div class="is-size-5 publication-authors">
<span class="author-block">
<a href="https://huggingface.co/mathias-atla" target="_blank">Mathias Leys</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://huggingface.co/MauriceBurg" target="_blank">Maurice Burger</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://huggingface.co/mbartolo" target="_blank">Max Bartolo</a><sup>2,3</sup>,</span>
<span class="author-block">
<a href="https://huggingface.co/RomanEngeler1805" target="_blank">Roman Engeler</a><sup>1</sup>,</span>
</div>
<div class="is-size-5 publication-authors">
<span class="author-block">
<a href="https://huggingface.co/spisupat" target="_blank">Sashank Pisupati</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://huggingface.co/tobydrane" target="_blank">Toby Drane</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://huggingface.co/youngsunpark" target="_blank">Young Sun Park</a><sup>1</sup></span>
</div>
<div class="is-size-5 publication-authors">
<span class="author-block"><sup>1</sup>atla,</span>
<span class="author-block"><sup>2</sup>University College London,</span>
<span class="author-block"><sup>3</sup>Cohere</span>
</div>
<div class="column has-text-centered">
<div class="publication-links">
<!-- PDF Link -->
<span class="link-block">
<a href="https://arxiv.org/pdf/2501.17195v1" target="_blank"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fas fa-file-pdf"></i>
</span>
<span>arXiv</span>
</a>
</span>
<!-- HuggingFace Link -->
<span class="link-block">
<a href="https://hf.co/AtlaAI/Selene-1-Mini-Llama-3.1-8B" target="_blank"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
🤗
</span>
<span>HuggingFace</span>
</a>
</span>
<!-- Github Link -->
<span class="link-block">
<a href="https://github.com/atla-ai/selene-mini" target="_blank"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fab fa-github"></i>
</span>
<span>Cookbooks</span>
</a>
</span>
<!-- Ollama Link -->
<span class="link-block">
<a href="https://ollama.com/atla/selene-mini" target="_blank"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fas fa-code"></i>
</span>
<span>Ollama</span>
</a>
</span>
</div>
<!-- New API Sign-up Button -->
<div class="publication-links" style="margin-top: 1rem;">
<span class="link-block">
<a href="https://www.atla-ai.com/sign-up-waitlist?utm_source=huggingface&utm_medium=community&utm_campaign=WL_HF_all_communitypost_sel1minilaunch" target="_blank"
class="external-link button is-normal is-rounded rainbow-button">
<span>Sign up for the API</span>
</a>
</span>
</div>
</div>
</div>
</div>
</div>
</div>
</section>
<section class="section" style="padding-top: 0;">
<div class="container is-max-desktop">
<!-- Logo -->
<div class="columns is-centered has-text-centered">
<div class="column is-2">
<div style="max-width: 200px; margin: 0 auto;">
<img src="figs/atla-logo.png" alt="Atla Logo" style="width: 100%; height: auto;">
</div>
</div>
</div>
<!-- Abstract -->
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<h2 class="title is-3">Abstract</h2>
<div class="content has-text-justified">
<p>
We introduce Atla Selene Mini, a state-of-the-art small language model-as-a-judge (SLMJ). Selene Mini is a general-purpose evaluator that outperforms the best SLMJs and GPT-4o-mini on overall performance across 11 out-of-distribution benchmarks, spanning absolute scoring, classification, and pairwise preference tasks. It is the highest-scoring 8B generative model on RewardBench, surpassing strong baselines like GPT-4o and specialized judges.
</p>
<p>
To achieve this, we develop a principled data curation strategy that augments public datasets with synthetically generated critiques and ensures high quality through filtering and dataset ablations. We train our model on a combined direct preference optimization (DPO) and supervised fine-tuning (SFT) loss, and produce a highly promptable evaluator that excels in real-world scenarios.
</p>
<p>
Selene Mini shows dramatically improved zero-shot agreement with human expert evaluations on financial and medical industry datasets. It is also robust to variations in prompt format. Preliminary results indicate that Selene Mini is the top-ranking evaluator in a live, community-driven <a href="https://huggingface.co/blog/arena-atla" target="_blank">Judge Arena</a>. We release the model weights on <a href="https://hf.co/AtlaAI/Selene-1-Mini-Llama-3.1-8B" target="_blank">HuggingFace</a> and <a href="https://ollama.com/atla/selene-mini" target="_blank">Ollama</a> to encourage widespread community adoption.
</p>
</div>
</div>
</div>
<!-- Demo Video -->
<div class="columns is-centered">
<div class="column is-four-fifths">
<div class="content has-text-centered">
<video controls width="800" autoplay loop muted>
<source src="figs/demo.mp4" type="video/mp4">
Your browser does not support the video element.
</video>
<p class="subtitle">
Demo of Atla Selene Mini on our <a href="https://huggingface.co/spaces/AtlaAI/selene" target="_blank">playground</a>
</p>
</div>
</div>
</div>
<!-- Key Results -->
<div class="columns is-centered">
<div class="column is-four-fifths">
<h2 class="title is-3 has-text-centered">Key Results</h2>
<div class="content has-text-justified">
<div class="columns is-centered has-text-centered">
Read the full technical report <a href="https://arxiv.org/pdf/2501.17195v1" target="_blank">here</a>
</div>
<figure class="image">
<img src="figs/Fig1.png" alt="Performance comparison">
<figcaption>
<b>Figure 1:</b> Atla Selene Mini outperforms current state-of-the-art SLMJs: a) Overall task-average performance, comparing Atla Selene Mini (black) with the best and most widely used SLMJs. b) Breakdown of performance by task type and benchmark.
</figcaption>
</figure>
<figure class="image">
<img src="figs/Fig2.png" alt="Data curation strategy">
<figcaption>
<b>Figure 2:</b> Data curation strategy: The process of transforming a candidate dataset (left) into the final training mix (right). Yellow boxes indicate filtering steps, purple represents synthetic generation of chosen and rejected pairs for preference optimization.
</figcaption>
</figure>
<figure class="image">
<img src="figs/Fig3.png" alt="Real-world evaluation">
<figcaption>
<b>Figure 3:</b> Real-world evaluation: a) Performance on domain-specific industry benchmarks b) Performance on RewardBench with different prompt formats c) Performance measured by ELO scores in Judge Arena.
</figcaption>
</figure>
<div class="columns is-centered has-text-centered">
Our larger model from the Selene family will be released soon. Sign up to our  <a href="https://www.atla-ai.com/sign-up-waitlist" target="_blank">waitlist</a>  to get first access.
</div>
</div>
</div>
</div>
</div>
</section>
<footer class="footer">
<div class="container">
<div class="content has-text-centered">
<p>
© 2025 Atla AI
</p>
</div>
</div>
</footer>
</body>
</html> |