wzxii commited on
Commit
7d88398
β€’
1 Parent(s): 3f0883c

Upload 2 files

Browse files
Files changed (2) hide show
  1. index.html +11 -11
  2. style.css +4 -3
index.html CHANGED
@@ -137,16 +137,16 @@
137
  <div>
138
  <p>The growing number of code models released by the community necessitates a comprehensive evaluation to
139
  reliably benchmark their capabilities.
140
- Similar to the πŸ€— Open LLM Leaderboard, we selected two common benchmarks for evaluating Code LLMs on
141
- multiple programming languages:</p>
142
  <ul>
143
- <li>HumanEval - benchmark for measuring functional correctness for synthesizing programs from
144
- docstrings. It consists of 164 Python programming problems.</li>
145
- <li>MultiPL-E - Translation of HumanEval to 18 programming languages.</li>
146
- <li>Throughput Measurement - In addition to these benchmarks, we also measure model throughput on a
147
- batch size of 1 and 50 to compare their inference speed.</li>
148
  </ul>
149
- <h3>Benchmark & Prompts</h3>
150
  <ul>
151
  <li>HumanEval-Python reports the pass@1 on HumanEval; the rest is from MultiPL-E benchmark.</li>
152
  <li>For all languages, we use the original benchamrk prompts for all models except HumanEval-Python,
@@ -159,8 +159,8 @@
159
  </ul>
160
  <p>Figure below shows the example of OctoCoder vs Base HumanEval prompt, you can find the other prompts
161
  here.</p>
162
- </div>
163
- <div>
164
  <p>- An exception to this is the Phind models. They seem to follow to base prompts better than the
165
  instruction versions.
166
  Therefore, following the authors' recommendation we use base HumanEval prompts without stripping them of
@@ -189,7 +189,7 @@
189
  <li>#Languages column represents the number of programming languages included during the pretraining.
190
  UNK means the number of languages is unknown.</li>
191
  </ul>
192
- </div>
193
  </section>
194
 
195
  <section class="section_submit" id="sec_submit">
 
137
  <div>
138
  <p>The growing number of code models released by the community necessitates a comprehensive evaluation to
139
  reliably benchmark their capabilities.
140
+ Similar to the <a href="https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard" target="_blank">πŸ€— Open LLM Leaderboard</a>,
141
+ we selected two common benchmarks for evaluating Code LLMs on multiple programming languages:</p>
142
  <ul>
143
+ <li><a href="https://huggingface.co/datasets/openai_humaneval" target="_blank">HumanEval</a>
144
+ - benchmark for measuring functional correctness for synthesizing programs from docstrings.
145
+ It consists of 164 Python programming problems.</li>
146
+ <li><a href="https://github.com/YihongDong/CodeGenEvaluation" target="_blank">HumanEval-ET</a>.</li>
147
+ <li>MGI - In addition to these benchmarks, we also measure Memorization-Generalization Index.</li>
148
  </ul>
149
+ <!-- <h3>Benchmark & Prompts</h3>
150
  <ul>
151
  <li>HumanEval-Python reports the pass@1 on HumanEval; the rest is from MultiPL-E benchmark.</li>
152
  <li>For all languages, we use the original benchamrk prompts for all models except HumanEval-Python,
 
159
  </ul>
160
  <p>Figure below shows the example of OctoCoder vs Base HumanEval prompt, you can find the other prompts
161
  here.</p>
162
+ </div> -->
163
+ <!-- <div>
164
  <p>- An exception to this is the Phind models. They seem to follow to base prompts better than the
165
  instruction versions.
166
  Therefore, following the authors' recommendation we use base HumanEval prompts without stripping them of
 
189
  <li>#Languages column represents the number of programming languages included during the pretraining.
190
  UNK means the number of languages is unknown.</li>
191
  </ul>
192
+ </div> -->
193
  </section>
194
 
195
  <section class="section_submit" id="sec_submit">
style.css CHANGED
@@ -251,9 +251,10 @@
251
  .section_about h3 {
252
  font-size: 18px;
253
  }
254
- .section_about img {
255
- margin-top: 20px;
256
- width:900px;
 
257
  }
258
  .section_about div {
259
  margin-top: 10px;
 
251
  .section_about h3 {
252
  font-size: 18px;
253
  }
254
+ .section_about a {
255
+ color: #386df4;
256
+ text-decoration-color: #0909f8;
257
+ text-decoration-style: dashed;
258
  }
259
  .section_about div {
260
  margin-top: 10px;