Add Reports Based on "Llemma: An Open Language Model For Mathematics"
Browse files## What are you reporting:
- [x] Evaluation dataset(s) found in a pre-training corpus. (e.g. COPA found in ThePile)
- [ ] Evaluation dataset(s) found in a pre-trained model. (e.g. FLAN T5 has been trained on ANLI)
**Evaluation dataset(s)**:
- `hendrycks/competition_math`
- `gsm8k`
**Contaminated model(s)**:
- `EleutherAI/llemma_7b`
- `EleutherAI/llemma_34b`
**Contaminated corpora**:
- `EleutherAI/proof-pile-2`
**Contaminated split(s)**:
- `hendrycks/competition_math`: 7.72 (%) of `test` split
- `gsm8k`: 0.15 (%) of `test` split
## Briefly describe your method to detect data contamination
- [x] Data-based approach
- [ ] Model-based approach
Description of your method, 3-4 sentences. Evidence of data contamination (Read below):
#### Data-based approaches
According to Section 3.5 of [Azerbayev et al. (2024)](https://arxiv.org/abs/2310.10631), the authors inspect whether any 30-gram in a test sequence (either an input problem or an output solution) occurs in any document of the pre-training corpus `Proof-Pile-2`, which they use to train `LLEMMA` models. Base on their exact numbers reported in the *left* part of Table 6, we can estimate the worst case (assuming non-overlapping instances of input problem and output solution) that the percentage of `MATH` test split contaminated would be 386 (348 + 34 + 3 + 1) / 5000 = 7.72 (%); and the percentage of `GSM8k` test split contaminated would be 2 (2 + 0 + 0 + 0) / 1319 = 0.15 (%).
## Citation
URL:
```
https://openreview.net/pdf?id=4WnqRR915j
```
Citation:
```
@inproceedings{
azerbayev2024llemma,
title={Llemma: An Open Language Model for Mathematics},
author={Zhangir Azerbayev and Hailey Schoelkopf and Keiran Paster and Marco Dos Santos and Stephen Marcus McAleer and Albert Q. Jiang and Jia Deng and Stella
Biderman and Sean Welleck},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=4WnqRR915j}
}
```
*Important!* If you wish to be listed as an author in the final report, please complete this information for all the authors of this Pull Request.
1.
- Full name: Wei-Lin Chen
- Institution: National Taiwan University, University of Virginia
- Email: [email protected]
2.
- Full name: Yu-Min Tseng
- Institution: National Taiwan University
- Email: [email protected]
- contamination_report.csv +7 -0
@@ -163,9 +163,16 @@ gigaword;;togethercomputer/RedPajama-Data-V2;;corpus;;;2.82;data-based;https://a
|
|
163 |
|
164 |
gsm8k;;BAAI/Aquila2-34B;;model;;;100.0;model-based;https://huggingface.co/BAAI/Aquila2-34B/blob/main/README.md;21
|
165 |
gsm8k;;BAAI/AquilaChat2-34B;;model;;;100.0;model-based;https://huggingface.co/BAAI/AquilaChat2-34B/blob/main/README.md;21
|
|
|
|
|
|
|
166 |
gsm8k;;GPT-4;;model;100.0;;1.0;data-based;https://arxiv.org/abs/2303.08774;11
|
167 |
gsm8k;;GPT-4;;model;79.00;;;model-based;https://arxiv.org/abs/2311.06233;8
|
168 |
|
|
|
|
|
|
|
|
|
169 |
head_qa;en;EleutherAI/pile;;corpus;;;5.11;data-based;https://arxiv.org/abs/2310.20707;2
|
170 |
head_qa;en;allenai/c4;;corpus;;;5.22;data-based;https://arxiv.org/abs/2310.20707;2
|
171 |
head_qa;en;oscar-corpus/OSCAR-2301;;corpus;;;5.29;data-based;https://arxiv.org/abs/2310.20707;2
|
|
|
163 |
|
164 |
gsm8k;;BAAI/Aquila2-34B;;model;;;100.0;model-based;https://huggingface.co/BAAI/Aquila2-34B/blob/main/README.md;21
|
165 |
gsm8k;;BAAI/AquilaChat2-34B;;model;;;100.0;model-based;https://huggingface.co/BAAI/AquilaChat2-34B/blob/main/README.md;21
|
166 |
+
gsm8k;;EleutherAI/llemma_7b;;model;;;0.15;data-based;https://openreview.net/pdf?id=4WnqRR915j;
|
167 |
+
gsm8k;;EleutherAI/llemma_34b;;model;;;0.15;data-based;https://openreview.net/pdf?id=4WnqRR915j;
|
168 |
+
gsm8k;;EleutherAI/proof-pile-2;;corpus;;;0.15;data-based;https://openreview.net/pdf?id=4WnqRR915j;
|
169 |
gsm8k;;GPT-4;;model;100.0;;1.0;data-based;https://arxiv.org/abs/2303.08774;11
|
170 |
gsm8k;;GPT-4;;model;79.00;;;model-based;https://arxiv.org/abs/2311.06233;8
|
171 |
|
172 |
+
hendrycks/competition_math;;EleutherAI/llemma_7b;;model;;;7.72;data-based;https://openreview.net/pdf?id=4WnqRR915j;
|
173 |
+
hendrycks/competition_math;;EleutherAI/llemma_34b;;model;;;7.72;data-based;https://openreview.net/pdf?id=4WnqRR915j;
|
174 |
+
hendrycks/competition_math;;EleutherAI/proof-pile-2;;corpus;;;7.72;data-based;https://openreview.net/pdf?id=4WnqRR915j;
|
175 |
+
|
176 |
head_qa;en;EleutherAI/pile;;corpus;;;5.11;data-based;https://arxiv.org/abs/2310.20707;2
|
177 |
head_qa;en;allenai/c4;;corpus;;;5.22;data-based;https://arxiv.org/abs/2310.20707;2
|
178 |
head_qa;en;oscar-corpus/OSCAR-2301;;corpus;;;5.29;data-based;https://arxiv.org/abs/2310.20707;2
|