lvwerra HF staff terryyz commited on
Commit
f55aa1d
1 Parent(s): f5e08bb

Fix format & Add BigCodeBench (#10)

Browse files

- Fix format & Add BigCodeBench (a1e04bec5d16ba9ad8e87d0f4f29fd72067deac9)
- Update README.md (7d69a4ab7102ef1affb896d091e6961a283f0e80)


Co-authored-by: Terry Yue Zhuo <[email protected]>

Files changed (1) hide show
  1. README.md +22 -1
README.md CHANGED
@@ -41,6 +41,7 @@ BigCode is an open scientific collaboration working on responsible training of l
41
  - [StarCoder2 Search](https://huggingface.co/spaces/bigcode/search-v2): Full-text search code in the pretraining dataset.
42
  - [StarCoder2 Membership Test](https://stack-v2.dataportraits.org/): Blazing fast test if code was present in pretraining dataset.
43
  </details>
 
44
  ---
45
  <details>
46
  <summary>
@@ -52,6 +53,7 @@ BigCode is an open scientific collaboration working on responsible training of l
52
  - [The Stack v2 dedup](https://huggingface.co/datasets/bigcode/the-stack-v2-dedup): Near deduplicated version of The Stack v2 (recommended for training).
53
  - [Am I in the Stack](https://huggingface.co/spaces/bigcode/in-the-stack): Check if your data is in The Stack and request opt-out.
54
  </details>
 
55
  ---
56
  <details>
57
  <summary>
@@ -82,17 +84,34 @@ BigCode is an open scientific collaboration working on responsible training of l
82
  - [StarCoder Search](https://huggingface.co/spaces/bigcode/search): Full-text search code in the pretraining dataset.
83
  - [StarCoder Membership Test](https://stack.dataportraits.org/): Blazing fast test if code was present in pretraining dataset.
84
  </details>
 
85
  ---
86
  <details>
87
  <summary>
88
  <b><font size="+1">📑The Stack</font></b>
89
  </summary>
90
  The Stack v1 is a 6.4TB dataset of source code in 358 programming languages from permissive licenses.
91
-
92
  - [The Stack](https://huggingface.co/datasets/bigcode/the-stack): Exact deduplicated version of The Stack.
93
  - [The Stack dedup](https://huggingface.co/datasets/bigcode/the-stack-dedup): Near deduplicated version of The Stack (recommended for training).
94
  - [Am I in the Stack](https://huggingface.co/spaces/bigcode/in-the-stack): Check if your data is in The Stack and request opt-out.
95
  </details>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96
  ---
97
  <details>
98
  <summary>
@@ -111,6 +130,7 @@ BigCode is an open scientific collaboration working on responsible training of l
111
  - [OctoCoder Demo](https://huggingface.co/spaces/bigcode/OctoCoder-Demo): Play with OctoCoder.
112
  - [OctoGeeX](https://huggingface.co/bigcode/octogeex): Instruction tuned model of CodeGeeX2 by training on CommitPackFT.
113
  </details>
 
114
  ---
115
  <details>
116
  <summary>
@@ -126,6 +146,7 @@ BigCode is an open scientific collaboration working on responsible training of l
126
  - [Astraios-7B](https://huggingface.co/collections/bigcode/astraios-7b-65788b509c5c26f96c08d576): Collection of StarCoderBase-7B models instruction tuned on CommitPackFT + OASST with 7 method.
127
  - [Astraios-15B](https://huggingface.co/collections/bigcode/astraios-15b-65788b7476b6de79781054cc): Collection of StarCoderBase-15B models instruction tuned on CommitPackFT + OASST with 7 method.
128
  </details>
 
129
  ---
130
  <details>
131
  <summary>
 
41
  - [StarCoder2 Search](https://huggingface.co/spaces/bigcode/search-v2): Full-text search code in the pretraining dataset.
42
  - [StarCoder2 Membership Test](https://stack-v2.dataportraits.org/): Blazing fast test if code was present in pretraining dataset.
43
  </details>
44
+
45
  ---
46
  <details>
47
  <summary>
 
53
  - [The Stack v2 dedup](https://huggingface.co/datasets/bigcode/the-stack-v2-dedup): Near deduplicated version of The Stack v2 (recommended for training).
54
  - [Am I in the Stack](https://huggingface.co/spaces/bigcode/in-the-stack): Check if your data is in The Stack and request opt-out.
55
  </details>
56
+
57
  ---
58
  <details>
59
  <summary>
 
84
  - [StarCoder Search](https://huggingface.co/spaces/bigcode/search): Full-text search code in the pretraining dataset.
85
  - [StarCoder Membership Test](https://stack.dataportraits.org/): Blazing fast test if code was present in pretraining dataset.
86
  </details>
87
+
88
  ---
89
  <details>
90
  <summary>
91
  <b><font size="+1">📑The Stack</font></b>
92
  </summary>
93
  The Stack v1 is a 6.4TB dataset of source code in 358 programming languages from permissive licenses.
94
+
95
  - [The Stack](https://huggingface.co/datasets/bigcode/the-stack): Exact deduplicated version of The Stack.
96
  - [The Stack dedup](https://huggingface.co/datasets/bigcode/the-stack-dedup): Near deduplicated version of The Stack (recommended for training).
97
  - [Am I in the Stack](https://huggingface.co/spaces/bigcode/in-the-stack): Check if your data is in The Stack and request opt-out.
98
  </details>
99
+
100
+ ---
101
+ <details>
102
+ <summary>
103
+ <b><font size="+1">🌸BigCodeBench</font></b>
104
+ </summary>
105
+ BigCodeBench is the next generation of HumanEval, benchmarking code generation with diverse function calls and complex instructions.
106
+
107
+ - [Github](https://github.com/bigcode-project/bigcodebench): Evaluation tool designed for BigCodeBench.
108
+ - [HF Leaderboard](https://huggingface.co/spaces/bigcode/bigcodebench-leaderboard): BigCodeBench leaderboard hosted on Hugging Face.
109
+ - [GP Leaderboard](https://huggingface.co/spaces/bigcode/bigcodebench-leaderboard): BigCodeBench leaderboard hosted on GitHub Pages.
110
+ - [Dataset](https://huggingface.co/datasets/bigcode/bigcodebench): BigCodeBench dataset.
111
+ - [Data Viewer](https://huggingface.co/spaces/bigcode/bigcodebench-viewer): Explore BigCodeBench data in an interactive demo.
112
+ - [Paper](https://github.com/bigcode-bench/bigcode-bench.github.io/blob/main/paper.pdf): Research paper with details about BigCodeBench.
113
+ </details>
114
+
115
  ---
116
  <details>
117
  <summary>
 
130
  - [OctoCoder Demo](https://huggingface.co/spaces/bigcode/OctoCoder-Demo): Play with OctoCoder.
131
  - [OctoGeeX](https://huggingface.co/bigcode/octogeex): Instruction tuned model of CodeGeeX2 by training on CommitPackFT.
132
  </details>
133
+
134
  ---
135
  <details>
136
  <summary>
 
146
  - [Astraios-7B](https://huggingface.co/collections/bigcode/astraios-7b-65788b509c5c26f96c08d576): Collection of StarCoderBase-7B models instruction tuned on CommitPackFT + OASST with 7 method.
147
  - [Astraios-15B](https://huggingface.co/collections/bigcode/astraios-15b-65788b7476b6de79781054cc): Collection of StarCoderBase-15B models instruction tuned on CommitPackFT + OASST with 7 method.
148
  </details>
149
+
150
  ---
151
  <details>
152
  <summary>