Spaces:
Running
Running
Add OctoPack
#4
by
Muennighoff
- opened
README.md
CHANGED
@@ -47,6 +47,20 @@ StarCoder is a 15.5B parameters language model for code trained for 1T tokens on
|
|
47 |
- [StarCoder Search](https://huggingface.co/spaces/bigcode/search): Full-text search code in the pretraining dataset.
|
48 |
- [StarCoder Membership Test](https://stack.dataportraits.org/): Blazing fast test if code was present in pretraining dataset.
|
49 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
50 |
|
51 |
---
|
52 |
|
|
|
47 |
- [StarCoder Search](https://huggingface.co/spaces/bigcode/search): Full-text search code in the pretraining dataset.
|
48 |
- [StarCoder Membership Test](https://stack.dataportraits.org/): Blazing fast test if code was present in pretraining dataset.
|
49 |
|
50 |
+
---
|
51 |
+
|
52 |
+
## 🐙OctoPack
|
53 |
+
OctoPack consists of data, evals & models relating to Code LLMs that follow human instructions.
|
54 |
+
|
55 |
+
- [Paper](https://arxiv.org/abs/2308.07124): Research paper with details about all components of OctoPack.
|
56 |
+
- [GitHub](https://github.com/bigcode-project/octopack): All code used for the creation of OctoPack.
|
57 |
+
- [CommitPack](https://huggingface.co/datasets/bigcode/commitpack): 4TB of Git commits.
|
58 |
+
- [Am I in the CommitPack](https://huggingface.co/spaces/bigcode/in-the-commitpack): Check if your code is in the CommitPack.
|
59 |
+
- [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft): 2GB of high-quality Git commits that resemble instructions.
|
60 |
+
- [HumanEvalPack](https://huggingface.co/datasets/bigcode/humanevalpack): Benchmark for Code Fixing/Explaining/Synthesizing across Python/JavaScript/Java/Go/C++/Rust.
|
61 |
+
- [OctoCoder](https://huggingface.co/bigcode/octocoder): Instruction tuned model of StarCoder by training on CommitPackFT.
|
62 |
+
- [OctoCoder Demo](https://huggingface.co/spaces/bigcode/OctoCoder-Demo): Play with OctoCoder.
|
63 |
+
- [OctoGeeX](https://huggingface.co/bigcode/octogeex): Instruction tuned model of CodeGeeX2 by training on CommitPackFT.
|
64 |
|
65 |
---
|
66 |
|