Update README.md
Browse files
README.md
CHANGED
@@ -97,7 +97,6 @@ model-index:
|
|
97 |
| Replit Code V1.5 | 3B | 23.0% | 25.9%| 26.2% | 23.6%| 23.2%| 21.5%|
|
98 |
| Deci Coder | 1B | 19.1% | 6.8% | 18.4% | 16.7%| 2.1% | 1.7% |
|
99 |
|
100 |
-
|
101 |
**Key Features**
|
102 |
* Fill in Middle Capability (FIM)
|
103 |
* Supports Long Context, trained with Sequences upto 16,384
|
@@ -207,6 +206,26 @@ The model is a decoder-only transformer similar to the LLaMA ([Touvron et al., 2
|
|
207 |
|
208 |
The dataset is comprised of a filtered mixture of open-source large-scale datasets available on the [HuggingFace Hub](https://huggingface.co/datasets): Falcon RefinedWeb extract ([Penedo et al., 2023](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)), along with [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft) and [Github Issues](https://huggingface.co/datasets/bigcode/the-stack-github-issues) (BigCode., 2023), and StarCoder ([Li et al., 2023](https://arxiv.org/abs/2305.06161)). We further supplement our training with data from mathematical domains ([Azerbayev, Zhangir, et al., 2023](https://arxiv.org/abs/2310.10631) and, [Yu, Longhui, et al., 2023](https://arxiv.org/abs/2309.12284)).
|
209 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
210 |
### Training Procedure
|
211 |
|
212 |
The model is pre-trained on the aforementioned datasets in `bfloat16` precision, optimized with AdamW.
|
|
|
97 |
| Replit Code V1.5 | 3B | 23.0% | 25.9%| 26.2% | 23.6%| 23.2%| 21.5%|
|
98 |
| Deci Coder | 1B | 19.1% | 6.8% | 18.4% | 16.7%| 2.1% | 1.7% |
|
99 |
|
|
|
100 |
**Key Features**
|
101 |
* Fill in Middle Capability (FIM)
|
102 |
* Supports Long Context, trained with Sequences upto 16,384
|
|
|
206 |
|
207 |
The dataset is comprised of a filtered mixture of open-source large-scale datasets available on the [HuggingFace Hub](https://huggingface.co/datasets): Falcon RefinedWeb extract ([Penedo et al., 2023](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)), along with [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft) and [Github Issues](https://huggingface.co/datasets/bigcode/the-stack-github-issues) (BigCode., 2023), and StarCoder ([Li et al., 2023](https://arxiv.org/abs/2305.06161)). We further supplement our training with data from mathematical domains ([Azerbayev, Zhangir, et al., 2023](https://arxiv.org/abs/2310.10631) and, [Yu, Longhui, et al., 2023](https://arxiv.org/abs/2309.12284)).
|
208 |
|
209 |
+
Top 18 programming languages trained on:
|
210 |
+
- C
|
211 |
+
- CPP
|
212 |
+
- Java
|
213 |
+
- JavaScript
|
214 |
+
- CSS
|
215 |
+
- Go
|
216 |
+
- HTML
|
217 |
+
- Ruby
|
218 |
+
- Rust
|
219 |
+
- Markdown
|
220 |
+
- Shell
|
221 |
+
- Php
|
222 |
+
- Sql
|
223 |
+
- R
|
224 |
+
- Typescript
|
225 |
+
- Python
|
226 |
+
- Jupyter-Clean
|
227 |
+
- RestructuredText
|
228 |
+
|
229 |
### Training Procedure
|
230 |
|
231 |
The model is pre-trained on the aforementioned datasets in `bfloat16` precision, optimized with AdamW.
|