File size: 5,704 Bytes
20cdc2b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 |
---
pipeline_tag: text-generation
license: apache-2.0
tags:
- text generation
- Deci AI
- DeciCoder
programming_language:
- Java
- JavaScript
- Python
metrics:
- code_eval
inference: true
widget:
- text: 'def print_hello_world():'
example_title: Hello world
group: Python
model-index:
- name: DeciCoder-1b
results:
- task:
type: text-generation
dataset:
type: nuprl/MultiPL-E
name: MultiPL-HumanEval (Python)
metrics:
- name: pass@1
type: pass@1
value: 0.191
verified: false
- task:
type: text-generation
dataset:
type: nuprl/MultiPL-E
name: MultiPL-HumanEval (JavaScript)
metrics:
- name: pass@1
type: pass@1
value: 0.184
verified: false
- task:
type: text-generation
dataset:
type: nuprl/MultiPL-E
name: MultiPL-HumanEval (Java)
metrics:
- name: pass@1
type: pass@1
value: 0.166
verified: false
datasets:
- bigcode/starcoderdata
---
# Model Card for DeciCoder 1B
DeciCoder 1B is a 1 billion parameter decoder-only code completion model
trained on the Python, Java, and Javascript subsets of [Starcoder Training Dataset](https://huggingface.co/datasets/bigcode/starcoderdata).
The model uses Grouped Query Attention and has a context window of 2048
tokens. It was trained using a Fill-in-the-Middle training objective. The model's
architecture was generated by Deci's proprietary Neural Architecture
Search-based technology, AutoNAC.
## Model Details
- **Developed by:** Deci
- **Model type:** DeciCoder is an auto-regressive language model based on the transformer decoder architecture, using Grouped Query Attention.
- **Language(s):** Python, Java, JavaScript
- **License:** Model checkpoints are licensed under the [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
## Model Architecture
| Parameters | Layers | Heads | Sequence Length | GQA num_key_value_heads | Hidden Size |
|:----------|:----------|:----------|:----------|:----------|:----------|
| 1.1B | 20 | 32 | 2048 | 4 | 2048 | |
- **Decoder layer:** Grouped Query Attention [Ainslie et al., 2023](https://arxiv.org/abs/2305.13245)
- **Position Embeddings:** Rotary Position Embeddings [Su et al., 2021](https://arxiv.org/abs/2104.09864)
## Uses
The model is intended to do single/multiline code completion from a
context window of up to 2048k tokens. It is *not* an instruction model
and commands like \"Write a function that computes the absolute value of
an integer,\" won't yield the desired results. A more effective approach
is to frame instructions in the style of source code comments (e.g. \#
this function calculates the absolute value of an integer) or to present
a function signature and docstring, enabling the model to complete the
function's body.
### How to Use
```bibtex
# pip install -q transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "Deci/DeciCoder-1b"
device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype=torch.bfloat16, trust_remote_code=True).to(device)
inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
```
### Attribution
DeciCoder was trained on StarCoder Training Dataset, filtered for
Python, Java, and Javascript code. For additional information, please
refer to [https://huggingface.co/datasets/bigcode/starcoderdata](https://huggingface.co/datasets/bigcode/starcoderdata).
### Limitations
The model has undergone training with source code from Python, Java, and
JavaScript. While the primary language in the source is English, it does
contain other languages. Therefore, the model can produce code snippets
given some context. However, there\'s no assurance that the resulting
code will function as expected. It might be suboptimal, contain bugs, or
even exploits.
## Training Details
### Training Data
DeciCoder was trained on the Python, Java, and Javascript subsets of [Starcoder Training Dataset](https://huggingface.co/datasets/bigcode/starcoderdata)
### Training Procedure
- **Warm-Up Steps**: 9000
- **Total Training Steps**: 284k
- **Total Tokens**: 446B
- **Global Batch Size**: 768
- **Optimizer**: AdamW
- **Optimizer Parameters**: beta1=0.9, beta2=0.95
- **Weight Decay**: 0.1
- **Learning Rate**: 4e-4
- **Learning Rate Schedule**: cosine
## Evaluation
Below are DeciCoder's pass@1 on MultiPL HumanEval scores
| Python | JavaScript | Java |
|:----------|:----------|:----------|
| 19.1% | 18.4% | 16.6% |
### Runtime Benchmarks
|Inference Tool/Hardware | A10 (tokens/sec) |A100 (tokens/sec) |
|:----------|:----------|:----------|
| PyTorch | 1,364.2 | 3,244.4 |
| Infery LLM | 3,889.3 | 11,676.8 |
- Throughput (tokens/sec) - Measured with optimal batch size per hardware - A10 on BS 128, A100 on BS 512
## Documentation
- [Notebook](https://colab.research.google.com/drive/1JCxvBsWCZKHfIcHSMVf7GZCs3ClMQPjs)
- Blog post: [Introducing DeciCoder: The New Gold Standard in Efficient and Accurate Code Generation](https://deci.ai/blog/decicoder-efficient-and-accurate-code-generation-llm/)
- Questions:Feel free to contact us via our [Discord Community!](https://discord.com/invite/p9ecgRhDR8/)
## How to Cite
Please cite this model using this format.
```bibtex
@misc{DeciFoundationModels,
title = {DeciCoder},
author = {DeciAI Research Team},
year = {2023}
url={[https://huggingface.co/deci/decicoder-1b](https://huggingface.co/deci/decicoder-1b)},
}
``` |