File size: 6,420 Bytes
294d66f 8c912e1 255e99e 294d66f 255e99e b66233e 255e99e a29bd8d 255e99e 7c06d7a 255e99e a8a435e 511f677 255e99e 385e3a2 255e99e 385e3a2 18f9b3d 255e99e 8699557 255e99e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
---
license: [llama2, other]
datasets:
- cerebras/SlimPajama-627B
language:
- en
pipeline_tag: text-generation
tags:
- Deci AI
- DeciLM
model-index:
- name: DeciLM 6B
results:
- task:
type: text-generation
dataset:
type: ai2/arc
name: ai2_arc
metrics:
- name: ARC Challenge
type: ARC Challenge
value: 42.06
verified: false
- task:
type: text-generation
dataset:
type: ai2/arc
name: ai2_arc
metrics:
- name: ARC Easy
type: ARC Easy
value: 70.02
verified: false
- task:
type: text-generation
dataset:
type: boolq
name: boolq
metrics:
- name: BoolQ
type: BoolQ
value: 71.01
verified: false
- task:
type: text-generation
dataset:
type: hellaswag
name: hellaswag
metrics:
- name: HellaSwag
type: HellaSwag
value: 74.58
verified: false
- task:
type: text-generation
dataset:
type: LAMBDA
name: OpenAI LAMBDA
metrics:
- name: LAMBDA
type: LAMBDA
value: 69.78
verified: false
- task:
type: text-generation
dataset:
type: OpenBookQA
name: openbookqa
metrics:
- name: OpenBookQA
type: OpenBookQA
value: 34
verified: false
- task:
type: text-generation
dataset:
type: PIQA
name: piqa
metrics:
- name: PIQA
type: PIQA
value: 77.09
verified: false
- task:
type: text-generation
dataset:
type: truthful_qa
name: truthful_qa
metrics:
- name: TruthfulQA
type: TruthfulQA
value: 36.19
verified: false
- task:
type: text-generation
dataset:
type: winogrande
name: winogrande
metrics:
- name: Winogrande
type: Winogrande
value: 68.03
verified: false
---
# DeciLM 6B
DeciLM 6B is a 5.7 billion parameter decoder-only text generation model. With a context window of 4096 tokens, the highly efficient model uses variable Grouped-Query Attention (GQA) to achieve an optimal balance between performance and computational efficiency. The model's architecture was generated using Deci's proprietary Neural Architecture Search-based technology, AutoNAC.
## Model Details
### Model Description
Deci developed and publically released the DeciLM 6B large language model, a pretrained, high-efficiency generative text model with 5.7 billion parameters. DeciLM 6B outpaces pretrained models in its class, with a throughput that's up to 15 times that of Llama 2 7B's. DeciLM-6B was further fine-tuned using [LoRA ](https://arxiv.org/pdf/2106.09685.pdf) for instruction following on a subset of the OpenOrca dataset, creating [DeciLM 6B-Instruct](https://huggingface.co/Deci/DeciLM-6b-instruct)
- **Developed by:** Deci
- **Model type:** DeciLM is an auto-regressive language model using an optimized transformer decoder architecture that includes variable Grouped-Query Attention.
- **Language(s) (NLP):** English
- **License:** [Llama 2 Community License Agreement](https://huggingface.co/Deci/DeciLM-6b/blob/main/LICENSE.md) with an extention of Deci regarding hosting service providers.
## Model Architecture
| Parameters | Layers | Heads | Sequence Length | GQA num_key_value_heads* | Hidden Size |
|:----------|:----------|:----------|:----------|:----------|:----------|
| 5.7B | 32 | 32 | 4096 | Variable | 4096 | |
*AutoNAC was employed to optimize the selection of the GQA num_key_value_heads for each layer of the model.
- **Decoder layer:** Varible Grouped Query Attention. Grouped Query Attention (GQA) was introduced in [Ainslie et al., 2023](https://arxiv.org/abs/2305.13245)
- **Position Embeddings:** Dynamic NTK Scaling Rotary Position Embeddings [Su et al., 2021](https://arxiv.org/abs/2104.09864)
### Model Sources
- **Paper:** [DeciLM Technical Blog](https://deci.ai/blog/decilm-15-times-faster-than-llama2-nas-generated-llm-with-variable-gqa/?utm_campaign=repos&utm_source=hugging-face&utm_medium=model-card&utm_content=decilm-6b)
- **Demo:** [DeciLM 6B Instruct Demo](https://huggingface.co/spaces/Deci/DeciLM-6b-instruct)
- **Notebook:** [DeciLM 6B Notebook](https://colab.research.google.com/drive/1LugJCifOv0L426ukRHjOblBRWwUImAit)
## Uses
The model is intended for commercial and research use in English and can be fine-tuned for use in other languages.
## How to Get Started with the Model
Use the code below to get started with the model.
```bibtex
# pip install -q transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "Deci/DeciLM-6b"
device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype=torch.bfloat16, trust_remote_code=True).to(device)
inputs = tokenizer.encode("In a shocking finding, scientists discovered a herd of unicorns living in", return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=100, do_sample=True, top_p=0.95)
print(tokenizer.decode(outputs[0]))
```
## Training Details
DeciLM 6B underwent training utilizing a subset of the SlimPajamas dataset, leveraging advanced proprietary methodologies allowing for fast training.
## Evaluation
Below are DeciLM's 6B evaluation results.
| Average | ARC Challenge* | ARC Easy* | BoolQ | HellaSwag* | LAMBDA OpenAI | OpenBookQA | PIQA | TruthfulQA | Winogrande |
|:----------|:----------|:----------|:----------|:----------|:----------|:----------|:----------|:----------|:----------|
| 60.33 | 42.06 | 70.02 | 71.01 | 74.58 | 69.78 | 34 | 77.09 |36.19 | 68.03 |
Accuracy-norm score*
### Runtime Benchmarks
|Inference Tool/Hardware | A10 (tokens/sec) |
|:----------|:----------|
| PyTorch | 652.49 |
| Infery LLM | 2,029.6 |
- Throughput (tokens/sec) - Measured with optimal batch - PyTorch BS 64, Infery LLM BS 128
- In order to replicate the results of the PyTorch benchmark, use this [code example](https://huggingface.co/Deci/DeciLM-6b/blob/main/hf_benchmark_example.py)
## How to Cite
Please cite this model using this format.
```bibtex
@misc{DeciFoundationModels,
title = {DeciLM 6B},
author = {DeciAI Research Team},
year = {2023}
url={[https://huggingface.co/Deci/DeciLM-6b](https://huggingface.co/Deci/DeciLM-6b)},
}
``` |