README

Model Summary

This is a instruction-tuned version of the Starcoder2-3B model. It has been trained using the same repository and dataset used for Starcoder2-15B. It uses the same prompt generation technique as the Starcoder2-15B mode. So, it can be used as a drop in replacement by just changing the model path.

Intended Use

Running code language models locally. This model can easily run on:

  • 8 GB and 10 GB VRAM machines with FP16
  • 6 GB VRAM machines with INT8
  • 4 GB VRAM machines with INT4

Example

Using FP16

import transformers
import torch

pipeline = transformers.pipeline(
    model="outputs_starcoder3b_4e",
    task="text-generation",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

def respond(instruction: str, response_prefix: str) -> str:
    messages = [{"role": "user", "content": instruction}]
    prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False)
    prompt += response_prefix

    teminators = [
        pipeline.tokenizer.eos_token_id,
        pipeline.tokenizer.convert_tokens_to_ids("###"),
    ]

    result = pipeline(
        prompt,
        max_length=1024,
        num_return_sequences=1,
        do_sample=False,
        eos_token_id=teminators,
        pad_token_id=pipeline.tokenizer.eos_token_id,
        truncation=True,
    )
    response = response_prefix + result[0]["generated_text"][len(prompt) :].split("###")[0].rstrip()
    return response


instruction = "Write the Transformer encoder in PyTorch."
response_prefix = ""

print(respond(instruction, response_prefix))

Output:

```python
import torch
import torch.nn as nn

class TransformerEncoder(nn.Module):
    def __init__(self, d_model, nhead, num_layers, dim_feedforward=2048, dropout=0.1):
        super(TransformerEncoder, self).__init__()
        self.encoder_layer = nn.TransformerEncoderLayer(d_model, nhead, dim_feedforward, dropout)
        self.transformer_encoder = nn.TransformerEncoder(self.encoder_layer, num_layers)

    def forward(self, src):
        return self.transformer_encoder(src)
```

Training

  • 4 epochs
  • Training type: Full fine tuning
  • Training time: ~4 hours
  • Batch size: 2
  • Gradient accumulation step: 256
  • Sequence length: 1280

Exact Training Command Used

See the repository for setup details.

MODEL_KEY=bigcode/starcoder2-3b
LR=1e-5
EPOCH=4
SEQ_LEN=1280
WARMUP_RATIO=0.05
OUTPUT_DIR=outputs_starcoder3b_4e
DATASET_FILE=train_data.jsonl
accelerate launch -m star_align.train \
    --model_key $MODEL_KEY \
    --model_name_or_path $MODEL_KEY \
    --use_flash_attention True \
    --datafile_paths $DATASET_FILE \
    --output_dir $OUTPUT_DIR \
    --bf16 True \
    --num_train_epochs $EPOCH \
    --max_training_seq_length $SEQ_LEN \
    --pad_to_max_length False \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 256 \
    --group_by_length False \
    --ddp_find_unused_parameters False \
    --logging_steps 1 \
    --log_level info \
    --optim adafactor \
    --max_grad_norm -1 \
    --warmup_ratio $WARMUP_RATIO \
    --learning_rate $LR \
    --lr_scheduler_type linear \
    --attention_dropout 0.0 \
    --residual_dropout 0.0 \
    --embedding_dropout 0.0

Hardware

  • 40 GB NVIDIA A100

Attributions

License

The model is licensed under the BigCode OpenRAIL-M v1 license agreement. You can find the full agreement here.

Downloads last month
7
Safetensors
Model size
3.03B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train sovitrath/starcoder2-3b-instruct