Fine-tuned Qwen2.5-Coder-7B for Function Writing

Model Description

This model is a fine-tuned version of Qwen2.5-Coder-7B, specifically optimized for function writing tasks. The base model Qwen2.5-Coder-7B is part of the Qwen2.5-Coder family, which was trained on 5.5 trillion tokens including source code, text-code grounding, and synthetic data.

Base Model Details

Type: Causal Language Model
Architecture: Transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
Parameters: 7.61B (6.53B Non-Embedding)
Layers: 28
Attention Heads: 28 for Q and 4 for KV
Context Length: Up to 131,072 tokens

Fine-tuning Specifications

The model was fine-tuned using LoRA (Low-Rank Adaptation) with the following configuration:

Training Parameters

Training Data: 30,000 examples
Batch Size: 1 per device
Gradient Accumulation Steps: 24
Learning Rate: 1e-6
Number of Epochs: 2
Warmup Ratio: 0.05
Maximum Sequence Length: 4,096 tokens
Weight Decay: 0.01
Maximum Gradient Norm: 0.5
Learning Rate Scheduler: Cosine

LoRA Configuration

Rank (r): 32
Alpha: 32
Dropout: 0.05
Target Modules: q_proj, v_proj, o_proj, gate_proj, up_proj
Training Mode: BF16 mixed precision
RS-LoRA: Enabled

Training Infrastructure

Quantization: 4-bit quantization (NF4)
Attention Implementation: Flash Attention 2
Memory Optimization: Gradient checkpointing enabled

Usage

This model is optimized for function writing tasks and can be loaded using the Hugging Face Transformers library. Here's a basic example:

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "path_to_your_model",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
    "path_to_your_model",
    trust_remote_code=True
)

# Generate text
input_text = "Write a function that..."
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=500)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Limitations

The model is specifically fine-tuned for function writing tasks and may not perform optimally for general code generation or other tasks
Maximum context length during fine-tuning was limited to 4,096 tokens
While the base model supports up to 128K tokens, using beyond 4,096 tokens may require additional validation

License

This model inherits the Apache 2.0 license from its base model Qwen2.5-Coder-7B.

Citation

If you use this model, please cite both the original Qwen2.5-Coder paper and acknowledge the fine-tuning work:

@article{hui2024qwen2,
      title={Qwen2.5-Coder Technical Report},
      author={Hui, Binyuan and Yang, Jian and Cui, Zeyu and Yang, Jiaxi and Liu, Dayiheng and Zhang, Lei and Liu, Tianyu and Zhang, Jiajun and Yu, Bowen and Dang, Kai and others},
      journal={arXiv preprint arXiv:2409.12186},
      year={2024}
}