|
--- |
|
license: apache-2.0 |
|
tags: |
|
- code |
|
--- |
|
# Fine-tuned Qwen2.5-Coder-7B for Function Writing |
|
|
|
## Model Description |
|
|
|
This model is a fine-tuned version of Qwen2.5-Coder-7B, specifically optimized for function writing tasks. The base model Qwen2.5-Coder-7B is part of the Qwen2.5-Coder family, which was trained on 5.5 trillion tokens including source code, text-code grounding, and synthetic data. |
|
|
|
### Base Model Details |
|
|
|
* **Type**: Causal Language Model |
|
* **Architecture**: Transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias |
|
* **Parameters**: 7.61B (6.53B Non-Embedding) |
|
* **Layers**: 28 |
|
* **Attention Heads**: 28 for Q and 4 for KV |
|
* **Context Length**: Up to 131,072 tokens |
|
|
|
## Fine-tuning Specifications |
|
|
|
The model was fine-tuned using LoRA (Low-Rank Adaptation) with the following configuration: |
|
|
|
### Training Parameters |
|
|
|
* **Training Data**: 30,000 examples |
|
* **Batch Size**: 1 per device |
|
* **Gradient Accumulation Steps**: 24 |
|
* **Learning Rate**: 1e-6 |
|
* **Number of Epochs**: 2 |
|
* **Warmup Ratio**: 0.05 |
|
* **Maximum Sequence Length**: 4,096 tokens |
|
* **Weight Decay**: 0.01 |
|
* **Maximum Gradient Norm**: 0.5 |
|
* **Learning Rate Scheduler**: Cosine |
|
|
|
### LoRA Configuration |
|
|
|
* **Rank (r)**: 32 |
|
* **Alpha**: 32 |
|
* **Dropout**: 0.05 |
|
* **Target Modules**: q_proj, v_proj, o_proj, gate_proj, up_proj |
|
* **Training Mode**: BF16 mixed precision |
|
* **RS-LoRA**: Enabled |
|
|
|
### Training Infrastructure |
|
|
|
* **Quantization**: 4-bit quantization (NF4) |
|
* **Attention Implementation**: Flash Attention 2 |
|
* **Memory Optimization**: Gradient checkpointing enabled |
|
|
|
## Usage |
|
|
|
This model is optimized for function writing tasks and can be loaded using the Hugging Face Transformers library. Here's a basic example: |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
# Load the model and tokenizer |
|
model = AutoModelForCausalLM.from_pretrained( |
|
"path_to_your_model", |
|
trust_remote_code=True, |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto" |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained( |
|
"path_to_your_model", |
|
trust_remote_code=True |
|
) |
|
|
|
# Generate text |
|
input_text = "Write a function that..." |
|
inputs = tokenizer(input_text, return_tensors="pt").to(model.device) |
|
outputs = model.generate(**inputs, max_new_tokens=500) |
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
``` |
|
|
|
## Limitations |
|
|
|
* The model is specifically fine-tuned for function writing tasks and may not perform optimally for general code generation or other tasks |
|
* Maximum context length during fine-tuning was limited to 4,096 tokens |
|
* While the base model supports up to 128K tokens, using beyond 4,096 tokens may require additional validation |
|
|
|
## License |
|
|
|
This model inherits the Apache 2.0 license from its base model Qwen2.5-Coder-7B. |
|
|
|
## Citation |
|
|
|
If you use this model, please cite both the original Qwen2.5-Coder paper and acknowledge the fine-tuning work: |
|
|
|
```bibtex |
|
@article{hui2024qwen2, |
|
title={Qwen2.5-Coder Technical Report}, |
|
author={Hui, Binyuan and Yang, Jian and Cui, Zeyu and Yang, Jiaxi and Liu, Dayiheng and Zhang, Lei and Liu, Tianyu and Zhang, Jiajun and Yu, Bowen and Dang, Kai and others}, |
|
journal={arXiv preprint arXiv:2409.12186}, |
|
year={2024} |
|
} |
|
``` |