--- base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B library_name: peft --- # Model Card for Reviewer-14B ## Model Details ### Model Description Reviewer-14B is a fine-tuned on [**DeepSeek-R1-Distill-Qwen-14B**](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), optimized for selecting the best patch among multiple patches generated by our DARS agent while solving software engineering problems. ### Model Sources - **Repository:** [DARS-14B Repository](https://github.com/darsagent/DARS-Agent) - **Paper:** ["DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal"](https://drive.google.com/file/d/1DMAZ-fkirC8uKl8819cOq9J3BQ4E7GXR/view?usp=drive_link) ## How to Get Started with the Model We use vLLM to deploy and infer the model. Please follow this tutorial [here]((https://docs.vllm.ai/en/latest/features/lora.html)) to use our LoRA weights with vLLM. ## Training Details ### Dataset We use our [code review dataset](https://huggingface.co/datasets/AGENTDARS/generated-critiques) where each instance contains several git patches with critiques for each each patch. The model learns to generate critiques for multiple patches and select the best patch. ### Training Procedure | Hyperparameter | Value | |----------------------|--------------------------------------------| | Training regime | BF16 mixed precision | | Optimizer | AdamW with cosine learning rate scheduler | | LoRA Configuration | rank=8, alpha=32, dropout=0.1 | | Batch Size | 48 | | Learning Rate | 5e-6 | | Sequence Length | 14K tokens | | Fine-tuning Epochs | 1 | | Compute Environment | DeepSpeed for memory-efficient distributed training | | Compute Infrastructure | 8x H100 | We use training script provided in [Qwen-2.5 codebase](https://github.com/QwenLM/Qwen2.5-Coder). ## Results Using this model as a reviewer with DARS trajectories generated using Claude 3.5 Sonnet V2 achieves 41.7% on SWE-Bench Lite.