---
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
library_name: peft
---

# Model Card for Reviewer-14B

## Model Details

### Model Description

Reviewer-14B is a fine-tuned on [**DeepSeek-R1-Distill-Qwen-14B**](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), optimized for selecting the best patch among multiple patches generated by our DARS agent while solving software engineering problems.

### Model Sources

- **Repository:** [DARS-14B Repository](https://github.com/darsagent/DARS-Agent)
- **Paper:** ["DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal"](https://drive.google.com/file/d/1DMAZ-fkirC8uKl8819cOq9J3BQ4E7GXR/view?usp=drive_link)

## How to Get Started with the Model
We use vLLM to deploy and infer the model. Please follow this tutorial [here]((https://docs.vllm.ai/en/latest/features/lora.html)) to use our LoRA weights with vLLM.

## Training Details

### Dataset

We use our [code review dataset](https://huggingface.co/datasets/AGENTDARS/generated-critiques) where each instance contains several git patches with critiques for each each patch. The model learns to generate critiques for multiple patches and select the best patch.

### Training Procedure

| Hyperparameter       | Value                                      |
|----------------------|--------------------------------------------|
| Training regime      | BF16 mixed precision                       |
| Optimizer            | AdamW with cosine learning rate scheduler  |
| LoRA Configuration   | rank=8, alpha=32, dropout=0.1              |
| Batch Size           | 48                           |
| Learning Rate        | 5e-6                                       |
| Sequence Length      | 14K tokens                                 |
| Fine-tuning Epochs   | 1                                          |
| Compute Environment  | DeepSpeed for memory-efficient distributed training |
| Compute Infrastructure  | 8x H100 |

We use training script provided in [Qwen-2.5 codebase](https://github.com/QwenLM/Qwen2.5-Coder).

## Results
Using this model as a reviewer with DARS trajectories generated using Claude 3.5 Sonnet V2 achieves 41.7% on SWE-Bench Lite.