Model Card for Reviewer-14B
Model Details
Model Description
Reviewer-14B is a fine-tuned on DeepSeek-R1-Distill-Qwen-14B, optimized for selecting the best patch among multiple patches generated by our DARS agent while solving software engineering problems.
Model Sources
- Repository: DARS-14B Repository
- Paper: "DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal"
How to Get Started with the Model
We use vLLM to deploy and infer the model. Please follow this tutorial here to use our LoRA weights with vLLM.
Training Details
Dataset
We use our code review dataset where each instance contains several git patches with critiques for each each patch. The model learns to generate critiques for multiple patches and select the best patch.
Training Procedure
Hyperparameter | Value |
---|---|
Training regime | BF16 mixed precision |
Optimizer | AdamW with cosine learning rate scheduler |
LoRA Configuration | rank=8, alpha=32, dropout=0.1 |
Batch Size | 48 |
Learning Rate | 5e-6 |
Sequence Length | 14K tokens |
Fine-tuning Epochs | 1 |
Compute Environment | DeepSpeed for memory-efficient distributed training |
Compute Infrastructure | 8x H100 |
We use training script provided in Qwen-2.5 codebase.
Results
Using this model as a reviewer with DARS trajectories generated using Claude 3.5 Sonnet V2 achieves 41.7% on SWE-Bench Lite.
- Downloads last month
- 2
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.
Model tree for AGENTDARS/Reviewer-14B
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-14B