Model Card for Reviewer-14B

Model Details

Model Description

Reviewer-14B is a fine-tuned on DeepSeek-R1-Distill-Qwen-14B, optimized for selecting the best patch among multiple patches generated by our DARS agent while solving software engineering problems.

Model Sources

Repository: DARS-14B Repository
Paper: "DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal"

How to Get Started with the Model

We use vLLM to deploy and infer the model. Please follow this tutorial here to use our LoRA weights with vLLM.

Training Details

Dataset

We use our code review dataset where each instance contains several git patches with critiques for each each patch. The model learns to generate critiques for multiple patches and select the best patch.

Training Procedure

Hyperparameter	Value
Training regime	BF16 mixed precision
Optimizer	AdamW with cosine learning rate scheduler
LoRA Configuration	rank=8, alpha=32, dropout=0.1
Batch Size	48
Learning Rate	5e-6
Sequence Length	14K tokens
Fine-tuning Epochs	1
Compute Environment	DeepSpeed for memory-efficient distributed training
Compute Infrastructure	8x H100

We use training script provided in Qwen-2.5 codebase.

Results

Using this model as a reviewer with DARS trajectories generated using Claude 3.5 Sonnet V2 achieves 41.7% on SWE-Bench Lite.

AGENTDARS
/

Reviewer-14B