Model Card for Reviewer-14B

Model Details

Model Description

Reviewer-14B is a fine-tuned on DeepSeek-R1-Distill-Qwen-14B, optimized for selecting the best patch among multiple patches generated by our DARS agent while solving software engineering problems.

Model Sources

How to Get Started with the Model

We use vLLM to deploy and infer the model. Please follow this tutorial here to use our LoRA weights with vLLM.

Training Details

Dataset

We use our code review dataset where each instance contains several git patches with critiques for each each patch. The model learns to generate critiques for multiple patches and select the best patch.

Training Procedure

Hyperparameter Value
Training regime BF16 mixed precision
Optimizer AdamW with cosine learning rate scheduler
LoRA Configuration rank=8, alpha=32, dropout=0.1
Batch Size 48
Learning Rate 5e-6
Sequence Length 14K tokens
Fine-tuning Epochs 1
Compute Environment DeepSpeed for memory-efficient distributed training
Compute Infrastructure 8x H100

We use training script provided in Qwen-2.5 codebase.

Results

Using this model as a reviewer with DARS trajectories generated using Claude 3.5 Sonnet V2 achieves 41.7% on SWE-Bench Lite.

Downloads last month
2
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for AGENTDARS/Reviewer-14B

Adapter
(8)
this model