Text Generation
Safetensors
qwen2
code
conversational
nielsr's picture
nielsr HF staff
Improve Model Card: Add library_name and abstract
639048a verified
|
raw
history blame
2.8 kB
metadata
base_model:
  - Qwen/Qwen2.5-72B
datasets:
  - internlm/SWE-Fixer-Eval
  - internlm/SWE-Fixer-Train-110K
license: mit
pipeline_tag: text-generation
tags:
  - code
library_name: transformers

SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution

πŸ“ƒ Paper

πŸš€ GitHub

Abstract

Large Language Models (LLMs) have demonstrated remarkable proficiency across a variety of complex tasks. One significant application of LLMs is in tackling software engineering challenges, particularly in resolving real-world tasks on GitHub by fixing code based on the issues reported by the users. However, many current approaches rely on proprietary LLMs, which limits reproducibility, accessibility, and transparency. The critical components of LLMs for addressing software engineering issues and how their capabilities can be effectively enhanced remain unclear. To address these challenges, we introduce SWE-Fixer, a novel open-source LLM designed to effectively and efficiently resolve GitHub issues. SWE-Fixer comprises two essential modules: a code file retrieval module and a code editing module. The retrieval module employs BM25 along with a lightweight LLM model to achieve coarse-to-fine file retrieval. Subsequently, the code editing module utilizes the other LLM model to generate patches for the identified files. Then, to mitigate the lack of publicly available datasets, we compile an extensive dataset that includes 110K GitHub issues along with their corresponding patches, and train the two modules of SWE-Fixer separately. We assess our approach on the SWE-Bench Lite and Verified benchmarks, achieving state-of-the-art performance among open-source models with scores of 23.3% and 30.2%, respectively. These outcomes highlight the efficacy of our approach. We will make our model, dataset, and code publicly available at https://github.com/InternLM/SWE-Fixer.

SWE-Fixer is a simple yet effective solution for addressing real-world GitHub issues by training open-source LLMs. It features a streamlined retrieve-then-edit pipeline with two core components: a code file retriever and a code editor.

This repo holds the SWE-Fixer-Editor-72B model, which is finetuned on the Qwen2.5-7B.

For more information, please visit our project page.

πŸ“š Citation

@article{xie2025swefixer,
  title={SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution}, 
  author={Xie, Chengxing and Li, Bowen and Gao, Chang and Du, He and Lam, Wai and Zou, Difan and Chen, Kai},
  journal={arXiv preprint arXiv:2501.05040},
  year={2025}
}