metadata

base_model:
  - Qwen/Qwen2.5-7B-Instruct
datasets:
  - liuwenhan/reasonrank_data_sft
  - liuwenhan/reasonrank_data_rl
  - liuwenhan/reasonrank_data_13k
language:
  - en
license: mit
pipeline_tag: text-ranking
library_name: transformers
tags:
  - qwen
  - reranker
  - passage-ranking

ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability

🤗 reasonrank-7B ｜ 🤗 reasonrank-32B

🤗 reasonrank_data_13k ｜ 🤗 reasonrank_data_sft ｜ 🤗 reasonrank_data_rl

If you like our project, please give us a star ⭐ on GitHub.

📣 Latest News

[Aug 9, 2025]: 🏆 Our ReasonRank (32B) has achieved SOTA performance 40.8 on BRIGHT leaderboard!
[Aug 9, 2025]: 📄 We uploaded our paper to the arXiv and Hugging Face.
[Aug 9, 2025]: 🔥 We released our 🤗full reasonrank training data (13k), 🤗cold-start SFT data and 🤗RL data.
[Aug 9, 2025]: 🔥 We released our reasoning-intensive reranker 🤗reasonrank-7B and 🤗reasonrank-32B.
[Aug 9, 2025]: 🚀 We released our full codebase, including inference, SFT training, and RL training.

1. ReasonRank

💡 1.1 Overview

ReasonRank is a reasoning-intensive passage reranker tailored for reasoning-intensive ranking tasks. To train it, we first design an automated reasoning-intensive training data synthesis framework and synthesize 1.3k high-quality training data.

Based on the training data, we design a two-stage training approach including cold-start SFT and multi-view ranking reward RL to inject listwise ranking ability to our ReasonRank.

📊 1.2 Overall Performance

When using ReasonIR as initial passage retriever, our ReasonRank demonstrates strong overall ranking performance on BRIGHT benchmark, while showing superior efficiency compared with pointwise reasoning-intensive reranker Rank1.

Besides, when using a higher-quality retrieval results (RaDeR + BM25 hybrid, provided by RaDeR), our ReasonRank (32B) achieves SOTA performance 40.8 on BRIGHT leaderboard.

📂 2. The Introduction of ReasonRank Training Data

An important contribution of our work is our reasoning-intensive training data (reasonrank_data_13k). The dataset fields of training_data_all.jsonl are as follows:

Dataset Fields & Descriptions

dataset (str)
- The dataset name of each piece of data (e.g., "math-qa").
qid (str)
- The query ID. The content is provided in id_query/ directory.
initial_list (List[str])
- The initial list of passage IDs before DeepSeek-R1 reranking. The content of each passage ID is provided in id_doc/ directory.
final_list (List[str])
- The re-ranked list of passage IDs after listwisely reranking with DeepSeek-R1.
- Reflects the improved ranking based on reasoning-enhanced relevance scoring.
reasoning (str)
- A step-by-step reasoning chain outputted by DeepSeek-R1 while performing the listwise reranking.
relevant_docids (List[str])
- The ids of relevant passages in initial_list mined by DeepSeek-R1. The remaining passage ids in initial_list are irrelevant ones.
- Note that relevant_docids are not necessarily ranked at the top of final_list by the DeepSeek-R1, which may stem from inconsistencies in DeepSeek-R1’s judgments. To address this, you can apply the self-consistency data filtering technique proposed in our paper to select higher-quality data.

The statistics of dataset is shown in the figure below:

Example Entry

{
  "dataset": "math-qa",
  "qid": "math_1001",
  "initial_list": ["math_test_intermediate_algebra_808", "math_train_intermediate_algebra_1471", ...],
  "final_list": ["math_test_intermediate_algebra_808", "math_test_intermediate_algebra_1678", ...],
  "reasoning": "Okay, I need to rank the 20 passages based on their relevance...",
  "relevant_docids": ["math_test_intermediate_algebra_808", "math_train_intermediate_algebra_1471", "math_train_intermediate_algebra_993"]
}

Application

Training passage reranker: Given the reranked passage list, one can use our data to train a listwise reranker
Training passage retriever: Using the relevant_docids and the remaining irrelevant ids, one can train a passage retriever.

⚡ 3. Quick Start

This section provides a general guide on how to use ReasonRank for inference. For detailed environment setup, specific inference commands (including usage with ReasonIR or custom retrieval results), and in-depth training procedures (Cold-Start SFT, Multi-reward ranking RL), please refer to the official GitHub repository.

Sample Usage

This model can be loaded and used with the transformers library. Below is a basic example demonstrating how to use the model for passage re-ranking. The model expects a specific chat-like format for input, including a system prompt and a user query with listed passages.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "liuwenhan/reasonrank-7B" # Or "liuwenhan/reasonrank-32B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
).eval()

# System prompt as used in the paper for reasoning-intensive ranking
system_prompt = (
    "You are a helpful and harmless AI assistant. You will be provided with a search query and a list of passages, "
    "and your task is to re-rank the passages based on their relevance to the query. "
    "You should follow a chain of thought to determine the most relevant passages. "
    "Your final answer should be a list of the re-ranked passage numbers, separated by commas. "
    "Do not include any other information or explanation in your final answer."
)

query = "What is the capital of France?"
passages = [
    "Paris is the capital and most populous city of France.",
    "The Eiffel Tower is a famous landmark in Paris.",
    "France is a country located in Western Europe.",
    "London is the capital of the United Kingdom."
]

# Construct the user message with query and passages
user_content = f"Search Query: {query}
"
for i, passage in enumerate(passages):
    user_content += f"[{i+1}] {passage}
"
user_content += "Please re-rank the passages based on their relevance to the query. Provide a chain of thought and then the final re-ranked list."

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_content}
]

# Apply chat template
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Tokenize input
input_ids = tokenizer.encode(input_text, return_tensors="pt").to(model.device)

# Generate response
output_ids = model.generate(
    input_ids,
    max_new_tokens=256, # Adjust as needed for reasoning length
    do_sample=False,    # Typically deterministic for ranking/reasoning
    temperature=0.1,    # Low temperature for focused output
    repetition_penalty=1.05,
    eos_token_id=tokenizer.eos_token_id
)

# Decode output
generated_text = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)
print(generated_text)

Citation

If you find this work helpful, please cite our papers:

@misc{liu2025reasonrankempoweringpassageranking,
      title={ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability}, 
      author={Wenhan Liu and Xinyu Ma and Weiwei Sun and Yutao Zhu and Yuchen Li and Dawei Yin and Zhicheng Dou},
      year={2025},
      eprint={2508.07050},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2508.07050}, 
}

🤝 Acknowledge

The inference codes and training implementation build upon RankLLM, Llama Factory and verl. Our work is based on the Qwen2.5 model series, and we sincerely thank the Qwen team for their outstanding contributions to the open-source community.

liuwenhan
/

reasonrank-7B