|
--- |
|
license: mit |
|
datasets: |
|
- jhu-clsp/rank1-training-data |
|
base_model: |
|
- Qwen/Qwen2.5-7B |
|
pipeline_tag: text-generation |
|
tags: |
|
- reranker |
|
- retrieval |
|
language: |
|
- en |
|
--- |
|
|
|
# rank1-7b: Test-Time Compute for Reranking in Information Retrieval |
|
|
|
π [Paper](https://arxiv.org/abs/2502.18418) | π [GitHub Repository](https://github.com/orionw/rank1) |
|
|
|
rank1 is a reasoning reranker model that "thinks" before making relevance judgments. This 7B parameter model is trained from the Qwen2.5-7B base model and leverages test-time compute to generate reasoning chains before deciding if a document is relevant to a query. |
|
|
|
## Model Description |
|
|
|
rank1 introduces a novel approach to information retrieval by generating explicit reasoning chains before making relevance judgments. Unlike traditional rerankers that directly output scores, rank1: |
|
|
|
1. Receives a query and document pair |
|
2. Generates a reasoning chain within a `<think>...</think>` section |
|
3. Makes a binary relevance judgment (`true` or `false`) |
|
4. Returns a confidence score based on the logits of the true/false tokens |
|
|
|
This approach helps the model break down complex relevance decisions into logical steps, improving performance across diverse retrieval tasks. |
|
|
|
## Model Family |
|
|
|
| Model | Base | Description | |
|
|:------|:-----|:------------| |
|
| [rank1-7b](https://huggingface.co/jhu-clsp/rank1-7b) | Qwen2.5-7B | Current model (7B parameters) | |
|
| [rank1-14b](https://huggingface.co/jhu-clsp/rank1-14b) | Qwen2.5-14B | Larger variant (14B parameters) | |
|
| [rank1-32b](https://huggingface.co/jhu-clsp/rank1-32b) | Qwen2.532B | Largest variant (32B parameters) | |
|
| [rank1-mistral-2501-24b](https://huggingface.co/jhu-clsp/rank1-mistral-2501-24b) | Mistral-Small 2501 24B | Trained from Mistral base | |
|
| [rank1-llama3-8b](https://huggingface.co/jhu-clsp/rank1-llama3-8b) | Llama 3.1 8B | Trained from Llama 3.1 base | |
|
|
|
### Quantized Variants |
|
|
|
| Model | Description | |
|
|:------|:------------| |
|
| [rank1-7b-awq](https://huggingface.co/jhu-clsp/rank1-7b-awq) | Quantized version of rank1-7b | |
|
| [rank1-14b-awq](https://huggingface.co/jhu-clsp/rank1-14b-awq) | Quantized version of rank1-14b | |
|
| [rank1-32b-awq](https://huggingface.co/jhu-clsp/rank1-32b-awq) | Quantized version of rank1-32b | |
|
| [rank1-mistral-2501-24b-awq](https://huggingface.co/jhu-clsp/rank1-mistral-2501-24b-awq) | Quantized version of rank1-mistral-24b | |
|
| [rank1-llama3-8b-awq](https://huggingface.co/jhu-clsp/rank1-llama3-8b-awq) | Quantized version of rank1-llama3-8b | |
|
|
|
## Associated Data and Resources |
|
|
|
| Resource | Description | |
|
|:---------|:------------| |
|
| [rank1-r1-msmarco](https://huggingface.co/datasets/jhu-clsp/rank1-r1-msmarco) | All R1 output examples from MS MARCO | |
|
| [rank1-training-data](https://huggingface.co/datasets/jhu-clsp/rank1-training-data) | Training data used for rank1 models | |
|
| [rank1-run-files](https://huggingface.co/datasets/jhu-clsp/rank1-run-files) | Pre-computed run files for use in top 100 doc reranking | |
|
| [GitHub Repository](https://github.com/orionw/rank1) | Official rank1 repository | |
|
|
|
## Usage |
|
Note that official usage is found on the Github and accounts for edge cases. But for simple use cases the minimal example below works. |
|
|
|
<details> |
|
<summary>Click to expand: Minimal example with vLLM</summary> |
|
|
|
```python |
|
from vllm import LLM, SamplingParams |
|
import math |
|
|
|
# Initialize the model with vLLM |
|
model = LLM( |
|
model="jhu-clsp/rank1-7b", |
|
tensor_parallel_size=1, # Number of GPUs |
|
trust_remote_code=True, |
|
max_model_len=16000, # Context length |
|
gpu_memory_utilization=0.9, |
|
dtype="float16", |
|
) |
|
|
|
# Set up sampling parameters |
|
sampling_params = SamplingParams( |
|
temperature=0, |
|
max_tokens=8192, |
|
logprobs=20, |
|
stop=["</think> true", "</think> false"], |
|
skip_special_tokens=False |
|
) |
|
|
|
# Prepare the prompt |
|
def create_prompt(query, document): |
|
return ( |
|
"Determine if the following passage is relevant to the query. " |
|
"Answer only with 'true' or 'false'.\n" |
|
f"Query: {query}\n" |
|
f"Passage: {document}\n" |
|
"<think>" |
|
) |
|
|
|
# Example usage |
|
query = "What are the effects of climate change?" |
|
document = "Climate change leads to rising sea levels, extreme weather events, and disruptions to ecosystems. These effects are caused by increasing greenhouse gas concentrations in the atmosphere due to human activities." |
|
|
|
# Generate prediction |
|
prompt = create_prompt(query, document) |
|
outputs = model.generate([prompt], sampling_params) |
|
|
|
# Extract score |
|
output = outputs[0].outputs[0] |
|
text = output.text |
|
final_logits = output.logprobs[-1] |
|
|
|
# Get token IDs for "true" and "false" tokens |
|
from transformers import AutoTokenizer |
|
tokenizer = AutoTokenizer.from_pretrained("jhu-clsp/rank1-7b") |
|
true_token = tokenizer(" true", add_special_tokens=False).input_ids[0] |
|
false_token = tokenizer(" false", add_special_tokens=False).input_ids[0] |
|
|
|
# Calculate relevance score (probability of "true") |
|
true_logit = final_logits[true_token].logprob |
|
false_logit = final_logits[false_token].logprob |
|
true_score = math.exp(true_logit) |
|
false_score = math.exp(false_logit) |
|
relevance_score = true_score / (true_score + false_score) |
|
|
|
print(f"Reasoning chain: {text}") |
|
print(f"Relevance score: {relevance_score}") |
|
``` |
|
|
|
</details> |
|
|
|
## Performance |
|
|
|
rank1-7b demonstrates strong performance on retrieval benchmarks, particularly on tasks requiring complex reasoning. The model's ability to "think through" relevance decisions makes it especially effective for nuanced topics. |
|
|
|
For specific benchmark results and comparisons with other models, please refer to the paper and the official GitHub repository. |
|
|
|
## Installation |
|
|
|
Please see the Github for detailed installation instructions. |
|
|
|
## MTEB Integration |
|
|
|
rank1 is compatible with the [MTEB benchmarking framework](https://github.com/embeddings-benchmark/mteb): |
|
|
|
```python |
|
from mteb import MTEB |
|
from rank1 import rank1 # From the official repo |
|
|
|
# Initialize the model |
|
model = rank1( |
|
model_name_or_path="jhu-clsp/rank1-7b", |
|
num_gpus=1, |
|
device="cuda" |
|
) |
|
|
|
# Run evaluation on specific tasks |
|
evaluation = MTEB(tasks=["NevIR"]) |
|
results = evaluation.run(model) |
|
``` |
|
|
|
## Citation |
|
|
|
If you use rank1 in your research, please cite our work: |
|
|
|
```bibtex |
|
@misc{weller2025rank1testtimecomputereranking, |
|
title={Rank1: Test-Time Compute for Reranking in Information Retrieval}, |
|
author={Orion Weller and Kathryn Ricci and Eugene Yang and Andrew Yates and Dawn Lawrie and Benjamin Van Durme}, |
|
year={2025}, |
|
eprint={2502.18418}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.IR}, |
|
url={https://arxiv.org/abs/2502.18418}, |
|
} |
|
``` |
|
|
|
## License |
|
|
|
[MIT License](https://github.com/orionw/rank1/blob/main/LICENSE) |