Text Generation
Transformers
Safetensors
qwen2
reranker
conversational
text-generation-inference
File size: 11,201 Bytes
3ae0c6c
 
d765fb0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3ae0c6c
d765fb0
 
 
 
 
 
 
3ae0c6c
 
d765fb0
 
 
 
 
 
 
 
 
 
ad8e38d
d765fb0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1155b77
d765fb0
 
1155b77
d765fb0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1155b77
d765fb0
 
22f1a0c
d765fb0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1155b77
d765fb0
 
22f1a0c
d765fb0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1155b77
d765fb0
 
22f1a0c
d765fb0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3ae0c6c
d765fb0
3ae0c6c
d765fb0
3ae0c6c
d765fb0
3ae0c6c
d765fb0
 
 
3ae0c6c
d765fb0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
---
library_name: transformers
license: apache-2.0
language:
- en
- zh
- es
- de
- ar
- ru
- ja
- ko
- hi
- sk
- vi
- tr
- fi
- id
- fa
- 'no'
- th
- sv
- pt
- da
- bn
- te
- ro
- it
- fr
- nl
- sw
- pl
- hu
- cs
- el
- uk
- mr
- ta
- tl
- bg
- lt
- ur
- he
- gu
- kn
- am
- kk
- hr
- uz
- jv
- ca
- az
- ms
- sr
- sl
- yo
- lv
- is
- ha
- ka
- et
- bs
- hy
- ml
- pa
- mt
- km
- sq
- or
- as
- my
- mn
- af
- be
- ga
- mk
- cy
- gl
- ceb
- la
- yi
- lb
- tg
- gd
- ne
- ps
- eu
- ky
- ku
- si
- ht
- eo
- lo
- fy
- sd
- mg
- so
- ckb
- su
- nn
datasets:
- lightblue/reranker_continuous_filt_max7_train
base_model:
- Qwen/Qwen2.5-0.5B-Instruct
pipeline_tag: text-generation
tags:
- reranker
widget:
- text: "<<<Query>>>\nHow many languages has LB-Reranker been trained on?\n\n\n<<<Context>>>\nLB-Reranker has been trained on more than 95 languages."
  example_title: Positive example (7/7)
- text: "<<<Query>>>\nHow many languages has LB-Reranker been trained on?\n\n\n<<<Context>>>\nAA-Reranker is applicable to a broad range of use cases."
  example_title: Negative example (2/7)

---

# LB Reranker v1.0

<div style="width: 100%; height: 160px; 
            display: flex; align-items: center; 
            justify-content: center; 
            border: 8px solid black; 
            font-size: 120px; font-weight: bold; 
            text-align: center;
            color: #438db8; 
            font-family: 'Helvetica Neue', sans-serif;">
  LBR-r
</div>


This is a reversed version of the original LB Reranker - (lightblue/lb-reranker-0.5B-v1.0)[https://huggingface.co/lightblue/lb-reranker-0.5B-v1.0]. 
With this version, you input the text, then the query into the reranker, allowing for caching of the text instead of the query. 

The LB Reranker has been trained to determine the relatedness of a given query to a piece of text, therefore allowing it to be used as a ranker or reranker in various retrieval-based tasks.

This model is fine-tuned from a [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) model checkpoint and was trained for roughly 5.5 hours using the 8 x L20 instance ([ecs.gn8is-8x.32xlarge](https://www.alibabacloud.com/help/en/ecs/user-guide/gpu-accelerated-compute-optimized-and-vgpu-accelerated-instance-families-1)) on [Alibaba Cloud](https://www.alibabacloud.com/).

The training data for this model can be found at [lightblue/reranker_continuous_filt_max7_train](https://huggingface.co/datasets/lightblue/reranker_continuous_filt_max7_train) and the code for generating this data as well as running the training of the model can be found on [our Github repo](https://github.com/lightblue-tech/lb-reranker).

Trained on data in over 95 languages, this model is applicable to a broad range of use cases.

This model has three main benefits over comparable rerankers.
1. It has shown slightly higher performance on evaluation benchmarks.
2. It has been trained on more languages than any previous model.
3. It is a simple Causal LM model trained to output a string between "1" and "7".

This last point means that this model can be used natively with many widely available inference packages, including vLLM and LMDeploy.
This in turns allows our reranker to benefit from improvements to inference as and when these packages release them.

Update: We have also found that this model works pretty well as a code snippet reranker too (P@1 of 96%)! See our [Colab](https://colab.research.google.com/drive/1ABL1xaarekLIlVJKbniYhXgYu6ZNwfBm?usp=sharing) for more details.

# How to use

The model was trained to expect an input such as:

```
<<<Context>>>
{your_context_here}

<<<Query>>>
{your_query_here}
```

And to output a string of a number between 1-7.

In order to make a continuous score that can be used for reranking query-context pairs (i.e. a method with few ties), we calculate the expectation value of the scores.

We include scripts to do this in vLLM, LMDeploy, and OpenAI (hosted for free on Huggingface):


<ul>
  <li><b>vLLM</b>

Install [vLLM](https://github.com/vllm-project/vllm/) using `pip install vllm`.

<details open>
  <summary>Show vLLM code</summary>
  
```python
from vllm import LLM, SamplingParams
import numpy as np

def make_reranker_input(t, q):
    return f"<<<Context>>>\n{t}\n\n<<<Query>>>\n{q}"

def make_reranker_inference_conversation(context, question):
    system_message = "Given a piece of text and a query, output a score of 1-7 based on how related the query is to the text. 1 means least related and 7 is most related."

    return [
        {"role": "system", "content": system_message},
        {"role": "user", "content": make_reranker_input(context, question)},
    ]

def get_prob(logprob_dict, tok_id):
    return np.exp(logprob_dict[tok_id].logprob) if tok_id in logprob_dict.keys() else 0

llm = LLM("lightblue/lb-reranker-0.5B-v1.0-rev")
sampling_params = SamplingParams(temperature=0.0, logprobs=14, max_tokens=1)
tok = llm.llm_engine.tokenizer.tokenizer
idx_tokens = [tok.encode(str(i))[0] for i in range(1, 8)]

query_texts = [
    ("What is the scientific name of apples?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
    ("What is the Chinese word for 'apple'?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
    ("What is the square root of 999?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
]

chats = [make_reranker_inference_conversation(c, q) for q, c in query_texts]
responses = llm.chat(chats, sampling_params)
probs = np.array([[get_prob(r.outputs[0].logprobs[0], y) for y in idx_tokens] for r in responses])

N = probs.shape[1]
M = probs.shape[0]
idxs = np.tile(np.arange(1, N + 1), M).reshape(M, N)

expected_vals = (probs * idxs).sum(axis=1)
print(expected_vals)
# [6.66570732 1.86686378 1.01102923]
```

</details></li>
  <li><b>LMDeploy</b>

Install [LMDeploy](https://github.com/InternLM/lmdeploy) using `pip install lmdeploy`.

<details>
  <summary>Show LMDeploy code</summary>
  
```python
# Un-comment this if running in a Jupyter notebook, Colab etc.
# import nest_asyncio
# nest_asyncio.apply()

from lmdeploy import GenerationConfig, ChatTemplateConfig, pipeline
import numpy as np

def make_reranker_input(t, q):
    return f"<<<Context>>>\n{t}\n\n<<<Query>>>\n{q}"

def make_reranker_inference_conversation(context, question):
    system_message = "Given a piece of text and a query, output a score of 1-7 based on how related the query is to the text. 1 means least related and 7 is most related."

    return [
        {"role": "system", "content": system_message},
        {"role": "user", "content": make_reranker_input(context, question)},
    ]

def get_prob(logprob_dict, tok_id):
    return np.exp(logprob_dict[tok_id]) if tok_id in logprob_dict.keys() else 0

pipe = pipeline(
    "lightblue/lb-reranker-0.5B-v1.0-rev",
    chat_template_config=ChatTemplateConfig(
                    model_name='qwen2d5',
                    capability='chat'
    )
)
tok = pipe.tokenizer.model
idx_tokens = [tok.encode(str(i))[0] for i in range(1, 8)]

query_texts = [
    ("What is the scientific name of apples?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
    ("What is the Chinese word for 'apple'?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
    ("What is the square root of 999?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
]

chats = [make_reranker_inference_conversation(c, q) for q, c in query_texts]
responses = pipe(
    chats, 
    gen_config=GenerationConfig(temperature=1.0, logprobs=14, max_new_tokens=1, do_sample=True)
)
probs = np.array([[get_prob(r.logprobs[0], y) for y in idx_tokens] for r in responses])

N = probs.shape[1]
M = probs.shape[0]
idxs = np.tile(np.arange(1, N + 1), M).reshape(M, N)

expected_vals = (probs * idxs).sum(axis=1)
print(expected_vals)
# [6.66415229 1.84342025 1.01133205]
```

</details></li>
  <li><b>OpenAI (Hosted on Huggingface)</b>

Install [openai](https://github.com/openai/openai-python) using `pip install openai`.

<details>
  <summary>Show OpenAI + Huggingface Inference code</summary>
  
```python
from openai import OpenAI
import numpy as np
from multiprocessing import Pool
from tqdm.auto import tqdm

client = OpenAI(
	base_url="https://api-inference.huggingface.co/v1/",
	api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # Change this to an access token from https://huggingface.co/settings/tokens
)

def make_reranker_input(t, q):
    return f"<<<Context>>>\n{t}\n\n<<<Query>>>\n{q}"

def make_reranker_inference_conversation(context, question):
    system_message = "Given a piece of text and a query, output a score of 1-7 based on how related the query is to the text. 1 means least related and 7 is most related."

    return [
        {"role": "system", "content": system_message},
        {"role": "user", "content": make_reranker_input(context, question)},
    ]

def get_reranker_score(context_question_tuple):
    question, context = context_question_tuple

    messages = make_reranker_inference_conversation(context, question)

    completion = client.chat.completions.create(
        model="lightblue/lb-reranker-0.5B-v1.0-rev", 
        messages=messages,
        max_tokens=1,
        temperature=0.0,
        logprobs=True,
        top_logprobs=5, # Max allowed by the openai API as top_n_tokens must be >= 0 and <= 5. If this gets changed, fix to > 7.
    )

    logprobs = completion.choices[0].logprobs.content[0].top_logprobs

    calculated_score = sum([int(x.token) * np.exp(x.logprob) for x in logprobs])

    return calculated_score

query_texts = [
    ("What is the scientific name of apples?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
    ("What is the Chinese word for 'apple'?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
    ("What is the square root of 999?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
]

with Pool(processes=16) as p: # Allows for parallel processing
    expected_vals = list(tqdm(p.imap(get_reranker_score, query_texts), total=len(query_texts)))

print(expected_vals)
# [6.64866580, 1.85144404, 1.010719508]
```

</details></li>
</ul>	

# License

We share this model under an Apache 2.0 license.

# Developed by

<a href="https://www.lightblue-tech.com">
<img src="https://www.lightblue-tech.com/wp-content/uploads/2023/08/color_%E6%A8%AA%E5%9E%8B-1536x469.png" alt="Lightblue technology logo" width="400"/>
</a>

This model was trained by Peter Devine ([ptrdvn](https://huggingface.co/ptrdvn)) for Lightblue