lightblue
/

lb-reranker-0.5B-v1.0

@@ -160,9 +160,9 @@ And to output a string of a number between 1-7.
 In order to make a continuous score that can be used for reranking query-context pairs (i.e. a method with few ties), we calculate the expectation value of the scores.
-We include scripts to do this in both vLLM and LMDeploy:
-#### vLLM
 Install [vLLM](https://github.com/vllm-project/vllm/) using `pip install vllm`.
@@ -208,7 +208,7 @@ print(expected_vals)
 # [6.66570732 1.86686378 1.01102923]
 ```
-#### LMDeploy
 Install [LMDeploy](https://github.com/InternLM/lmdeploy) using `pip install lmdeploy`.
@@ -266,6 +266,65 @@ print(expected_vals)
 # [6.66415229 1.84342025 1.01133205]
 ```
 # Evaluation
 We perform an evaluation on 9 datasets from the [BEIR benchmark](https://github.com/beir-cellar/beir) that none of the evaluated models have been trained upon (to our knowledge).

 In order to make a continuous score that can be used for reranking query-context pairs (i.e. a method with few ties), we calculate the expectation value of the scores.
+We include scripts to do this in vLLM, LMDeploy, and OpenAI (hosted for free on Huggingface):
+### vLLM
 Install [vLLM](https://github.com/vllm-project/vllm/) using `pip install vllm`.
 # [6.66570732 1.86686378 1.01102923]
 ```
+### LMDeploy
 Install [LMDeploy](https://github.com/InternLM/lmdeploy) using `pip install lmdeploy`.
 # [6.66415229 1.84342025 1.01133205]
 ```
+### OpenAI (Hosted on Huggingface)
+Install [openai](https://github.com/openai/openai-python) using `pip install openai`.
+```python
+from openai import OpenAI
+import numpy as np
+from multiprocessing import Pool
+from tqdm.auto import tqdm
+client = OpenAI(
+	base_url="https://api-inference.huggingface.co/v1/",
+	api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # Change this to an access token from https://huggingface.co/settings/tokens
+)
+def make_reranker_input(t, q):
+    return f"<<<Query>>>\n{q}\n\n<<<Context>>>\n{t}"
+def make_reranker_inference_conversation(context, question):
+    system_message = "Given a query and a piece of text, output a score of 1-7 based on how related the query is to the text. 1 means least related and 7 is most related."
+    return [
+        {"role": "system", "content": system_message},
+        {"role": "user", "content": make_reranker_input(context, question)},
+    ]
+def get_reranker_score(context_question_tuple):
+    question, context = context_question_tuple
+    messages = make_reranker_inference_conversation(context, question)
+    completion = client.chat.completions.create(
+        model="lightblue/lb-reranker-0.5B-v1.0",
+        messages=messages,
+        max_tokens=1,
+        temperature=0.0,
+        logprobs=True,
+        top_logprobs=5, # Max allowed by the openai API as top_n_tokens must be >= 0 and <= 5. If this gets changed, fix to > 7.
+    )
+    logprobs = completion.choices[0].logprobs.content[0].top_logprobs
+    calculated_score = sum([int(x.token) * np.exp(x.logprob) for x in logprobs])
+    return calculated_score
+query_texts = [
+    ("What is the scientific name of apples?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
+    ("What is the Chinese word for 'apple'?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
+    ("What is the square root of 999?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
+]
+with Pool(processes=16) as p: # Allows for parallel processing
+    expected_vals = list(tqdm(p.imap(get_reranker_score, query_texts), total=len(query_texts)))
+print(expected_vals)
+# [6.64866580, 1.85144404, 1.010719508]
+```
 # Evaluation
 We perform an evaluation on 9 datasets from the [BEIR benchmark](https://github.com/beir-cellar/beir) that none of the evaluated models have been trained upon (to our knowledge).