lightblue
/

lb-reranker-0.5B-v1.0

@@ -110,6 +110,120 @@ The training data for this model can be found at [lightblue/reranker_continuous_
 Trained on data in over 95 languages, this model is applicable to a broad range of use cases.
 # Evaluation
 We perform an evaluation on 9 datasets from the [BEIR benchmark](https://github.com/beir-cellar/beir) that none of the evaluated models have been trained upon (to our knowledge).

 Trained on data in over 95 languages, this model is applicable to a broad range of use cases.
+This model has three main benefits over comparable rerankers.
+1. It has shown slightly higher performance on evaluation benchmarks.
+2. It has been trained on more languages than any previous model.
+3. It is a simple Causal LM model trained to output a string between "1" and "7".
+This last point means that this model can be used natively with many widely available inference packages, including vLLM and LMDeploy.
+This in turns allows our reranker to benefit from improvements to inference as and when these packages release them.
+# How to use
+#### vLLM
+Install [vLLM](https://github.com/vllm-project/vllm/) using `pip install vllm`.
+```python
+from vllm import LLM, SamplingParams
+import numpy as np
+def make_reranker_input(t, q):
+    return f"<<<Query>>>\n{q}\n\n<<<Context>>>\n{t}"
+def make_reranker_training_datum(context, question):
+    system_message = "Given a query and a piece of text, output a score of 1-7 based on how related the query is to the text. 1 means least related and 7 is most related."
+    return [
+        {"role": "system", "content": system_message},
+        {"role": "user", "content": make_reranker_input(context, question)},
+    ]
+def get_prob(logprob_dict, tok_id):
+    return np.exp(logprob_dict[tok_id].logprob) if tok_id in logprob_dict.keys() else 0
+llm = LLM("lightblue/lb-reranker-v1.0")
+sampling_params = SamplingParams(temperature=0.0, logprobs=14, max_tokens=1)
+tok = llm.llm_engine.tokenizer.tokenizer
+idx_tokens = [tok.encode(str(i))[0] for i in range(1, 8)]
+query_texts = [
+    ("What is the scientific name of apples?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
+    ("What is the Chinese word for 'apple'?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
+    ("What is the square root of 999?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
+]
+chats = [make_reranker_training_datum(c, q) for q, c in query_texts]
+responses = llm.chat(chats, sampling_params)
+probs = np.array([[get_prob(r.outputs[0].logprobs[0], y) for y in idx_tokens] for r in responses])
+N = probs.shape[1]
+M = probs.shape[0]
+idxs = np.tile(np.arange(1, N + 1), M).reshape(M, N)
+expected_vals = (probs * idxs).sum(axis=1)
+print(expected_vals)
+# [6.66570732 1.86686378 1.01102923]
+```
+#### LMDeploy
+Install [LMDeploy](https://github.com/InternLM/lmdeploy) using `pip install lmdeploy`.
+```python
+# Un-comment this if running in a Jupyter notebook, Colab etc.
+# import nest_asyncio
+# nest_asyncio.apply()
+from lmdeploy import GenerationConfig, ChatTemplateConfig, pipeline
+import numpy as np
+def make_reranker_input(t, q):
+    return f"<<<Query>>>\n{q}\n\n<<<Context>>>\n{t}"
+def make_reranker_training_datum(context, question):
+    system_message = "Given a query and a piece of text, output a score of 1-7 based on how related the query is to the text. 1 means least related and 7 is most related."
+    return [
+        {"role": "system", "content": system_message},
+        {"role": "user", "content": make_reranker_input(context, question)},
+    ]
+def get_prob(logprob_dict, tok_id):
+    return np.exp(logprob_dict[tok_id]) if tok_id in logprob_dict.keys() else 0
+pipe = pipeline(
+    "lightblue/lb-reranker-v1.0",
+    chat_template_config=ChatTemplateConfig(
+                    model_name='qwen2d5',
+                    capability='chat'
+    )
+)
+tok = pipe.tokenizer.model
+idx_tokens = [tok.encode(str(i))[0] for i in range(1, 8)]
+query_texts = [
+    ("What is the scientific name of apples?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
+    ("What is the Chinese word for 'apple'?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
+    ("What is the square root of 999?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
+]
+chats = [make_reranker_training_datum(c, q) for q, c in query_texts]
+responses = pipe(
+    chats,
+    gen_config=GenerationConfig(temperature=0.0, logprobs=14, max_new_tokens=1)
+)
+probs = np.array([[get_prob(r.logprobs[0], y) for y in idx_tokens] for r in responses])
+N = probs.shape[1]
+M = probs.shape[0]
+idxs = np.tile(np.arange(1, N + 1), M).reshape(M, N)
+expected_vals = (probs * idxs).sum(axis=1)
+print(expected_vals)
+# [7. 2. 1.]
+```
 # Evaluation
 We perform an evaluation on 9 datasets from the [BEIR benchmark](https://github.com/beir-cellar/beir) that none of the evaluated models have been trained upon (to our knowledge).