Files changed (1) hide show
  1. README.md +45 -0
README.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # InRanker-small (60M parameters)
2
+
3
+ InRanker is a version of monoT5 distilled from [monoT5-3B](https://huggingface.co/castorini/monot5-3b-msmarco-10k) with increased effectiveness on out-of-domain scenarios.
4
+ Our key insight were to use language models and rerankers to generate as much as possible
5
+ synthetic "in-domain" training data, i.e., data that closely resembles
6
+ the data that will be seen at retrieval time. The pipeline used for training consists of
7
+ two distillation phases that do not require additional user queries
8
+ or manual annotations: (1) training on existing supervised soft
9
+ teacher labels, and (2) training on teacher soft labels for synthetic
10
+ queries generated using a large language model.
11
+
12
+ The paper with further details can be found [here](). The code and library are available at
13
+ https://github.com/unicamp-dl/InRanker
14
+
15
+ ## Usage
16
+ The library was tested using python 3.10 and is installed with:
17
+ ```bash
18
+ pip install inranker
19
+ ```
20
+
21
+ The code for inference is:
22
+ ```python
23
+ from inranker import T5Ranker
24
+
25
+ model = T5Ranker(model_name_or_path="unicamp-dl/InRanker-small")
26
+
27
+ docs = [
28
+ "The capital of France is Paris",
29
+ "Learn deep learning with InRanker and transformers"
30
+ ]
31
+ scores = model.get_scores(
32
+ query="What is the best way to learn deep learning?",
33
+ docs=docs
34
+ )
35
+ # Scores are sorted in descending order (most relevant to least)
36
+ # scores -> [0, 1]
37
+ sorted_scores = sorted(zip(scores, docs), key=lambda x: x[0], reverse=True)
38
+
39
+ """ InRanker-small:
40
+ sorted_scores = [
41
+ (0.4844, 'Learn deep learning with InRanker and transformers'),
42
+ (7.83e-06, 'The capital of France is Paris')
43
+ ]
44
+ """
45
+ ```