File size: 1,217 Bytes
bad2ce0
 
6d34d25
 
 
 
 
 
 
bad2ce0
6d34d25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
---

license: apache-2.0
language: "en"
tags:
- bag-of-words
- dense-passage-retrieval
- knowledge-distillation
datasets:
- ms_marco
---


# Uni-ColBERTer (Dim: 1) for Passage Retrieval

If you want to know more about our (Uni-)ColBERTer architecture check out our paper: https://arxiv.org/abs/2203.13088 🎉

For more information, source code, and a minimal usage example please visit: https://github.com/sebastian-hofstaetter/colberter

## Limitations & Bias

- The model is only trained on english text.

- The model inherits social biases from both DistilBERT and MSMARCO. 

- The model is only trained on relatively short passages of MSMARCO (avg. 60 words length), so it might struggle with longer text. 

## Citation

If you use our model checkpoint please cite our work as:

```

@article{Hofstaetter2022_colberter,

 author = {Sebastian Hofst{\"a}tter and Omar Khattab and Sophia Althammer and Mete Sertkan and Allan Hanbury},

 title = {Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction},

 publisher = {arXiv},

 url = {https://arxiv.org/abs/2203.13088},

 doi = {10.48550/ARXIV.2203.13088},

 year = {2022},

}

```