|
---
|
|
license: apache-2.0
|
|
language: "en"
|
|
tags:
|
|
- bag-of-words
|
|
- dense-passage-retrieval
|
|
- knowledge-distillation
|
|
datasets:
|
|
- ms_marco
|
|
---
|
|
|
|
# Uni-ColBERTer (Dim: 1) for Passage Retrieval
|
|
|
|
If you want to know more about our (Uni-)ColBERTer architecture check out our paper: https://arxiv.org/abs/2203.13088 🎉
|
|
|
|
For more information, source code, and a minimal usage example please visit: https://github.com/sebastian-hofstaetter/colberter
|
|
|
|
## Limitations & Bias
|
|
|
|
- The model is only trained on english text.
|
|
|
|
- The model inherits social biases from both DistilBERT and MSMARCO.
|
|
|
|
- The model is only trained on relatively short passages of MSMARCO (avg. 60 words length), so it might struggle with longer text.
|
|
|
|
## Citation
|
|
|
|
If you use our model checkpoint please cite our work as:
|
|
|
|
```
|
|
@article{Hofstaetter2022_colberter,
|
|
author = {Sebastian Hofst{\"a}tter and Omar Khattab and Sophia Althammer and Mete Sertkan and Allan Hanbury},
|
|
title = {Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction},
|
|
publisher = {arXiv},
|
|
url = {https://arxiv.org/abs/2203.13088},
|
|
doi = {10.48550/ARXIV.2203.13088},
|
|
year = {2022},
|
|
}
|
|
``` |