File size: 1,217 Bytes
bad2ce0 6d34d25 bad2ce0 6d34d25 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
---
license: apache-2.0
language: "en"
tags:
- bag-of-words
- dense-passage-retrieval
- knowledge-distillation
datasets:
- ms_marco
---
# Uni-ColBERTer (Dim: 1) for Passage Retrieval
If you want to know more about our (Uni-)ColBERTer architecture check out our paper: https://arxiv.org/abs/2203.13088 🎉
For more information, source code, and a minimal usage example please visit: https://github.com/sebastian-hofstaetter/colberter
## Limitations & Bias
- The model is only trained on english text.
- The model inherits social biases from both DistilBERT and MSMARCO.
- The model is only trained on relatively short passages of MSMARCO (avg. 60 words length), so it might struggle with longer text.
## Citation
If you use our model checkpoint please cite our work as:
```
@article{Hofstaetter2022_colberter,
author = {Sebastian Hofst{\"a}tter and Omar Khattab and Sophia Althammer and Mete Sertkan and Allan Hanbury},
title = {Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction},
publisher = {arXiv},
url = {https://arxiv.org/abs/2203.13088},
doi = {10.48550/ARXIV.2203.13088},
year = {2022},
}
``` |