PrimeQA
/

DrDecr_XOR-TyDi_whitebox

Model card Files Files and versions Community

DrDecr_XOR-TyDi_whitebox / README.md

yulongl's picture

Update README.md (#1)

6f6f99c almost 2 years ago

|

history blame contribute delete

1.55 kB

	# Basic Information

	This is the Dr. Decr model used in XOR-TyDi leaderboard task 1 whitebox submission.

	https://nlp.cs.washington.edu/xorqa/


	The detailed implementation of the model can be found in:

	https://arxiv.org/pdf/2112.08185.pdf

	Source code to train the model can be found via PrimeQA's IR component:
	https://github.com/primeqa/primeqa/tree/main/examples/drdecr

	It is a Neural IR model built on top of the ColBERTv1 api and not directly compatible with Huggingface API. The inference result on XOR Dev dataset is:
	```
	R@2kt R@5kt
	te 66.67 70.88
	bn 70.23 75.08
	fi 82.24 86.18
	ja 65.92 72.93
	ko 67.93 71.73
	ru 63.07 69.71
	ar 78.15 82.77
	Avg 70.60 75.61
	```

	# Limitations and Bias

	This model used pre-trained XLM-R base model and fine tuned on 7 languages in XOR-TyDi leaderboard. The performance of other languages was not tested.

	Since the model was fine-tuned on a large pre-trained language model XLM-Roberta, biases associated with the pre-existing XLM-Roberta model may be present in our fine-tuned model, Dr. Decr

	# Citation
	```
	@article{Li2021_DrDecr,
	doi = {10.48550/ARXIV.2112.08185},
	url = {https://arxiv.org/abs/2112.08185},
	author = {Li, Yulong and Franz, Martin and Sultan, Md Arafat and Iyer, Bhavani and Lee, Young-Suk and Sil, Avirup},
	keywords = {Computation and Language (cs.CL), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
	title = {Learning Cross-Lingual IR from an English Retriever},
	publisher = {arXiv},
	year = {2021}
	}
	```