ymoslem
/

NMT-EN-FR-CT2

Model card Files Files and versions Community

NMT-EN-FR-CT2 / README.md

ymoslem's picture

Update README.md

431d81b verified 9 months ago

|

history blame contribute delete

1.87 kB

	---
	language:
	- fr
	- en
	metrics:
	- bleu
	pipeline_tag: translation
	model-index:
	- name: NMT-EN-FR
	results:
	- task:
	type: translation
	dataset:
	name: UN Corpus
	type: bilingual
	metrics:
	- name: BLEU
	type: BLEU
	value: 49
	library_name: ctranslate2
	license: cc-by-sa-4.0
	---

	# Model Details

	French-to-English Machine Translation model trained by Yasmin Moslem.
	This model depends on the Transformer (base) architecture.
	The model was originally trained with OpenNMT-py and then converted to the CTranslate2 format for efficient inference.

	## Tools

	- OpenNMT-py
	- CTranslate2

	## Data

	This model is trained on the French-to-English portion of the [UN Corpus](https://conferences.unite.un.org/UNCorpus/),
	consisting of approx. 20 million segments.

	## Tokenizer

	The tokenizer was trained using [SentencePiece](https://github.com/google/sentencepiece) on shared vocabulary.
	Hence, there is only one SentencePiece model that can be used for tokenizing both the source and target texts.

	## Demo

	A demo of this model is available at: https://www.machinetranslation.io/

	The demo also illustrates word-level auto-suggestions with teacher forcing.


	## Inference

	If you want to run this model locally, you can use the [CTranslate2](https://github.com/OpenNMT/CTranslate2) library.

	## Citation

	```
	@inproceedings{moslem-etal-2022-translation,
	title = "Translation Word-Level Auto-Completion: What Can We Achieve Out of the Box?",
	author = "Moslem, Yasmin and
	Haque, Rejwanul and
	Way, Andy",
	booktitle = "Proceedings of the Seventh Conference on Machine Translation (WMT)",
	month = dec,
	year = "2022",
	address = "Abu Dhabi, United Arab Emirates",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2022.wmt-1.119",
	pages = "1176--1181",
	}
	```