|
--- |
|
language: |
|
- fr |
|
- en |
|
metrics: |
|
- bleu |
|
pipeline_tag: translation |
|
model-index: |
|
- name: NMT-EN-FR |
|
results: |
|
- task: |
|
type: translation |
|
dataset: |
|
name: UN Corpus |
|
type: bilingual |
|
metrics: |
|
- name: BLEU |
|
type: BLEU |
|
value: 49 |
|
library_name: ctranslate2 |
|
license: cc-by-sa-4.0 |
|
--- |
|
|
|
# Model Details |
|
|
|
French-to-English Machine Translation model trained by Yasmin Moslem. |
|
This model depends on the Transformer (base) architecture. |
|
The model was originally trained with OpenNMT-py and then converted to the CTranslate2 format for efficient inference. |
|
|
|
## Tools |
|
|
|
- OpenNMT-py |
|
- CTranslate2 |
|
|
|
## Data |
|
|
|
This model is trained on the French-to-English portion of the [UN Corpus](https://conferences.unite.un.org/UNCorpus/), |
|
consisting of approx. 20 million segments. |
|
|
|
## Tokenizer |
|
|
|
The tokenizer was trained using [SentencePiece](https://github.com/google/sentencepiece) on shared vocabulary. |
|
Hence, there is only one SentencePiece model that can be used for tokenizing both the source and target texts. |
|
|
|
## Demo |
|
|
|
A demo of this model is available at: https://www.machinetranslation.io/ |
|
|
|
The demo also illustrates word-level auto-suggestions with teacher forcing. |
|
|
|
|
|
## Inference |
|
|
|
If you want to run this model locally, you can use the [CTranslate2](https://github.com/OpenNMT/CTranslate2) library. |
|
|
|
## Citation |
|
|
|
``` |
|
@inproceedings{moslem-etal-2022-translation, |
|
title = "Translation Word-Level Auto-Completion: What Can We Achieve Out of the Box?", |
|
author = "Moslem, Yasmin and |
|
Haque, Rejwanul and |
|
Way, Andy", |
|
booktitle = "Proceedings of the Seventh Conference on Machine Translation (WMT)", |
|
month = dec, |
|
year = "2022", |
|
address = "Abu Dhabi, United Arab Emirates", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://aclanthology.org/2022.wmt-1.119", |
|
pages = "1176--1181", |
|
} |
|
``` |