Edit model card

opus-mt-tc-bible-big-iir-deu_eng_fra_por_spa

Table of Contents

Model Details

Neural machine translation model for translating from Indo-Iranian languages (iir) to unknown (deu+eng+fra+por+spa).

This model is part of the OPUS-MT project, an effort to make neural machine translation models widely available and accessible for many languages in the world. All models are originally trained using the amazing framework of Marian NMT, an efficient NMT implementation written in pure C++. The models have been converted to pyTorch using the transformers library by huggingface. Training data is taken from OPUS and training pipelines use the procedures of OPUS-MT-train. Model Description:

This is a multilingual translation model with multiple target languages. A sentence initial language token is required in the form of >>id<< (id = valid target language ID), e.g. >>deu<<

Uses

This model can be used for translation and text-to-text generation.

Risks, Limitations and Biases

CONTENT WARNING: Readers should be aware that the model is trained on various public data sets that may contain content that is disturbing, offensive, and can propagate historical and current stereotypes.

Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)).

How to Get Started With the Model

A short example code:

from transformers import MarianMTModel, MarianTokenizer

src_text = [
    ">>deu<< Replace this with text in an accepted source language.",
    ">>spa<< This is the second sentence."
]

model_name = "pytorch-models/opus-mt-tc-bible-big-iir-deu_eng_fra_por_spa"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))

for t in translated:
    print( tokenizer.decode(t, skip_special_tokens=True) )

You can also use OPUS-MT models with the transformers pipelines, for example:

from transformers import pipeline
pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-bible-big-iir-deu_eng_fra_por_spa")
print(pipe(">>deu<< Replace this with text in an accepted source language."))

Training

Evaluation

langpair testset chr-F BLEU #sent #words
awa-eng tatoeba-test-v2021-08-07 0.60240 40.8 279 1335
ben-eng tatoeba-test-v2021-08-07 0.64471 49.3 2500 13978
fas-deu tatoeba-test-v2021-08-07 0.58631 34.7 3185 25590
fas-eng tatoeba-test-v2021-08-07 0.59868 41.8 3762 31480
fas-fra tatoeba-test-v2021-08-07 0.57181 35.8 376 3377
hin-eng tatoeba-test-v2021-08-07 0.65417 49.5 5000 33943
kur_Latn-deu tatoeba-test-v2021-08-07 0.42694 27.0 223 1323
kur_Latn-eng tatoeba-test-v2021-08-07 0.42721 25.6 290 1708
mar-eng tatoeba-test-v2021-08-07 0.64493 48.3 10396 67527
pes-eng tatoeba-test-v2021-08-07 0.59959 41.9 3757 31411
urd-eng tatoeba-test-v2021-08-07 0.53679 35.4 1663 12029
ben-deu flores101-devtest 0.46873 16.4 1012 25094
ben-eng flores101-devtest 0.57508 30.0 1012 24721
ben-spa flores101-devtest 0.44010 15.1 1012 29199
ckb-deu flores101-devtest 0.41546 13.0 1012 25094
ckb-por flores101-devtest 0.44178 17.6 1012 26519
fas-por flores101-devtest 0.54077 26.1 1012 26519
guj-deu flores101-devtest 0.45906 16.5 1012 25094
guj-spa flores101-devtest 0.43928 15.2 1012 29199
hin-eng flores101-devtest 0.62807 36.6 1012 24721
hin-por flores101-devtest 0.52825 25.1 1012 26519
mar-deu flores101-devtest 0.44767 14.8 1012 25094
npi-deu flores101-devtest 0.46178 15.9 1012 25094
pan-fra flores101-devtest 0.50909 23.4 1012 28343
pan-por flores101-devtest 0.50634 23.0 1012 26519
pus-deu flores101-devtest 0.42645 13.5 1012 25094
pus-fra flores101-devtest 0.45719 18.0 1012 28343
urd-deu flores101-devtest 0.46102 16.5 1012 25094
urd-eng flores101-devtest 0.56356 28.4 1012 24721
asm-eng flores200-devtest 0.48589 21.5 1012 24721
awa-deu flores200-devtest 0.47071 16.0 1012 25094
awa-eng flores200-devtest 0.53069 26.6 1012 24721
awa-fra flores200-devtest 0.49700 21.1 1012 28343
awa-por flores200-devtest 0.49950 21.8 1012 26519
awa-spa flores200-devtest 0.43831 15.4 1012 29199
ben-deu flores200-devtest 0.47434 17.0 1012 25094
ben-eng flores200-devtest 0.58408 31.4 1012 24721
ben-fra flores200-devtest 0.50930 23.2 1012 28343
ben-por flores200-devtest 0.50661 22.4 1012 26519
ben-spa flores200-devtest 0.44485 15.7 1012 29199
bho-deu flores200-devtest 0.42463 12.8 1012 25094
bho-eng flores200-devtest 0.50545 22.6 1012 24721
bho-fra flores200-devtest 0.45264 17.4 1012 28343
bho-por flores200-devtest 0.44737 17.0 1012 26519
bho-spa flores200-devtest 0.40585 13.0 1012 29199
ckb-deu flores200-devtest 0.42110 13.6 1012 25094
ckb-eng flores200-devtest 0.50543 24.7 1012 24721
ckb-fra flores200-devtest 0.45847 19.1 1012 28343
ckb-por flores200-devtest 0.44567 17.8 1012 26519
guj-deu flores200-devtest 0.46758 17.3 1012 25094
guj-eng flores200-devtest 0.61139 34.4 1012 24721
guj-fra flores200-devtest 0.50349 22.5 1012 28343
guj-por flores200-devtest 0.49828 22.4 1012 26519
guj-spa flores200-devtest 0.44472 15.5 1012 29199
hin-deu flores200-devtest 0.50772 20.8 1012 25094
hin-eng flores200-devtest 0.63234 37.3 1012 24721
hin-fra flores200-devtest 0.53933 26.5 1012 28343
hin-por flores200-devtest 0.53523 26.1 1012 26519
hin-spa flores200-devtest 0.46183 17.4 1012 29199
hne-deu flores200-devtest 0.49946 19.0 1012 25094
hne-eng flores200-devtest 0.63640 38.1 1012 24721
hne-fra flores200-devtest 0.53419 25.7 1012 28343
hne-por flores200-devtest 0.53735 25.9 1012 26519
hne-spa flores200-devtest 0.45610 16.9 1012 29199
mag-deu flores200-devtest 0.50681 20.0 1012 25094
mag-eng flores200-devtest 0.63966 38.0 1012 24721
mag-fra flores200-devtest 0.53810 25.9 1012 28343
mag-por flores200-devtest 0.54065 26.6 1012 26519
mag-spa flores200-devtest 0.46131 17.1 1012 29199
mai-deu flores200-devtest 0.47686 16.8 1012 25094
mai-eng flores200-devtest 0.57552 30.2 1012 24721
mai-fra flores200-devtest 0.50909 22.4 1012 28343
mai-por flores200-devtest 0.51249 22.9 1012 26519
mai-spa flores200-devtest 0.44694 15.9 1012 29199
mar-deu flores200-devtest 0.45295 14.8 1012 25094
mar-eng flores200-devtest 0.58203 31.0 1012 24721
mar-fra flores200-devtest 0.48254 20.4 1012 28343
mar-por flores200-devtest 0.48368 20.4 1012 26519
mar-spa flores200-devtest 0.42799 14.7 1012 29199
npi-deu flores200-devtest 0.47267 17.2 1012 25094
npi-eng flores200-devtest 0.59559 32.5 1012 24721
npi-fra flores200-devtest 0.50869 22.5 1012 28343
npi-por flores200-devtest 0.50900 22.5 1012 26519
npi-spa flores200-devtest 0.44304 15.6 1012 29199
pan-deu flores200-devtest 0.48342 18.6 1012 25094
pan-eng flores200-devtest 0.60328 33.4 1012 24721
pan-fra flores200-devtest 0.51953 24.4 1012 28343
pan-por flores200-devtest 0.51428 23.9 1012 26519
pan-spa flores200-devtest 0.44615 16.3 1012 29199
pes-deu flores200-devtest 0.51124 21.0 1012 25094
pes-eng flores200-devtest 0.60538 33.7 1012 24721
pes-fra flores200-devtest 0.55157 27.8 1012 28343
pes-por flores200-devtest 0.54372 26.6 1012 26519
pes-spa flores200-devtest 0.47561 18.8 1012 29199
prs-deu flores200-devtest 0.50273 20.7 1012 25094
prs-eng flores200-devtest 0.60144 34.5 1012 24721
prs-fra flores200-devtest 0.54241 27.0 1012 28343
prs-por flores200-devtest 0.53562 26.6 1012 26519
prs-spa flores200-devtest 0.46497 18.1 1012 29199
sin-deu flores200-devtest 0.45041 14.7 1012 25094
sin-eng flores200-devtest 0.54060 26.3 1012 24721
sin-fra flores200-devtest 0.48163 19.9 1012 28343
sin-por flores200-devtest 0.47780 19.6 1012 26519
sin-spa flores200-devtest 0.42546 14.2 1012 29199
tgk-deu flores200-devtest 0.45203 15.6 1012 25094
tgk-eng flores200-devtest 0.53740 25.3 1012 24721
tgk-fra flores200-devtest 0.50153 22.1 1012 28343
tgk-por flores200-devtest 0.49378 21.9 1012 26519
tgk-spa flores200-devtest 0.44099 15.9 1012 29199
urd-deu flores200-devtest 0.46894 17.2 1012 25094
urd-eng flores200-devtest 0.56967 29.3 1012 24721
urd-fra flores200-devtest 0.50616 22.6 1012 28343
urd-por flores200-devtest 0.49398 21.7 1012 26519
urd-spa flores200-devtest 0.43800 15.4 1012 29199
hin-eng newstest2014 0.59024 30.3 2507 55571
guj-eng newstest2019 0.53977 27.2 1016 17757
ben-deu ntrex128 0.45551 15.0 1997 48761
ben-eng ntrex128 0.56878 29.0 1997 47673
ben-fra ntrex128 0.47077 18.6 1997 53481
ben-por ntrex128 0.46049 17.1 1997 51631
ben-spa ntrex128 0.48833 21.3 1997 54107
fas-deu ntrex128 0.46991 16.1 1997 48761
fas-eng ntrex128 0.55119 25.9 1997 47673
fas-fra ntrex128 0.49626 21.2 1997 53481
fas-por ntrex128 0.47499 18.6 1997 51631
fas-spa ntrex128 0.50178 22.8 1997 54107
guj-deu ntrex128 0.43998 14.3 1997 48761
guj-eng ntrex128 0.58481 31.0 1997 47673
guj-fra ntrex128 0.45468 17.3 1997 53481
guj-por ntrex128 0.44223 15.8 1997 51631
guj-spa ntrex128 0.47798 20.7 1997 54107
hin-deu ntrex128 0.46580 15.0 1997 48761
hin-eng ntrex128 0.59832 31.6 1997 47673
hin-fra ntrex128 0.48328 19.5 1997 53481
hin-por ntrex128 0.46833 17.8 1997 51631
hin-spa ntrex128 0.49517 21.9 1997 54107
mar-deu ntrex128 0.43713 13.5 1997 48761
mar-eng ntrex128 0.55132 27.4 1997 47673
mar-fra ntrex128 0.44797 16.9 1997 53481
mar-por ntrex128 0.44342 16.1 1997 51631
mar-spa ntrex128 0.46950 19.7 1997 54107
nep-deu ntrex128 0.43568 13.5 1997 48761
nep-eng ntrex128 0.55954 28.8 1997 47673
nep-fra ntrex128 0.45083 16.9 1997 53481
nep-por ntrex128 0.44458 16.0 1997 51631
nep-spa ntrex128 0.46832 19.4 1997 54107
pan-deu ntrex128 0.44327 14.6 1997 48761
pan-eng ntrex128 0.57665 30.5 1997 47673
pan-fra ntrex128 0.45815 17.7 1997 53481
pan-por ntrex128 0.44608 16.3 1997 51631
pan-spa ntrex128 0.47289 20.0 1997 54107
prs-deu ntrex128 0.45067 14.6 1997 48761
prs-eng ntrex128 0.54767 26.6 1997 47673
prs-fra ntrex128 0.47453 19.3 1997 53481
prs-por ntrex128 0.45843 17.1 1997 51631
prs-spa ntrex128 0.48317 20.9 1997 54107
pus-eng ntrex128 0.44698 17.6 1997 47673
pus-spa ntrex128 0.41132 14.6 1997 54107
sin-deu ntrex128 0.42541 12.5 1997 48761
sin-eng ntrex128 0.51853 23.5 1997 47673
sin-fra ntrex128 0.44099 15.9 1997 53481
sin-por ntrex128 0.43010 14.4 1997 51631
sin-spa ntrex128 0.46225 18.4 1997 54107
tgk_Cyrl-deu ntrex128 0.40368 11.4 1997 48761
tgk_Cyrl-eng ntrex128 0.47132 18.2 1997 47673
tgk_Cyrl-fra ntrex128 0.43311 15.8 1997 53481
tgk_Cyrl-por ntrex128 0.42095 13.8 1997 51631
tgk_Cyrl-spa ntrex128 0.44279 17.3 1997 54107
urd-deu ntrex128 0.45708 15.5 1997 48761
urd-eng ntrex128 0.56560 28.5 1997 47673
urd-fra ntrex128 0.47536 19.0 1997 53481
urd-por ntrex128 0.45911 16.7 1997 51631
urd-spa ntrex128 0.48986 21.6 1997 54107
ben-eng tico19-test 0.64578 38.7 2100 56824
ben-fra tico19-test 0.50165 22.8 2100 64661
ben-por tico19-test 0.55662 27.7 2100 62729
ben-spa tico19-test 0.56795 29.6 2100 66563
ckb-eng tico19-test 0.51623 27.4 2100 56315
ckb-fra tico19-test 0.42405 17.1 2100 64661
ckb-por tico19-test 0.45405 19.0 2100 62729
ckb-spa tico19-test 0.46976 21.7 2100 66563
fas-eng tico19-test 0.62079 34.2 2100 56315
fas-fra tico19-test 0.52041 24.4 2100 64661
fas-por tico19-test 0.56780 29.2 2100 62729
fas-spa tico19-test 0.58248 32.3 2100 66563
hin-eng tico19-test 0.70535 46.8 2100 56323
hin-fra tico19-test 0.53833 26.6 2100 64661
hin-por tico19-test 0.60246 33.2 2100 62729
hin-spa tico19-test 0.61504 35.7 2100 66563
mar-eng tico19-test 0.59247 31.4 2100 56315
mar-fra tico19-test 0.46895 19.3 2100 64661
mar-por tico19-test 0.51945 23.8 2100 62729
mar-spa tico19-test 0.52914 26.2 2100 66563
nep-eng tico19-test 0.65865 40.1 2100 56824
nep-fra tico19-test 0.50473 23.2 2100 64661
nep-por tico19-test 0.56185 28.0 2100 62729
nep-spa tico19-test 0.57270 30.2 2100 66563
prs-eng tico19-test 0.59536 32.1 2100 56824
prs-fra tico19-test 0.50044 23.1 2100 64661
prs-por tico19-test 0.54448 27.3 2100 62729
prs-spa tico19-test 0.56311 30.2 2100 66563
pus-eng tico19-test 0.56711 31.4 2100 56315
pus-fra tico19-test 0.45951 19.4 2100 64661
pus-por tico19-test 0.50225 23.7 2100 62729
pus-spa tico19-test 0.51246 25.4 2100 66563
urd-eng tico19-test 0.57786 30.8 2100 56315
urd-fra tico19-test 0.46807 20.1 2100 64661
urd-por tico19-test 0.51567 24.1 2100 62729
urd-spa tico19-test 0.52820 26.4 2100 66563

Citation Information

@article{tiedemann2023democratizing,
  title={Democratizing neural machine translation with {OPUS-MT}},
  author={Tiedemann, J{\"o}rg and Aulamo, Mikko and Bakshandaeva, Daria and Boggia, Michele and Gr{\"o}nroos, Stig-Arne and Nieminen, Tommi and Raganato, Alessandro and Scherrer, Yves and Vazquez, Raul and Virpioja, Sami},
  journal={Language Resources and Evaluation},
  number={58},
  pages={713--755},
  year={2023},
  publisher={Springer Nature},
  issn={1574-0218},
  doi={10.1007/s10579-023-09704-w}
}

@inproceedings{tiedemann-thottingal-2020-opus,
    title = "{OPUS}-{MT} {--} Building open translation services for the World",
    author = {Tiedemann, J{\"o}rg  and Thottingal, Santhosh},
    booktitle = "Proceedings of the 22nd Annual Conference of the European Association for Machine Translation",
    month = nov,
    year = "2020",
    address = "Lisboa, Portugal",
    publisher = "European Association for Machine Translation",
    url = "https://aclanthology.org/2020.eamt-1.61",
    pages = "479--480",
}

@inproceedings{tiedemann-2020-tatoeba,
    title = "The Tatoeba Translation Challenge {--} Realistic Data Sets for Low Resource and Multilingual {MT}",
    author = {Tiedemann, J{\"o}rg},
    booktitle = "Proceedings of the Fifth Conference on Machine Translation",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.wmt-1.139",
    pages = "1174--1182",
}

Acknowledgements

The work is supported by the HPLT project, funded by the European Union’s Horizon Europe research and innovation programme under grant agreement No 101070350. We are also grateful for the generous computational resources and IT infrastructure provided by CSC -- IT Center for Science, Finland, and the EuroHPC supercomputer LUMI.

Model conversion info

  • transformers version: 4.45.1
  • OPUS-MT git hash: 0882077
  • port time: Tue Oct 8 11:32:44 EEST 2024
  • port machine: LM0-400-22516.local
Downloads last month
4
Safetensors
Model size
240M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including Helsinki-NLP/opus-mt-tc-bible-big-iir-deu_eng_fra_por_spa

Evaluation results