Edit model card

opus-mt-tc-bible-big-itc-deu_eng_fra_por_spa

Table of Contents

Model Details

Neural machine translation model for translating from Italic languages (itc) to unknown (deu+eng+fra+por+spa).

This model is part of the OPUS-MT project, an effort to make neural machine translation models widely available and accessible for many languages in the world. All models are originally trained using the amazing framework of Marian NMT, an efficient NMT implementation written in pure C++. The models have been converted to pyTorch using the transformers library by huggingface. Training data is taken from OPUS and training pipelines use the procedures of OPUS-MT-train. Model Description:

This is a multilingual translation model with multiple target languages. A sentence initial language token is required in the form of >>id<< (id = valid target language ID), e.g. >>deu<<

Uses

This model can be used for translation and text-to-text generation.

Risks, Limitations and Biases

CONTENT WARNING: Readers should be aware that the model is trained on various public data sets that may contain content that is disturbing, offensive, and can propagate historical and current stereotypes.

Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)).

How to Get Started With the Model

A short example code:

from transformers import MarianMTModel, MarianTokenizer

src_text = [
    ">>deu<< Replace this with text in an accepted source language.",
    ">>spa<< This is the second sentence."
]

model_name = "pytorch-models/opus-mt-tc-bible-big-itc-deu_eng_fra_por_spa"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))

for t in translated:
    print( tokenizer.decode(t, skip_special_tokens=True) )

You can also use OPUS-MT models with the transformers pipelines, for example:

from transformers import pipeline
pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-bible-big-itc-deu_eng_fra_por_spa")
print(pipe(">>deu<< Replace this with text in an accepted source language."))

Training

Evaluation

langpair testset chr-F BLEU #sent #words
cat-deu tatoeba-test-v2021-08-07 0.66856 47.9 723 5676
cat-eng tatoeba-test-v2021-08-07 0.72313 57.9 1631 12627
cat-fra tatoeba-test-v2021-08-07 0.71565 53.8 700 5664
cat-por tatoeba-test-v2021-08-07 0.75797 58.7 747 6119
cat-spa tatoeba-test-v2021-08-07 0.87610 77.7 1534 12094
fra-deu tatoeba-test-v2021-08-07 0.68638 50.0 12418 100545
fra-eng tatoeba-test-v2021-08-07 0.72664 58.0 12681 101754
fra-fra tatoeba-test-v2021-08-07 0.62093 40.6 1000 7757
fra-por tatoeba-test-v2021-08-07 0.70764 52.0 10518 77650
fra-spa tatoeba-test-v2021-08-07 0.72229 55.0 10294 78406
glg-eng tatoeba-test-v2021-08-07 0.70552 55.7 1015 8421
glg-por tatoeba-test-v2021-08-07 0.77067 62.1 433 3105
glg-spa tatoeba-test-v2021-08-07 0.82795 72.1 2121 17443
ita-deu tatoeba-test-v2021-08-07 0.68325 49.4 10094 79762
ita-eng tatoeba-test-v2021-08-07 0.81176 70.5 17320 119214
ita-fra tatoeba-test-v2021-08-07 0.78299 64.4 10091 66377
ita-por tatoeba-test-v2021-08-07 0.74169 55.6 3066 25668
ita-spa tatoeba-test-v2021-08-07 0.77673 63.0 5000 34937
lad_Latn-eng tatoeba-test-v2021-08-07 0.54247 36.7 672 3665
lad_Latn-spa tatoeba-test-v2021-08-07 0.59790 40.4 239 1239
lat-deu tatoeba-test-v2021-08-07 0.42548 24.8 2016 13326
lat-eng tatoeba-test-v2021-08-07 0.42385 24.3 10298 100152
lat-spa tatoeba-test-v2021-08-07 0.45821 25.2 3129 34036
oci-eng tatoeba-test-v2021-08-07 0.40921 22.4 841 5299
oci-fra tatoeba-test-v2021-08-07 0.49044 28.4 806 6302
pcd-fra tatoeba-test-v2021-08-07 0.41500 15.0 266 1677
pms-eng tatoeba-test-v2021-08-07 0.39308 20.8 269 2059
por-deu tatoeba-test-v2021-08-07 0.68379 48.8 10000 81246
por-eng tatoeba-test-v2021-08-07 0.77089 64.2 13222 105351
por-fra tatoeba-test-v2021-08-07 0.75364 58.7 10518 80459
por-por tatoeba-test-v2021-08-07 0.71396 50.3 2500 19220
por-spa tatoeba-test-v2021-08-07 0.79684 65.2 10947 87335
ron-deu tatoeba-test-v2021-08-07 0.68217 50.3 1141 7893
ron-eng tatoeba-test-v2021-08-07 0.73059 59.0 5508 40717
ron-fra tatoeba-test-v2021-08-07 0.70724 54.1 1925 13347
ron-por tatoeba-test-v2021-08-07 0.73085 53.3 681 4593
ron-spa tatoeba-test-v2021-08-07 0.73813 57.6 1959 12679
spa-deu tatoeba-test-v2021-08-07 0.68124 49.3 10521 86430
spa-eng tatoeba-test-v2021-08-07 0.74977 61.0 16583 138123
spa-fra tatoeba-test-v2021-08-07 0.73392 56.6 10294 83501
spa-por tatoeba-test-v2021-08-07 0.77280 61.1 10947 87610
spa-spa tatoeba-test-v2021-08-07 0.68111 50.9 2500 21469
ast-deu flores101-devtest 0.53243 24.2 1012 25094
ast-eng flores101-devtest 0.61235 36.0 1012 24721
ast-fra flores101-devtest 0.56687 31.2 1012 28343
ast-por flores101-devtest 0.57033 30.6 1012 26519
ast-spa flores101-devtest 0.49637 21.2 1012 29199
cat-fra flores101-devtest 0.63271 38.4 1012 28343
fra-deu flores101-devtest 0.58433 28.9 1012 25094
fra-eng flores101-devtest 0.67826 43.3 1012 24721
glg-deu flores101-devtest 0.56897 27.1 1012 25094
glg-spa flores101-devtest 0.53183 24.2 1012 29199
ita-por flores101-devtest 0.57961 28.4 1012 26519
kea-deu flores101-devtest 0.48105 18.3 1012 25094
kea-eng flores101-devtest 0.60362 35.0 1012 24721
kea-por flores101-devtest 0.57808 29.0 1012 26519
kea-spa flores101-devtest 0.46648 17.6 1012 29199
oci-deu flores101-devtest 0.57391 28.0 1012 25094
oci-eng flores101-devtest 0.72351 49.4 1012 24721
por-eng flores101-devtest 0.70724 47.4 1012 24721
por-fra flores101-devtest 0.64103 39.2 1012 28343
por-spa flores101-devtest 0.53268 25.0 1012 29199
ron-deu flores101-devtest 0.57980 28.1 1012 25094
ron-eng flores101-devtest 0.67583 41.6 1012 24721
ron-spa flores101-devtest 0.53082 24.3 1012 29199
spa-fra flores101-devtest 0.57039 27.1 1012 28343
spa-por flores101-devtest 0.55607 25.0 1012 26519
ast-deu flores200-devtest 0.53776 24.8 1012 25094
ast-eng flores200-devtest 0.61482 36.8 1012 24721
ast-fra flores200-devtest 0.56504 31.3 1012 28343
ast-por flores200-devtest 0.57158 31.1 1012 26519
ast-spa flores200-devtest 0.49579 21.2 1012 29199
cat-deu flores200-devtest 0.58203 29.2 1012 25094
cat-eng flores200-devtest 0.69165 44.6 1012 24721
cat-fra flores200-devtest 0.63612 38.9 1012 28343
cat-por flores200-devtest 0.62911 37.7 1012 26519
cat-spa flores200-devtest 0.53320 24.6 1012 29199
fra-deu flores200-devtest 0.58592 29.1 1012 25094
fra-eng flores200-devtest 0.68067 43.8 1012 24721
fra-por flores200-devtest 0.62388 37.0 1012 26519
fra-spa flores200-devtest 0.52983 24.4 1012 29199
fur-deu flores200-devtest 0.51969 21.8 1012 25094
fur-eng flores200-devtest 0.60793 34.3 1012 24721
fur-fra flores200-devtest 0.56989 30.0 1012 28343
fur-por flores200-devtest 0.56207 29.3 1012 26519
fur-spa flores200-devtest 0.48436 20.0 1012 29199
glg-deu flores200-devtest 0.57369 27.7 1012 25094
glg-eng flores200-devtest 0.66358 40.0 1012 24721
glg-fra flores200-devtest 0.62487 36.5 1012 28343
glg-por flores200-devtest 0.60267 32.7 1012 26519
glg-spa flores200-devtest 0.53227 24.3 1012 29199
hat-deu flores200-devtest 0.49916 19.1 1012 25094
hat-eng flores200-devtest 0.59656 32.5 1012 24721
hat-fra flores200-devtest 0.61574 35.4 1012 28343
hat-por flores200-devtest 0.55195 27.7 1012 26519
hat-spa flores200-devtest 0.47382 18.4 1012 29199
ita-deu flores200-devtest 0.55779 24.1 1012 25094
ita-eng flores200-devtest 0.61563 32.2 1012 24721
ita-fra flores200-devtest 0.60210 31.2 1012 28343
ita-por flores200-devtest 0.58279 28.8 1012 26519
ita-spa flores200-devtest 0.52348 23.2 1012 29199
kea-deu flores200-devtest 0.49089 19.3 1012 25094
kea-eng flores200-devtest 0.60553 35.5 1012 24721
kea-fra flores200-devtest 0.54027 26.6 1012 28343
kea-por flores200-devtest 0.57696 28.9 1012 26519
kea-spa flores200-devtest 0.46974 18.0 1012 29199
lij-deu flores200-devtest 0.51695 22.7 1012 25094
lij-eng flores200-devtest 0.62347 36.2 1012 24721
lij-fra flores200-devtest 0.57498 31.4 1012 28343
lij-por flores200-devtest 0.56183 29.4 1012 26519
lij-spa flores200-devtest 0.48038 20.0 1012 29199
lmo-deu flores200-devtest 0.45516 15.4 1012 25094
lmo-eng flores200-devtest 0.53540 25.5 1012 24721
lmo-fra flores200-devtest 0.50076 22.2 1012 28343
lmo-por flores200-devtest 0.50134 22.9 1012 26519
lmo-spa flores200-devtest 0.44053 16.2 1012 29199
oci-deu flores200-devtest 0.57822 28.7 1012 25094
oci-eng flores200-devtest 0.73030 50.7 1012 24721
oci-fra flores200-devtest 0.64900 39.7 1012 28343
oci-por flores200-devtest 0.63318 36.9 1012 26519
oci-spa flores200-devtest 0.52269 22.9 1012 29199
pap-deu flores200-devtest 0.53166 23.2 1012 25094
pap-eng flores200-devtest 0.68541 44.6 1012 24721
pap-fra flores200-devtest 0.57224 30.5 1012 28343
pap-por flores200-devtest 0.59064 33.2 1012 26519
pap-spa flores200-devtest 0.49601 21.7 1012 29199
por-deu flores200-devtest 0.59047 30.3 1012 25094
por-eng flores200-devtest 0.71096 48.0 1012 24721
por-fra flores200-devtest 0.64555 40.1 1012 28343
por-spa flores200-devtest 0.53400 25.1 1012 29199
ron-deu flores200-devtest 0.58428 28.7 1012 25094
ron-eng flores200-devtest 0.67719 41.8 1012 24721
ron-fra flores200-devtest 0.63678 37.6 1012 28343
ron-por flores200-devtest 0.62371 36.1 1012 26519
ron-spa flores200-devtest 0.53150 24.5 1012 29199
scn-deu flores200-devtest 0.48102 19.2 1012 25094
scn-eng flores200-devtest 0.55782 29.6 1012 24721
scn-fra flores200-devtest 0.52773 26.1 1012 28343
scn-por flores200-devtest 0.51894 25.2 1012 26519
scn-spa flores200-devtest 0.45724 17.9 1012 29199
spa-deu flores200-devtest 0.53451 21.5 1012 25094
spa-eng flores200-devtest 0.58896 28.5 1012 24721
spa-fra flores200-devtest 0.57406 27.6 1012 28343
spa-por flores200-devtest 0.55749 25.2 1012 26519
srd-deu flores200-devtest 0.49238 19.9 1012 25094
srd-eng flores200-devtest 0.59392 34.2 1012 24721
srd-fra flores200-devtest 0.54003 27.6 1012 28343
srd-por flores200-devtest 0.53842 27.9 1012 26519
srd-spa flores200-devtest 0.46002 18.2 1012 29199
vec-deu flores200-devtest 0.48795 19.3 1012 25094
vec-eng flores200-devtest 0.56840 30.7 1012 24721
vec-fra flores200-devtest 0.54164 27.3 1012 28343
vec-por flores200-devtest 0.53482 26.2 1012 26519
vec-spa flores200-devtest 0.46588 18.4 1012 29199
fra-deu generaltest2022 0.66476 42.4 2006 37696
fra-deu multi30k_test_2016_flickr 0.61797 32.6 1000 12106
fra-eng multi30k_test_2016_flickr 0.66271 47.2 1000 12955
fra-deu multi30k_test_2017_flickr 0.59701 29.4 1000 10755
fra-eng multi30k_test_2017_flickr 0.69422 50.3 1000 11374
fra-deu multi30k_test_2017_mscoco 0.55509 25.7 461 5158
fra-eng multi30k_test_2017_mscoco 0.67791 48.7 461 5231
fra-deu multi30k_test_2018_flickr 0.55237 24.0 1071 13703
fra-eng multi30k_test_2018_flickr 0.64722 43.8 1071 14689
fra-eng newsdiscusstest2015 0.61385 38.4 1500 26982
fra-deu newssyscomb2009 0.53530 23.7 502 11271
fra-eng newssyscomb2009 0.57297 31.3 502 11818
fra-spa newssyscomb2009 0.60233 34.1 502 12503
ita-deu newssyscomb2009 0.53590 22.4 502 11271
ita-eng newssyscomb2009 0.59976 34.8 502 11818
ita-fra newssyscomb2009 0.61232 33.5 502 12331
ita-spa newssyscomb2009 0.60782 35.3 502 12503
spa-deu newssyscomb2009 0.52853 21.8 502 11271
spa-eng newssyscomb2009 0.57347 31.0 502 11818
spa-fra newssyscomb2009 0.61436 34.3 502 12331
fra-deu newstest2008 0.53180 22.9 2051 47447
fra-eng newstest2008 0.54379 26.5 2051 49380
fra-spa newstest2008 0.58804 33.1 2051 52586
spa-deu newstest2008 0.52221 21.6 2051 47447
spa-eng newstest2008 0.55331 27.9 2051 49380
spa-fra newstest2008 0.58769 32.0 2051 52685
fra-deu newstest2009 0.52771 22.5 2525 62816
fra-eng newstest2009 0.56679 30.2 2525 65399
fra-spa newstest2009 0.58921 32.1 2525 68111
ita-deu newstest2009 0.53022 22.8 2525 62816
ita-eng newstest2009 0.59309 33.8 2525 65399
ita-fra newstest2009 0.59309 32.0 2525 69263
ita-spa newstest2009 0.59760 33.5 2525 68111
spa-deu newstest2009 0.52822 22.3 2525 62816
spa-eng newstest2009 0.56989 30.4 2525 65399
spa-fra newstest2009 0.59150 32.2 2525 69263
fra-deu newstest2010 0.53765 24.0 2489 61503
fra-eng newstest2010 0.59251 32.6 2489 61711
fra-spa newstest2010 0.62480 37.6 2489 65480
spa-deu newstest2010 0.55161 26.0 2489 61503
spa-eng newstest2010 0.61562 36.3 2489 61711
spa-fra newstest2010 0.62021 35.7 2489 66022
fra-deu newstest2011 0.53025 23.1 3003 72981
fra-eng newstest2011 0.59636 32.9 3003 74681
fra-spa newstest2011 0.63203 39.9 3003 79476
spa-deu newstest2011 0.52934 23.3 3003 72981
spa-eng newstest2011 0.59606 33.8 3003 74681
spa-fra newstest2011 0.61079 34.9 3003 80626
fra-deu newstest2012 0.52957 24.0 3003 72886
fra-eng newstest2012 0.59352 33.6 3003 72812
fra-spa newstest2012 0.62641 39.2 3003 79006
spa-deu newstest2012 0.53519 24.6 3003 72886
spa-eng newstest2012 0.62284 37.4 3003 72812
spa-fra newstest2012 0.61076 33.8 3003 78011
fra-deu newstest2013 0.54167 25.4 3000 63737
fra-eng newstest2013 0.59236 34.0 3000 64505
fra-spa newstest2013 0.59347 34.9 3000 70528
spa-deu newstest2013 0.55130 26.3 3000 63737
spa-eng newstest2013 0.60681 34.6 3000 64505
spa-fra newstest2013 0.59816 33.2 3000 70037
fra-eng newstest2014 0.63499 37.9 3003 70708
ron-eng newstest2016 0.63996 39.5 1999 47562
fra-deu newstest2019 0.60468 28.6 1701 36446
fra-deu newstest2020 0.61401 28.8 1619 30265
fra-deu newstest2021 0.65950 39.5 1026 26077
cat-deu ntrex128 0.54096 24.0 1997 48761
cat-eng ntrex128 0.63516 36.5 1997 47673
cat-fra ntrex128 0.56385 28.1 1997 53481
cat-por ntrex128 0.56246 28.7 1997 51631
cat-spa ntrex128 0.61311 35.8 1997 54107
fra-deu ntrex128 0.53059 23.4 1997 48761
fra-eng ntrex128 0.61285 34.7 1997 47673
fra-por ntrex128 0.54075 25.8 1997 51631
fra-spa ntrex128 0.56863 30.6 1997 54107
glg-deu ntrex128 0.53724 23.6 1997 48761
glg-eng ntrex128 0.64481 38.7 1997 47673
glg-fra ntrex128 0.55856 27.8 1997 53481
glg-por ntrex128 0.56322 28.7 1997 51631
glg-spa ntrex128 0.61794 36.8 1997 54107
ita-deu ntrex128 0.54678 25.0 1997 48761
ita-eng ntrex128 0.64636 39.2 1997 47673
ita-fra ntrex128 0.57428 30.0 1997 53481
ita-por ntrex128 0.56858 29.7 1997 51631
ita-spa ntrex128 0.58886 33.0 1997 54107
por-deu ntrex128 0.54833 24.6 1997 48761
por-eng ntrex128 0.65223 39.7 1997 47673
por-fra ntrex128 0.56793 28.9 1997 53481
por-spa ntrex128 0.59218 33.8 1997 54107
ron-deu ntrex128 0.53249 22.4 1997 48761
ron-eng ntrex128 0.61807 33.8 1997 47673
ron-fra ntrex128 0.55575 26.4 1997 53481
ron-por ntrex128 0.55086 27.2 1997 51631
ron-spa ntrex128 0.57787 31.9 1997 54107
spa-deu ntrex128 0.54309 23.8 1997 48761
spa-eng ntrex128 0.64416 37.4 1997 47673
spa-fra ntrex128 0.57320 29.4 1997 53481
spa-por ntrex128 0.56751 29.0 1997 51631
fra-eng tico19-test 0.62364 39.7 2100 56323
fra-por tico19-test 0.58563 34.2 2100 62729
fra-spa tico19-test 0.59556 36.5 2100 66563
por-eng tico19-test 0.74420 51.8 2100 56315
por-fra tico19-test 0.60081 34.5 2100 64661
por-spa tico19-test 0.68156 44.8 2100 66563
spa-eng tico19-test 0.73454 50.3 2100 56315
spa-fra tico19-test 0.60441 34.9 2100 64661
spa-por tico19-test 0.67749 42.7 2100 62729

Citation Information

@article{tiedemann2023democratizing,
  title={Democratizing neural machine translation with {OPUS-MT}},
  author={Tiedemann, J{\"o}rg and Aulamo, Mikko and Bakshandaeva, Daria and Boggia, Michele and Gr{\"o}nroos, Stig-Arne and Nieminen, Tommi and Raganato, Alessandro and Scherrer, Yves and Vazquez, Raul and Virpioja, Sami},
  journal={Language Resources and Evaluation},
  number={58},
  pages={713--755},
  year={2023},
  publisher={Springer Nature},
  issn={1574-0218},
  doi={10.1007/s10579-023-09704-w}
}

@inproceedings{tiedemann-thottingal-2020-opus,
    title = "{OPUS}-{MT} {--} Building open translation services for the World",
    author = {Tiedemann, J{\"o}rg  and Thottingal, Santhosh},
    booktitle = "Proceedings of the 22nd Annual Conference of the European Association for Machine Translation",
    month = nov,
    year = "2020",
    address = "Lisboa, Portugal",
    publisher = "European Association for Machine Translation",
    url = "https://aclanthology.org/2020.eamt-1.61",
    pages = "479--480",
}

@inproceedings{tiedemann-2020-tatoeba,
    title = "The Tatoeba Translation Challenge {--} Realistic Data Sets for Low Resource and Multilingual {MT}",
    author = {Tiedemann, J{\"o}rg},
    booktitle = "Proceedings of the Fifth Conference on Machine Translation",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.wmt-1.139",
    pages = "1174--1182",
}

Acknowledgements

The work is supported by the HPLT project, funded by the European Union’s Horizon Europe research and innovation programme under grant agreement No 101070350. We are also grateful for the generous computational resources and IT infrastructure provided by CSC -- IT Center for Science, Finland, and the EuroHPC supercomputer LUMI.

Model conversion info

  • transformers version: 4.45.1
  • OPUS-MT git hash: 0882077
  • port time: Tue Oct 8 11:57:19 EEST 2024
  • port machine: LM0-400-22516.local
Downloads last month
4
Safetensors
Model size
223M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including Helsinki-NLP/opus-mt-tc-bible-big-itc-deu_eng_fra_por_spa

Evaluation results