Opus Tatoeba English-German

*This model was obtained by running the script convert_marian_to_pytorch.py - Instruction available here. The original models were trained by J�rg Tiedemann using the MarianNMT library. See all available MarianMTModel models on the profile of the Helsinki NLP group.

This is the conversion of checkpoint opus-2021-02-22.zip *


eng-deu

  • source language name: English

  • target language name: German

  • OPUS readme: README.md

  • model: transformer

  • source language code: en

  • target language code: de

  • dataset: opus

  • release date: 2021-02-22

  • pre-processing: normalization + SentencePiece (spm32k,spm32k)

  • download original weights: opus-2021-02-22.zip

  • Training data:

    • deu-eng: Tatoeba-train (86845165)
  • Validation data:

    • deu-eng: Tatoeba-dev, 284809
    • total-size-shuffled: 284809
    • devset-selected: top 5000 lines of Tatoeba-dev.src.shuffled!
  • Test data:

    • newssyscomb2009.eng-deu: 502/11271
    • news-test2008.eng-deu: 2051/47427
    • newstest2009.eng-deu: 2525/62816
    • newstest2010.eng-deu: 2489/61511
    • newstest2011.eng-deu: 3003/72981
    • newstest2012.eng-deu: 3003/72886
    • newstest2013.eng-deu: 3000/63737
    • newstest2014-deen.eng-deu: 3003/62964
    • newstest2015-ende.eng-deu: 2169/44260
    • newstest2016-ende.eng-deu: 2999/62670
    • newstest2017-ende.eng-deu: 3004/61291
    • newstest2018-ende.eng-deu: 2998/64276
    • newstest2019-ende.eng-deu: 1997/48969
    • Tatoeba-test.eng-deu: 10000/83347
  • test set translations file: test.txt

  • test set scores file: eval.txt

  • BLEU-scores

    Test set score
    newstest2018-ende.eng-deu 46.4
    Tatoeba-test.eng-deu 45.8
    newstest2019-ende.eng-deu 42.4
    newstest2016-ende.eng-deu 37.9
    newstest2015-ende.eng-deu 32.0
    newstest2017-ende.eng-deu 30.6
    newstest2014-deen.eng-deu 29.6
    newstest2013.eng-deu 27.6
    newstest2010.eng-deu 25.9
    news-test2008.eng-deu 23.9
    newstest2012.eng-deu 23.8
    newssyscomb2009.eng-deu 23.3
    newstest2011.eng-deu 22.9
    newstest2009.eng-deu 22.7
  • chr-F-scores

    Test set score
    newstest2018-ende.eng-deu 0.697
    newstest2019-ende.eng-deu 0.664
    Tatoeba-test.eng-deu 0.655
    newstest2016-ende.eng-deu 0.644
    newstest2015-ende.eng-deu 0.601
    newstest2014-deen.eng-deu 0.595
    newstest2017-ende.eng-deu 0.593
    newstest2013.eng-deu 0.558
    newstest2010.eng-deu 0.55
    newssyscomb2009.eng-deu 0.539
    news-test2008.eng-deu 0.533
    newstest2009.eng-deu 0.533
    newstest2012.eng-deu 0.53
    newstest2011.eng-deu 0.528
Downloads last month
15
Safetensors
Model size
111M params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Evaluation results