lgrobol
/

m2m100_418M_br_fr

text2text-generation

Inference Endpoints

Model card Files Files and versions Community

lgrobol commited on Oct 23, 2022

Commit

bf6456b

•

1 Parent(s): f1d5efc

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -31,7 +31,7 @@ The training dataset consists of:
 - The [Tatoeba corpus v2022-03-03](https://opus.nlpl.eu/Tatoeba-v2022-03-03.php)
 - Part of the [OpenSubtitles corpus v2018](https://opus.nlpl.eu/OpenSubtitles-v2018.php)
-These are obtained from the [OPUS](https://opus.nlpl.eu/) base (Tiedemann, 2012) and filtered using [OpusFilter](https://helsinki-nlp.github.io/OpusFilter), see [`dl_opus.yaml`](dl_opus.yaml) for the details. The filtering is slightly non-deterministic due to the retraining of a statistical alignment model, but in my experience, different runs tend to give extremely similar results.
 ## Training procedure

 - The [Tatoeba corpus v2022-03-03](https://opus.nlpl.eu/Tatoeba-v2022-03-03.php)
 - Part of the [OpenSubtitles corpus v2018](https://opus.nlpl.eu/OpenSubtitles-v2018.php)
+These are obtained from the [OPUS](https://opus.nlpl.eu/) base (Tiedemann, 2012) and filtered using [OpusFilter](https://helsinki-nlp.github.io/OpusFilter), see [`dl_opus.yaml`](dl_opus.yaml) for the details. The filtering is slightly non-deterministic due to the retraining of a statistical alignment model, but in my experience, different runs tend to give extremely similar results. Do not hesitate to reach out if you experience difficulties in using this to collect data.
 ## Training procedure