sandrarrey commited on
Commit
95ef0ae
1 Parent(s): 8aba977

Update README_English.md

Browse files
Files changed (1) hide show
  1. README_English.md +3 -3
README_English.md CHANGED
@@ -29,12 +29,12 @@ onmt_translate -src input_text -model NOS-MT-en-gl -output ./output_file.txt -r
29
 
30
  **Training**
31
 
32
- In the training we have used authentic and synthetic corpora from [ProxectoNós](https://github.com/proxectonos/corpora). The former are corpora of translations directly produced by human translators. The latter are corpora of english-portuguese translations, which we have converted into english-galician by means of portuguese-galician translation with Opentrad/Apertium and transliteration for out-of-vocabulary words.
33
 
34
  **Training process**
35
 
36
  + Tokenization of the datasets made with linguakit tokeniser https://github.com/citiususc/Linguakit
37
- + The vocabulary for the models was generated through the script [learn_bpe.py](https://github.com/OpenNMT/OpenNMT-py/blob/master/tools/learn_bpe.py) da open NMT
38
  + Using .yaml in this repository you can replicate the training process as follows
39
 
40
  ```bash
@@ -48,7 +48,7 @@ The parameters used for the development of the model can be directly consulted i
48
 
49
  **Evaluation**
50
 
51
- The BLEU evaluation of the models is made with a mixture of internally developed tests (gold1, gold2, test-suite) with other datasets available in Galician (Flores).
52
 
53
  | GOLD 1 | GOLD 2 | FLORES | TEST-SUITE|
54
  | ------------- |:-------------:| -------:|----------:|
 
29
 
30
  **Training**
31
 
32
+ In the training we have used authentic and synthetic corpora from [ProxectoNós](https://github.com/proxectonos/corpora). The former are corpora of translations directly produced by human translators. The latter are corpora of English-Portuguese translations, which we have converted into English-Galician by means of Portuguese-Galician translation with Opentrad/Apertium and transliteration for out-of-vocabulary words.
33
 
34
  **Training process**
35
 
36
  + Tokenization of the datasets made with linguakit tokeniser https://github.com/citiususc/Linguakit
37
+ + The vocabulary for the models was generated through the script [learn_bpe.py](https://github.com/OpenNMT/OpenNMT-py/blob/master/tools/learn_bpe.py) of OpenNMT
38
  + Using .yaml in this repository you can replicate the training process as follows
39
 
40
  ```bash
 
48
 
49
  **Evaluation**
50
 
51
+ The BLEU evaluation of the models is made with a mixture of internally developed tests (gold1, gold2, test-suite) and other datasets available in Galician (Flores).
52
 
53
  | GOLD 1 | GOLD 2 | FLORES | TEST-SUITE|
54
  | ------------- |:-------------:| -------:|----------:|