milmor commited on
Commit
9833d5a
1 Parent(s): f54e327

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +1 -1
app.py CHANGED
@@ -6,7 +6,7 @@ os.environ["TOKENIZERS_PARALLELISM"] = "false"
6
 
7
  article='''
8
  # Spanish Nahuatl Automatic Translation
9
- Nahuatl is the most widely spoken indigenous language in Mexico. However, training a neural network for the neural machine translation task is challenging due to the lack of structured data. The most popular datasets, such as the Axolot and bible-corpus, only consist of ~16,000 and ~7,000 samples, respectively. Moreover, there are multiple variants of Nahuatl, which makes this task even more difficult. For example, it is possible to find a single word from the Axolot dataset written in more than three different ways. Therefore, we leverage the T5 text-to-text prefix training strategy to compensate for the lack of data. We first train the multilingual model to learn Spanish and then adapt the model to Nahuatl. The resulting model successfully translates short sentences. Finally, we report Chrf and BLEU results.
10
 
11
  ## Motivation
12
 
 
6
 
7
  article='''
8
  # Spanish Nahuatl Automatic Translation
9
+ Nahuatl is the most widely spoken indigenous language in Mexico. However, training a neural network for the neural machine translation task is challenging due to the lack of structured data. The most popular datasets, such as the Axolot and bible-corpus, only consist of ~16,000 and ~7,000 samples, respectively. Moreover, there are multiple variants of Nahuatl, which makes this task even more difficult. For example, it is possible to find a single word from the Axolot dataset written in more than three different ways. Therefore, we leverage the T5 text-to-text prefix training strategy to compensate for the lack of data. We first train the multilingual model to learn Spanish and then adapt it to Nahuatl. The resulting T5 Transformer successfully translates short sentences. Finally, we report Chrf and BLEU results.
10
 
11
  ## Motivation
12