language: | |
- en | |
- tok | |
- multilingual | |
license: apache-2.0 | |
tags: | |
- generated_from_trainer | |
- translation | |
widget: | |
- text: Hello, my name is Tom. | |
- text: Can the cat speak English? | |
base_model: Helsinki-NLP/opus-mt-en-ROMANCE | |
model-index: | |
- name: en-toki-mt | |
results: [] | |
# en-toki-mt | |
This model is a fine-tuned version of [Helsinki-NLP/opus-mt-en-ROMANCE](https://huggingface.co./Helsinki-NLP/opus-mt-en-ROMANCE) on the English - toki pona translation dataset on Tatoeba. | |
## Model description | |
toki pona is a minimalist constructed language created in 2014 by Sonja Lang. The language features a very small volcabulary (~130 words) and a very simple grammar structure. | |
## Intended uses & limitations | |
This model aims to translate English to Toki pona. | |
## Training and evaluation data | |
The training data is acquired from all En-Toki sentence pairs on [Tatoeba](https://tatoeba.org/en) (~20000 pairs), without any filtering. Since this dataset mostly only includes core words (pu), it may produce inaccurate results when encountering more complex words. The model achieved a BLEU score of 54 on the testing set. | |
## Training procedure | |
### Training hyperparameters | |
The following hyperparameters were used during training: | |
- learning_rate: 2e-05 | |
- train_batch_size: 16 | |
- eval_batch_size: 16 | |
- seed: 42 | |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 | |
- lr_scheduler_type: linear | |
- num_epochs: 10 | |
- mixed_precision_training: Native AMP | |
### Framework versions | |
- Transformers 4.20.1 | |
- Pytorch 1.11.0 | |
- Datasets 2.3.2 | |
- Tokenizers 0.12.1 | |