utrobinmv's picture
feat add readme
6956362
|
raw
history blame
2.18 kB
---
language:
- ru
- zh
- en
tags:
- translation
license: apache-2.0
datasets:
- ccmatrix
metrics:
- sacrebleu
---
# T5 English, Russian and Chinese multilingual machine translation
This model represents a conventional T5 transformer in multitasking mode for translation into the required language, precisely configured for machine translation for pairs: ru-zh, zh-ru, en-zh, zh-en, en-ru, ru-en.
The model can perform direct translation between any pair of Russian, Chinese or English languages. For translation into the target language, the target language identifier is specified as a prefix 'translate to <lang>:'. In this case, the source language may not be specified, in addition, the source text may be multilingual.
Example translate Russian to Chinese
```python
from transformers import T5ForConditionalGeneration, T5Tokenizer
model_name = 'utrobinmv/t5_translate_en_ru_zh_large_1024'
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)
prefix = 'translate to zh: '
src_text = prefix + "Съешь ещё этих мягких французских булок."
# translate Russian to Chinese
input_ids = tokenizer(src_text, return_tensors="pt")
generated_tokens = model.generate(**input_ids)
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
# 再吃这些法国的甜蜜的面包。
```
and Example translate Chinese to Russian
```python
from transformers import T5ForConditionalGeneration, T5Tokenizer
model_name = 'utrobinmv/t5_translate_en_ru_zh_large_1024'
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)
prefix = 'translate to ru: '
src_text = prefix + "再吃这些法国的甜蜜的面包。"
# translate Russian to Chinese
input_ids = tokenizer(src_text, return_tensors="pt")
generated_tokens = model.generate(**input_ids)
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
# Съешьте этот сладкий хлеб из Франции.
```
##
## Languages covered
Russian (ru_RU), Chinese (zh_CN), English (en_US)