|
--- |
|
library_name: transformers |
|
license: mit |
|
language: |
|
- fa |
|
tags: |
|
- persian |
|
- mt5-small |
|
- mt5 |
|
- persian translation |
|
- seq2seq |
|
- farsi |
|
--- |
|
# Model Card: English to Persian Translation using MT5-Small |
|
|
|
## Model Details |
|
|
|
**Model Description:** |
|
This model is designed to translate text from English to Persian (Farsi) using the MT5-Small architecture. MT5 is a multilingual variant of the T5 model, pretrained on a diverse set of languages. |
|
|
|
**Intended Use:** |
|
The model is intended for use in applications where automatic translation from English to Persian is required. It can be used for translating documents, web pages, or any other text-based content. |
|
|
|
**Model Architecture:** |
|
- **Model Type:** MT5-Small |
|
- **Language Pair:** English (en) to Persian (fa) |
|
|
|
## Training Data |
|
|
|
**Dataset:** |
|
The model was trained on a dataset consisting of 100,000 parallel sentences of English and Persian text. The data includes various sources to cover a wide range of topics and ensure diversity. |
|
|
|
**Data Preprocessing:** |
|
- Text normalization was performed to ensure consistency. |
|
- Tokenization was done using the SentencePiece tokenizer. |
|
|
|
## Training Procedure |
|
|
|
**Training Configuration:** |
|
- **Number of Epochs:** 4 |
|
- **Batch Size:** 8 |
|
- **Learning Rate:** 5e-5 |
|
- **Optimizer:** AdamW |
|
|
|
**Hardware:** |
|
- **Training Environment:** NVIDIA P100 GPU |
|
- **Training Time:** Approximately 4 hours |
|
|
|
## How To Use |
|
```python |
|
import torch |
|
from transformers import pipeline, MT5ForConditionalGeneration, MT5Tokenizer, Text2TextGenerationPipeline |
|
|
|
# Function to translate using the pipeline |
|
def translate_with_pipeline(text): |
|
translator = Text2TextGenerationPipeline(model='NLPclass/mt5_en_fa_translation',tokenizer='NLPclass/mt5_en_fa_translation') |
|
return translator(text,, max_length=128,num_beams=4)[0]['generated_text'] |
|
|
|
|
|
# Example usage |
|
text = "Hello, how are you?" |
|
|
|
# Using pipeline |
|
print("Pipeline Translation:", translate_with_pipeline(text)) |
|
``` |
|
|
|
|
|
|
|
## Ethical Considerations |
|
|
|
- The model's translations are only as good as the data it was trained on, and biases present in the training data may propagate through the model's outputs. |
|
- Users should be cautious when using the model for critical tasks, as automatic translations can sometimes be inaccurate or misleading. |
|
|
|
## Citation |
|
|
|
If you use this model in your research or applications, please cite it as follows: |
|
|
|
```bibtex |
|
@misc{mt5_en_fa_translation, |
|
author = {mansoorhamidzadeh}, |
|
title = {English to Persian Translation using MT5-Small}, |
|
year = {2024}, |
|
publisher = {Hugging Face}, |
|
howpublished = {\url{https://huggingface.co./mansoorhamidzadeh/mt5_en_fa_translation}}, |
|
} |