File size: 2,622 Bytes
0e23caa
 
6d0d567
 
 
 
 
 
 
 
 
 
0e23caa
6d0d567
0e23caa
 
 
6d0d567
 
0e23caa
6d0d567
 
0e23caa
6d0d567
 
 
0e23caa
6d0d567
0e23caa
6d0d567
 
0e23caa
6d0d567
 
 
0e23caa
6d0d567
0e23caa
6d0d567
 
 
 
 
0e23caa
6d0d567
 
 
0e23caa
6d0d567
 
 
 
0e23caa
6d0d567
 
 
 
0e23caa
 
6d0d567
 
0e23caa
6d0d567
 
 
0e23caa
 
 
6d0d567
0e23caa
6d0d567
 
0e23caa
6d0d567
0e23caa
6d0d567
0e23caa
6d0d567
a36a50c
6d0d567
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
library_name: transformers
license: mit
language:
- fa
tags:
- persian
- mt5-small
- mt5
- persian translation
- seq2seq
- farsi
---
# Model Card: English to Persian Translation using MT5-Small

## Model Details

**Model Description:**
This model is designed to translate text from English to Persian (Farsi) using the MT5-Small architecture. MT5 is a multilingual variant of the T5 model, pretrained on a diverse set of languages.

**Intended Use:**
The model is intended for use in applications where automatic translation from English to Persian is required. It can be used for translating documents, web pages, or any other text-based content.

**Model Architecture:**
- **Model Type:** MT5-Small
- **Language Pair:** English (en) to Persian (fa)

## Training Data

**Dataset:**
The model was trained on a dataset consisting of 100,000 parallel sentences of English and Persian text. The data includes various sources to cover a wide range of topics and ensure diversity.

**Data Preprocessing:**
- Text normalization was performed to ensure consistency.
- Tokenization was done using the SentencePiece tokenizer.

## Training Procedure

**Training Configuration:**
- **Number of Epochs:** 4
- **Batch Size:** 8
- **Learning Rate:** 5e-5
- **Optimizer:** AdamW

**Hardware:**
- **Training Environment:** NVIDIA P100 GPU
- **Training Time:** Approximately 4 hours

## How To Use
```python
import torch
from transformers import pipeline, MT5ForConditionalGeneration, MT5Tokenizer, Text2TextGenerationPipeline

# Function to translate using the pipeline
def translate_with_pipeline(text):
    translator = Text2TextGenerationPipeline(model='NLPclass/mt5_en_fa_translation',tokenizer='NLPclass/mt5_en_fa_translation')
    return translator(text,, max_length=128,num_beams=4)[0]['generated_text']


# Example usage
text = "Hello, how are you?"

# Using pipeline
print("Pipeline Translation:", translate_with_pipeline(text))
```



## Ethical Considerations

- The model's translations are only as good as the data it was trained on, and biases present in the training data may propagate through the model's outputs.
- Users should be cautious when using the model for critical tasks, as automatic translations can sometimes be inaccurate or misleading.

## Citation

If you use this model in your research or applications, please cite it as follows:

```bibtex
@misc{mt5_en_fa_translation,
  author = {mansoorhamidzadeh},
  title = {English to Persian Translation using MT5-Small},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co./mansoorhamidzadeh/mt5_en_fa_translation}},
}