|
--- |
|
license: apache-2.0 |
|
language: |
|
- fa |
|
- en |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
datasets: |
|
- myrkur/persian-alpaca-deep-clean |
|
--- |
|
|
|
# Paya (aya 23 8B Instruction Tuned on Farsi) |
|
|
|
<a href="https://ibb.co/fHmCngh"><img src="https://i.ibb.co/jD7LWNc/paya.png" alt="paya" border="0"></a> |
|
|
|
|
|
Welcome to PAYA, a powerful Persian text generation model built upon the foundations of Aya 23 8B, a multilingual language model. PAYA has been fine-tuned using the supervised finetuning technique, employing the DORA method for efficient refinement on Persian datasets, particularly leveraging the [persian-alpaca-deep-clean](https://huggingface.co./datasets/myrkur/persian-alpaca-deep-clean) dataset. |
|
|
|
## Features |
|
|
|
- **Advanced Text Generation**: Generate coherent and contextually relevant Persian text with ease. |
|
- **Efficient Fine-Tuning**: Utilizes the DORA method for streamlined fine-tuning on Persian datasets. |
|
- **Optimized Tokenization**: The model's tokenizer ensures accurate representation of Persian words, enhancing the quality of generated text. |
|
|
|
## Usage |
|
|
|
You can quickly get started with PAYA using the following sample code: |
|
|
|
```python |
|
import transformers |
|
import torch |
|
|
|
model_id = "myrkur/paya" |
|
|
|
pipeline = transformers.pipeline( |
|
"text-generation", |
|
model=model_id, |
|
model_kwargs={"torch_dtype": torch.bfloat16}, |
|
device_map="auto", |
|
) |
|
|
|
messages = [ |
|
{"role": "user", "content": "علم بهتر است یا ثروت؟"}, |
|
] |
|
|
|
prompt = pipeline.tokenizer.apply_chat_template( |
|
messages, |
|
tokenize=False, |
|
add_generation_prompt=True |
|
) |
|
|
|
terminators = [ |
|
pipeline.tokenizer.eos_token_id, |
|
] |
|
|
|
outputs = pipeline( |
|
prompt, |
|
max_new_tokens=512, |
|
eos_token_id=terminators, |
|
do_sample=True, |
|
temperature=0.4, |
|
top_p=0.9, |
|
repetition_penalty=1.1 |
|
) |
|
print(outputs[0]["generated_text"][len(prompt):]) |
|
``` |
|
|
|
## Why PAYA? |
|
|
|
PAYA stands out for its exceptional tokenization capabilities, accurately capturing the nuances of the Persian language. Additionally, its fine-tuned parameters and efficient training methodology ensure remarkable results in text generation tasks. |
|
|
|
## Contributions |
|
|
|
Contributions to PAYA are welcome! Whether it's enhancing the model's capabilities, improving its performance on specific tasks, or evaluating its performance, your contributions can help advance Persian natural language processing. |
|
|
|
## Contact |
|
For questions or further information, please contact: |
|
|
|
- Amir Masoud Ahmadi: [[email protected]](mailto:[email protected]) |
|
- Sahar Mirzapour: [[email protected]](mailto:[email protected]) |