aidal
/

Persian-Mistral-7B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

aidal commited on Apr 13

Commit

afc9f0c

•

1 Parent(s): a5d14ed

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -63,7 +63,7 @@ print(tokenizer.decode(outputs[0]))
 # Training and finetuning
 - **Extend tokenzer:** The base Mistral tokenizer does not support Persian. As an initial step, we trained a SentencePiece tokenizer on the Farsi Wikipedia corpus and subsequently integrated it with the Mistral tokenizer.
 - **Pre-training:**  In the following step, we expanded the embedding layer of the base model to match the size of the Persian tokenizer. We then employed the LoRA method to train the model on three distinct datasets: Wikipedia-Farsi, an Islamic book collection, and content from Khamenei.ir.
-- <p align="center">
   <picture>
     <img alt="Hugging Face Transformers Library" src="https://i.postimg.cc/LXSD4HnZ/Stakehozlder-Map-1-page-0001-modified.png" width="270" height="270" style="max-width: 100%;">
   </picture>

 # Training and finetuning
 - **Extend tokenzer:** The base Mistral tokenizer does not support Persian. As an initial step, we trained a SentencePiece tokenizer on the Farsi Wikipedia corpus and subsequently integrated it with the Mistral tokenizer.
 - **Pre-training:**  In the following step, we expanded the embedding layer of the base model to match the size of the Persian tokenizer. We then employed the LoRA method to train the model on three distinct datasets: Wikipedia-Farsi, an Islamic book collection, and content from Khamenei.ir.
+<p align="center">
   <picture>
     <img alt="Hugging Face Transformers Library" src="https://i.postimg.cc/LXSD4HnZ/Stakehozlder-Map-1-page-0001-modified.png" width="270" height="270" style="max-width: 100%;">
   </picture>