AmirMohseni's picture
Upload model
e99b726 verified
|
raw
history blame
4.55 kB
---
datasets:
- taesiri/TinyStories-Farsi
library_name: transformers
model_name: LLaMA-3.1-8B-Persian-Instruct
pipeline_tag: text-generation
tags:
- language-model
- fine-tuned
- instruction-following
- PEFT
- LoRA
- BitsAndBytes
- Persian
- Farsi
- text-generation
---
# LLaMA-3.1-8B-Persian-Instruct
This model is a fine-tuned version of the `meta-llama/Meta-Llama-3.1-8B-Instruct` model, specifically tailored for generating and understanding Persian text. The fine-tuning was conducted using the [TinyStories-Farsi](https://huggingface.co./datasets/taesiri/TinyStories-Farsi) dataset, which includes a diverse set of short stories in Persian. The primary goal of this fine-tuning was to enhance the model's performance in instruction-following tasks within the Persian language.
## Model Details
### Model Description
This model is a fine-tuned version of Llama-3.1-8B-Instruct that meta has released. By training this model on persian short stories, the new model gets to understand the relation between English and Persian in a more meaning full way.
- **Developed by:** Meta AI
- **Model type:** Language Model
- **License:** Apache 2.0
- **Base Model:** `meta-llama/Meta-Llama-3.1-8B-Instruct`
### Model Sources
- **Repository:** [Llama-3.1-8B-Instruct on Hugging Face](https://huggingface.co./meta-llama/Meta-Llama-3.1-8B-Instruct)
## Training Details
### Training Data
The model was fine-tuned using the [TinyStories-Farsi](https://huggingface.co./datasets/taesiri/TinyStories-Farsi) dataset. This dataset provided a rich and diverse linguistic context, helping the model better understand and generate text in Persian.
### Training Procedure
The fine-tuning process was conducted using the following setup:
- **Epochs:** 4
- **Batch Size:** 8
- **Gradient Accumulation Steps:** 2
- **Hardware:** NVIDIA A100 GPU
### Fine-Tuning Strategy
To make the fine-tuning process efficient and effective, PEFT (Parameter-Efficient Fine-Tuning) techniques were employed. Specifically, the `BitsAndBytesConfig(load_in_4bit=True)` configuration was used, allowing the model to be fine-tuned in 4-bit precision. This approach significantly reduced the computational resources required while maintaining high performance, resulting in a training time of approximately 2 hours. The use of `BitsAndBytesConfig(load_in_4bit=True)` helped reduce the environmental impact by minimizing the computational resources required.
## Uses
### Direct Use
This model is well-suited for generating text in Persian, particularly for instruction-following tasks. It can be used in applications like chatbots, customer support systems, educational tools, and more where accurate and context-aware Persian language generation is needed.
### Out-of-Scope Use
The model is not intended for tasks requiring deep reasoning, complex multi-turn conversations, or contexts beyond the immediate prompt. It is also not designed for generating text in languages other than Persian.
## How to Get Started with the Model
Here is how you can use this model:
```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Define the base model and the adapter model
base_model = "meta-llama/Meta-Llama-3.1-8B-Instruct"
adapter_model = "AmirMohseni/Llama-3.1-8B-Instruct-Persian-finetuned-sft"
# Load the base model and apply the adapter model using PEFT
model = AutoModelForCausalLM.from_pretrained(base_model, device_map={"": 0})
model = PeftModel.from_pretrained(model, adapter_model)
# Check if CUDA is available, otherwise use CPU
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model)
# Add a new pad token if necessary
if tokenizer.pad_token is None:
tokenizer.add_special_tokens({'pad_token': '[PAD]'}) # Adding a distinct pad token
# Example usage
input_text = "چطوری میتونم به اطلاعات درباره ی سهام شرکت های آمریکایی دست پیدا کنم؟"
# Tokenize the input and get both input IDs and attention mask
inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True)
input_ids = inputs['input_ids'].to(device)
attention_mask = inputs['attention_mask'].to(device)
# Generate text
outputs = model.generate(input_ids, attention_mask=attention_mask, max_length=512, pad_token_id=tokenizer.pad_token_id)
# Decode and print the output
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```