|
|
|
model-index: |
|
- name: xmanii/llama-3-8b-instruct-bnb-4bit-persian |
|
description: | |
|
**Model Information** |
|
|
|
**Developed by:** xmanii |
|
**License:** Apache-2.0 |
|
**Finetuned from model:** unsloth/llama-3-8b-instruct-bnb-4bit |
|
|
|
**Model Description** |
|
|
|
This LLaMA model was fine-tuned on a unique Persian dataset of Alpaca chat conversations, consisting of approximately 8,000 rows. Our training process utilized two H100 GPUs, completing in just under 1 hour. We leveraged the power of Unsloth and Hugging Face's TRL library to accelerate our training process by 2x. |
|
|
|
![Unsloth Made with Love](https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png) |
|
|
|
**Training Resources** |
|
|
|
* 2x H100 GPUs |
|
* Unsloth and Hugging Face's TRL library |
|
|
|
**Dataset** |
|
|
|
* Unique Persian dataset of Alpaca chat conversations |
|
* Approximately 8,000 rows |
|
|
|
**Open-Source Contribution** |
|
|
|
This model is open-source, and we invite the community to use and build upon our work. The fine-tuned LLaMA model is designed to improve Persian conversation capabilities, and we hope it will contribute to the advancement of natural language processing in the Persian language. |
|
|
|
**Using Adapters with Unsloth** |
|
|
|
To run the model with adapters, you can use the following code: |
|
|
|
```python |
|
import torch |
|
from unsloth import FastLanguageModel |
|
from unsloth.chat_templates import get_chat_template |
|
|
|
model_save_path = "path to the download folder" #the hugging face folder path pulled. |
|
|
|
model, tokenizer = FastLanguageModel.from_pretrained( |
|
model_name=model_save_path, |
|
max_seq_length=4096, |
|
load_in_4bit=True, |
|
) |
|
FastLanguageModel.for_inference(model) # Enable native 2x faster inference |
|
|
|
tokenizer = get_chat_template( |
|
tokenizer, |
|
chat_template="llama-3", # use the llama-3 template |
|
mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"}, # mapping the messages. |
|
) |
|
|
|
messages = [{"from": "human", "value": "your prompt"}]#add your prompt here as human |
|
inputs = tokenizer.apply_chat_template( |
|
messages, |
|
tokenize=True, |
|
add_generation_prompt=True, # Must add for generation |
|
return_tensors="pt", |
|
).to("cuda") |
|
|
|
outputs = model.generate(input_ids=inputs, max_new_tokens=2048, use_cache=True) |
|
response = tokenizer.batch_decode(outputs, skip_special_tokens=True) |
|
print(response) |
|
``` |
|
|
|
**Full 16-bit Merged Model** |
|
|
|
For a full 16-bit merged model, please check out xmanii/Llama3-8b-simorgh-16bit. |
|
|
|
**Future Work** |
|
|
|
We are working on quantizing the models and bringing them to ollama. |