File size: 2,479 Bytes
0f6287a 6107dec 0f6287a 6107dec 0f6287a 6107dec 0f6287a 6107dec 0f6287a 6107dec 0f6287a 6107dec 0f6287a 6107dec 0f6287a 6107dec 0f6287a 6107dec 0f6287a 6107dec 0f6287a 00f4a22 6107dec 00f4a22 6107dec 00f4a22 6107dec |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
model-index:
- name: xmanii/llama-3-8b-instruct-bnb-4bit-persian
description: |
**Model Information**
**Developed by:** xmanii
**License:** Apache-2.0
**Finetuned from model:** unsloth/llama-3-8b-instruct-bnb-4bit
**Model Description**
This LLaMA model was fine-tuned on a unique Persian dataset of Alpaca chat conversations, consisting of approximately 8,000 rows. Our training process utilized two H100 GPUs, completing in just under 1 hour. We leveraged the power of Unsloth and Hugging Face's TRL library to accelerate our training process by 2x.
![Unsloth Made with Love](https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png)
**Open-Source Contribution**
This model is open-source, and we invite the community to use and build upon our work. The fine-tuned LLaMA model is designed to improve Persian conversation capabilities, and we hope it will contribute to the advancement of natural language processing in the Persian language.
**Using Adapters with Unsloth**
To run the model with adapters, you can use the following code:
```python
import torch
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
model_save_path = "path to the download folder" #the hugging face folder path pulled.
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=model_save_path,
max_seq_length=4096,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
tokenizer = get_chat_template(
tokenizer,
chat_template="llama-3", # use the llama-3 template
mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"}, # mapping the messages.
)
messages = [{"from": "human", "value": "your prompt"}]#add your prompt here as human
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True, # Must add for generation
return_tensors="pt",
).to("cuda")
outputs = model.generate(input_ids=inputs, max_new_tokens=2048, use_cache=True)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(response)
```
**Future Work**
We are working on quantizing the models and bringing them to ollama. |