xmanii commited on
Commit
6107dec
1 Parent(s): 0f6287a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -34
README.md CHANGED
@@ -1,45 +1,60 @@
1
- Model Information
2
 
3
- Developed by: xmanii License: Apache-2.0 Finetuned from model: unsloth/llama-3-8b-instruct-bnb-4bit
 
 
 
4
 
5
- This LLaMA model was fine-tuned on a unique Persian dataset of Alpaca chat conversations, consisting of approximately 8,000 rows. Our training process utilized two H100 GPUs, completing in just under 1 hour. We leveraged the power of Unsloth and Hugging Face's TRL library to accelerate our training process by 2x.
 
 
6
 
7
- <img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>
8
 
9
- This model is open-source, and we invite the community to use and build upon our work. The fine-tuned LLaMA model is designed to improve Persian conversation capabilities, and we hope it will contribute to the advancement of natural language processing in the Persian language.
10
 
11
- Using Adapters with Unsloth
12
- To run the model with adapters, you can use the following code:
13
- (you need unsloth package)
14
- import torch
15
- from unsloth import FastLanguageModel
16
- from unsloth.chat_templates import get_chat_template
17
 
18
- model_save_path = "path to the download folder" # Adjust this path as needed
19
 
20
- model, tokenizer = FastLanguageModel.from_pretrained(
21
- model_name=model_save_path,
22
- max_seq_length=4096,
23
- load_in_4bit=True,
24
- )
25
- FastLanguageModel.for_inference(model) # Enable native 2x faster inference
26
 
27
- tokenizer = get_chat_template(
28
- tokenizer,
29
- chat_template="llama-3", # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth
30
- mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"}, # ShareGPT style
31
- )
32
 
33
- messages = [ {"from": "human", "value": "your prompt"},]
34
- inputs = tokenizer.apply_chat_template(
35
- messages,
36
- tokenize=True,
37
- add_generation_prompt=True, # Must add for generation
38
- return_tensors="pt",
39
- ).to("cuda")
40
 
41
- outputs = model.generate(input_ids=inputs, max_new_tokens=2048, use_cache=True)
42
- response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
43
- print(response)
 
44
 
45
- We are working on quantizing the models and bringing them to ollama.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
 
2
+ model-index:
3
+ - name: xmanii/llama-3-8b-instruct-bnb-4bit-persian
4
+ description: |
5
+ **Model Information**
6
 
7
+ **Developed by:** xmanii
8
+ **License:** Apache-2.0
9
+ **Finetuned from model:** unsloth/llama-3-8b-instruct-bnb-4bit
10
 
11
+ **Model Description**
12
 
13
+ This LLaMA model was fine-tuned on a unique Persian dataset of Alpaca chat conversations, consisting of approximately 8,000 rows. Our training process utilized two H100 GPUs, completing in just under 1 hour. We leveraged the power of Unsloth and Hugging Face's TRL library to accelerate our training process by 2x.
14
 
15
+ ![Unsloth Made with Love](https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png)
 
 
 
 
 
16
 
17
+ **Open-Source Contribution**
18
 
19
+ This model is open-source, and we invite the community to use and build upon our work. The fine-tuned LLaMA model is designed to improve Persian conversation capabilities, and we hope it will contribute to the advancement of natural language processing in the Persian language.
 
 
 
 
 
20
 
21
+ **Using Adapters with Unsloth**
 
 
 
 
22
 
23
+ To run the model with adapters, you can use the following code:
 
 
 
 
 
 
24
 
25
+ ```python
26
+ import torch
27
+ from unsloth import FastLanguageModel
28
+ from unsloth.chat_templates import get_chat_template
29
 
30
+ model_save_path = "path to the download folder" # Adjust this path as needed
31
+
32
+ model, tokenizer = FastLanguageModel.from_pretrained(
33
+ model_name=model_save_path,
34
+ max_seq_length=4096,
35
+ load_in_4bit=True,
36
+ )
37
+ FastLanguageModel.for_inference(model) # Enable native 2x faster inference
38
+
39
+ tokenizer = get_chat_template(
40
+ tokenizer,
41
+ chat_template="llama-3", # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth
42
+ mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"}, # ShareGPT style
43
+ )
44
+
45
+ messages = [{"from": "human", "value": "your prompt"}]
46
+ inputs = tokenizer.apply_chat_template(
47
+ messages,
48
+ tokenize=True,
49
+ add_generation_prompt=True, # Must add for generation
50
+ return_tensors="pt",
51
+ ).to("cuda")
52
+
53
+ outputs = model.generate(input_ids=inputs, max_new_tokens=2048, use_cache=True)
54
+ response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
55
+ print(response)
56
+ ```
57
+
58
+ **Future Work**
59
+
60
+ We are working on quantizing the models and bringing them to ollama.