File size: 2,479 Bytes
0f6287a
6107dec
 
 
 
0f6287a
6107dec
 
 
0f6287a
6107dec
0f6287a
6107dec
0f6287a
6107dec
0f6287a
6107dec
0f6287a
6107dec
0f6287a
6107dec
0f6287a
6107dec
0f6287a
6107dec
 
 
 
0f6287a
00f4a22
6107dec
 
 
 
 
 
 
 
 
 
00f4a22
 
6107dec
 
00f4a22
6107dec
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

model-index:
  - name: xmanii/llama-3-8b-instruct-bnb-4bit-persian
    description: |
      **Model Information**

      **Developed by:** xmanii
      **License:** Apache-2.0
      **Finetuned from model:** unsloth/llama-3-8b-instruct-bnb-4bit

      **Model Description**

      This LLaMA model was fine-tuned on a unique Persian dataset of Alpaca chat conversations, consisting of approximately 8,000 rows. Our training process utilized two H100 GPUs, completing in just under 1 hour. We leveraged the power of Unsloth and Hugging Face's TRL library to accelerate our training process by 2x.

      ![Unsloth Made with Love](https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png)

      **Open-Source Contribution**

      This model is open-source, and we invite the community to use and build upon our work. The fine-tuned LLaMA model is designed to improve Persian conversation capabilities, and we hope it will contribute to the advancement of natural language processing in the Persian language.

      **Using Adapters with Unsloth**

      To run the model with adapters, you can use the following code:

      ```python
      import torch
      from unsloth import FastLanguageModel
      from unsloth.chat_templates import get_chat_template

      model_save_path = "path to the download folder"  #the hugging face folder path pulled.

      model, tokenizer = FastLanguageModel.from_pretrained(
          model_name=model_save_path,
          max_seq_length=4096,
          load_in_4bit=True,
      )
      FastLanguageModel.for_inference(model)  # Enable native 2x faster inference

      tokenizer = get_chat_template(
          tokenizer,
          chat_template="llama-3",  # use the llama-3 template
          mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"},  # mapping the messages.
      )

      messages = [{"from": "human", "value": "your prompt"}]#add your prompt here as human
      inputs = tokenizer.apply_chat_template(
          messages,
          tokenize=True,
          add_generation_prompt=True,  # Must add for generation
          return_tensors="pt",
      ).to("cuda")

      outputs = model.generate(input_ids=inputs, max_new_tokens=2048, use_cache=True)
      response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
      print(response)
      ```

      **Future Work**

      We are working on quantizing the models and bringing them to ollama.