File size: 5,077 Bytes
0a341a4
 
 
9e4c38d
9cb20e7
0a341a4
119bfb9
0a341a4
 
f4ea37e
0a341a4
f4ea37e
9e4c38d
 
 
 
119bfb9
0a341a4
 
ff3d2de
 
0a341a4
9e4c38d
0a341a4
 
119bfb9
 
ff3d2de
119bfb9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f994de4
 
119bfb9
 
 
 
 
ff3d2de
 
 
 
 
119bfb9
 
 
 
 
f994de4
 
119bfb9
 
 
 
 
ff3d2de
 
 
 
 
 
 
 
 
119bfb9
 
01e53a2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0a341a4
 
 
9e4c38d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
---
language:
- en
- hi
license: gemma
tags:
- text-generation
- transformers
- unsloth
- gemma
- trl
base_model: unsloth/gemma-2b-bnb-4bit
datasets:
- yahma/alpaca-cleaned
- ravithejads/samvaad-hi-filtered
- HydraIndicLM/hindi_alpaca_dolly_67k
pipeline_tag: text-generation
---

# 🔥 Gemma-2B-Hinglish-LORA-v1.0 model
### 🚀 Visit this HF Space to try out this model's inference: https://huggingface.co./spaces/kirankunapuli/Gemma-2B-Hinglish-Model-Inference-v1.0

- **Developed by:** [Kiran Kunapuli](https://www.linkedin.com/in/kirankunapuli/)
- **License:** apache-2.0
- **Finetuned from model :** unsloth/gemma-2b-bnb-4bit
- **Model usage:** Use the below code in Python
  ```python
    import re
    import torch
    from transformers import AutoTokenizer, AutoModelForCausalLM
    
    tokenizer = AutoTokenizer.from_pretrained("kirankunapuli/Gemma-2B-Hinglish-LORA-v1.0")
    model = AutoModelForCausalLM.from_pretrained("kirankunapuli/Gemma-2B-Hinglish-LORA-v1.0")

    device = "cuda:0" if torch.cuda.is_available() else "cpu"
    model = model.to(device)

    alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

    ### Instruction:
    {}
    
    ### Input:
    {}
    
    ### Response:
    {}"""

    # Example 1
    inputs = tokenizer(
    [
        alpaca_prompt.format(
            "Please answer the following sentence as requested", # instruction
            "ऐतिहासिक स्मारक India Gate कहाँ स्थित है?", # input
            "", # output - leave this blank for generation!
        )
    ], return_tensors = "pt").to(device)
    
    outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
    output = tokenizer.batch_decode(outputs)[0]
    response_start = output.find("### Response:") + len("### Response:")
    response_end = output.find("<eos>", response_start)
    response = output[response_start:response_end].strip()
    print(response)
    
    # Example 2
    inputs = tokenizer(
    [
        alpaca_prompt.format(
            "Please answer the following sentence as requested", # instruction
            "ऐतिहासिक स्मारक इंडिया गेट कहाँ स्थित है? मुझे अंग्रेजी में बताओ", # input
            "", # output - leave this blank for generation!
        )
    ], return_tensors = "pt").to(device)
    
    outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
    output = tokenizer.batch_decode(outputs)[0]
    response_pattern = re.compile(r'### Response:\n(.*?)<eos>', re.DOTALL)
    response_match = response_pattern.search(output)

    if response_match:
        response = response_match.group(1).strip()
        return response
    else:
        return "Response not found"
  ```
- **Model config:**
  ```python
    model = FastLanguageModel.get_peft_model(
    model,
    r = 16, 
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 32,
    lora_dropout = 0, 
    bias = "none",   
    use_gradient_checkpointing = True, 
    random_state = 42,
    use_rslora = True,  
    loftq_config = None, 
    )
  ```
- **Training parameters:**
  ```python
    trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = True,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 120,
        learning_rate = 2e-4,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 42,
        output_dir = "outputs",
        report_to = "wandb",
      ),
    )
  ```
- **Training details:**
  ```
  ==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
     \\   /|    Num examples = 14,343 | Num Epochs = 1
  O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
  \        /    Total batch size = 8 | Total steps = 120
   "-____-"     Number of trainable parameters = 19,611,648

  GPU = Tesla T4. Max memory = 14.748 GB.
  2118.7553 seconds used for training.
  35.31 minutes used for training.
  Peak reserved memory = 9.172 GB.
  Peak reserved memory for training = 6.758 GB.
  Peak reserved memory % of max memory = 62.191 %.
  Peak reserved memory for training % of max memory = 45.823 %.
  ```

This gemma model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)