01GangaPutraBheeshma commited on
Commit
3a57615
1 Parent(s): df3f221

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -0
README.md CHANGED
@@ -31,6 +31,8 @@ colab_code_generator_FT_code_gen_UT, an instruction-following large language mod
31
 
32
  # Getting Started
33
 
 
 
34
  Loading the fine-tuned Code Generator
35
  ```
36
  from peft import AutoPeftModelForCausalLM>
@@ -38,6 +40,50 @@ test_model_UT = AutoPeftModelForCausalLM.from_pretrained("01GangaPutraBheeshma/c
38
  test_tokenizer_UT = AutoTokenizer.from_pretrained("01GangaPutraBheeshma/colab_code_generator_FT_code_gen_UT")
39
  ```
40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  # Documentation
42
 
43
  This model was fine-tuned using LoRA because I wanted the model's weights to be efficient in solving other types of Python problems(Ones that were not included in the training data).
@@ -75,3 +121,4 @@ bnb_config = BitsAndBytesConfig(
75
 
76
 
77
 
 
 
31
 
32
  # Getting Started
33
 
34
+
35
+ ## Installation
36
  Loading the fine-tuned Code Generator
37
  ```
38
  from peft import AutoPeftModelForCausalLM>
 
40
  test_tokenizer_UT = AutoTokenizer.from_pretrained("01GangaPutraBheeshma/colab_code_generator_FT_code_gen_UT")
41
  ```
42
 
43
+ ## Usage
44
+ For re-training this model, I would highly recommend using this format to provide input to the tokenizer.
45
+
46
+ ```
47
+ def prompt_instruction_format(sample):
48
+ return f"""### Instruction:
49
+ Use the Task below and the Input given to write the Response, which is a programming code that can solve the following Task:
50
+
51
+ ### Task:
52
+ {sample['instruction']}
53
+
54
+ ### Input:
55
+ {sample['input']}
56
+
57
+ ### Response:
58
+ {sample['output']}
59
+
60
+ ```
61
+
62
+ Then, we can leverage the above function to format our input prompts that can be pre-processed and used in the Model Training using Supervised Fine-Tuning or SFTTrainer Class.
63
+
64
+ ```
65
+ trainer = SFTTrainer(
66
+ model=model,
67
+ train_dataset=code_dataset,
68
+ peft_config=peft_config,
69
+ max_seq_length=2048,
70
+ tokenizer=tokenizer,
71
+ packing=True,
72
+ formatting_func=prompt_instruction_format,
73
+ args=trainingArgs,
74
+ )
75
+
76
+ ```
77
+
78
+ This is a crucial step when we perform Reinforcement Learning with Human Feedback or RLHF for short. Here are the six reasons why its important:
79
+ 1. Sample Efficiency
80
+ 2. Task Adaptation
81
+ 3. Transfer Learning
82
+ 4. Human Guidance
83
+ 5. Reducing Exploration Challenges
84
+ 6. Addressing Distribution Shift
85
+
86
+
87
  # Documentation
88
 
89
  This model was fine-tuned using LoRA because I wanted the model's weights to be efficient in solving other types of Python problems(Ones that were not included in the training data).
 
121
 
122
 
123
 
124
+