01GangaPutraBheeshma
commited on
Commit
•
3a57615
1
Parent(s):
df3f221
Update README.md
Browse files
README.md
CHANGED
@@ -31,6 +31,8 @@ colab_code_generator_FT_code_gen_UT, an instruction-following large language mod
|
|
31 |
|
32 |
# Getting Started
|
33 |
|
|
|
|
|
34 |
Loading the fine-tuned Code Generator
|
35 |
```
|
36 |
from peft import AutoPeftModelForCausalLM>
|
@@ -38,6 +40,50 @@ test_model_UT = AutoPeftModelForCausalLM.from_pretrained("01GangaPutraBheeshma/c
|
|
38 |
test_tokenizer_UT = AutoTokenizer.from_pretrained("01GangaPutraBheeshma/colab_code_generator_FT_code_gen_UT")
|
39 |
```
|
40 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
41 |
# Documentation
|
42 |
|
43 |
This model was fine-tuned using LoRA because I wanted the model's weights to be efficient in solving other types of Python problems(Ones that were not included in the training data).
|
@@ -75,3 +121,4 @@ bnb_config = BitsAndBytesConfig(
|
|
75 |
|
76 |
|
77 |
|
|
|
|
31 |
|
32 |
# Getting Started
|
33 |
|
34 |
+
|
35 |
+
## Installation
|
36 |
Loading the fine-tuned Code Generator
|
37 |
```
|
38 |
from peft import AutoPeftModelForCausalLM>
|
|
|
40 |
test_tokenizer_UT = AutoTokenizer.from_pretrained("01GangaPutraBheeshma/colab_code_generator_FT_code_gen_UT")
|
41 |
```
|
42 |
|
43 |
+
## Usage
|
44 |
+
For re-training this model, I would highly recommend using this format to provide input to the tokenizer.
|
45 |
+
|
46 |
+
```
|
47 |
+
def prompt_instruction_format(sample):
|
48 |
+
return f"""### Instruction:
|
49 |
+
Use the Task below and the Input given to write the Response, which is a programming code that can solve the following Task:
|
50 |
+
|
51 |
+
### Task:
|
52 |
+
{sample['instruction']}
|
53 |
+
|
54 |
+
### Input:
|
55 |
+
{sample['input']}
|
56 |
+
|
57 |
+
### Response:
|
58 |
+
{sample['output']}
|
59 |
+
|
60 |
+
```
|
61 |
+
|
62 |
+
Then, we can leverage the above function to format our input prompts that can be pre-processed and used in the Model Training using Supervised Fine-Tuning or SFTTrainer Class.
|
63 |
+
|
64 |
+
```
|
65 |
+
trainer = SFTTrainer(
|
66 |
+
model=model,
|
67 |
+
train_dataset=code_dataset,
|
68 |
+
peft_config=peft_config,
|
69 |
+
max_seq_length=2048,
|
70 |
+
tokenizer=tokenizer,
|
71 |
+
packing=True,
|
72 |
+
formatting_func=prompt_instruction_format,
|
73 |
+
args=trainingArgs,
|
74 |
+
)
|
75 |
+
|
76 |
+
```
|
77 |
+
|
78 |
+
This is a crucial step when we perform Reinforcement Learning with Human Feedback or RLHF for short. Here are the six reasons why its important:
|
79 |
+
1. Sample Efficiency
|
80 |
+
2. Task Adaptation
|
81 |
+
3. Transfer Learning
|
82 |
+
4. Human Guidance
|
83 |
+
5. Reducing Exploration Challenges
|
84 |
+
6. Addressing Distribution Shift
|
85 |
+
|
86 |
+
|
87 |
# Documentation
|
88 |
|
89 |
This model was fine-tuned using LoRA because I wanted the model's weights to be efficient in solving other types of Python problems(Ones that were not included in the training data).
|
|
|
121 |
|
122 |
|
123 |
|
124 |
+
|