Update README.md
Browse files
README.md
CHANGED
@@ -29,7 +29,7 @@ Edit: It could prevent overfitting though and hence help with generalization. It
|
|
29 |
|
30 |
- size of dataset matters when you are finetuning on base, but matters less when finetuning on well finetuned model. - in fact sometimes less is better in that case or you may be ruining a good previous finetuning.
|
31 |
|
32 |
-
- alpha = 2x rank seems like something that came from the old times when people had potato VRAM at most. I really don't feel like it makes much sense - it multiplies the weights and that's it.
|
33 |
|
34 |
- my favorite scheduler is warmup, hold for 1 epoch then cosine down for the next 1- x epochs.
|
35 |
|
|
|
29 |
|
30 |
- size of dataset matters when you are finetuning on base, but matters less when finetuning on well finetuned model. - in fact sometimes less is better in that case or you may be ruining a good previous finetuning.
|
31 |
|
32 |
+
- alpha = 2x rank seems like something that came from the old times when people had potato VRAM at most and wanted to get there fast. I really don't feel like it makes much sense - it multiplies the weights and that's it. Making things louder, makes also noise louder.
|
33 |
|
34 |
- my favorite scheduler is warmup, hold for 1 epoch then cosine down for the next 1- x epochs.
|
35 |
|