FPHam
/

LORA-secrets

Model card Files Files and versions Community

FPHam commited on Oct 26, 2023

Commit

9d83d48

•

1 Parent(s): 034eb5c

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -29,7 +29,7 @@ Edit: It could prevent overfitting though and hence help with generalization. It
 - size of dataset matters when you are finetuning on base, but matters less when finetuning on well finetuned model. - in fact sometimes less is better in that case or you may be ruining a good previous finetuning.
-- alpha = 2x rank seems like something that came from the old times when people had potato VRAM at most. I really don't feel like it makes much sense - it multiplies the weights and that's it. (check the PEFT code) Making things louder, makes also noise louder.
 - my favorite scheduler is warmup, hold for 1 epoch then cosine down for the next 1- x epochs.

 - size of dataset matters when you are finetuning on base, but matters less when finetuning on well finetuned model. - in fact sometimes less is better in that case or you may be ruining a good previous finetuning.
+- alpha = 2x rank seems like something that came from the old times when people had potato VRAM at most and wanted to get there fast. I really don't feel like it makes much sense - it multiplies the weights and that's it. Making things louder, makes also noise louder.
 - my favorite scheduler is warmup, hold for 1 epoch then cosine down for the next 1- x epochs.