Idefics2-pretraining
Hi,
There does not seem to be any support for pre-training.
When I try, there seems to be some instability with the Connector. How did you initialize your weights?
Hi
@orrzohar
can you say more about the instability you are seeing?
our initialization scheme for newly initialized parameters is rather standard. the code snippet below should give you a good idea:
if isinstance(module, MLP):
for sub_module_name, sub_module in module.named_modules():
if isinstance(sub_module, nn.Linear):
factor = 1.0
if "down_proj" in sub_module_name:
factor = 2.0
init_a_linear(sub_module, std=(0.4 / (self.config.hidden_size * factor)) ** 0.5)
Hi Victor,
Thank you for your response!
What I am seeing is that the loss initially decreases, but then NaN's are detected after the "connector" (MLP+Perceiver Pooler). I have tried xavier_uniform_/kaiming_uniform_ for all the connector whieghts -- but was unsuccessful.
I have tried the obvious -- varying batch sizes/learning rates (2-1000 and 1e-3-1e-6).
It is extremely regular -- seems to happen at the same iteration for the same batch size, no matter the learning rate. The only time this does not occur is when using batch size=1.
Have you ever experienced similar/how did you debug?
Best,
Orr
indeed nan are never a good sign....
before I answer, a few question:
- are you fine-tuning or training from scratch?
- what data?
- mixed precision? what precision?
- is it specifically after the connector? any details as to where in the connector?
Hi @VictorSanh ,
- I am training from scratch
- LLaVA 1.5
- BF16
- It is usually in the MLP of the Idefics2PerceiverLayer, usually after "gate_proj", very rarely after "down_proj".
I tried your initialization code, increasing the batch size to 4096 and reducing lr to 1r-06, but with no luck. When interrogating the issue further, I noticed that the 'latents' remain all-ones even when training persists to a few 100 iterations. I am sure that the parameters are added to the optimizer. I tried randomly initializing those instead, but that did not solve the issue.
Best,
Orr