Why is the weight separated?
Thanks for the great work!
I have some question about the model.
Why is the model separated?
I saw the model has lora_weights not merged and the code for loading the model sets the lora adapter and not merging. What is the purpose of this? Does merginig the wieghts harm the performance?Is the lora weights the instruted tuned weight?
Thank you for your question!
Currently we separate base weight, vision lora weights, and speech lora weights, and use
set_lora_adapter
(https://huggingface.co./microsoft/Phi-4-multimodal-instruct/blob/main/modeling_phi4mm.py#L1980) for weights switching. The purpose is simply to make the lora weight switching a bit easier. If your scenario is only on certain modalities (e.g., vision-lang), I think it is better to merge the weight to get better speed. I don't think merging the weights will harm performance.Yes, the lora weights are instruction-tuned for the corresponding modalities.