custom_text_proj initialization

#4
by edesalve - opened

Hi all, I've been working with the weights for custom_text_proj were not being loaded from the checkpoint and instead got initialized randomly, leading to very different results at each model instantiation.

To address this, I solved the problem by leveraging a base model provided by the Vidore team (https://huggingface.co./vidore/colqwen2.5-base).

Do you recommend this solution, or is there a better alternative or a forthcoming update to ensure proper initialization for the projection layer within your model?

Thank you!

Hi @edesalve , thanks for the question.

We ran multiple evaluations previously, and the results were consistent across runs. I checked the initialization of the custom_text_proj layer, and for various runs, the standard deviation, max, and min values remained the same. The only difference was in the mean, which had a very small variation close to zero (e.g., 4.3869e-05, 6.5327e-05). This minor variation in the mean does not adversely affect model performance.

Metric org

@edesalve hi again,
I have updated configs and added https://huggingface.co./Metric-AI/colqwen2.5-base with the corresponding weights for custom_text_proj . Now you can use the model and get deterministic scores always :)

Markgazol changed discussion status to closed

Sign up or log in to comment