Questions on Training and Architecture
I’m exploring this model, particularly its training methods and architectural specifics, and I have a few questions:
How exactly is training of KaLM on top of Qwen?
What loss function or objective was used to train KaLM? Was a specific ranking or contrastive loss applied?
What metric was chosen to optimize embeddings, and how was it used in training?
Was a particular method of positional encoding used, given the multilingual scope and Qwen’s involvement?
Thank you in advance for any insights or resources on KaLM’s architecture and training processes.
Thank you for your interest in our model. We have trained it using the Qwen2 model without any architectural modifications. For detailed information on the architecture, please refer to the Qwen model documentation.
Regarding the loss function, we employ the widely-used Info-NCE loss. You can currently access the training code from FlagEmbedding.
We will be releasing more details about the training process and data soon.
hi, i would like to know on which data it has been trained? I want to know if medical or wiki or sports, etc. Thanks!
hi, i would like to know on which data it has been trained? I want to know if medical or wiki or sports, etc. Thanks!
@rohitpanjwani03
hi, Rohit
We utilized a relatively extensive training dataset, encompassing various domains and types. The specific list of training data can be found in Table 8 and Table 9 in the appendix of the technical report: KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model.