ibm-granite/granite-3.0-2b-instruct · Question for the IBM Granite Team

Hello IBM Granite Team,

Thank you for providing this model!

I have a question regarding the training approach you took with this model (granite-3.0-2b-instruct) and the 8B model (granite-3.0-8b-instruct).

I am curious to know how the inclusion of multiple languages affected the model's size and parameter count. Did the addition of many languages significantly increase the model's size, or were you able to find ways to mitigate this effect?

If it did affect the size and parameter count, my next question would be if you considered training future models on a single language and then relying on post-training fine-tuning by community members to adapt the model to other languages?

I'm interested in understanding the trade-offs you made when designing this model, and whether you think a single-language approach with community-driven fine-tuning could be a viable path forward.

Thanks in advance for your time, and I look forward to hearing your thoughts on this!

Hey! About supporting multiple languages, the change is more on the data that goes into training, less about model size/architecture/parameters!

In terms of choosing base training or finetuning (or both), that is a good question to ask whenever we want to advance models' capabilities or addressing any issues. Usually for a significant extension to the model's scope (like a new programming language, a foreign language), likely the data you might need is of larger scale and it is the best to include them in the base model training. But finetuning on the target tasks/language will definitely help! In summary, given where our model is at now (12 languages), a combination of both approaches would be more effective.