On evaluating fine tuned 7B Italian open source LLMs I have collected many data points and I created a super simple explorative analyses. My hypothesis based on data are:
- mmlu is hard to improve when fine tuning a base model on a different language - fine tuning also on single GPUs can improve by 5% to 10% the base model on common tasks but a lot more on specific cases with the right training time and data - fine tuning can specialize well but at cost of loosing some foundational knowledge.