3 2

111

Osd111

None yet

liked a model 7 days ago

deepseek-ai/DeepSeek-R1

new activity 4 months ago

new activity 4 months ago

None yet

Osd111's activity

liked a model 7 days ago

New activity in NousResearch/hermes-function-calling-v1 4 months ago

#7 opened 5 months ago by

New activity in NousResearch/Llama-3.2-1B 4 months ago

#1 opened 4 months ago by

New activity in dreamgen/opus-v1.2-llama-3-8b 8 months ago

#4 opened 10 months ago by

liked a model 10 months ago

reacted to giux78's post with 🚀 10 months ago

Post

1780

On evaluating fine tuned 7B Italian open source LLMs I have collected many data points and I created a super simple explorative analyses. My hypothesis based on data are:

- mmlu is hard to improve when fine tuning a base model on a different language
- fine tuning also on single GPUs can improve by 5% to 10% the base model on common tasks but a lot more on specific cases with the right training time and data
- fine tuning can specialize well but at cost of loosing some foundational knowledge.

Here the data https://docs.google.com/spreadsheets/d/1MBcxy1loK8eIycZG4DN84Q2ejZ0jSjxUBgoShHDR6IY/edit?usp=sharing
Here the colab https://colab.research.google.com/drive/1ra4_skG5QYWSYOzvagOoIoj4bibQD8Gw?usp=sharing
Here an article with some considerations https://medium.com/@giuxale/an-analyses-on-italian-llms-models-evaluations-51bffe1d44d1