Stefano Fiorucci

anakin87

AI & ML interests

Contributing to Haystack, the LLM Framework ๐Ÿ—๏ธ. NLP / LLMs.

Articles

Organizations

Posts 7

view post
Post
999
๐Œ๐ฒ ๐Ÿ๐ข๐ซ๐ฌ๐ญ ๐œ๐จ๐ฆ๐ฆ๐ฎ๐ง๐ข๐ญ๐ฒ ๐š๐ซ๐ญ๐ข๐œ๐ฅ๐ž! ๐’๐ž๐ฅ๐ž๐œ๐ญ๐ข๐ฏ๐ž ๐Ÿ๐ข๐ง๐ž-๐ญ๐ฎ๐ง๐ข๐ง๐  ๐ฐ๐ข๐ญ๐ก ๐’๐ฉ๐ž๐œ๐ญ๐ซ๐ฎ๐ฆ ๐ŸŽฏ

Full walkthrough on how to get started with Spectrum and TRL for efficient fine-tuning.
๐Ÿ“” ๐Ÿ‘ฃ https://huggingface.co./blog/anakin87/spectrum

---

Looking to fine-tune Language Models efficiently and save on computational resources?

One popular method is QLoRa, which quantizes the original model and trains low-rank adapters on top.
It's quite effective and uses less GPU than full fine-tuning.

However, QLoRa applies Low-Rank Adaptation uniformly across the entire model.

What if we could identify the most informative layers and only fine-tune those? ๐Ÿค”

This is exactly what Spectrum does! ๐Ÿ‘‡

๐Ÿ”ฌ Spectrum analyzes the weight matrices for all layers in a Language Model and calculates a Signal to Noise Ratio (SNR) for each one.
(It uses Random Matrix Theory and Marchenko-Pastur distribution to distinguish signal from noise.)

๐ŸŽฏ Based on a chosen percentage (say, 25%), Spectrum selects the most informative layers of each type (mlp.down_proj, self_attn.o_proj, etc.).

You can then โ„๏ธ freeze the rest of the model and focus your ๐Ÿ‹๏ธโ€โ™‚๏ธ training on the chosen layers.


๐Ÿ† Results/Evaluation
- Spectrum is competitive with full fine-tuning and beats QLoRA on benchmarks.
- While QLoRA is more memory-efficient on a single GPU, Spectrum shines in distributed training setups.
- Great models trained with Spectrum: Dolphin models, Llama 3.1 Storm, numerous models by VAGO Solutions...

---

For a practical guide, check out the article above.
view post
Post
1575
๐Ÿ’ฌ ๐Ÿ‡ฎ๐Ÿ‡น Phi 3.5 mini ITA: a Small Language Model for Italian

Lately, I've spent some time fine-tuning language models.

Now I am happy to release Phi 3.5 mini ITA: a fine-tuned version of Phi-3.5-mini-instruct to improve performance on the Italian language

๐Ÿ”น Small (3.82 B parameters) but capable model
๐Ÿ”น 128k context length

Chat with it on ๐Ÿค— Spaces: anakin87/Phi-3.5-mini-ITA
Model card: anakin87/Phi-3.5-mini-ITA

๐Ÿ—ƒ๏ธ Data
Supervised fine-tuning using a good mix of English and Italian data:
- mlabonne/FineTome-100k by @mlabonne
- efederici/capybara-claude-15k-ita by @efederici
๐Ÿ™ Thanks to the authors for the datasets.


๐ŸŽฏ Targeted training with Spectrum
I used Spectrum, a relatively new technique for parameter-efficient learning.
The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ„๏ธ freeze the rest.
I trained the top 30% of model layers.

๐Ÿ“ Spectrum paper: https://arxiv.org/abs/2406.06623


๐Ÿ“Š Vibe check and performance on Italian benchmarks seem encouraging