A theoretical formulation of how this works based on a paper by Sakana AI
Hey team, just wanted to share my findings on the theory behind how this works:
You can read the full article here: https://huggingface.co./blog/NagaSaiAbhinay/transformer-layers-as-painters-dit
But basically by looking at the cosine similarity of the activations across layers you notice that layers tend to group into first, middle and last layers. First and last layers tend to act as translation layers where they convert data from external representation to the transformers internal representation space and the middle layers operate on the internal representation.
This is consistent with Ostris findings where skipping a couple of middle blocks is almost always ok whereas the first and last layers are disastrous if removed.
One more interesting outcome of this is that you don’t have to train Lora’s on all layers ! See TheLastBens findings too https://x.com/__theben/status/1829554120270987740?s=46
https://arxiv.org/abs/2407.09298 The original paper that used LLM's for the expt.