matlok 's Collections
LMM

Papers - Attention - NoPE - Long Context with SoftMax Temp

Uniform scaling not as good as Head-based scaling