Downscaling the `Q_q` and `W_k` matrices for repeated layers in franken-merges
14
#4 opened 10 months ago
by
jukofyork
![](https://cdn-avatars.huggingface.co/v1/production/uploads/65995c45539c808e84c38bf1/k0y3ULloWQEMvosQwHgrE.png)
Guidance on GPU VRAM Split?
5
#3 opened about 1 year ago
by
nmitchko
Performance
13
#2 opened about 1 year ago
by
KnutJaegersberg
![](https://cdn-avatars.huggingface.co/v1/production/uploads/1669551186189-63732ebbbd81fae2b3aaf3fb.jpeg)