MobileLLM Collection Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 β’ 9 items β’ Updated Nov 27, 2024 β’ 103
view post Post 6002 Working on a concept GPT-2 (small) that uses KANs instead of MLPs.The ckpt and training code will be soon on the hub. 6 replies Β· π 31 31 π 13 13 π₯ 11 11 π€― 4 4 β 4 4 + Reply