-
TinyGSM: achieving >80% on GSM8k with small language models
Paper • 2312.09241 • Published • 37 -
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Paper • 2403.03853 • Published • 62 -
Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction
Paper • 2403.18795 • Published • 18 -
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Paper • 2404.04478 • Published • 12
Collections
Discover the best community collections!
Collections including paper arxiv:2403.03853
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 138 -
SparQ Attention: Bandwidth-Efficient LLM Inference
Paper • 2312.04985 • Published • 38 -
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Paper • 2402.00159 • Published • 59 -
Neural Network Diffusion
Paper • 2402.13144 • Published • 94
-
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Paper • 2311.09257 • Published • 45 -
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Paper • 2310.04378 • Published • 19 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 44 -
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 118