MobileNetV4 pretrained weights Collection Weights for MobileNet-V4 pretrained in timm • 13 items • Updated Jun 24 • 12
DiTFastAttn: Attention Compression for Diffusion Transformer Models Paper • 2406.08552 • Published Jun 12 • 22
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations Paper • 2405.18392 • Published May 28 • 12
2D Gaussian Splatting for Geometrically Accurate Radiance Fields Paper • 2403.17888 • Published Mar 26 • 26
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models Paper • 2403.13372 • Published Mar 20 • 58
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect Paper • 2403.03853 • Published Mar 6 • 63
BitNet: Scaling 1-bit Transformers for Large Language Models Paper • 2310.11453 • Published Oct 17, 2023 • 96
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 590
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT Paper • 2402.16840 • Published Feb 26 • 23
Scaling Laws for Downstream Task Performance of Large Language Models Paper • 2402.04177 • Published Feb 6 • 17
MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices Paper • 2311.16567 • Published Nov 28, 2023 • 22
MM-LLMs: Recent Advances in MultiModal Large Language Models Paper • 2401.13601 • Published Jan 24 • 44
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper • 2401.04081 • Published Jan 8 • 70
Understanding LLMs: A Comprehensive Overview from Training to Inference Paper • 2401.02038 • Published Jan 4 • 61
Analyzing and Improving the Training Dynamics of Diffusion Models Paper • 2312.02696 • Published Dec 5, 2023 • 31
CCM: Adding Conditional Controls to Text-to-Image Consistency Models Paper • 2312.06971 • Published Dec 12, 2023 • 10
Papers to read - General Collection Papers I want to read, at some point. • 8 items • Updated Apr 9 • 4
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper • 2312.00752 • Published Dec 1, 2023 • 138