-
You Only Cache Once: Decoder-Decoder Architectures for Language Models
Paper • 2405.05254 • Published • 9 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 602 -
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model
Paper • 2406.04333 • Published • 36
Collections
Discover the best community collections!
Collections including paper arxiv:2405.05254
-
Addition is All You Need for Energy-efficient Language Models
Paper • 2410.00907 • Published • 143 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 602 -
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Paper • 2404.16710 • Published • 73 -
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
Paper • 2405.08707 • Published • 27
-
xLSTM: Extended Long Short-Term Memory
Paper • 2405.04517 • Published • 11 -
You Only Cache Once: Decoder-Decoder Architectures for Language Models
Paper • 2405.05254 • Published • 9 -
Understanding the performance gap between online and offline alignment algorithms
Paper • 2405.08448 • Published • 14 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 126
-
Condition-Aware Neural Network for Controlled Image Generation
Paper • 2404.01143 • Published • 11 -
FlexiDreamer: Single Image-to-3D Generation with FlexiCubes
Paper • 2404.00987 • Published • 21 -
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 44 -
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
Paper • 2404.02893 • Published • 20
-
Can large language models explore in-context?
Paper • 2403.15371 • Published • 32 -
GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling
Paper • 2403.19655 • Published • 18 -
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Paper • 2404.00656 • Published • 10 -
Enabling Memory Safety of C Programs using LLMs
Paper • 2404.01096 • Published • 1
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 124 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 50 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 12 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 65
-
Nemotron-4 15B Technical Report
Paper • 2402.16819 • Published • 42 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 52 -
RWKV: Reinventing RNNs for the Transformer Era
Paper • 2305.13048 • Published • 14 -
Reformer: The Efficient Transformer
Paper • 2001.04451 • Published
-
AutoMix: Automatically Mixing Language Models
Paper • 2310.12963 • Published • 14 -
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning
Paper • 2310.03094 • Published • 12 -
MatFormer: Nested Transformer for Elastic Inference
Paper • 2310.07707 • Published • 1 -
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Paper • 2310.08461 • Published • 1