-
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time
Paper • 2408.13233 • Published • 20 -
Heterogeneous Multi-task Learning with Expert Diversity
Paper • 2106.10595 • Published • 1 -
Residual Mixture of Experts
Paper • 2204.09636 • Published • 1 -
Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition
Paper • 2307.05956 • Published • 1
Collections
Discover the best community collections!
Collections including paper arxiv:2211.11315
-
Experts Weights Averaging: A New General Training Scheme for Vision Transformers
Paper • 2308.06093 • Published • 2 -
Platypus: Quick, Cheap, and Powerful Refinement of LLMs
Paper • 2308.07317 • Published • 23 -
Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers
Paper • 2211.11315 • Published • 1 -
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
Paper • 2307.13269 • Published • 31
-
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Paper • 2310.17157 • Published • 11 -
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Paper • 2305.15805 • Published • 1 -
Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt
Paper • 2305.11186 • Published • 1 -
Composable Sparse Fine-Tuning for Cross-Lingual Transfer
Paper • 2110.07560 • Published • 1
-
Efficient Memory Management for Large Language Model Serving with PagedAttention
Paper • 2309.06180 • Published • 25 -
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
Paper • 2308.16137 • Published • 39 -
Scaling Transformer to 1M tokens and beyond with RMT
Paper • 2304.11062 • Published • 2 -
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 17
-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 14 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 25 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 6 -
Conditional Diffusion Distillation
Paper • 2310.01407 • Published • 20