Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2308.10110

Papers - Image - MoE

Robust Mixture-of-Expert Training for Convolutional Neural Networks

Paper • 2308.10110 • Published Aug 19, 2023 • 2
HyperFormer: Enhancing Entity and Relation Interaction for Hyper-Relational Knowledge Graph Completion

Paper • 2308.06512 • Published Aug 12, 2023 • 2
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts

Paper • 2309.04354 • Published Sep 8, 2023 • 13
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Paper • 2212.05055 • Published Dec 9, 2022 • 5

Papers - MoE - Training

Robust Mixture-of-Expert Training for Convolutional Neural Networks

Paper • 2308.10110 • Published Aug 19, 2023 • 2
Experts Weights Averaging: A New General Training Scheme for Vision Transformers

Paper • 2308.06093 • Published Aug 11, 2023 • 2
ConstitutionalExperts: Training a Mixture of Principle-based Prompts

Paper • 2403.04894 • Published Mar 7 • 2
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models

Paper • 2403.03432 • Published Mar 6 • 1

Papers - MoE - Research

Adaptive sequential Monte Carlo by means of mixture of experts

Paper • 1108.2836 • Published Aug 14, 2011 • 2
Convergence Rates for Mixture-of-Experts

Paper • 1110.2058 • Published Oct 10, 2011 • 2
Multi-view Contrastive Learning for Entity Typing over Knowledge Graphs

Paper • 2310.12008 • Published Oct 18, 2023 • 2
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts

Paper • 2308.11793 • Published Aug 22, 2023 • 2

Trellis Networks for Sequence Modeling

Paper • 1810.06682 • Published Oct 15, 2018 • 1
Pruning Very Deep Neural Network Channels for Efficient Inference

Paper • 2211.08339 • Published Nov 14, 2022 • 1
LAPP: Layer Adaptive Progressive Pruning for Compressing CNNs from Scratch

Paper • 2309.14157 • Published Sep 25, 2023 • 1
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 138

QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models

Paper • 2310.16795 • Published Oct 25, 2023 • 26
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

Paper • 2308.12066 • Published Aug 23, 2023 • 4
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference

Paper • 2303.06182 • Published Mar 10, 2023 • 1
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate

Paper • 2112.14397 • Published Dec 29, 2021 • 1

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs