Papers
arxiv:2409.02060

OLMoE: Open Mixture-of-Experts Language Models

Published on Sep 3
· Submitted by Muennighoff on Sep 4
#2 Paper of the day

Abstract

We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). OLMoE-1B-7B has 7 billion (B) parameters but uses only 1B per input token. We pretrain it on 5 trillion tokens and further adapt it to create OLMoE-1B-7B-Instruct. Our models outperform all available models with similar active parameters, even surpassing larger ones like Llama2-13B-Chat and DeepSeekMoE-16B. We present various experiments on MoE training, analyze routing in our model showing high specialization, and open-source all aspects of our work: model weights, training data, code, and logs.

Community

Paper author Paper submitter

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Hey, Amazing work :)
We've summarised this and a few other papers in our blog. Hope you like it

  1. KTO: The infamous alignment algorithm
  2. OLMoE: Open Data, Weights, Code Mixture of Experts models
  3. Mamba in the LlaMA: Distilling from Transformers to Mamba
  4. PlanSearch: Improving Code Generation via Planning

https://datta0.substack.com/p/ai-unplugged-19-kto-for-model-alignment

it is awesome

Sign up or log in to comment

Models citing this paper 15

Browse 15 models citing this paper

Datasets citing this paper 1

Spaces citing this paper 4

Collections including this paper 15