--- license: apache-2.0 language: - en tags: - moe - olmo - olmoe co2_eq_emissions: 1 --- ![olmoe](https://github.com/allenai/OLMoE/blob/main/visuals/logo/OLMoE_4.png?raw=true) # Model Summary **We strongly recommend using the instruct version at https://hf.co/OLMoE/OLMoE-1B-7B-0824-Instruct instead which is based on this model with additional DPO (Direct Preference Optimization).** - Code: https://github.com/allenai/OLMoE - Paper: - Logs: https://github.com/allenai/OLMoE/blob/main/logs/olmoe-sft-logs.txt Important branches: - `main`: Instruction tuned / supervised finetuned (SFT) model of https://hf.co/OLMoE/OLMoE-1B-7B-0824 (`main` branch) - `no-load-balancing`: Ablation without load balancing loss during SFT - `non-annealed`: Ablation starting from the checkpoint prior to annealing (branch `step1200000-tokens5033B` of https://hf.co/OLMoE/OLMoE-1B-7B-0824) rather than the annealed checkpoint (branch `main` of https://hf.co/OLMoE/OLMoE-1B-7B-0824) # Citation ```bibtex TODO ```