Safetensors
English
olmoe
Mixture of Experts
olmo
Muennighoff's picture
Update README.md
749483b verified
|
raw
history blame
1 kB
---
license: apache-2.0
language:
- en
tags:
- moe
- olmo
- olmoe
co2_eq_emissions: 1
---
![olmoe](https://github.com/allenai/OLMoE/blob/main/visuals/logo/OLMoE_4.png?raw=true)
# Model Summary
**We strongly recommend using the instruct version at https://hf.co/OLMoE/OLMoE-1B-7B-0824-Instruct instead which is based on this model with additional DPO (Direct Preference Optimization).**
- Code: https://github.com/allenai/OLMoE
- Paper:
- Logs: https://github.com/allenai/OLMoE/blob/main/logs/olmoe-sft-logs.txt
Important branches:
- `main`: Instruction tuned / supervised finetuned (SFT) model of https://hf.co/OLMoE/OLMoE-1B-7B-0824 (`main` branch)
- `no-load-balancing`: Ablation without load balancing loss during SFT
- `non-annealed`: Ablation starting from the checkpoint prior to annealing (branch `step1200000-tokens5033B` of https://hf.co/OLMoE/OLMoE-1B-7B-0824) rather than the annealed checkpoint (branch `main` of https://hf.co/OLMoE/OLMoE-1B-7B-0824)
# Citation
```bibtex
TODO
```