|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
tags: |
|
- moe |
|
- olmo |
|
- olmoe |
|
co2_eq_emissions: 1 |
|
--- |
|
|
|
![olmoe](https://github.com/allenai/OLMoE/blob/main/visuals/logo/OLMoE_4.png?raw=true) |
|
|
|
# Model Summary |
|
|
|
**We strongly recommend using the instruct version at https://hf.co/OLMoE/OLMoE-1B-7B-0824-Instruct instead which is based on this model with additional DPO (Direct Preference Optimization).** |
|
|
|
- Code: https://github.com/allenai/OLMoE |
|
- Paper: |
|
- Logs: https://github.com/allenai/OLMoE/blob/main/logs/olmoe-sft-logs.txt |
|
|
|
|
|
Important branches: |
|
- `main`: Instruction tuned / supervised finetuned (SFT) model of https://hf.co/OLMoE/OLMoE-1B-7B-0824 (`main` branch) |
|
- `no-load-balancing`: Ablation without load balancing loss during SFT |
|
- `non-annealed`: Ablation starting from the checkpoint prior to annealing (branch `step1200000-tokens5033B` of https://hf.co/OLMoE/OLMoE-1B-7B-0824) rather than the annealed checkpoint (branch `main` of https://hf.co/OLMoE/OLMoE-1B-7B-0824) |
|
|
|
# Citation |
|
|
|
```bibtex |
|
TODO |
|
``` |