amanrangapur
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -2,6 +2,7 @@
|
|
2 |
license: apache-2.0
|
3 |
datasets:
|
4 |
- allenai/dolmino-mix-1124
|
|
|
5 |
language:
|
6 |
- en
|
7 |
---
|
@@ -97,56 +98,60 @@ For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo?
|
|
97 |
- Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo
|
98 |
- Evaluation code: https://github.com/allenai/OLMo-Eval
|
99 |
- Further fine-tuning code: https://github.com/allenai/open-instruct
|
100 |
-
- **Paper:** [Link](https://arxiv.org/abs/2402.00838)
|
101 |
- **Technical blog post:** https://blog.allenai.org/olmo-1-7-7b-a-24-point-improvement-on-mmlu-92b43f7d269d
|
102 |
- **W&B Logs:** [pretraining](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B), [annealing](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B-anneal)
|
|
|
103 |
|
104 |
|
105 |
-
|
106 |
-
<!-- TODO -->
|
107 |
## Evaluation
|
108 |
-
|
109 |
-
|
110 |
-
|
111 |
-
|
112 |
-
|
113 |
-
|
|
114 |
-
|
|
115 |
-
|
|
116 |
-
|
|
117 |
-
|
|
118 |
-
|
|
119 |
-
|
|
120 |
-
|
|
121 |
-
|
|
122 |
-
|
|
123 |
-
|
|
124 |
-
|
|
125 |
-
|
|
126 |
-
|
127 |
-
|
128 |
-
|
129 |
-
| Task | Random | [StableLM 2 1.6b](https://huggingface.co/stabilityai/stablelm-2-1_6b)\* | [Pythia 1B](https://huggingface.co/EleutherAI/pythia-1b) | [TinyLlama 1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T) | [OLMo 1.0 1B](https://huggingface.co/allenai/OLMo-1B-hf) | **OLMo 1B July 2024** |
|
130 |
-
| ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ | ----------------- | --------- | -------------------------------------- | ------- | ------ |
|
131 |
-
| arc_challenge | 25 | 43.81 | 33.11 | 34.78 | 34.45 | 36.5 |
|
132 |
-
| arc_easy | 25 | 63.68 | 50.18 | 53.16 | 58.07 | 55.3 |
|
133 |
-
| boolq | 50 | 76.6 | 61.8 | 64.6 | 60.7 | 67.5 |
|
134 |
-
| copa | 50 | 84 | 72 | 78 | 79 | 83.0 |
|
135 |
-
| hellaswag | 25 | 68.2 | 44.7 | 58.7 | 62.5 | 66.9 |
|
136 |
-
| openbookqa | 25 | 45.8 | 37.8 | 43.6 | 46.4 | 46.4 |
|
137 |
-
| piqa | 50 | 74 | 69.1 | 71.1 | 73.7 | 74.9 |
|
138 |
-
| sciq | 25 | 94.7 | 86 | 90.5 | 88.1 | 93.4 |
|
139 |
-
| winogrande | 50 | 64.9 | 53.3 | 58.9 | 58.9 | 61.4 |
|
140 |
-
| Average | 36.11 | 68.41 | 56.44 | 61.48 | 62.42 | 65.0 |
|
141 |
-
-->
|
142 |
|
143 |
## Model Details
|
144 |
|
145 |
-
###
|
146 |
-
|
147 |
-
|
148 |
-
|
149 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
150 |
|
151 |
|
152 |
## Bias, Risks, and Limitations
|
|
|
2 |
license: apache-2.0
|
3 |
datasets:
|
4 |
- allenai/dolmino-mix-1124
|
5 |
+
- allenai/dolma
|
6 |
language:
|
7 |
- en
|
8 |
---
|
|
|
98 |
- Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo
|
99 |
- Evaluation code: https://github.com/allenai/OLMo-Eval
|
100 |
- Further fine-tuning code: https://github.com/allenai/open-instruct
|
101 |
+
<!-- - **Paper:** [Link](https://arxiv.org/abs/2402.00838)
|
102 |
- **Technical blog post:** https://blog.allenai.org/olmo-1-7-7b-a-24-point-improvement-on-mmlu-92b43f7d269d
|
103 |
- **W&B Logs:** [pretraining](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B), [annealing](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B-anneal)
|
104 |
+
-->
|
105 |
|
106 |
|
|
|
|
|
107 |
## Evaluation
|
108 |
+
Core model results for OLMo2 7B and 13B models are found below.
|
109 |
+
|
110 |
+
| Model | Train FLOPs | Average | ARC/C | HSwag | WinoG | MMLU | DROP | NQ | AGIEval | GSM8k | MMWLUPro | TriviaQA |
|
111 |
+
|-------------------|------------|---------|--------|--------|--------|-------|-------|-----|----------|--------|-----------|-----------|
|
112 |
+
| Gemma-2-9B | 4.4·10²³ | 52.9 | 89.5 | 87.3 | 78.8 | 70.6 | 63 | 38 | 57.3 | 1.1 | 42 | 0.9 |
|
113 |
+
| Llama-2-13B | 1.6·10²³ | 54.1 | 67.3 | 83.9 | 74.9 | 55.7 | 45.6 | 38.4 | 41.5 | 28.1 | 23.9 | 81.3 |
|
114 |
+
| Mistral-7B-v0.3 | n/a | 58.8 | 78.3 | 83.1 | 77.7 | 63.5 | 51.8 | 37.2 | 47.3 | 40.1 | 30 | 79.3 |
|
115 |
+
| Llama-3.1-8B | 7.2·10²³ | 61.8 | 79.5 | 81.6 | 76.6 | 66.9 | 56.4 | 33.9 | 51.3 | 56.5 | 34.7 | 80.3 |
|
116 |
+
| Mistral-Nemo-12B | n/a | 66.9 | 85.2 | 85.6 | 81.5 | 69.5 | 69.2 | 39.7 | 54.7 | 62.1 | 36.7 | 84.6 |
|
117 |
+
| Qwen-2.5-7B | 8.2·10²³ | 67.4 | 89.5 | 89.7 | 74.2 | 74.4 | 55.8 | 29.9 | 63.7 | 81.5 | 45.8 | 69.4 |
|
118 |
+
| Qwen-2.5-14B | 16.0·10²³ | 72.2 | 94 | 94 | 80 | 79.3 | 51.5 | 37.3 | 71 | 83.4 | 52.8 | 79.1 |
|
119 |
+
| StableLM-2-12B | 2.9·10²³ | 62.2 | 81.9 | 84.5 | 77.7 | 62.4 | 55.5 | 37.6 | 50.9 | 62 | 29.3 | 79.9 |
|
120 |
+
| Zamba-2-7B | n/c | 65.2 | 92.2 | 89.4 | 79.6 | 68.5 | 51.7 | 36.5 | 55.5 | 67.2 | 32.8 | 78.8 |
|
121 |
+
| Amber-7B | 0.5·10²³ | 35.2 | 44.9 | 74.5 | 65.5 | 24.7 | 26.1 | 18.7 | 21.8 | 4.8 | 11.7 | 59.3 |
|
122 |
+
| OLMo-7B | 1.0·10²³ | 38.3 | 46.4 | 78.1 | 68.5 | 28.3 | 27.3 | 24.8 | 23.7 | 9.2 | 12.1 | 64.1 |
|
123 |
+
| MAP-Neo-7B | 2.1·10²³ | 49.6 | 78.4 | 72.8 | 69.2 | 58 | 39.4 | 28.9 | 45.8 | 12.5 | 25.9 | 65.1 |
|
124 |
+
| OLMo-0424-7B | 0.9·10²³ | 50.7 | 66.9 | 80.1 | 73.6 | 54.3 | 50 | 29.6 | 43.9 | 27.7 | 22.1 | 58.8 |
|
125 |
+
| DCLM-7B | 1.0·10²³ | 56.9 | 79.8 | 82.3 | 77.3 | 64.4 | 39.3 | 28.8 | 47.5 | 46.1 | 31.3 | 72.1 |
|
126 |
+
| **OLMo-2-1124-7B** | 1.8·10²³ | 62.9 | 79.8 | 83.8 | 77.2 | 63.7 | 60.8 | 36.9 | 50.4 | 67.5 | 31 | 78 |
|
127 |
+
| **OLMo-2-1124-13B** | 4.6·10²³ | 68.3 | 83.5 | 86.4 | 81.5 | 67.5 | 70.7 | 46.7 | 54.2 | 75.1 | 35.1 | 81.9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
128 |
|
129 |
## Model Details
|
130 |
|
131 |
+
### Pretraining
|
132 |
+
| | **OLMo 2 7B** | **OLMo 2 13B** |
|
133 |
+
|-------------------|------------|------------|
|
134 |
+
| Pretraining Stage 1<br>([OLMo-Mix-1124](https://huggingface.co/datasets/allenai/olmo-mix-1124)) | 4 trillion tokens<br>(1 epoch) | 5 trillion tokens<br>(1.2 epochs) |
|
135 |
+
| Pretraining Stage 2<br>([Dolmino-Mix-1124](https://huggingface.co/datasets/allenai/dolmino-mix-1124)) | 50B tokens (3 runs)<br>*merged* | 100B tokens (3 runs)<br>300B tokens (1 run)<br>*merged* |
|
136 |
+
| Post-training<br>([Tulu 3 SFT OLMo mix](https://huggingface.co/datasets/allenai/tulu-3-sft-olmo-mixture)) | SFT + DPO + PPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-1124-7b-preference-mix)) | SFT + DPO + PPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-1124-13b-preference-mix)) |
|
137 |
+
|
138 |
+
#### Stage 1: Initial Pretraining
|
139 |
+
- Dataset: [OLMo-Mix-1124](https://huggingface.co/datasets/allenai/olmo-mix-1124) (3.9T tokens)
|
140 |
+
- Coverage: 90%+ of total pretraining budget
|
141 |
+
- 7B Model: ~1 epoch
|
142 |
+
- 13B Model: 1.2 epochs (5T tokens)
|
143 |
+
|
144 |
+
#### Stage 2: Fine-tuning
|
145 |
+
- Dataset: [Dolmino-Mix-1124](https://huggingface.co/datasets/allenai/dolmino-mix-1124) (843B tokens)
|
146 |
+
- Three training mixes:
|
147 |
+
- 50B tokens
|
148 |
+
- 100B tokens
|
149 |
+
- 300B tokens
|
150 |
+
- Mix composition: 50% high-quality data + academic/Q&A/instruction/math content
|
151 |
+
|
152 |
+
#### Model Merging
|
153 |
+
- 7B Model: 3 versions trained on 50B mix, merged via model souping
|
154 |
+
- 13B Model: 3 versions on 100B mix + 1 version on 300B mix, merged for final checkpoint
|
155 |
|
156 |
|
157 |
## Bias, Risks, and Limitations
|