Safetensors
English
olmo2
amanrangapur commited on
Commit
05f08bd
·
verified ·
1 Parent(s): a3346e8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -42
README.md CHANGED
@@ -2,6 +2,7 @@
2
  license: apache-2.0
3
  datasets:
4
  - allenai/dolmino-mix-1124
 
5
  language:
6
  - en
7
  ---
@@ -97,56 +98,60 @@ For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo?
97
  - Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo
98
  - Evaluation code: https://github.com/allenai/OLMo-Eval
99
  - Further fine-tuning code: https://github.com/allenai/open-instruct
100
- - **Paper:** [Link](https://arxiv.org/abs/2402.00838)
101
  - **Technical blog post:** https://blog.allenai.org/olmo-1-7-7b-a-24-point-improvement-on-mmlu-92b43f7d269d
102
  - **W&B Logs:** [pretraining](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B), [annealing](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B-anneal)
 
103
 
104
 
105
-
106
- <!-- TODO -->
107
  ## Evaluation
108
- `TODO`
109
- <!-- Core model results for OLMo 7B models are found below.
110
-
111
- | Task | Llama-7b | Llama2-7b | Falcon-7b | Mpt-7b | OLMo-7B | Llama2-13b | OLMo 7B April 2024 | **OLMo 7B July 2024** |
112
- |-------------------|----------|-----------|-----------|--------|---------|------------|--------------------|-----------------------|
113
- | arc_c | 44.5 | 48.5 | 47.5 | 46.5 | 48.5 | 52.8 | 42.5 | 43.8 |
114
- | arc_e | 67.9 | 69.5 | 70.4 | 70.5 | 65.4 | 73.7 | 67.2 | 68.8 |
115
- | boolq | 75.4 | 80.2 | 74.6 | 74.2 | 73.4 | 82.2 | 83.7 | 78.9 |
116
- | copa | 91.0 | 86.0 | 86.0 | 85.0 | 90.0 | 90.0 | 86.0 | 84.0 |
117
- | hellaswag | 76.2 | 76.8 | 75.9 | 77.6 | 76.4 | 78.6 | 75.5 | 77.4 |
118
- | openbookqa | 51.2 | 48.4 | 53.0 | 48.6 | 50.4 | 51.8 | 50.0 | 48.2 |
119
- | piqa | 77.2 | 76.7 | 78.5 | 77.3 | 78.4 | 79.0 | 77.5 | 78.2 |
120
- | sciq | 93.9 | 94.5 | 93.9 | 93.7 | 93.8 | 95.5 | 96.7 | 97.0 |
121
- | winogrande | 70.5 | 69.4 | 68.9 | 69.9 | 67.9 | 73.5 | 69.8 | 68.8 |
122
- | truthfulQA (MC2) | 33.9 | 38.5 | 34.0 | 33.0 | 36.0 | 36.8 | 35.8 | 36.5 |
123
- | MMLU (5 shot MC) | 31.5 | 45.0 | 24.0 | 30.8 | 28.3 | 55.5 | 52.0 | 53.4 |
124
- | GSM8k | 10.0 | 12.0 | 4.0 | 4.5 | 8.5 | 25.0 | 29.0 | 35.0 |
125
- | Full average | 60.3 | 62.1 | 59.2 | 59.3 | 59.8 | 66.2 | 63.8 | 64.2 |
126
-
127
- And for 13B models:
128
-
129
- | Task | Random | [StableLM 2 1.6b](https://huggingface.co/stabilityai/stablelm-2-1_6b)\* | [Pythia 1B](https://huggingface.co/EleutherAI/pythia-1b) | [TinyLlama 1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T) | [OLMo 1.0 1B](https://huggingface.co/allenai/OLMo-1B-hf) | **OLMo 1B July 2024** |
130
- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ | ----------------- | --------- | -------------------------------------- | ------- | ------ |
131
- | arc_challenge | 25 | 43.81 | 33.11 | 34.78 | 34.45 | 36.5 |
132
- | arc_easy | 25 | 63.68 | 50.18 | 53.16 | 58.07 | 55.3 |
133
- | boolq | 50 | 76.6 | 61.8 | 64.6 | 60.7 | 67.5 |
134
- | copa | 50 | 84 | 72 | 78 | 79 | 83.0 |
135
- | hellaswag | 25 | 68.2 | 44.7 | 58.7 | 62.5 | 66.9 |
136
- | openbookqa | 25 | 45.8 | 37.8 | 43.6 | 46.4 | 46.4 |
137
- | piqa | 50 | 74 | 69.1 | 71.1 | 73.7 | 74.9 |
138
- | sciq | 25 | 94.7 | 86 | 90.5 | 88.1 | 93.4 |
139
- | winogrande | 50 | 64.9 | 53.3 | 58.9 | 58.9 | 61.4 |
140
- | Average | 36.11 | 68.41 | 56.44 | 61.48 | 62.42 | 65.0 |
141
- -->
142
 
143
  ## Model Details
144
 
145
- ### Data
146
- `TODO`
147
-
148
- ### Staged training / annealing
149
- `TODO`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
150
 
151
 
152
  ## Bias, Risks, and Limitations
 
2
  license: apache-2.0
3
  datasets:
4
  - allenai/dolmino-mix-1124
5
+ - allenai/dolma
6
  language:
7
  - en
8
  ---
 
98
  - Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo
99
  - Evaluation code: https://github.com/allenai/OLMo-Eval
100
  - Further fine-tuning code: https://github.com/allenai/open-instruct
101
+ <!-- - **Paper:** [Link](https://arxiv.org/abs/2402.00838)
102
  - **Technical blog post:** https://blog.allenai.org/olmo-1-7-7b-a-24-point-improvement-on-mmlu-92b43f7d269d
103
  - **W&B Logs:** [pretraining](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B), [annealing](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B-anneal)
104
+ -->
105
 
106
 
 
 
107
  ## Evaluation
108
+ Core model results for OLMo2 7B and 13B models are found below.
109
+
110
+ | Model | Train FLOPs | Average | ARC/C | HSwag | WinoG | MMLU | DROP | NQ | AGIEval | GSM8k | MMWLUPro | TriviaQA |
111
+ |-------------------|------------|---------|--------|--------|--------|-------|-------|-----|----------|--------|-----------|-----------|
112
+ | Gemma-2-9B | 4.4·10²³ | 52.9 | 89.5 | 87.3 | 78.8 | 70.6 | 63 | 38 | 57.3 | 1.1 | 42 | 0.9 |
113
+ | Llama-2-13B | 1.6·10²³ | 54.1 | 67.3 | 83.9 | 74.9 | 55.7 | 45.6 | 38.4 | 41.5 | 28.1 | 23.9 | 81.3 |
114
+ | Mistral-7B-v0.3 | n/a | 58.8 | 78.3 | 83.1 | 77.7 | 63.5 | 51.8 | 37.2 | 47.3 | 40.1 | 30 | 79.3 |
115
+ | Llama-3.1-8B | 7.2·10²³ | 61.8 | 79.5 | 81.6 | 76.6 | 66.9 | 56.4 | 33.9 | 51.3 | 56.5 | 34.7 | 80.3 |
116
+ | Mistral-Nemo-12B | n/a | 66.9 | 85.2 | 85.6 | 81.5 | 69.5 | 69.2 | 39.7 | 54.7 | 62.1 | 36.7 | 84.6 |
117
+ | Qwen-2.5-7B | 8.2·10²³ | 67.4 | 89.5 | 89.7 | 74.2 | 74.4 | 55.8 | 29.9 | 63.7 | 81.5 | 45.8 | 69.4 |
118
+ | Qwen-2.5-14B | 16.0·10²³ | 72.2 | 94 | 94 | 80 | 79.3 | 51.5 | 37.3 | 71 | 83.4 | 52.8 | 79.1 |
119
+ | StableLM-2-12B | 2.9·10²³ | 62.2 | 81.9 | 84.5 | 77.7 | 62.4 | 55.5 | 37.6 | 50.9 | 62 | 29.3 | 79.9 |
120
+ | Zamba-2-7B | n/c | 65.2 | 92.2 | 89.4 | 79.6 | 68.5 | 51.7 | 36.5 | 55.5 | 67.2 | 32.8 | 78.8 |
121
+ | Amber-7B | 0.5·10²³ | 35.2 | 44.9 | 74.5 | 65.5 | 24.7 | 26.1 | 18.7 | 21.8 | 4.8 | 11.7 | 59.3 |
122
+ | OLMo-7B | 1.0·10²³ | 38.3 | 46.4 | 78.1 | 68.5 | 28.3 | 27.3 | 24.8 | 23.7 | 9.2 | 12.1 | 64.1 |
123
+ | MAP-Neo-7B | 2.1·10²³ | 49.6 | 78.4 | 72.8 | 69.2 | 58 | 39.4 | 28.9 | 45.8 | 12.5 | 25.9 | 65.1 |
124
+ | OLMo-0424-7B | 0.9·10²³ | 50.7 | 66.9 | 80.1 | 73.6 | 54.3 | 50 | 29.6 | 43.9 | 27.7 | 22.1 | 58.8 |
125
+ | DCLM-7B | 1.0·10²³ | 56.9 | 79.8 | 82.3 | 77.3 | 64.4 | 39.3 | 28.8 | 47.5 | 46.1 | 31.3 | 72.1 |
126
+ | **OLMo-2-1124-7B** | 1.8·10²³ | 62.9 | 79.8 | 83.8 | 77.2 | 63.7 | 60.8 | 36.9 | 50.4 | 67.5 | 31 | 78 |
127
+ | **OLMo-2-1124-13B** | 4.6·10²³ | 68.3 | 83.5 | 86.4 | 81.5 | 67.5 | 70.7 | 46.7 | 54.2 | 75.1 | 35.1 | 81.9 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
128
 
129
  ## Model Details
130
 
131
+ ### Pretraining
132
+ | | **OLMo 2 7B** | **OLMo 2 13B** |
133
+ |-------------------|------------|------------|
134
+ | Pretraining Stage 1<br>([OLMo-Mix-1124](https://huggingface.co/datasets/allenai/olmo-mix-1124)) | 4 trillion tokens<br>(1 epoch) | 5 trillion tokens<br>(1.2 epochs) |
135
+ | Pretraining Stage 2<br>([Dolmino-Mix-1124](https://huggingface.co/datasets/allenai/dolmino-mix-1124)) | 50B tokens (3 runs)<br>*merged* | 100B tokens (3 runs)<br>300B tokens (1 run)<br>*merged* |
136
+ | Post-training<br>([Tulu 3 SFT OLMo mix](https://huggingface.co/datasets/allenai/tulu-3-sft-olmo-mixture)) | SFT + DPO + PPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-1124-7b-preference-mix)) | SFT + DPO + PPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-1124-13b-preference-mix)) |
137
+
138
+ #### Stage 1: Initial Pretraining
139
+ - Dataset: [OLMo-Mix-1124](https://huggingface.co/datasets/allenai/olmo-mix-1124) (3.9T tokens)
140
+ - Coverage: 90%+ of total pretraining budget
141
+ - 7B Model: ~1 epoch
142
+ - 13B Model: 1.2 epochs (5T tokens)
143
+
144
+ #### Stage 2: Fine-tuning
145
+ - Dataset: [Dolmino-Mix-1124](https://huggingface.co/datasets/allenai/dolmino-mix-1124) (843B tokens)
146
+ - Three training mixes:
147
+ - 50B tokens
148
+ - 100B tokens
149
+ - 300B tokens
150
+ - Mix composition: 50% high-quality data + academic/Q&A/instruction/math content
151
+
152
+ #### Model Merging
153
+ - 7B Model: 3 versions trained on 50B mix, merged via model souping
154
+ - 13B Model: 3 versions on 100B mix + 1 version on 300B mix, merged for final checkpoint
155
 
156
 
157
  ## Bias, Risks, and Limitations