allenai
/

OLMo-2-1124-13B

Safetensors

English

olmo2

Model card Files Files and versions Community

amanrangapur commited on Nov 26, 2024

Commit

05f08bd

verified ·

1 Parent(s): a3346e8

Update README.md

Browse files

Files changed (1) hide show

README.md +47 -42

README.md CHANGED Viewed

@@ -2,6 +2,7 @@
 license: apache-2.0
 datasets:
 - allenai/dolmino-mix-1124
 language:
 - en
 ---
@@ -97,56 +98,60 @@ For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo?
     - Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo
     - Evaluation code: https://github.com/allenai/OLMo-Eval
     - Further fine-tuning code: https://github.com/allenai/open-instruct
-- **Paper:** [Link](https://arxiv.org/abs/2402.00838)
 - **Technical blog post:** https://blog.allenai.org/olmo-1-7-7b-a-24-point-improvement-on-mmlu-92b43f7d269d
 - **W&B Logs:** [pretraining](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B), [annealing](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B-anneal)
-<!-- TODO -->
 ## Evaluation
-`TODO`
-<!-- Core model results for OLMo 7B models are found below.
-| Task              | Llama-7b | Llama2-7b | Falcon-7b | Mpt-7b | OLMo-7B | Llama2-13b | OLMo 7B April 2024 | **OLMo 7B July 2024** |
-|-------------------|----------|-----------|-----------|--------|---------|------------|--------------------|-----------------------|
-| arc_c             | 44.5     | 48.5      | 47.5      | 46.5   | 48.5    | 52.8       | 42.5               | 43.8                  |
-| arc_e             | 67.9     | 69.5      | 70.4      | 70.5   | 65.4    | 73.7       | 67.2               | 68.8                  |
-| boolq             | 75.4     | 80.2      | 74.6      | 74.2   | 73.4    | 82.2       | 83.7               | 78.9                  |
-| copa              | 91.0     | 86.0      | 86.0      | 85.0   | 90.0    | 90.0       | 86.0               | 84.0                  |
-| hellaswag         | 76.2     | 76.8      | 75.9      | 77.6   | 76.4    | 78.6       | 75.5               | 77.4                  |
-| openbookqa        | 51.2     | 48.4      | 53.0      | 48.6   | 50.4    | 51.8       | 50.0               | 48.2                  |
-| piqa              | 77.2     | 76.7      | 78.5      | 77.3   | 78.4    | 79.0       | 77.5               | 78.2                  |
-| sciq              | 93.9     | 94.5      | 93.9      | 93.7   | 93.8    | 95.5       | 96.7               | 97.0                  |
-| winogrande        | 70.5     | 69.4      | 68.9      | 69.9   | 67.9    | 73.5       | 69.8               | 68.8                  |
-| truthfulQA (MC2)  | 33.9     | 38.5      | 34.0      | 33.0   | 36.0    | 36.8       | 35.8               | 36.5                  |
-| MMLU (5 shot MC)  | 31.5     | 45.0      | 24.0      | 30.8   | 28.3    | 55.5       | 52.0               | 53.4                  |
-| GSM8k             | 10.0     | 12.0      | 4.0       | 4.5    | 8.5     | 25.0       | 29.0               | 35.0                  |
-| Full average      | 60.3     | 62.1      | 59.2      | 59.3   | 59.8    | 66.2       | 63.8               | 64.2                  |
-And for 13B models:
-| Task       | Random | [StableLM 2 1.6b](https://huggingface.co/stabilityai/stablelm-2-1_6b)\* | [Pythia 1B](https://huggingface.co/EleutherAI/pythia-1b) | [TinyLlama 1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T) | [OLMo 1.0 1B](https://huggingface.co/allenai/OLMo-1B-hf) | **OLMo 1B July 2024** |
-| ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ | ----------------- | --------- | -------------------------------------- | ------- | ------ |
-| arc_challenge | 25     | 43.81             | 33.11     | 34.78                                  | 34.45   | 36.5 |
-| arc_easy      | 25     | 63.68             | 50.18     | 53.16                                  | 58.07   | 55.3 |
-| boolq         | 50     | 76.6              | 61.8      | 64.6                                   | 60.7    | 67.5 |
-| copa          | 50     | 84                | 72        | 78                                     | 79      | 83.0 |
-| hellaswag     | 25     | 68.2              | 44.7      | 58.7                                   | 62.5    | 66.9 |
-| openbookqa    | 25     | 45.8              | 37.8      | 43.6                                   | 46.4    | 46.4 |
-| piqa          | 50     | 74                | 69.1      | 71.1                                   | 73.7    | 74.9 |
-| sciq          | 25     | 94.7              | 86        | 90.5                                   | 88.1    | 93.4 |
-| winogrande    | 50     | 64.9              | 53.3      | 58.9                                   | 58.9    | 61.4 |
-| Average       | 36.11  | 68.41             | 56.44     | 61.48                                  | 62.42   | 65.0 |
- -->
 ## Model Details
-### Data
-`TODO`
-### Staged training / annealing
-`TODO`
 ## Bias, Risks, and Limitations

 license: apache-2.0
 datasets:
 - allenai/dolmino-mix-1124
+- allenai/dolma
 language:
 - en
 ---
     - Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo
     - Evaluation code: https://github.com/allenai/OLMo-Eval
     - Further fine-tuning code: https://github.com/allenai/open-instruct
+<!-- - **Paper:** [Link](https://arxiv.org/abs/2402.00838)
 - **Technical blog post:** https://blog.allenai.org/olmo-1-7-7b-a-24-point-improvement-on-mmlu-92b43f7d269d
 - **W&B Logs:** [pretraining](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B), [annealing](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B-anneal)
+ -->
 ## Evaluation
+Core model results for OLMo2 7B and 13B models are found below.
+| Model | Train FLOPs | Average | ARC/C | HSwag | WinoG | MMLU | DROP | NQ | AGIEval | GSM8k | MMWLUPro | TriviaQA |
+|-------------------|------------|---------|--------|--------|--------|-------|-------|-----|----------|--------|-----------|-----------|
+| Gemma-2-9B | 4.4·10²³ | 52.9 | 89.5 | 87.3 | 78.8 | 70.6 | 63 | 38 | 57.3 | 1.1 | 42 | 0.9 |
+| Llama-2-13B | 1.6·10²³ | 54.1 | 67.3 | 83.9 | 74.9 | 55.7 | 45.6 | 38.4 | 41.5 | 28.1 | 23.9 | 81.3 |
+| Mistral-7B-v0.3 | n/a | 58.8 | 78.3 | 83.1 | 77.7 | 63.5 | 51.8 | 37.2 | 47.3 | 40.1 | 30 | 79.3 |
+| Llama-3.1-8B | 7.2·10²³ | 61.8 | 79.5 | 81.6 | 76.6 | 66.9 | 56.4 | 33.9 | 51.3 | 56.5 | 34.7 | 80.3 |
+| Mistral-Nemo-12B | n/a | 66.9 | 85.2 | 85.6 | 81.5 | 69.5 | 69.2 | 39.7 | 54.7 | 62.1 | 36.7 | 84.6 |
+| Qwen-2.5-7B | 8.2·10²³ | 67.4 | 89.5 | 89.7 | 74.2 | 74.4 | 55.8 | 29.9 | 63.7 | 81.5 | 45.8 | 69.4 |
+| Qwen-2.5-14B | 16.0·10²³ | 72.2 | 94 | 94 | 80 | 79.3 | 51.5 | 37.3 | 71 | 83.4 | 52.8 | 79.1 |
+| StableLM-2-12B | 2.9·10²³ | 62.2 | 81.9 | 84.5 | 77.7 | 62.4 | 55.5 | 37.6 | 50.9 | 62 | 29.3 | 79.9 |
+| Zamba-2-7B | n/c | 65.2 | 92.2 | 89.4 | 79.6 | 68.5 | 51.7 | 36.5 | 55.5 | 67.2 | 32.8 | 78.8 |
+| Amber-7B | 0.5·10²³ | 35.2 | 44.9 | 74.5 | 65.5 | 24.7 | 26.1 | 18.7 | 21.8 | 4.8 | 11.7 | 59.3 |
+| OLMo-7B | 1.0·10²³ | 38.3 | 46.4 | 78.1 | 68.5 | 28.3 | 27.3 | 24.8 | 23.7 | 9.2 | 12.1 | 64.1 |
+| MAP-Neo-7B | 2.1·10²³ | 49.6 | 78.4 | 72.8 | 69.2 | 58 | 39.4 | 28.9 | 45.8 | 12.5 | 25.9 | 65.1 |
+| OLMo-0424-7B | 0.9·10²³ | 50.7 | 66.9 | 80.1 | 73.6 | 54.3 | 50 | 29.6 | 43.9 | 27.7 | 22.1 | 58.8 |
+| DCLM-7B | 1.0·10²³ | 56.9 | 79.8 | 82.3 | 77.3 | 64.4 | 39.3 | 28.8 | 47.5 | 46.1 | 31.3 | 72.1 |
+| **OLMo-2-1124-7B** | 1.8·10²³ | 62.9 | 79.8 | 83.8 | 77.2 | 63.7 | 60.8 | 36.9 | 50.4 | 67.5 | 31 | 78 |
+| **OLMo-2-1124-13B** | 4.6·10²³ | 68.3 | 83.5 | 86.4 | 81.5 | 67.5 | 70.7 | 46.7 | 54.2 | 75.1 | 35.1 | 81.9 |
 ## Model Details
+### Pretraining
+|  | **OLMo 2 7B** | **OLMo 2 13B** |
+|-------------------|------------|------------|
+| Pretraining Stage 1<br>([OLMo-Mix-1124](https://huggingface.co/datasets/allenai/olmo-mix-1124)) | 4 trillion tokens<br>(1 epoch) | 5 trillion tokens<br>(1.2 epochs) |
+| Pretraining Stage 2<br>([Dolmino-Mix-1124](https://huggingface.co/datasets/allenai/dolmino-mix-1124)) | 50B tokens (3 runs)<br>*merged* | 100B tokens (3 runs)<br>300B tokens (1 run)<br>*merged* |
+| Post-training<br>([Tulu 3 SFT OLMo mix](https://huggingface.co/datasets/allenai/tulu-3-sft-olmo-mixture)) | SFT + DPO + PPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-1124-7b-preference-mix)) | SFT + DPO + PPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-1124-13b-preference-mix)) |
+#### Stage 1: Initial Pretraining
+- Dataset: [OLMo-Mix-1124](https://huggingface.co/datasets/allenai/olmo-mix-1124) (3.9T tokens)
+- Coverage: 90%+ of total pretraining budget
+- 7B Model: ~1 epoch
+- 13B Model: 1.2 epochs (5T tokens)
+#### Stage 2: Fine-tuning
+- Dataset: [Dolmino-Mix-1124](https://huggingface.co/datasets/allenai/dolmino-mix-1124) (843B tokens)
+- Three training mixes:
+  - 50B tokens
+  - 100B tokens
+  - 300B tokens
+- Mix composition: 50% high-quality data + academic/Q&A/instruction/math content
+#### Model Merging
+- 7B Model: 3 versions trained on 50B mix, merged via model souping
+- 13B Model: 3 versions on 100B mix + 1 version on 300B mix, merged for final checkpoint
 ## Bias, Risks, and Limitations