allenai
/

OLMo-2-1124-13B

Safetensors

English

olmo2

Model card Files Files and versions Community

amanrangapur commited on Nov 26, 2024

Commit

bb8557c

verified ·

1 Parent(s): 5dbe404

Update README.md

Browse files

Files changed (1) hide show

README.md +15 -11

README.md CHANGED Viewed

@@ -13,22 +13,18 @@ language:
 # Model Card for OLMo2 13B
-OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
-The OLMo models are trained on the [Dolma](https://huggingface.co/datasets/allenai/dolma) dataset.
-We release all code, checkpoints, logs (coming soon), and details involved in training these models.
-The core models released in this batch are the following:
 | Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
 |------|--------|---------|-------------|-----------------|----------------|
 | [OLMo2-7B July 2024](https://huggingface.co/allenai/OLMo-7B-0724-hf) | 4 Trillion   | 32     | 4096        | 32              |  4096  |
 | [OLMo2- 13B July 2024](https://huggingface.co/allenai/OLMo-1B-0724-hf) | 5 Trillion   | 40     | 5120        | 42              |  4096  |
 ## Inference
-Proceed as usual with HuggingFace:
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo2-13B-1124")
@@ -43,8 +39,16 @@ print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
 >> 'Language modeling is the first step to build natural language generation...'
 ```
-Or, you can make this slightly faster by quantizing the model, e.g. `AutoModelForCausalLM.from_pretrained("allenai/OLMo2-13B-1124", torch_dtype=torch.float16, load_in_8bit=True)` (requires `bitsandbytes`).
-The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as `inputs.input_ids.to('cuda')` to avoid potential issues.
 We have released checkpoints for these models, for every 1000 training steps.
 The naming convention is `stepXXX-tokensYYYB`.
@@ -122,7 +126,7 @@ Core model results for OLMo 7B models are found below.
 And for 13B models:
-| task       | random | [StableLM 2 1.6b](https://huggingface.co/stabilityai/stablelm-2-1_6b)\* | [Pythia 1B](https://huggingface.co/EleutherAI/pythia-1b) | [TinyLlama 1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T) | [OLMo 1.0 1B](https://huggingface.co/allenai/OLMo-1B-hf) | **OLMo 1B July 2024** |
 | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ | ----------------- | --------- | -------------------------------------- | ------- | ------ |
 | arc_challenge | 25     | 43.81             | 33.11     | 34.78                                  | 34.45   | 36.5 |
 | arc_easy      | 25     | 63.68             | 50.18     | 53.16                                  | 58.07   | 55.3 |

 # Model Card for OLMo2 13B
+OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
+These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs (coming soon), and associated training details.
+The core models released in this batch include the following:
 | Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
 |------|--------|---------|-------------|-----------------|----------------|
 | [OLMo2-7B July 2024](https://huggingface.co/allenai/OLMo-7B-0724-hf) | 4 Trillion   | 32     | 4096        | 32              |  4096  |
 | [OLMo2- 13B July 2024](https://huggingface.co/allenai/OLMo-1B-0724-hf) | 5 Trillion   | 40     | 5120        | 42              |  4096  |
 ## Inference
+You can use OLMo with the standard HuggingFace transformers library:
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo2-13B-1124")
 >> 'Language modeling is the first step to build natural language generation...'
 ```
+For faster performance, you can quantize the model using the following method:
+```python
+AutoModelForCausalLM.from_pretrained("allenai/OLMo2-13B-1124",
+    torch_dtype=torch.float16,
+    load_in_8bit=True)  # Requires bitsandbytes package
+```
+The quantized model is more sensitive to data types and CUDA operations. To avoid potential issues, it's recommended to pass the inputs directly to CUDA using:
+```python
+inputs.input_ids.to('cuda')
+```
 We have released checkpoints for these models, for every 1000 training steps.
 The naming convention is `stepXXX-tokensYYYB`.
 And for 13B models:
+| Task       | Random | [StableLM 2 1.6b](https://huggingface.co/stabilityai/stablelm-2-1_6b)\* | [Pythia 1B](https://huggingface.co/EleutherAI/pythia-1b) | [TinyLlama 1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T) | [OLMo 1.0 1B](https://huggingface.co/allenai/OLMo-1B-hf) | **OLMo 1B July 2024** |
 | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ | ----------------- | --------- | -------------------------------------- | ------- | ------ |
 | arc_challenge | 25     | 43.81             | 33.11     | 34.78                                  | 34.45   | 36.5 |
 | arc_easy      | 25     | 63.68             | 50.18     | 53.16                                  | 58.07   | 55.3 |