amanrangapur
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -13,22 +13,18 @@ language:
|
|
13 |
|
14 |
# Model Card for OLMo2 13B
|
15 |
|
16 |
-
OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
|
17 |
-
|
18 |
-
|
19 |
|
20 |
-
|
21 |
-
|
22 |
-
The core models released in this batch are the following:
|
23 |
| Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
|
24 |
|------|--------|---------|-------------|-----------------|----------------|
|
25 |
| [OLMo2-7B July 2024](https://huggingface.co/allenai/OLMo-7B-0724-hf) | 4 Trillion | 32 | 4096 | 32 | 4096 |
|
26 |
| [OLMo2- 13B July 2024](https://huggingface.co/allenai/OLMo-1B-0724-hf) | 5 Trillion | 40 | 5120 | 42 | 4096 |
|
27 |
|
28 |
-
|
29 |
## Inference
|
30 |
|
31 |
-
|
32 |
```python
|
33 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
34 |
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo2-13B-1124")
|
@@ -43,8 +39,16 @@ print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
|
|
43 |
>> 'Language modeling is the first step to build natural language generation...'
|
44 |
```
|
45 |
|
46 |
-
|
47 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
|
49 |
We have released checkpoints for these models, for every 1000 training steps.
|
50 |
The naming convention is `stepXXX-tokensYYYB`.
|
@@ -122,7 +126,7 @@ Core model results for OLMo 7B models are found below.
|
|
122 |
|
123 |
And for 13B models:
|
124 |
|
125 |
-
|
|
126 |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ | ----------------- | --------- | -------------------------------------- | ------- | ------ |
|
127 |
| arc_challenge | 25 | 43.81 | 33.11 | 34.78 | 34.45 | 36.5 |
|
128 |
| arc_easy | 25 | 63.68 | 50.18 | 53.16 | 58.07 | 55.3 |
|
|
|
13 |
|
14 |
# Model Card for OLMo2 13B
|
15 |
|
16 |
+
OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
|
17 |
+
These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs (coming soon), and associated training details.
|
18 |
+
The core models released in this batch include the following:
|
19 |
|
|
|
|
|
|
|
20 |
| Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
|
21 |
|------|--------|---------|-------------|-----------------|----------------|
|
22 |
| [OLMo2-7B July 2024](https://huggingface.co/allenai/OLMo-7B-0724-hf) | 4 Trillion | 32 | 4096 | 32 | 4096 |
|
23 |
| [OLMo2- 13B July 2024](https://huggingface.co/allenai/OLMo-1B-0724-hf) | 5 Trillion | 40 | 5120 | 42 | 4096 |
|
24 |
|
|
|
25 |
## Inference
|
26 |
|
27 |
+
You can use OLMo with the standard HuggingFace transformers library:
|
28 |
```python
|
29 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
30 |
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo2-13B-1124")
|
|
|
39 |
>> 'Language modeling is the first step to build natural language generation...'
|
40 |
```
|
41 |
|
42 |
+
For faster performance, you can quantize the model using the following method:
|
43 |
+
```python
|
44 |
+
AutoModelForCausalLM.from_pretrained("allenai/OLMo2-13B-1124",
|
45 |
+
torch_dtype=torch.float16,
|
46 |
+
load_in_8bit=True) # Requires bitsandbytes package
|
47 |
+
```
|
48 |
+
The quantized model is more sensitive to data types and CUDA operations. To avoid potential issues, it's recommended to pass the inputs directly to CUDA using:
|
49 |
+
```python
|
50 |
+
inputs.input_ids.to('cuda')
|
51 |
+
```
|
52 |
|
53 |
We have released checkpoints for these models, for every 1000 training steps.
|
54 |
The naming convention is `stepXXX-tokensYYYB`.
|
|
|
126 |
|
127 |
And for 13B models:
|
128 |
|
129 |
+
| Task | Random | [StableLM 2 1.6b](https://huggingface.co/stabilityai/stablelm-2-1_6b)\* | [Pythia 1B](https://huggingface.co/EleutherAI/pythia-1b) | [TinyLlama 1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T) | [OLMo 1.0 1B](https://huggingface.co/allenai/OLMo-1B-hf) | **OLMo 1B July 2024** |
|
130 |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ | ----------------- | --------- | -------------------------------------- | ------- | ------ |
|
131 |
| arc_challenge | 25 | 43.81 | 33.11 | 34.78 | 34.45 | 36.5 |
|
132 |
| arc_easy | 25 | 63.68 | 50.18 | 53.16 | 58.07 | 55.3 |
|