patrickvonplaten
commited on
Merge branch 'main' of https://huggingface.co./facebook/opt-350m into main
Browse files
README.md
CHANGED
@@ -9,11 +9,11 @@ license: mit
|
|
9 |
|
10 |
# OPT : Open Pre-trained Transformer Language Models
|
11 |
|
12 |
-
Feel free to test the whole generation capabilities here: https://transformer.huggingface.co/doc/opt-30b.
|
13 |
|
14 |
-
The models were pretrained on the English language using a causal language modeling (CLM) objective. It was first introduced in [
|
15 |
|
16 |
-
Disclaimer
|
17 |
[model card](https://github.com/facebookresearch/metaseq/blob/main/projects/OPT/model_card.md) for their model, which is available in the appendix D of their paper. Content from this model card
|
18 |
has been written by the Hugging Face team to complete the information they provided and give specific examples of how to use the model, and the various bias.
|
19 |
|
@@ -38,8 +38,7 @@ You can use the raw model for text generation or fine-tune it to a downstream ta
|
|
38 |
|
39 |
### How to use
|
40 |
|
41 |
-
You can use this model directly with a pipeline for text generation.
|
42 |
-
set a seed for reproducibility:
|
43 |
|
44 |
```python
|
45 |
>>> from transformers import pipeline, set_seed, OPTForCausalLM, GPT2Tokenizer
|
@@ -52,15 +51,48 @@ set a seed for reproducibility:
|
|
52 |
[{'generated_text': "Hello, I'm a language model, and I'm interested in learning more about the language model.\n\nI'm a language model, and I"}]
|
53 |
```
|
54 |
|
55 |
-
Here is how to use this model to get the
|
56 |
|
57 |
```python
|
58 |
-
from transformers import GPT2Tokenizer, OPTModel
|
59 |
-
tokenizer = GPT2Tokenizer.from_pretrained("patrickvonplaten/opt_gpt2_tokenizer")
|
60 |
-
model = OPTModel.from_pretrained("facebook/opt-350m")
|
61 |
-
text = "
|
62 |
-
encoded_input = tokenizer(text, return_tensors='pt')
|
63 |
-
output = model(**encoded_input)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
64 |
```
|
65 |
|
66 |
### Limitations and bias
|
|
|
9 |
|
10 |
# OPT : Open Pre-trained Transformer Language Models
|
11 |
|
12 |
+
Feel free to test the whole generation capabilities here: https://transformer.huggingface.co/doc/opt-30b.
|
13 |
|
14 |
+
The models were pretrained on the English language using a causal language modeling (CLM) objective. It was first introduced in [Open Pre-trained Transformer Language Models](https://arxiv.org/pdf/2205.01068.pdf) and was first released in [metaseq repository](https://github.com/facebookresearch/metaseq) on May 3rd 2022 by the META AI team.
|
15 |
|
16 |
+
**Disclaimer**: The team releasing OPT also wrote a
|
17 |
[model card](https://github.com/facebookresearch/metaseq/blob/main/projects/OPT/model_card.md) for their model, which is available in the appendix D of their paper. Content from this model card
|
18 |
has been written by the Hugging Face team to complete the information they provided and give specific examples of how to use the model, and the various bias.
|
19 |
|
|
|
38 |
|
39 |
### How to use
|
40 |
|
41 |
+
You can use this model directly with a pipeline for text generation. Generation is deterministic, thus in order to use the top-k sampling `do_sample` is set to `True`.
|
|
|
42 |
|
43 |
```python
|
44 |
>>> from transformers import pipeline, set_seed, OPTForCausalLM, GPT2Tokenizer
|
|
|
51 |
[{'generated_text': "Hello, I'm a language model, and I'm interested in learning more about the language model.\n\nI'm a language model, and I"}]
|
52 |
```
|
53 |
|
54 |
+
Here is how to use this model to get the hidden states of a given text in PyTorch:
|
55 |
|
56 |
```python
|
57 |
+
>>> from transformers import GPT2Tokenizer, OPTModel
|
58 |
+
>>> tokenizer = GPT2Tokenizer.from_pretrained("patrickvonplaten/opt_gpt2_tokenizer")
|
59 |
+
>>> model = OPTModel.from_pretrained("facebook/opt-350m")
|
60 |
+
>>> text = "I am happy to be releasing a new model!"
|
61 |
+
>>> encoded_input = tokenizer(text, return_tensors='pt')
|
62 |
+
>>> output = model(**encoded_input)
|
63 |
+
BaseModelOutputWithPast(last_hidden_state=tensor([[[-2.4159, 0.7136, -4.6705, ..., -1.3857, 0.4758, -1.5518],
|
64 |
+
[-1.4122, -2.0026, -9.4849, ..., 1.3589, 3.1777, 0.8622],
|
65 |
+
[ 0.8425, -5.9863, -5.7204, ..., 2.2054, 4.3147, 0.2039],
|
66 |
+
...,
|
67 |
+
[-0.5943, -0.9686, -2.3670, ..., 6.7386, -4.5704, 3.1795],
|
68 |
+
[ 0.0582, -5.4449, -3.1305, ..., 3.9461, -2.2183, 1.1721],
|
69 |
+
[ 0.0547, -4.1437, -0.1780, ..., -0.1648, 0.7273, 0.7006]]],
|
70 |
+
grad_fn=<UnsafeViewBackward0>), past_key_values=((tensor([[[[-0.4485, 0.4126, 0.3829, ..., -0.4228, 0.5844, 0.4145],
|
71 |
+
[-0.8542, 0.8587, 0.8495, ..., -0.8048, 0.7143, 0.8142],
|
72 |
+
[-0.6921, 0.6961, 0.6502, ..., -0.6523, 0.5810, 0.6708],
|
73 |
+
...,
|
74 |
+
[-0.6822, 0.6847, 0.6880, ..., -0.6225, 0.5817, 0.6720],
|
75 |
+
[-0.7208, 0.7355, 0.6723, ..., -0.6821, 0.6895, 0.7070],
|
76 |
+
[-0.6217, 0.6276, 0.6367, ..., -0.5950, 0.5609, 0.6075]],
|
77 |
+
|
78 |
+
[[-0.0373, -0.4824, 0.0290, ..., -0.5359, 0.5350, 0.1365],
|
79 |
+
[ 0.8295, -0.3887, -0.7507, ..., -0.2576, -1.1691, 0.6727],
|
80 |
+
[ 0.5611, -0.3490, -0.5395, ..., -0.2822, -0.7972, 0.5236],
|
81 |
+
...,
|
82 |
+
[ 0.4013, -0.2377, -0.3478, ..., -0.1679, -0.5556, 0.4043],
|
83 |
+
[ 0.5444, -0.3821, -0.4555, ..., -0.2781, -0.6267, 0.4551],
|
84 |
+
[ 0.2731, -0.1157, -0.2134, ..., -0.0131, -0.3230, 0.2420]],
|
85 |
+
|
86 |
+
[[-0.8761, 0.8668, 0.8488, ..., -0.7307, -0.8133, 0.7668],
|
87 |
+
[-0.6488, 0.7369, 0.7716, ..., -0.8711, -0.6874, 0.7305],
|
88 |
+
[-0.6605, 0.7629, 0.7675, ..., -0.7790, -0.6908, 0.7493],
|
89 |
+
...,
|
90 |
+
[-0.6542, 0.7252, 0.7787, ..., -0.7739, -0.6742, 0.7018],
|
91 |
+
[-0.7012, 0.7739, 0.8003, ..., -0.8420, -0.7059, 0.7675],
|
92 |
+
[-0.5077, 0.5662, 0.6203, ..., -0.7885, -0.5262, 0.5924]],
|
93 |
+
|
94 |
+
...,
|
95 |
+
]]], hidden_states=None, attentions=None)
|
96 |
```
|
97 |
|
98 |
### Limitations and bias
|