avi-skowron
commited on
Commit
•
f98c709
1
Parent(s):
6e35e21
updated the use section
Browse files
README.md
CHANGED
@@ -38,17 +38,36 @@ dimension is split into 16 heads, each with a dimension of 256. Rotary Position
|
|
38 |
dimensions of each head. The model is trained with a tokenization vocabulary of 50257, using the same set of BPEs as
|
39 |
GPT-2/GPT-3.
|
40 |
|
41 |
-
##
|
42 |
|
43 |
-
GPT-J
|
|
|
|
|
44 |
|
45 |
-
|
46 |
|
47 |
-
|
|
|
|
|
|
|
48 |
|
49 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
50 |
|
51 |
-
GPT-J
|
|
|
|
|
52 |
|
53 |
### How to use
|
54 |
|
@@ -61,13 +80,13 @@ tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
|
|
61 |
model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6B")
|
62 |
```
|
63 |
|
64 |
-
|
65 |
|
66 |
-
|
67 |
|
68 |
-
|
69 |
|
70 |
-
|
71 |
|
72 |
## Evaluation results
|
73 |
|
|
|
38 |
dimensions of each head. The model is trained with a tokenization vocabulary of 50257, using the same set of BPEs as
|
39 |
GPT-2/GPT-3.
|
40 |
|
41 |
+
## Intended Use and Limitations
|
42 |
|
43 |
+
GPT-J learns an inner representation of the English language that can be used to
|
44 |
+
extract features useful for downstream tasks. The model is best at what it was
|
45 |
+
pretrained for however, which is generating text from a prompt.
|
46 |
|
47 |
+
### Out-of-scope use
|
48 |
|
49 |
+
GPT-J-6B is **not** intended for deployment without fine-tuning, supervision,
|
50 |
+
and/or moderation. It is not a in itself a product and cannot be used for
|
51 |
+
human-facing interactions. For example, the model may generate harmful or
|
52 |
+
offensive text. Please evaluate the risks associated with your particular use case.
|
53 |
|
54 |
+
GPT-J-6B was trained on an English-language only dataset, and is thus **not**
|
55 |
+
suitable for translation or generating text in other languages.
|
56 |
+
|
57 |
+
GPT-J-6B has not been fine-tuned for downstream contexts in which
|
58 |
+
language models are commonly deployed, such as writing genre prose,
|
59 |
+
or commercial chatbots. This means GPT-J-6B will **not**
|
60 |
+
respond to a given prompt the way a product like ChatGPT does. This is because,
|
61 |
+
unlike this model, ChatGPT was fine-tuned using methods such as Reinforcement
|
62 |
+
Learning from Human Feedback (RLHF) to better “follow” human instructions.
|
63 |
+
|
64 |
+
### Limitations and Biases
|
65 |
+
|
66 |
+
The core functionality of GPT-J is taking a string of text and predicting the next token. While language models are widely used for tasks other than this, there are a lot of unknowns with this work. When prompting GPT-J it is important to remember that the statistically most likely next token is often not the token that produces the most "accurate" text. Never depend upon GPT-J to produce factually accurate output.
|
67 |
|
68 |
+
GPT-J was trained on the Pile, a dataset known to contain profanity, lewd, and otherwise abrasive language. Depending upon use case GPT-J may produce socially unacceptable text. See [Sections 5 and 6 of the Pile paper](https://arxiv.org/abs/2101.00027) for a more detailed analysis of the biases in the Pile.
|
69 |
+
|
70 |
+
As with all language models, it is hard to predict in advance how GPT-J will respond to particular prompts and offensive content may occur without warning. We recommend having a human curate or filter the outputs before releasing them, both to censor undesirable content and to improve the quality of the results.
|
71 |
|
72 |
### How to use
|
73 |
|
|
|
80 |
model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6B")
|
81 |
```
|
82 |
|
83 |
+
## Training data
|
84 |
|
85 |
+
GPT-J 6B was trained on [the Pile](https://pile.eleuther.ai), a large-scale curated dataset created by [EleutherAI](https://www.eleuther.ai).
|
86 |
|
87 |
+
## Training procedure
|
88 |
|
89 |
+
This model was trained for 402 billion tokens over 383,500 steps on TPU v3-256 pod. It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly.
|
90 |
|
91 |
## Evaluation results
|
92 |
|