stellaathena
commited on
Commit
•
a7c07bf
1
Parent(s):
a6259a0
Update README.md
Browse files
README.md
CHANGED
@@ -48,25 +48,31 @@ GPT-Neo was trained as an autoregressive language model. This means that its cor
|
|
48 |
GPT-Neo was trained on the Pile, a dataset known to contain profanity, lewd, and otherwise abrasive language. Depending on your usecase GPT-Neo may produce socially unacceptable text. See Sections 5 and 6 of the Pile paper for a more detailed analysis of the biases in the Pile.
|
49 |
|
50 |
As with all language models, it is hard to predict in advance how GPT-Neo will respond to particular prompts and offensive content may occur without warning. We recommend having a human curate or filter the outputs before releasing them, both to censor undesirable content and to improve the quality of the results.
|
|
|
51 |
## Eval results
|
52 |
|
53 |
-
###
|
54 |
|
55 |
-
|
|
|
|
|
|
|
|
|
|
|
56 |
|
57 |
-
|
58 |
-
| ---------------- | ------------- | ------------- | -------------- |
|
59 |
-
| **GPT-Neo 1.3B** | **0.7527** | **6.159** | **13.10** |
|
60 |
-
| GPT-3 1.3B | ------ | ----- | ----- |
|
61 |
-
| GPT-2 1.5B | 1.0468 | ----- | 17.48 |
|
62 |
-
| GPT-Neo 2.7B | 0.7165 | 5.646 | 11.39 |
|
63 |
-
| GPT-3 2.7B | 0.9631 | ----- | ----- |
|
64 |
-
| GPT-3 175B | 0.7177 | ----- | ----- |
|
65 |
|
66 |
-
|
|
|
|
|
|
|
|
|
|
|
67 |
|
68 |
### Down-Stream Applications
|
69 |
|
|
|
|
|
70 |
### BibTeX entry and citation info
|
71 |
|
72 |
```bibtex
|
|
|
48 |
GPT-Neo was trained on the Pile, a dataset known to contain profanity, lewd, and otherwise abrasive language. Depending on your usecase GPT-Neo may produce socially unacceptable text. See Sections 5 and 6 of the Pile paper for a more detailed analysis of the biases in the Pile.
|
49 |
|
50 |
As with all language models, it is hard to predict in advance how GPT-Neo will respond to particular prompts and offensive content may occur without warning. We recommend having a human curate or filter the outputs before releasing them, both to censor undesirable content and to improve the quality of the results.
|
51 |
+
|
52 |
## Eval results
|
53 |
|
54 |
+
### Linguistic Reasoning
|
55 |
|
56 |
+
| Model and Size | Pile BPB | Pile PPL | Wikitext PPL | Lambada PPL | Lambada Acc | Winogrande | Hellaswag |
|
57 |
+
| ---------------- | ---------- | ---------- | ------------- | ----------- | ----------- | ---------- | ----------- |
|
58 |
+
| **GPT-Neo 1.3B** | **0.7527** | **6.159** | **13.10** | **7.498** | **57.23%** | **55.01%** | **38.66%** |
|
59 |
+
| GPT-2 1.5B | 1.0468 | ----- | 17.48 | 10.634 | 51.21% | 59.40% | 40.03% |
|
60 |
+
| GPT-Neo 2.7B | 0.7165 | 5.646 | 11.39 | 5.626 | 62.22% | 56.50% | 42.73% |
|
61 |
+
| GPT-3 Ada | 0.9631 | ----- | ----- | 9.954 | 51.60% | 52.90% | 35.93% |
|
62 |
|
63 |
+
### Physical and Scientific Reasoning
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
64 |
|
65 |
+
| Model and Size | MathQA | PubMedQA | Piqa |
|
66 |
+
| ---------------- | ---------- | ---------- | ----------- |
|
67 |
+
| **GPT-Neo 1.3B** | **24.05%** | **54.40%** | **71.11%** |
|
68 |
+
| GPT-2 1.5B | 23.64% | 58.33% | 70.78% |
|
69 |
+
| GPT-Neo 2.7B | 24.72% | 57.54% | 72.14% |
|
70 |
+
| GPT-3 Ada | 24.29% | 52.80% | 68.88% |
|
71 |
|
72 |
### Down-Stream Applications
|
73 |
|
74 |
+
TBD
|
75 |
+
|
76 |
### BibTeX entry and citation info
|
77 |
|
78 |
```bibtex
|