Update README.md
Browse files
README.md
CHANGED
@@ -79,17 +79,17 @@ TokenFormer compared with Opensource Transformer-based LLMs.
|
|
79 |
| Model | #Param | LAMBADA | HellaSwag | PIQA | Arc-E | Arc-C | WinoGrande | Average |
|
80 |
| :----: | :------: | :------: | :-------: | :--: | :---: | :---: | :--------: | :------: |
|
81 |
| Pythia | 150M | 35.4 | 30.3 | 62.3 | 43.6 | 23.6 | 51.3 | 40.1 |
|
82 |
-
| TokenFormer | 150M | 45.0 | 35.5 | 64.9 | 47.3 | 24.9 | 50.4 | 44.7 |
|
83 |
| Pythia | 410M | 51.4 | 40.6 | 66.9 | 52.1 | 24.6 | 53.8 | 48.2 |
|
84 |
-
| TokenFormer | 450M | 57.3 | 47.5 | 69.5 | 56.2 | 26.7 | 54.6 | 52.0 |
|
85 |
| Pythia | 1B | 56.1 | 47.2 | 70.7 | 57.0 | 27.1 | 53.5 | 51.9 |
|
86 |
-
| TokenFormer | 900M | 64.0 | 55.3 | 72.4 | 59.9 | 30.6 | 56.4 | 56.4 |
|
87 |
| GPT-Neo | 1.3B | 57.2 | 48.9 | 71.1 | 56.2 | 25.9 | 54.9 | 52.4 |
|
88 |
| OPT | 1.3B | 58.0 | 53.7 | 72.4 | 56.7 | 29.6 | 59.5 | 55.0 |
|
89 |
| Pythia | 1.3B | 61.7 | 52.1 | 71.0 | 60.5 | 28.5 | 57.2 | 55.2 |
|
90 |
| GPT-Neo | 2.7B | 62.2 | 55.8 | 71.1 | 61.1 | 30.2 | 57.6 | 56.5 |
|
91 |
| OPT | 2.7B | 63.6 | 60.6 | 74.8 | 60.8 | 31.3 | 61.0 | 58.7 |
|
92 |
| Pythia | 2.8B | 64.7 | 59.3 | 74.0 | 64.1 | 32.9 | 59.7 | 59.1 |
|
93 |
-
| TokenFormer | 1.5B | 64.7 | 60.0 | 74.8 | 64.8 | 32.0 | 59.7 | 59.3 |
|
94 |
<figcaption>Zero-shot evaluation of Language Modeling. </figcaption>
|
95 |
</figure>
|
|
|
79 |
| Model | #Param | LAMBADA | HellaSwag | PIQA | Arc-E | Arc-C | WinoGrande | Average |
|
80 |
| :----: | :------: | :------: | :-------: | :--: | :---: | :---: | :--------: | :------: |
|
81 |
| Pythia | 150M | 35.4 | 30.3 | 62.3 | 43.6 | 23.6 | 51.3 | 40.1 |
|
82 |
+
| **TokenFormer** | 150M | **45.0** | **35.5** | **64.9** | **47.3** | **24.9** | **50.4** | **44.7** |
|
83 |
| Pythia | 410M | 51.4 | 40.6 | 66.9 | 52.1 | 24.6 | 53.8 | 48.2 |
|
84 |
+
| **TokenFormer** | 450M | **57.3** | **47.5** | **69.5** | **56.2** | **26.7** | **54.6** | **52.0** |
|
85 |
| Pythia | 1B | 56.1 | 47.2 | 70.7 | 57.0 | 27.1 | 53.5 | 51.9 |
|
86 |
+
| **TokenFormer** | 900M | **64.0** | **55.3** | **72.4** | **59.9** | **30.6** | **56.4** | **56.4** |
|
87 |
| GPT-Neo | 1.3B | 57.2 | 48.9 | 71.1 | 56.2 | 25.9 | 54.9 | 52.4 |
|
88 |
| OPT | 1.3B | 58.0 | 53.7 | 72.4 | 56.7 | 29.6 | 59.5 | 55.0 |
|
89 |
| Pythia | 1.3B | 61.7 | 52.1 | 71.0 | 60.5 | 28.5 | 57.2 | 55.2 |
|
90 |
| GPT-Neo | 2.7B | 62.2 | 55.8 | 71.1 | 61.1 | 30.2 | 57.6 | 56.5 |
|
91 |
| OPT | 2.7B | 63.6 | 60.6 | 74.8 | 60.8 | 31.3 | 61.0 | 58.7 |
|
92 |
| Pythia | 2.8B | 64.7 | 59.3 | 74.0 | 64.1 | 32.9 | 59.7 | 59.1 |
|
93 |
+
| **TokenFormer** | 1.5B | **64.7** | 60.0 | **74.8** | **64.8** | 32.0 | 59.7 | **59.3** |
|
94 |
<figcaption>Zero-shot evaluation of Language Modeling. </figcaption>
|
95 |
</figure>
|