Update README.md
Browse files
README.md
CHANGED
@@ -4,7 +4,7 @@ license: apache-2.0
|
|
4 |
|
5 |
The *TokenFormer* is a **fully attention-based architecture**
|
6 |
that unifies the computations of token-token and token-parameter interactions
|
7 |
-
by entirely employing the attention mechanism, **maximizes the flexibility of neural network**.[(see paper)](https://
|
8 |
It contains four models of sizes
|
9 |
150M, 450M, 900M, 1.5B. For each size, it's trained based on [gpt-neox](https://github.com/EleutherAI/gpt-neox) code base and uses [Pile](https://huggingface.co/datasets/EleutherAI/pile) with 300B tokens.
|
10 |
All 4 model sizes are trained on the exact
|
@@ -19,7 +19,7 @@ same data, in the exact same order.
|
|
19 |
- Language: English
|
20 |
- Learn more: [TokenFormer's GitHub repository](https://github.com/Haiyang-W/TokenFormer)
|
21 |
for training procedure, config files, and details on how to use.
|
22 |
-
[See paper](https://
|
23 |
details.
|
24 |
- Library: [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)
|
25 |
- License: Apache 2.0
|
@@ -68,7 +68,7 @@ TokenFormer uses the same tokenizer as [GPT-NeoX-
|
|
68 |
|
69 |
## Evaluations
|
70 |
|
71 |
-
All
|
72 |
Harness](https://github.com/EleutherAI/lm-evaluation-harness).
|
73 |
You can run the evaluation with our [instruction](https://github.com/Haiyang-W/TokenFormer?tab=readme-ov-file#evaluations).<br>
|
74 |
Expand the sections below to see plots of evaluation results for all
|
|
|
4 |
|
5 |
The *TokenFormer* is a **fully attention-based architecture**
|
6 |
that unifies the computations of token-token and token-parameter interactions
|
7 |
+
by entirely employing the attention mechanism, **maximizes the flexibility of neural network**.[(see paper)](https://arxiv.org/pdf/2410.23168).
|
8 |
It contains four models of sizes
|
9 |
150M, 450M, 900M, 1.5B. For each size, it's trained based on [gpt-neox](https://github.com/EleutherAI/gpt-neox) code base and uses [Pile](https://huggingface.co/datasets/EleutherAI/pile) with 300B tokens.
|
10 |
All 4 model sizes are trained on the exact
|
|
|
19 |
- Language: English
|
20 |
- Learn more: [TokenFormer's GitHub repository](https://github.com/Haiyang-W/TokenFormer)
|
21 |
for training procedure, config files, and details on how to use.
|
22 |
+
[See paper](https://arxiv.org/pdf/2410.23168) for more evals and implementation
|
23 |
details.
|
24 |
- Library: [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)
|
25 |
- License: Apache 2.0
|
|
|
68 |
|
69 |
## Evaluations
|
70 |
|
71 |
+
All *TokenFormer* models were evaluated using the [LM Evaluation
|
72 |
Harness](https://github.com/EleutherAI/lm-evaluation-harness).
|
73 |
You can run the evaluation with our [instruction](https://github.com/Haiyang-W/TokenFormer?tab=readme-ov-file#evaluations).<br>
|
74 |
Expand the sections below to see plots of evaluation results for all
|