Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,26 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
---
|
4 |
+
|
5 |
+
The *TokenFormer* is a **fully attention-based architecture**
|
6 |
+
that unifies the computations of token-token and token-parameter interactions
|
7 |
+
by entirely employing the attention mechanism, **maximizes the flexibility of neural network**.[(see paper)](https://github.com/Haiyang-W/TokenFormer).
|
8 |
+
It contains four models of sizes
|
9 |
+
150M, 450M, 900M, 1.5B. For each size, it's trained based on [gpt-neox](https://github.com/EleutherAI/gpt-neox) code base and uses [Pile](https://huggingface.co/datasets/EleutherAI/pile) with 300B tokens.
|
10 |
+
All 4 model sizes are trained on the exact
|
11 |
+
same data, in the exact same order.
|
12 |
+
|
13 |
+
# TokenFormer-450M
|
14 |
+
|
15 |
+
## Model Details
|
16 |
+
|
17 |
+
- Developed by: [Haiyang Wang](https://haiyang-w.github.io/)
|
18 |
+
- Model type: ToeknFormer-based Language Model
|
19 |
+
- Language: English
|
20 |
+
- Learn more: [TokenFormer's GitHub repository](https://github.com/Haiyang-W/TokenFormer)
|
21 |
+
for training procedure, config files, and details on how to use.
|
22 |
+
[See paper](https://github.com/Haiyang-W/TokenFormer) for more evals and implementation
|
23 |
+
details.
|
24 |
+
- Library: [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)
|
25 |
+
- License: Apache 2.0
|
26 |
+
- Contact: to ask questions about this model, please email Haiyang Wang.
|