File size: 1,223 Bytes
b0db186 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
---
license: apache-2.0
---
The *TokenFormer* is a **fully attention-based architecture**
that unifies the computations of token-token and token-parameter interactions
by entirely employing the attention mechanism, **maximizes the flexibility of neural network**.[(see paper)](https://github.com/Haiyang-W/TokenFormer).
It contains four models of sizes
150M, 450M, 900M, 1.5B. For each size, it's trained based on [gpt-neox](https://github.com/EleutherAI/gpt-neox) code base and uses [Pile](https://huggingface.co./datasets/EleutherAI/pile) with 300B tokens.
All 4 model sizes are trained on the exact
same data, in the exact same order.
# TokenFormer-450M
## Model Details
- Developed by: [Haiyang Wang](https://haiyang-w.github.io/)
- Model type: ToeknFormer-based Language Model
- Language: English
- Learn more: [TokenFormer's GitHub repository](https://github.com/Haiyang-W/TokenFormer)
for training procedure, config files, and details on how to use.
[See paper](https://github.com/Haiyang-W/TokenFormer) for more evals and implementation
details.
- Library: [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)
- License: Apache 2.0
- Contact: to ask questions about this model, please email Haiyang Wang.
|