TokenFormer-450M / README.md
Haiyang-W's picture
Update README.md
b0db186 verified
|
raw
history blame
1.22 kB
metadata
license: apache-2.0

The TokenFormer is a fully attention-based architecture that unifies the computations of token-token and token-parameter interactions by entirely employing the attention mechanism, maximizes the flexibility of neural network.(see paper). It contains four models of sizes 150M, 450M, 900M, 1.5B. For each size, it's trained based on gpt-neox code base and uses Pile with 300B tokens. All 4 model sizes are trained on the exact same data, in the exact same order.

TokenFormer-450M

Model Details

  • Developed by: Haiyang Wang
  • Model type: ToeknFormer-based Language Model
  • Language: English
  • Learn more: TokenFormer's GitHub repository for training procedure, config files, and details on how to use. See paper for more evals and implementation details.
  • Library: GPT-NeoX
  • License: Apache 2.0
  • Contact: to ask questions about this model, please email Haiyang Wang.