metadata

license: apache-2.0

The TokenFormer is a fully attention-based architecture that unifies the computations of token-token and token-parameter interactions by entirely employing the attention mechanism, maximizes the flexibility of neural network.(see paper). It contains four models of sizes 150M, 450M, 900M, 1.5B. For each size, it's trained based on gpt-neox code base and uses Pile with 300B tokens. All 4 model sizes are trained on the exact same data, in the exact same order.

TokenFormer-450M

Model Details

Developed by: Haiyang Wang
Model type: ToeknFormer-based Language Model
Language: English
Learn more: TokenFormer's GitHub repository for training procedure, config files, and details on how to use. See paper for more evals and implementation details.
Library: GPT-NeoX
License: Apache 2.0
Contact: to ask questions about this model, please email Haiyang Wang.