Haiyang-W
/

TokenFormer-450M

Model card Files Files and versions Community

Haiyang-W commited on 7 days ago

Commit

b0db186

•

1 Parent(s): 27e5f68

Update README.md

Files changed (1) hide show

README.md +26 -3

README.md CHANGED Viewed

@@ -1,3 +1,26 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+The *TokenFormer* is a **fully attention-based architecture**
+that unifies the computations of token-token and token-parameter interactions
+by entirely employing the attention mechanism, **maximizes the flexibility of neural network**.[(see paper)](https://github.com/Haiyang-W/TokenFormer).
+It contains four models of sizes
+150M, 450M, 900M, 1.5B. For each size, it's trained based on [gpt-neox](https://github.com/EleutherAI/gpt-neox) code base and uses [Pile](https://huggingface.co/datasets/EleutherAI/pile) with 300B tokens.
+All 4 model sizes are trained on the exact
+same data, in the exact same order.
+# TokenFormer-450M
+## Model Details
+- Developed by: [Haiyang Wang](https://haiyang-w.github.io/)
+- Model type: ToeknFormer-based Language Model
+- Language: English
+- Learn more: [TokenFormer's GitHub repository](https://github.com/Haiyang-W/TokenFormer)
+ for training procedure, config files, and details on how to use.
+ [See paper](https://github.com/Haiyang-W/TokenFormer) for more evals and implementation
+ details.
+- Library: [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)
+- License: Apache 2.0
+- Contact: to ask questions about this model, please email Haiyang Wang.