Haiyang-W commited on
Commit
b0db186
1 Parent(s): 27e5f68

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -3
README.md CHANGED
@@ -1,3 +1,26 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ The *TokenFormer* is a **fully attention-based architecture**
6
+ that unifies the computations of token-token and token-parameter interactions
7
+ by entirely employing the attention mechanism, **maximizes the flexibility of neural network**.[(see paper)](https://github.com/Haiyang-W/TokenFormer).
8
+ It contains four models of sizes
9
+ 150M, 450M, 900M, 1.5B. For each size, it's trained based on [gpt-neox](https://github.com/EleutherAI/gpt-neox) code base and uses [Pile](https://huggingface.co/datasets/EleutherAI/pile) with 300B tokens.
10
+ All 4 model sizes are trained on the exact
11
+ same data, in the exact same order.
12
+
13
+ # TokenFormer-450M
14
+
15
+ ## Model Details
16
+
17
+ - Developed by: [Haiyang Wang](https://haiyang-w.github.io/)
18
+ - Model type: ToeknFormer-based Language Model
19
+ - Language: English
20
+ - Learn more: [TokenFormer's GitHub repository](https://github.com/Haiyang-W/TokenFormer)
21
+ for training procedure, config files, and details on how to use.
22
+ [See paper](https://github.com/Haiyang-W/TokenFormer) for more evals and implementation
23
+ details.
24
+ - Library: [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)
25
+ - License: Apache 2.0
26
+ - Contact: to ask questions about this model, please email Haiyang Wang.