Haiyang-W
/

TokenFormer-450M

Model card Files Files and versions Community

TokenFormer-450M / README.md

Haiyang-W's picture

Update README.md

b0db186 verified 7 days ago

|

1.22 kB

	---
	license: apache-2.0
	---

	The TokenFormer is a fully attention-based architecture
	that unifies the computations of token-token and token-parameter interactions
	by entirely employing the attention mechanism, maximizes the flexibility of neural network.[(see paper)](https://github.com/Haiyang-W/TokenFormer).
	It contains four models of sizes
	150M, 450M, 900M, 1.5B. For each size, it's trained based on [gpt-neox](https://github.com/EleutherAI/gpt-neox) code base and uses [Pile](https://huggingface.co./datasets/EleutherAI/pile) with 300B tokens.
	All 4 model sizes are trained on the exact
	same data, in the exact same order.

	# TokenFormer-450M

	## Model Details

	- Developed by: [Haiyang Wang](https://haiyang-w.github.io/)
	- Model type: ToeknFormer-based Language Model
	- Language: English
	- Learn more: [TokenFormer's GitHub repository](https://github.com/Haiyang-W/TokenFormer)
	for training procedure, config files, and details on how to use.
	[See paper](https://github.com/Haiyang-W/TokenFormer) for more evals and implementation
	details.
	- Library: [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)
	- License: Apache 2.0
	- Contact: to ask questions about this model, please email Haiyang Wang.