jacobfulano
commited on
Commit
•
e27b4b2
1
Parent(s):
ee3acd5
Update README.md
Browse files
README.md
CHANGED
@@ -79,7 +79,7 @@ Note: This model requires that `trust_remote_code=True` be passed to the `from_p
|
|
79 |
This is because we use a custom `MPT` model architecture that is not yet part of the Hugging Face `transformers` package.
|
80 |
`MPT` includes options for many training efficiency features such as [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf), [ALiBi](https://arxiv.org/abs/2108.12409), [QK LayerNorm](https://arxiv.org/abs/2010.04245), and more.
|
81 |
|
82 |
-
To use the optimized [triton implementation](https://github.com/openai/triton) of FlashAttention
|
83 |
```python
|
84 |
config = transformers.AutoConfig.from_pretrained('mosaicml/mpt-7b', trust_remote_code=True)
|
85 |
config.attn_config['attn_impl'] = 'triton'
|
|
|
79 |
This is because we use a custom `MPT` model architecture that is not yet part of the Hugging Face `transformers` package.
|
80 |
`MPT` includes options for many training efficiency features such as [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf), [ALiBi](https://arxiv.org/abs/2108.12409), [QK LayerNorm](https://arxiv.org/abs/2010.04245), and more.
|
81 |
|
82 |
+
To use the optimized [triton implementation](https://github.com/openai/triton) of FlashAttention, you can load the model with `attn_impl='triton'` and move the model to `bfloat16`:
|
83 |
```python
|
84 |
config = transformers.AutoConfig.from_pretrained('mosaicml/mpt-7b', trust_remote_code=True)
|
85 |
config.attn_config['attn_impl'] = 'triton'
|