mosaicml
/

mpt-7b-storywriter

Text Generation

text-generation-inference

Model card Files Files and versions Community

abhi-mosaic commited on May 5, 2023

Commit

2f88b1b

•

1 Parent(s): 40e5047

Update README.md

Files changed (1) hide show

README.md +7 -5

README.md CHANGED Viewed

@@ -39,14 +39,16 @@ It includes options for many training efficiency features such as [FlashAttentio
 ```python
 import transformers
-model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mpt-7b-storywriter', trust_remote_code=True, torch_dtype=torch.bfloat16)
 ```
-To use the optimized triton implementation of FlashAttention, you can load with `attn_impl='triton'` and move the model to `bfloat16` like so:
 ```python
-model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mpt-7b-storywriter', trust_remote_code=True, torch_dtype=torch.bfloat16, attn_impl='triton')
-model.to(device='cuda:0', dtype=torch.bfloat16)
 ```
 Although the model was trained with a sequence length of 2048 and finetuned with a sequence length of 65536,

 ```python
 import transformers
+model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mpt-7b-storywriter', trust_remote_code=True)
 ```
+To use the optimized [triton implementation](https://github.com/openai/triton) of FlashAttention, you can load the model with `attn_impl='triton'` and move the model to `bfloat16`:
 ```python
+config = transformers.AutoConfig.from_pretrained('mosaicml/mpt-7b-storywriter', trust_remote_code=True)
+config.attn_config['attn_impl'] = 'triton'
+model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mpt-7b-storywriter', config=config, torch_dtype=torch.bfloat16, trust_remote_code=True)
+model.to(device='cuda:0')
 ```
 Although the model was trained with a sequence length of 2048 and finetuned with a sequence length of 65536,