flash_attn on gpu
#20
by
uglydumpling
- opened
Can we run this model without using flash_attn on GPU?
Yes you can! Just use attn_impl: torch
.
You can do this by editing the config.json
directly or by following the instructions in the README:
config = transformers.AutoConfig.from_pretrained(
'mosaicml/mpt-7b',
trust_remote_code=True
)
config.attn_config.attn_impl = 'torch' # it should already be 'torch' but just for clarity
model = transformers.AutoModelForCausalLM.from_pretrained(
'mosaicml/mpt-7b',
config=config,
torch_dtype=torch.bfloat16,
trust_remote_code=True
)
model.to(device='cuda:0')
abhi-mosaic
changed discussion status to
closed