Fix for RuntimeError: FlashAttention only support fp16 and bf16 data type during fine tuning. 5b7216f verified moidhassan commited on Sep 30, 2024
Resolve - 196 [rank0]: triton.runtime.autotuner.OutOfResources: out of resource: shared memory, Required: 180224, Hardware limit: 101376. Reducing block sizes or `num_stages` may help. 794ffcf verified moidhassan commited on Sep 30, 2024
Move flash_attn assert from __init__ into calling func (#32) ad85cab verified nguyenbh rogerxfeng8 commited on Sep 12, 2024
Update tokenization_phi3_small.py (#14) f80aaa3 verified bapatra damajercakms commited on Jun 3, 2024
Add attention_bias to make TGI work (#4) 5e0fbf0 verified bapatra philschmid commited on May 22, 2024