Instead of flash_attn it should be flash_attn_2_cuda . This is causing a deployment issue in TGI/DJL

#14
by monuminu - opened

from flash_attn.flash_attn_interface import (
flash_attn_func,
flash_attn_kvpacked_func,
flash_attn_qkvpacked_func,
flash_attn_varlen_kvpacked_func,
)

Together org

Hi @monuminu thanks for bringing this up! Can you provide some more details about the issues this is causing?

Sign up or log in to comment