sdadas/polish-reranker-roberta-v2 · "'RobertaFlashAttention2' object has no attribute 'q

Dzień dobry!
Model działa wyśmienicie jeżeli nie próbuję używać flash attention - ale kiedy tylko dodaję tę linijkę, wyrzuca mi tytułowy błąd:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.11/site-packages/runpod/serverless/modules/rp_job.py", line 164, in run_job
    handler_return = handler(job)
  File "/workspace/handler.py", line 44, in handler
    scores = model(**features).logits.squeeze(-1)
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/sdadas/polish-reranker-roberta-v2/7296f335a1f0a6bc771df58e6c1319e8068fa583/modeling_roberta.py", line 1577, in forward
    outputs = self.roberta(
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/sdadas/polish-reranker-roberta-v2/7296f335a1f0a6bc771df58e6c1319e8068fa583/modeling_roberta.py", line 1214, in forward
    encoder_outputs = self.encoder(
  File "/root/.cache/huggingface/modules/transformers_modules/sdadas/polish-reranker-roberta-v2/7296f335a1f0a6bc771df58e6c1319e8068fa583/modeling_roberta.py", line 790, in forward
    self_attention_outputs = self.attention(
  File "/root/.cache/huggingface/modules/transformers_modules/sdadas/polish-reranker-roberta-v2/7296f335a1f0a6bc771df58e6c1319e8068fa583/modeling_roberta.py", line 717, in forward
    self_outputs = self.self(
  File "/root/.cache/huggingface/modules/transformers_modules/sdadas/polish-reranker-roberta-v2/7296f335a1f0a6bc771df58e6c1319e8068fa583/modeling_roberta.py", line 411, in forward
    target_dtype = self.q_lin.weight.dtype
AttributeError: 'RobertaFlashAttention2' object has no attribute 'q_lin'

Korzystam z takiej wersji flash attention:
https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu123torch2.4cxx11abiFALSE-cp311-cp311-linux_x86_64.whl

sdadas
/

polish-reranker-roberta-v2

"'RobertaFlashAttention2' object has no attribute 'q_lin'"