eval error with LLaMA-Factory
python src/evaluate.py --model_name_or_path ~/model/Qwen-14B-Chat-LLaMAfied --finetuning_type full --template llama2 --task ceval --split validation --lang zh --n_shot 5 --batch_size 1
/models/llama/modeling_llama.py", line 726, in forward
attn_output = torch.nn.functional.scaled_dot_product_attention(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: cutlassF: no kernel found to launch!
Install flash attention2 may help
Thanks
I run it at V100, I modified torch_dtype from bfloat16 to float16 in config.json just fine .
But encountered a new error
LLaMA-Factory/src/llmtuner/eval/evaluator.py", line 44, in
word_probs = torch.stack([logits[i, lengths[i] - 1] for i in range(len(lengths))], dim=0)
~~~~~~^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
/opt/conda/conda-bld/pytorch_1702400430266/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [176,0,0], thread: [0,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1702400430266/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [176,0,0], thread: [1,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1702400430266/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [176,0,0], thread: [2,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.
use --template qwen
I changed to the --template qwen and it's still the same. I switched to the A100 machine and used this model normally. I just tested a fresh installation of unsloth update on this V100 server, but it looks like there are still issues with the environment.
Thanks a lot