Getting this error when coping it to my space

#51
by umair007 - opened

Are you running it on GPU?

This is working for CPU, but the processing takes 15 minutes for Redshift Render

https://huggingface.co./spaces/Omnibus/finetuned_diffusion_cpu

@anzorq :

I had that exact same error when running a duplicate of your space on a T4.

Just copying & running on a CPU produces this totally unhelpful error :

image.png

@umair007 & @anzorq :

You should be able to fix the layernormkernelimpl not implemented for 'half' error by going to the following lines in your requirements.txt.

torch
torchvision==0.13.1+cu113

Now, replace them with this :

torch==1.12.1+cu113
torchvision==0.13.1+cu113

That should fix that particular bug.

After I fixed that, however, my app broke on the following line :

pipe.enable_xformers_memory_efficient_attention()

I tried fixing this by taking the following lines :

if torch.cuda.is_available():
  pipe = pipe.to("cuda")
  pipe.enable_xformers_memory_efficient_attention()

And then I replaced them with this :

to_cuda(torch, pipe)

def to_cuda(torch, pipe):
    try:
        if torch.cuda.is_available():
          pipe = pipe.to("cuda")
          pipe.enable_xformers_memory_efficient_attention()
        return True
    except:
        return False

After that, the app started succesfully. However, it was still quite unstable and kept throwing errors.

@Omnibus :

At least your version works.

Unfortunately it's also very slow when running on a T4.

It should be optimized, so it runs slow on a CPU and fast on a GPU.

@johnslegers

I had found that switching all of the "torch_dtype=torch.float16" to "torch_dtype=torch.get_default_dtype()" allows the program to run on CPU, and removes the
" layernormkernelimpl not implemented for 'half' " error when running on CPU. Also, changing the requirements to download CPU compatible modules as you mentioned.

I sense that in order for the program to perform on both CPU and GPU a toggle like this might work:

"if device = GPU: torch_dtype=torch.float16 elif device = CPU: torch_dtype=torch.get_default_dtype()"

@johnslegers

I had found that switching all of the "torch_dtype=torch.float16" to "torch_dtype=torch.get_default_dtype()" allows the program to run on CPU, and removes the
" layernormkernelimpl not implemented for 'half' " error when running on CPU. Also, changing the requirements to download CPU compatible modules as you mentioned.

I sense that in order for the program to perform on both CPU and GPU a toggle like this might work:

"if device = GPU: torch_dtype=torch.float16 elif device = CPU: torch_dtype=torch.get_default_dtype()"

This demo was meant to be run on GPU only. If you want to run it on CPU follow the above instruction.

@anzorq :

Please read my previous comment.

As I already explained, I tried running it in a GPU-enabled environment (T4 small --- 4 vCPU / 15 GiB RAM / Nvidia T4) and I got the same error as the OP.

As I also already explained, I got rid of the layernormkernelimpl not implemented for 'half' error by adding a torch version compatible with the torchvision version.

However, after that it produces a different error :

Traceback (most recent call last):
  File "app.py", line 52, in <module>
    pipe.enable_xformers_memory_efficient_attention()
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py", line 870, in enable_xformers_memory_efficient_attention
    self.set_use_memory_efficient_attention_xformers(True, attention_op)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py", line 895, in set_use_memory_efficient_attention_xformers
    fn_recursive_set_mem_eff(module)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py", line 886, in fn_recursive_set_mem_eff
    module.set_use_memory_efficient_attention_xformers(valid, attention_op)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/diffusers/models/modeling_utils.py", line 208, in set_use_memory_efficient_attention_xformers
    fn_recursive_set_mem_eff(module)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/diffusers/models/modeling_utils.py", line 204, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/diffusers/models/modeling_utils.py", line 204, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/diffusers/models/modeling_utils.py", line 204, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/diffusers/models/modeling_utils.py", line 201, in fn_recursive_set_mem_eff
    module.set_use_memory_efficient_attention_xformers(valid, attention_op)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/diffusers/models/attention.py", line 117, in set_use_memory_efficient_attention_xformers
    raise e
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/diffusers/models/attention.py", line 111, in set_use_memory_efficient_attention_xformers
    _ = xformers.ops.memory_efficient_attention(
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/xformers/ops/memory_efficient_attention.py", line 967, in memory_efficient_attention
    return op.forward_no_grad(
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/xformers/ops/memory_efficient_attention.py", line 343, in forward_no_grad
    return cls.FORWARD_OPERATOR(
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/torch/_ops.py", line 143, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.

Wrapping a try ... except around pipe.enable_xformers_memory_efficient_attention() fixes that error as well, as I also already explained.

That's as far as I got fixing errors myself. Will continue testing / developing soon.

This space installs A10G-specific prebuilt xformers whl. To use it on T4 you need to either disable xformers or install xformers for T4.

This space installs A10G-specific prebuilt xformers whl. To use it on T4 you need to either disable xformers or install xformers for T4.

I kinda suspected that's why it broke on that line. I experienced a similar issue / the same issue with another app on Google Colab a while ago.

Either way, my fix will stop the app from breaking the moment the app is downgraded from A10G to T4.

This is especially important for people who - like myself - are running their environment on a community GPU grant, as the environment can be changed by Huggingface at any time without the author being aware of it.

If you don't care about this, that's fine, I guess. It's your app. But in that case you might want to make people aware your app is designed to work as-is on Huggingface on A10G environments only and will break on both T4 environments & CPU-only environments without making certain adjustments. This would save lots of people from headaches when trying to duplicate your space.

@johnslegers

I had found that switching all of the "torch_dtype=torch.float16" to "torch_dtype=torch.get_default_dtype()" allows the program to run on CPU, and removes the
" layernormkernelimpl not implemented for 'half' " error when running on CPU. Also, changing the requirements to download CPU compatible modules as you mentioned.

I sense that in order for the program to perform on both CPU and GPU a toggle like this might work:

"if device = GPU: torch_dtype=torch.float16 elif device = CPU: torch_dtype=torch.get_default_dtype()"

how should i do that last bit? the same way that i would switch all of the "torch_dtype=torch.float16" to "torch_dtype=torch.get_default_dtype()"?

Sign up or log in to comment