Spaces:
Running
Runtime error ZeroGPU with transoformers MistralForCasualLM
Error:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 527, in process_events
response = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 261, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1788, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1352, in call_function
prediction = await utils.async_iteration(iterator)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 583, in async_iteration
return await iterator.anext()
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 709, in asyncgen_wrapper
response = await iterator.anext()
File "/usr/local/lib/python3.10/site-packages/gradio/chat_interface.py", line 552, in _stream_fn
first_response = await async_iteration(generator)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 583, in async_iteration
return await iterator.anext()
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 576, in anext
return await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 559, in run_sync_iterator_async
return next(iterator)
File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 294, in gradio_handler
raise res.value
RuntimeError: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":830, please report a bug to PyTorch.
Not sure if this will fix it but you probably shouldn’t be loading the model inside the @zero .gpu function
Also to use CUDA don’t set pytorch default but use .to(cuda instead
Also encountered this issue
The same error, any guide on how to fix ?