Getting an issue with Cuda
Hey,
I've deployed an instance of meditron-70b on 2xA100 and when testing the endpoint, I keep getting the following CUDA error. Any workarounds / solutions?
Request failed during generation: Server error: Unexpected <class 'RuntimeError'>: captures_underway == 0 INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1699449201336/work/c10/cuda/CUDACachingAllocator.cpp":2939, please report a bug to PyTorch.
I got the same error when trying to access the dedicated endpoint with API key request: 'Request failed during generation: Server error: Unexpected <class 'RuntimeError'>: captures_underway == 0 INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1699449201336/work/c10/cuda/CUDACachingAllocator.cpp":2939, please report a bug to PyTorch. '.
Hi @LLMHackathonNYC @marichka-dobko , Thanks for reporting. We've taken a look and recommend selecting quantization: EETQ (in place of Bitsandbytes) to help resolve the error reported. Please let us know how it goes. Thanks again!