Getting an issue with Cuda

#10

by LLMHackathonNYC - opened Apr 9, 2024

Apr 9, 2024

Hey,

I've deployed an instance of meditron-70b on 2xA100 and when testing the endpoint, I keep getting the following CUDA error. Any workarounds / solutions?

Request failed during generation: Server error: Unexpected <class 'RuntimeError'>: captures_underway == 0 INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1699449201336/work/c10/cuda/CUDACachingAllocator.cpp":2939, please report a bug to PyTorch.

marichka-dobko

Apr 9, 2024

I got the same error when trying to access the dedicated endpoint with API key request: 'Request failed during generation: Server error: Unexpected <class 'RuntimeError'>: captures_underway == 0 INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1699449201336/work/c10/cuda/CUDACachingAllocator.cpp":2939, please report a bug to PyTorch. '.

michellehbn

Apr 11, 2024

Hi @LLMHackathonNYC @marichka-dobko , Thanks for reporting. We've taken a look and recommend selecting quantization: EETQ (in place of Bitsandbytes) to help resolve the error reported. Please let us know how it goes. Thanks again!

LLMHackathonNYC

Apr 11, 2024

When we run it with EETQ instead of Bitsandbytes it fails to start. Could you share the specific config that makes it work for you?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment