GPU Memory / RAM requierements
How much GPU memory does this model require to run? And in CPU mode, how much RAM? I'm currently trying to run it on GPU with a GTX 1080 8Gb, and I'm getting a "cannot allocate memory" error, I suppose this requires at least 16gb or so.
I would assume it takes about ~15 GBs of VRAM without any optimizations! However, you can very successfully run it on a CPU with 5-bit quantization with just ~5.3 GBs of RAM taken!
In theory, you might be able to run it in bfloat16 mode, but I don't know how, sry.
@Raspbfox I searched far and wide for a quantization example, but couldn't find one... =[
@danieldaugherty , just try searching for the GGML quantized models (usually q5_1) or GPTQ π
Ah yeah, I found that. But I didn't really understand how to use it...
GPTQ doesn't support MPT yet =[
When you run this MPT-7B model in FP16 then it would consume 14 GB of GPU memory. So you would need atleast 16 GB of GPU memory to run this model for Inference
Closing as stale.
Also noting that we added device_map
support as of this PR: https://huggingface.co./mosaicml/mpt-7b-instruct/discussions/41