GPU requirements
#59
by
thightower1
- opened
I have the demo code working, but it's very slow. My PC has a fast processor and an 8GB GPU Nvidia 4060 installed - I can't seem to find the minimum GPU memory requirements.
Any suggestions would be greatly appreciated!
Running inference only with 4-bit quantization took me around 12.2GB of vram.
Thanks leo4life. Just curious, how was the performance using 4-bit quantization?
I didn't do any benchmarks, but I ran the prompts in the sample code on the model card page and it gave me the same outputs. For other prompts it gives reasonable output as well.
Hi
@leo4life
I tried to run it in 4-bit in colab but I got CUDA out of memory
. Would you share a code snippet to run it. idk I might be missing an important parameter