Is there a Quantized version(s)?
#5
by
mrmikelevy
- opened
I was hoping there would be a quantized version. I know it would be less accurate, but the performance might make up for it. Doing zero-shot-classification.
Hi, nice idea, do you want this for CPU workloads ? The model already fits small GPU
Yes. I've been using a Tesla T4 GPU, but the model is so small that it seems like moving it to CPU might be worth it. Unquantized, it runs about 5 times faster on GPU. I think if it was quantized, it would be about the same speed on CPU.