Quantization into int8
#11
by
SajjadIqbal
- opened
How can we quantize this model to int8? Has anyone worked with this?
And in Fastapi it's taking 3 to 4 hundred mbs more than Gradio. Why is this so?
How can we quantize this model to int8? Has anyone worked with this?
And in Fastapi it's taking 3 to 4 hundred mbs more than Gradio. Why is this so?