Is it worth hosting a quantized DeepSeek V3? Cost & performance insights?
Hi everyone,
I’m fairly new to working with quantized models and exploring whether it's worth self-hosting a quantized version of DeepSeek V3 on SageMaker while maintaining good quality results.
I’d appreciate insights from those with experience:
- Is it worth exploring this approach in terms of both quality and cost?
- What kind of computing resources (e.g., instance types, memory, GPU/CPU) would I need?
- Any rough cost estimates for running it effectively?
Any recommendations or shared experiences would be really helpful. Thanks in advance!
Hi everyone,
I’m fairly new to working with quantized models and exploring whether it's worth self-hosting a quantized version of DeepSeek V3 on SageMaker while maintaining good quality results.
I’d appreciate insights from those with experience:
- Is it worth exploring this approach in terms of both quality and cost?
- What kind of computing resources (e.g., instance types, memory, GPU/CPU) would I need?
- Any rough cost estimates for running it effectively?
Any recommendations or shared experiences would be really helpful. Thanks in advance!
It really depends on which Quant. I'd say hosting isn't a bad option if you really want a good local model. If I was you, I'd rather host Llama 3.3 70B: https://huggingface.co./collections/unsloth/llama-33-all-versions-67535d7d994794b9d7cf5e9f
Hi @shimmyshimmer ,
Thanks a lof for your response. Still, hosting a Llama 3.3 70B with a cost of around 1-2$/hour and reasonable inference speed, seems undoable right?
Just trying to make sure that finetuning a smaller model like Llama 3.1 8B is the best option based on my budget constraints.
Once again, thanks a lot for your kind support.
I know they are quite basic questions, but I am fairly new.