It's a bit difficult to deploy the 70B model for verification, so let's keep an eye on how things develop
#4
by
wawoshashi
- opened
个人部署70B模型来做验证,有点困难, 关注事态发展
Try this quantized version https://huggingface.co./TheBloke/Xwin-LM-70B-V0.1-GGUF which only needs a 48G Vram card, or 40GB RAM cpu only.
You can try it now with llama.cpp
There is also 7B GPTQ Version https://huggingface.co./TheBloke/Xwin-LM-7B-V0.1-GPTQ only need 6G VRAM
I can run 70B quantized GGUF model (Q3_K - Small and offloaded 60/83 layers to GPU ) on 3090 via llama.cpp.