4-bit Quantization (GPTQ or GGUF)
Are there plans to release this model in a 4-bit quantized version?
Currently, we do not have plans to develop a quantized version in the short term. However, we are working on training smaller models (e.g., 2~3B) to better meet different user needs and application scenarios.
It's very sad to know, because this model in the quantized version should fit in 12 GB VRAM.
Why release smaller models if you can make 4-bit quantization for this one and allow people to use it locally, considering the fact that the most popular selling video card model of all time is the GeForce RTX 3060 12 GB.
If you don't have a person who can handle this, then at least leave some instructions on how to do this, I will do it and share with the whole community.
Thank you for the suggestion. Considering the community's feedback on the quantized version, we have decided to dedicate our efforts to developing it. We will strive to complete it within a month.
We've released quantized versions of Ovis1.6: Ovis1.6-Gemma2-9B-GPTQ-Int4 and Ovis1.6-Llama3.2-3B-GPTQ-Int4. Feel free to try them out and share your feedback!