4-bit Quantization (GPTQ or GGUF)

by ThetaCursed - opened Sep 26

Discussion

ThetaCursed

Sep 26

Are there plans to release this model in a 4-bit quantized version?

runninglsy

AIDC-AI org Sep 27

Currently, we do not have plans to develop a quantized version in the short term. However, we are working on training smaller models (e.g., 2~3B) to better meet different user needs and application scenarios.

ThetaCursed

Sep 27

It's very sad to know, because this model in the quantized version should fit in 12 GB VRAM.

Why release smaller models if you can make 4-bit quantization for this one and allow people to use it locally, considering the fact that the most popular selling video card model of all time is the GeForce RTX 3060 12 GB.

If you don't have a person who can handle this, then at least leave some instructions on how to do this, I will do it and share with the whole community.

aiPhone

Sep 28

This comment has been hidden

runninglsy

AIDC-AI org Sep 28

Thank you for the suggestion. Considering the community's feedback on the quantized version, we have decided to dedicate our efforts to developing it. We will strive to complete it within a month.

runninglsy

AIDC-AI org 9 days ago

We've released quantized versions of Ovis1.6: Ovis1.6-Gemma2-9B-GPTQ-Int4 and Ovis1.6-Llama3.2-3B-GPTQ-Int4. Feel free to try them out and share your feedback!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment