mradermacher/ReWiz-Qwen-2.5-14B-GGUF

theprint

8 days ago

GGUF files of this model were included in the original repo. Why the duplication?

nicoboss

8 days ago

•

edited 8 days ago

mradermacher provides aditional static quants to the ones provided in the original model like Q4_0_4_4 optimised for ARM devices and more importantly provides the much better weighted/imatrix quants under https://huggingface.co./mradermacher/ReWiz-Qwen-2.5-14B-i1-GGUF

theprint

8 days ago

mradermacher provides aditional static quants to the ones provided in the original like Q4_0_4_4 optimised for ARM devices model and more importantly provides the much better weighted/imatrix quants under https://huggingface.co./mradermacher/ReWiz-Qwen-2.5-14B-i1-GGUF

Yep - and that is highly appreciated, but not what the question is about.

nicoboss

8 days ago

•

edited 8 days ago

The reason some duplicate quants are getting computed is because mradermacher's process is fully automated. He currently has 11K models containing GGUF quants. It would be unfeasible to manually check for each of them what GGUF files already exists in the original repository or anywhere else on HuggingFace and then only quantize the ones that do not. It is also really uncommon to mix SafeTensors with GGUF files in the same model and not a setup recommended by HuggingFace as it will inevitably lead to many users wanting to use SafeTensor to download all the GGUF quants as well. The model selection process is still mosely manual but there is no time for an in-depth review. I believe quantizing this model was justified as having weighted/imatrix quants adds a lot of value to many users as it gives much better quality per size. We are talking about a 14B model so quantizing it is relatively cheap.

theprint

8 days ago

Thanks for clarifying.

theprint changed discussion status to closed 8 days ago

mradermacher
/

ReWiz-Qwen-2.5-14B-GGUF

Why duplicates?