Please
Can you quantize conceptofmind/LLongMA-3b and conceptofmind/Flan-Open-Llama-3b, I can't find it anywhere, and with k-quants too?
I don't really see what the origin or license is for those models.
But otherwise, is there something difficult in converting these models to ggml? I think it should be possible to use the tools included in llama.cpp right now without any hacking patching (which was not true when I first uploaded the 3B models).
K quants for 3B models is problematic right now because they require a special build of llama.cpp to work.
I found more info on:
https://twitter.com/EnricoShippole/status/1672274141255180288?t=iDgpZy2ggF3xt9I4TlhTug&s=19
I asked you cause I don't any computer available at the moment to quantize and Idk how too. Thanks for answering π€
I think it would be possible to quantize with k quants The Bloke did with https://huggingface.co./TheBloke/Flan-OpenLlama-7B-GGML
K quants are not supported well for 3B models. You have to basically compile a custom version of llama.cpp but it will probably be extremely hard for you to do.
I will try to add the models, though.
Alright, here they are:
I did manage to get LLongMA-3b working with the 8K context but I needed apply patches from Github as it is not merged yet.
Not sure how helpful it is for you.
Thanksπ, I'm going to test on koboldcpp
all worked perfectly, thanks, if you can convert this too https://huggingface.co./syzymon/long_llama_3b
That model would only work with 2048 tokens in the current llama.cpp code, so I don't see the point right now.
Maybe later.