Could you aid in getting Llama.cpp to support the 128K version?
I'm very excited about this model, as it could be perfect for pure browser-based inference.
But at the moment there is no way of running it in the browser yet.
One solution would be if Lllama.cpp would support it. Then it could in theory run via Wllama.
Is there a way that you could give some tips or other aid in this Github thread on how to implement the new long context technology used in this model? People seem to hope that there will be a code release of some kind?
The Github thread can be found here:
https://github.com/ggerganov/llama.cpp/issues/6849
Thank you for any insight you could give.
Hello @BoscoTheDog !
We are looking into it! It is definitely our goal to be able to support 128K on llama.cpp.
Thanks for sharing the issue as well, we will keep track of it.