microsoft/Phi-3-mini-128k-instruct · Could you aid in getting Llama.cpp to support the 128K version?

Apr 29

I'm very excited about this model, as it could be perfect for pure browser-based inference.

But at the moment there is no way of running it in the browser yet.

One solution would be if Lllama.cpp would support it. Then it could in theory run via Wllama.

Is there a way that you could give some tips or other aid in this Github thread on how to implement the new long context technology used in this model? People seem to hope that there will be a code release of some kind?

The Github thread can be found here:
https://github.com/ggerganov/llama.cpp/issues/6849

Thank you for any insight you could give.

gugarosa

Microsoft org May 1

Hello @BoscoTheDog !

We are looking into it! It is definitely our goal to be able to support 128K on llama.cpp.

Thanks for sharing the issue as well, we will keep track of it.

gugarosa changed discussion status to closed May 1