IHaveNoClueAndIMustPost/OpenAssistant_llama2-13b-orca-8k-GGML

This is OpenAssistant's llama2-13b-orca-8k-3319 in a couple of GGML formats.

I had to apply this workaround to pad the vocab and quantize the models, this may or may not affect performance.
I have no idea what I'm doing so if something doesn't work as it should or at all that's likely on me, not the models themselves.

Below is the suggested prompt format from the original repo:

For the initial response use (e.g. the llama2 default system prompt works well):

<|system|>system message</s><|prompter|>user prompt</s><|assistant|>

For multi-turn conversations use:

<|system|>system message</s><|prompter|>Q1</s><|assistant|>A1</s><|prompter|>Q2</s><|assistant|>