gguf in llama cpp
#1
by
Bearsaerker
- opened
Would this also work quantized for long context in llama cpp or are there any special dependencies which are specific to the implementation in the model card?
Hi, I haven't used llama-cpp before. There's no special dependencies other than pytorch==2.1.2 transformers==4.36.1 accelerate==0.25.0
for this implementation.
I get this error when trying to convert to GGUF:
raise Exception(f"Unexpected tensor name: {name}")
Exception: Unexpected tensor name: model.beacon_embed_tokens.weight
Does anyone know how we can use this model quantized?