OSError: mobiuslabsgmbh/Mixtral-8x7B-Instruct-v0.1-hf-attn-4bit-moe-2bit-metaoffload-HQQ does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.
I tried to load the model using both the transformers library and the hqq.engine.hf library, however i got the mentioned error in both the cases. Code:
from transformers import AutoModelForCausalLM, AutoTokenizer,pipeline, AutoModelForSeq2SeqLM
from hqq.engine.hf import HQQModelForCausalLM
from hqq.engine.hf import AutoTokenizer as hqqtokenizer
import torch
tokenizer = hqqtokenizer.from_pretrained("mobiuslabsgmbh/Mixtral-8x7B-Instruct-v0.1-hf-attn-4bit-moe-2bit-metaoffload-HQQ")
model = HQQModelForCausalLM.from_pretrained("mobiuslabsgmbh/Mixtral-8x7B-Instruct-v0.1-hf-attn-4bit-moe-2bit-metaoffload-HQQ")
pipe = pipeline('text-generation', model=model,tokenizer=tokenizer,
trust_remote_code=True,
max_new_tokens=1000,
model_kwargs={"do_sample":True,
"temperature":0.2})
print("Chat interface - Type exit() to quit.")
while True:
question = input("User: ")
prompt_template=f'''<s>[INST] {question} [/INST]
'''
if question=="exit()":
break
else:
output = pipe(prompt_template)
print("Mistral:", output[0]['generated_text'])
Hi! Please follow the example on the page. Replacefrom_pretrained
with from_quantized
to load a quantized model.