mobiuslabsgmbh/Mixtral-8x7B-Instruct-v0.1-hf-attn-4bit-moe-2bit-metaoffload-HQQ · OSError: mobiuslabsgmbh/Mixtral-8x7B-Instruct-v0.1-hf-attn-4bit-moe-2bit-metaoffload-HQQ does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax

Feb 24

I tried to load the model using both the transformers library and the hqq.engine.hf library, however i got the mentioned error in both the cases. Code:
from transformers import AutoModelForCausalLM, AutoTokenizer,pipeline, AutoModelForSeq2SeqLM from hqq.engine.hf import HQQModelForCausalLM from hqq.engine.hf import AutoTokenizer as hqqtokenizer import torch


tokenizer = hqqtokenizer.from_pretrained("mobiuslabsgmbh/Mixtral-8x7B-Instruct-v0.1-hf-attn-4bit-moe-2bit-metaoffload-HQQ")
model = HQQModelForCausalLM.from_pretrained("mobiuslabsgmbh/Mixtral-8x7B-Instruct-v0.1-hf-attn-4bit-moe-2bit-metaoffload-HQQ")
pipe = pipeline('text-generation', model=model,tokenizer=tokenizer,
                                            trust_remote_code=True,
                                            max_new_tokens=1000,
                                            model_kwargs={"do_sample":True,
                                    "temperature":0.2})

print("Chat interface - Type exit() to quit.") while True: question = input("User: ") prompt_template=f'''<s>[INST] {question} [/INST] ''' if question=="exit()": break else: output = pipe(prompt_template) print("Mistral:", output[0]['generated_text'])

mobicham

Mobius Labs GmbH org Feb 26

Hi! Please follow the example on the page. Replacefrom_pretrained with from_quantized to load a quantized model.

mobicham changed discussion status to closed Feb 26

mobiuslabsgmbh
/

Mixtral-8x7B-Instruct-v0.1-hf-attn-4bit-moe-2bit-metaoffload-HQQ

OSError: mobiuslabsgmbh/Mixtral-8x7B-Instruct-v0.1-hf-attn-4bit-moe-2bit-metaoffload-HQQ does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.