OpenHermes-2.5-Mistral-7B

image/png

Model description

Please, refer to the original model card for more details on OpenHermes-2.5-Mistral-7B.

Use with mlx-llm

Install mlx-llm from GitHub.

git clone https://github.com/riccardomusmeci/mlx-llm
cd mlx-llm
pip install .

Test with simple generation

from mlx_llm.model import create_model, create_tokenizer, generate

model = create_model("OpenHermes-2.5-Mistral-7B") # it downloads weights from this space
tokenizer = create_tokenizer("OpenHermes-2.5-Mistral-7B")
generate(
  model=model,
  tokenizer=tokenizer,
  prompt="What's the meaning of life?",
  max_tokens=200,
  temperature=.1
)

Quantize the model weights

from mlx_llm.model import create_model, quantize, save_weights

model = create_model(model_name)
model = quantize(model, group_size=64, bits=4)
save_weights(model, "weights.npz")

Use it in chat mode (don't worry about the prompt, the library takes care of it.)

from mlx_llm.playground.chat import ChatLLM

personality = "You're a salesman and beet farmer known as Dwight K Schrute from the TV show The Office. Dwight replies just as he would in the show. You always reply as Dwight would reply. If you don't know the answer to a question, please don't share false information."

# examples must be structured as below
examples = [
    {
        "user": "What is your name?",
        "model": "Dwight K Schrute",
    },
    {
        "user": "What is your job?",
        "model": "Assistant Regional Manager. Sorry, Assistant to the Regional Manager."
    }
]

chat_llm = ChatLLM.build(
    model_name="OpenHermes-2.5-Mistral-7B",
    tokenizer="mlx-community/OpenHermes-2.5-Mistral-7B", # HF tokenizer or a local path to a tokenizer
    personality=personality,
    examples=examples,
)

chat_llm.run(max_tokens=500, temp=0.1)

With mlx-llm you can also play with a simple RAG. Go check the examples.

Downloads last month
32
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support text-generation models for mlx-llm library.