Using MLX at Hugging Face

MLX is a model training and serving framework for Apple silicon made by Apple Machine Learning Research.

It comes with a variety of examples:

Generate text with MLX-LM and generating text with MLX-LM for models in GGUF format.
Large-scale text generation with LLaMA.
Fine-tuning with LoRA.
Generating images with Stable Diffusion.
Speech recognition with OpenAI’s Whisper.

Exploring MLX on the Hub

You can find MLX models by filtering at the left of the models page. There’s also an open MLX community of contributors converting and publishing weights for MLX format.

Thanks to MLX Hugging Face Hub integration, you can load MLX models with a few lines of code.

Installation

MLX comes as a standalone package, and there’s a subpackage called MLX-LM with Hugging Face integration for Large Language Models. To install MLX-LM, you can use the following one-line install through pip:

pip install mlx-lm

You can get more information about it here.

If you install mlx-lm, you don’t need to install mlx. If you don’t want to use mlx-lm but only MLX, you can install MLX itself as follows.

With pip:

pip install mlx

With conda:

conda install -c conda-forge mlx

Using Existing Models

MLX-LM has useful utilities to generate text. The following line directly downloads and loads the model and starts generating text.

python -m mlx_lm.generate --model mistralai/Mistral-7B-Instruct-v0.2 --prompt "hello"

For a full list of generation options, run

python -m mlx_lm.generate --help

You can also load a model and start generating text through Python like below:

from mlx_lm import load, generate

model, tokenizer = load("mistralai/Mistral-7B-Instruct-v0.2")

response = generate(model, tokenizer, prompt="hello", verbose=True)

MLX-LM supports popular LLM architectures including LLaMA, Phi-2, Mistral, and Qwen. Models other than supported ones can easily be downloaded as follows:

pip install huggingface_hub hf_transfer

export HF_HUB_ENABLE_HF_TRANSFER=1
huggingface-cli download --local-dir <LOCAL FOLDER PATH> <USER_ID>/<MODEL_NAME>

Converting and Sharing Models

You can convert, and optionally quantize, LLMs from the Hugging Face Hub as follows:

python -m mlx_lm.convert --hf-path mistralai/Mistral-7B-v0.1 -q

If you want to directly push the model after the conversion, you can do it like below.

python -m mlx_lm.convert \
    --hf-path mistralai/Mistral-7B-v0.1 \
    -q \
    --upload-repo <USER_ID>/<MODEL_NAME>

Additional Resources

< > Update on GitHub