GSAI-ML/LLaDA-8B-Instruct · Anybody has been able to run their chat.py model on a Mac?

Thanks for uploading. But I am struggling to get the chat.py to run on a M2 Pro 32GB

It won't run with AppleSilicon MPS due to it using bfloat16. I tried changing that to float32 but then it did not run. :D
Now with CPU it is running, but takes ages to reply. All I entered was "hi".
Is this model not supposed to be faster? Anything I need to change?

modifications to the chat.py
from generate import generate
from transformers import AutoTokenizer, AutoModel

def chat():
device = 'cpu' ##<-- force cpu use
model = AutoModel.from_pretrained('GSAI-ML/LLaDA-8B-Instruct', trust_remote_code=True, torch_dtype=torch.bfloat16).to(device).eval()