Anyone succeeded in running this model?
I get the following error running AutoTokenizer.from_pretrained():
ValueError: Unrecognized configuration class <class 'transformers_modules.iGeniusAI.Italia-9B-Instruct-v0.1.e821f1462547cca2a6ff4f7af102d37d9a79fafd.configuration_italia.ItaliaConfig'> to build an AutoTokenizer.
The model has been correctly downloaded.
This morning I cleared the cache (~/.cache/huggingface/hub/models--iGeniusAI--Italia-9B-Instruct-v0.1) and repeated the same operations.
The error disappeared, so the first download was probably corrupted somehow.
However, on my Macbook Pro M1 32GB RAM the sample test application has been running for 30 minutes, taking 35GB or RAM. I'm using device='mps'.
My guess is that it is swapping a lot and there is not enough RAM.
So my new question is: was anybody able to to run this model locally on Apple Silicon?
Hello code-runner,
Unfortunately, the model cannot be executed on the MPS backend because the aten::isin.Tensor_Tensor_out
operator is not yet supported on MPS. However, you can still run the model on your CPU by either setting device = "cpu"
or exporting the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1
.
We tested the model on a MacBook Pro M3 Max with 36GB of memory, although the model itself requires significantly less. When loaded with torch_dtype = torch.float16
, the process utilized approximately 16GB of RAM.
Running the model I get this error
/home/linux/myenv/lib/python3.12/site-packages/transformers/generation/utils.py:1220: UserWarning: Using the model-agnostic default max_length
(=20) to control the generation length. We recommend setting max_new_tokens
to control the maximum length of the generation.
warnings.warn(
Traceback (most recent call last):
File "/home/linux/inference.py", line 19, in
print(generate_text(prompt))
^^^^^^^^^^^^^^^^^^^^^
File "/home/linux/inference.py", line 14, in generate_text
outputs = model.generate(**inputs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/linux/myenv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/linux/myenv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2047, in generate
result = self._sample(
^^^^^^^^^^^^^
File "/home/linux/myenv/lib/python3.12/site-packages/transformers/generation/utils.py", line 3007, in _sample
outputs = self(**model_inputs, return_dict=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/linux/myenv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/linux/myenv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/linux/myenv/lib/python3.12/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 1178, in forward
outputs = self.gpt_neox(
^^^^^^^^^^^^^^
File "/home/linux/myenv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/linux/myenv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/linux/myenv/lib/python3.12/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 1007, in forward
outputs = layer(
^^^^^^
File "/home/linux/myenv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/linux/myenv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: GPTNeoXLayer.forward() got an unexpected keyword argument 'cache_position'
Same 'cache_position' error for me too. Using just the standard inference script. Is there a recommended transformers version?