Make it usageable for cpu
0%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 32/32 [00:00<00:00, 2046.97it/s]
e:\1b.env\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: resume_download
is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True
.
warnings.warn(
Setting pad_token_id
to eos_token_id
:2 for open-end generation.
User: Hi
Assistant:
e:\1b.env\lib\site-packages\transformers\generation\utils.py:1510: UserWarning: You are calling .generate() with the input_ids
being on a device type different than your model's device. input_ids
is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put input_ids
to the correct device by calling for example input_ids = input_ids.to('cuda') before running .generate()
.
warnings.warn(
Exception in thread Thread-3 (generate):
Traceback (most recent call last):
File "C:\Users\CEDP\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "C:\Users\CEDP\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "e:\1b.env\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "e:\1b.env\lib\site-packages\transformers\generation\utils.py", line 1622, in generate
result = self._sample(
File "e:\1b.env\lib\site-packages\transformers\generation\utils.py", line 2791, in _sample
outputs = self(
File "e:\1b.env\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "e:\1b.env\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "e:\1b.env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 1208, in forward
outputs = self.model(
File "e:\1b.env\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "e:\1b.env\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "e:\1b.env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 974, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "e:\1b.env\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "e:\1b.env\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "e:\1b.env\lib\site-packages\torch\nn\modules\sparse.py", line 163, in forward
return F.embedding(
File "e:\1b.env\lib\site-packages\torch\nn\functional.py", line 2237, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
Hi! Yeah only gpu runtime is supported. CPU will run very slow with the current implementation and we focus on GPU because the lib is also intended to be used for training.
You use a free gpu on Google colab if you don't have access to a gpu -powered machine!