run in colab t4
Use a pipeline as a high-level helper
from transformers import pipeline
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe = pipeline("text-generation", model="ISTA-DASLab/Qwen2-72B-AQLM-PV-1bit-1x16", trust_remote_code=True, device_map="auto")
pipe(messages)
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning:
The secret HF_TOKEN
does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co./settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
config.json:β100%
β959/959β[00:00<00:00,β24.6kB/s]
model.safetensors.index.json:β100%
β171k/171kβ[00:00<00:00,β2.36MB/s]
Downloadingβshards:β100%
β5/5β[09:16<00:00,β104.07s/it]
model-00001-of-00005.safetensors:β100%
β4.99G/4.99Gβ[01:59<00:00,β42.2MB/s]
model-00002-of-00005.safetensors:β100%
β4.99G/4.99Gβ[01:59<00:00,β41.2MB/s]
model-00003-of-00005.safetensors:β100%
β4.99G/4.99Gβ[02:00<00:00,β42.7MB/s]
model-00004-of-00005.safetensors:β100%
β4.99G/4.99Gβ[01:58<00:00,β42.4MB/s]
model-00005-of-00005.safetensors:β100%
β3.17G/3.17Gβ[01:15<00:00,β42.2MB/s]
Loadingβcheckpointβshards:β100%
β5/5β[00:52<00:00,ββ6.42s/it]
generation_config.json:β100%
β242/242β[00:00<00:00,β13.7kB/s]
WARNING:accelerate.big_modeling:Some parameters are on the meta device because they were offloaded to the disk and cpu.
tokenizer_config.json:β100%
β1.29k/1.29kβ[00:00<00:00,β74.3kB/s]
vocab.json:β100%
β2.78M/2.78Mβ[00:00<00:00,β8.50MB/s]
merges.txt:β100%
β1.67M/1.67Mβ[00:00<00:00,β12.5MB/s]
tokenizer.json:β100%
β7.03M/7.03Mβ[00:00<00:00,β21.2MB/s]
Device set to use cuda:0
/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:1965: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
warnings.warn(
/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.py:20: FutureWarning: torch.library.impl_abstract
was renamed to torch.library.register_fake
. Please use that instead; we will remove torch.library.impl_abstract
in a future version of PyTorch.
@torch
.library.impl_abstract("aqlm::code1x16_matmat")
/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.py:33: FutureWarning: torch.library.impl_abstract
was renamed to torch.library.register_fake
. Please use that instead; we will remove torch.library.impl_abstract
in a future version of PyTorch.
@torch
.library.impl_abstract("aqlm::code1x16_matmat_dequant")
/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.py:48: FutureWarning: torch.library.impl_abstract
was renamed to torch.library.register_fake
. Please use that instead; we will remove torch.library.impl_abstract
in a future version of PyTorch.
@torch
.library.impl_abstract("aqlm::code1x16_matmat_dequant_transposed")
/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.py:62: FutureWarning: torch.library.impl_abstract
was renamed to torch.library.register_fake
. Please use that instead; we will remove torch.library.impl_abstract
in a future version of PyTorch.
@torch
.library.impl_abstract("aqlm::code2x8_matmat")
/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.py:75: FutureWarning: torch.library.impl_abstract
was renamed to torch.library.register_fake
. Please use that instead; we will remove torch.library.impl_abstract
in a future version of PyTorch.
@torch
.library.impl_abstract("aqlm::code2x8_matmat_dequant")
/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.py:88: FutureWarning: torch.library.impl_abstract
was renamed to torch.library.register_fake
. Please use that instead; we will remove torch.library.impl_abstract
in a future version of PyTorch.
@torch
.library.impl_abstract("aqlm::code2x8_matmat_dequant_transposed")
[{'generated_text': [{'role': 'user', 'content': 'Who are you?'},
{'role': 'assistant',
'content': 'I am Qwen, a large language model created by Alibaba Cloud. I am here to assist you'}]}]
'content': 'I am Qwen, a large language model created by Alibaba Cloud. I am here to assist you'}]}]
!pip uninstall torch torchvision torchaudio -y
!pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121
!pip install aqlm[gpu,cpu]