Fails to quntize T5 (xl and xxl) models

#116
by girishponkiya - opened

I successfully quantized T5-large (google/flan-t5-large), but my attempts to quantize t5-xl (google/flan-t5-xl) and t5-xxl (google/flan-t5-xxl) fails.

Error: Error converting to fp16: b'INFO:hf-to-gguf:Loading model: t5-11b\nINFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only\nINFO:hf-to-gguf:Exporting model...\nINFO:hf-to-gguf:gguf: loading model part 'pytorch_model.bin'\nTraceback (most recent call last):\n File "/home/user/app/llama.cpp/convert_hf_to_gguf.py", line 4067, in \n main()\n File "/home/user/app/llama.cpp/convert_hf_to_gguf.py", line 4061, in main\n model_instance.write()\n File "/home/user/app/llama.cpp/convert_hf_to_gguf.py", line 390, in write\n self.prepare_tensors()\n File "/home/user/app/llama.cpp/convert_hf_to_gguf.py", line 264, in prepare_tensors\n for name, data_torch in self.get_tensors():\n File "/home/user/app/llama.cpp/convert_hf_to_gguf.py", line 155, in get_tensors\n ctx = contextlib.nullcontext(torch.load(str(self.dir_model / part_name), map_location="cpu", mmap=True, weights_only=True))\n File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/torch/serialization.py", line 1032, in load\n raise RuntimeError("mmap can only be used with files saved with "\nRuntimeError: mmap can only be used with files saved with `torch.save(_use_new_zipfile_serialization=True), please torch.save your checkpoint with this option in order to use mmap.\n'

I was monitoring the CPU and RAM usage; my observation:

  • RAM usage keeps increasing and CPU usage is >50% during this time.
  • Eventually, RAM is full (50/50GB), and CPU usage goes down ~10%
  • After sometime, the process fails.
This comment has been hidden

Sign up or log in to comment