Save quantised model throws error
I am trying to quantise the bert-base-uncased
model using the following code:
import torch
from transformers import BertTokenizer, BertModel
import os
from safetensors.torch import save_file
bert_base_model_path = "google-bert/bert-base-uncased"
bert_tokenizer = BertTokenizer.from_pretrained(bert_base_model_path)
bert_model = BertModel.from_pretrained(bert_base_model_path, output_attentions=True)
bert_device = torch.device("cpu")
bert_model.to(bert_device)
quantized_bert_model = torch.quantization.quantize_dynamic(
bert_model, {torch.nn.Linear}, dtype=torch.qint8
)
output_dir = "bert_base_uncased_quantised/"
SAFE_WEIGHTS_NAME = "model.safetensors"
state_dict = quantized_bert_model.state_dict()
save_file(state_dict, os.path.join(output_dir, SAFE_WEIGHTS_NAME), metadata={"format": "pt"})
When I try to save the same, I get the error:
ValueError: Key encoder.layer.0.attention.self.query._packed_params.dtype
is invalid, expected torch.Tensor but received <class 'torch.dtype'>
How can I save this quantised model, so I can reload it elsewhere using BertModel.from_pretrained
?