ValueError: Pipeline with tokenizer without pad_token cannot do batching. You can try to set it with `pipe.tokenizer.pad_token_id = model.config.eos_token_id`.
Error description :ValueError: Pipeline with tokenizer without pad_token cannot do batching. You can try to set it with pipe.tokenizer.pad_token_id = model.config.eos_token_id
.
I am setting up my ReAct agent in this way as outlined below. It works well for these models.
checkpoint = "mistralai/Mistral-7B-Instruct-v0.3"
checkpoint = "microsoft/Orca-2-13b"
checkpoint="internlm/internlm2_5-7b"
checkpoint="stabilityai/StableBeluga-13B"
checkpoint="mistralai/Mistral-Nemo-Instruct-2407"
checkpoint="meta-llama/Meta-Llama-3-8B-Instruct"
But crashes for checkpoint="meta-llama/Meta-Llama-3.1-8B-Instruct"
I explored adding the tokenizer paddings at several spots throughout the code but it doesn't seem to work. Anyone seen and solved this error ?
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# Set the pad_token_id of the tokenizer to the eos_token_id of the model's configuration
tokenizer.pad_token_id = tokenizer.eos_token_id
model_kwargs = {
"max_length": 64,
"offload_folder": "offload",
"max_memory": {4: "11GB",3: "12GB", 0: "8GB", 1: "8GB", 2: "8GB"},
"quantization_config": quantization_config,
"pad_token_id": tokenizer.pad_token_id, #doesn't work
"low_cpu_mem_usage" : "True",
}
llm = HuggingFacePipeline.from_model_id(
model_id=checkpoint,
task="text-generation",
device_map="auto",
batch_size=4,
pipeline_kwargs={
"top_p": 1, # changed from 0.15
"temperature":0.3,
"do_sample": True, # changed from true
"torch_dtype": torch.float16, # bfloat16
"use_fast": True,
},
model_kwargs=model_kwargs
)
Hello, did you manage to solve it?
Hope this is helpful, I solved it by
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
batch_size=batch_size
)
pipeline.tokenizer.pad_token_id = pipeline.model.config.eos_token_id[0]
Seemed to have nudged me in the right direction, even though now I enjoy a "torch.cuda.OutOfMemoryError: CUDA out of memory." error. Well (╯°□°)╯︵ ┻━┻
Edit: With the below config, it appears to work.
quantization_config = BitsAndBytesConfig(load_in_8bit=True,llm_int8_threshold=25.0,llm_int8_enable_fp32_cpu_offload=True)
model_kwargs = {
"offload_folder": "offload",
"max_memory": {4: "11GB",3: "12GB", 0: "8GB", 1: "8GB", 2: "8GB"},
"quantization_config": quantization_config,
"low_cpu_mem_usage" : "True",
"device_map":"auto"
}
pipeline_kwargs = {
"top_p": 1, # changed from 0.15
"temperature": 0.7,
"do_sample": True, # changed from true
"torch_dtype": torch.float16, # bfloat16
"use_fast": True,
"max_new_tokens": 800,
"repetition_penalty": 1.1 # without this output begins repeating
}
pipe = transformers.pipeline(
"text-generation",
model=checkpoint,
max_new_tokens=1000
)
pipe.tokenizer.pad_token_id = pipe.model.config.eos_token_id[0]
llm = HuggingFacePipeline(
pipeline=pipe,
model_kwargs=model_kwargs,
pipeline_kwargs=pipeline_kwargs,
batch_size=2,
)
from langchain_huggingface import HuggingFacePipeline, ChatHuggingFace
from langchain_community.agent_toolkits import create_sql_agent
from transformers import AutoTokenizer, AutoModelForCausalLM
HF_TOKEN = ""
model_id = 'chuanli11/Llama-3.2-3B-Instruct-uncensored'
Load the tokenizer and set pad_token_id if necessary
tokenizer = AutoTokenizer.from_pretrained(model_id)
if tokenizer.pad_token is None:
tokenizer.pad_token_id = AutoModelForCausalLM.from_pretrained(model_id).config.eos_token_id
llm = HuggingFacePipeline.from_model_id(
model_id=model_id,
task="text-generation",
device=None,
model_kwargs=dict(
device_map="auto",
),
pipeline_kwargs=dict(
token=HF_TOKEN,
temperature=0.6,
max_new_tokens=512,
repetition_penalty=1.1
),
)
chat_models = ChatHuggingFace(llm=llm)
chat_models.llm.pipeline.tokenizer.pad_token_id = chat_models.llm.pipeline.tokenizer.eos_token_id
This will be the correct code