ValueError: Pipeline with tokenizer without pad_token cannot do batching. You can try to set it with `pipe.tokenizer.pad_token_id = model.config.eos_token_id`.

#39
by jsemrau - opened

Error description :ValueError: Pipeline with tokenizer without pad_token cannot do batching. You can try to set it with pipe.tokenizer.pad_token_id = model.config.eos_token_id.

I am setting up my ReAct agent in this way as outlined below. It works well for these models.
checkpoint = "mistralai/Mistral-7B-Instruct-v0.3"
checkpoint = "microsoft/Orca-2-13b"
checkpoint="internlm/internlm2_5-7b"
checkpoint="stabilityai/StableBeluga-13B"
checkpoint="mistralai/Mistral-Nemo-Instruct-2407"
checkpoint="meta-llama/Meta-Llama-3-8B-Instruct"

But crashes for checkpoint="meta-llama/Meta-Llama-3.1-8B-Instruct"
I explored adding the tokenizer paddings at several spots throughout the code but it doesn't seem to work. Anyone seen and solved this error ?

    tokenizer = AutoTokenizer.from_pretrained(checkpoint)


        # Set the pad_token_id of the tokenizer to the eos_token_id of the model's configuration
        tokenizer.pad_token_id = tokenizer.eos_token_id

        model_kwargs = {
            "max_length": 64,
            "offload_folder": "offload",
            "max_memory": {4: "11GB",3: "12GB", 0: "8GB", 1: "8GB", 2: "8GB"},
            "quantization_config": quantization_config,
            "pad_token_id": tokenizer.pad_token_id, #doesn't work
            "low_cpu_mem_usage" : "True",

        }

        llm = HuggingFacePipeline.from_model_id(
            model_id=checkpoint,
            task="text-generation",
            device_map="auto",
            batch_size=4,
            pipeline_kwargs={
                "top_p": 1,  # changed from 0.15
                "temperature":0.3,
                "do_sample": True,  # changed from true
                "torch_dtype": torch.float16,  # bfloat16
                "use_fast": True,
            },
            model_kwargs=model_kwargs
        )

Hello, did you manage to solve it?

Hope this is helpful, I solved it by

pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
batch_size=batch_size
)

pipeline.tokenizer.pad_token_id = pipeline.model.config.eos_token_id[0]

Seemed to have nudged me in the right direction, even though now I enjoy a "torch.cuda.OutOfMemoryError: CUDA out of memory." error. Well (╯°□°)╯︵ ┻━┻

Edit: With the below config, it appears to work.

quantization_config = BitsAndBytesConfig(load_in_8bit=True,llm_int8_threshold=25.0,llm_int8_enable_fp32_cpu_offload=True)

        model_kwargs = {
            "offload_folder": "offload",
            "max_memory": {4: "11GB",3: "12GB", 0: "8GB", 1: "8GB", 2: "8GB"},
            "quantization_config": quantization_config,
            "low_cpu_mem_usage" : "True",
          "device_map":"auto"
        }

        pipeline_kwargs = {
            "top_p": 1,  # changed from 0.15
            "temperature": 0.7,
            "do_sample": True,  # changed from true
            "torch_dtype": torch.float16,  # bfloat16
            "use_fast": True,
            "max_new_tokens": 800,
            "repetition_penalty": 1.1  # without this output begins repeating
        }

        pipe = transformers.pipeline(
                            "text-generation",
                            model=checkpoint,
                            max_new_tokens=1000
        )

        pipe.tokenizer.pad_token_id = pipe.model.config.eos_token_id[0]

        llm = HuggingFacePipeline(
                                    pipeline=pipe,
                                    model_kwargs=model_kwargs,
                                    pipeline_kwargs=pipeline_kwargs,
                                    batch_size=2,
                                 )

https://jdsemrau.substack.com/

from langchain_huggingface import HuggingFacePipeline, ChatHuggingFace
from langchain_community.agent_toolkits import create_sql_agent
from transformers import AutoTokenizer, AutoModelForCausalLM

HF_TOKEN = ""
model_id = 'chuanli11/Llama-3.2-3B-Instruct-uncensored'

Load the tokenizer and set pad_token_id if necessary

tokenizer = AutoTokenizer.from_pretrained(model_id)
if tokenizer.pad_token is None:
tokenizer.pad_token_id = AutoModelForCausalLM.from_pretrained(model_id).config.eos_token_id

llm = HuggingFacePipeline.from_model_id(
model_id=model_id,
task="text-generation",
device=None,
model_kwargs=dict(
device_map="auto",
),
pipeline_kwargs=dict(
token=HF_TOKEN,
temperature=0.6,
max_new_tokens=512,
repetition_penalty=1.1
),
)
chat_models = ChatHuggingFace(llm=llm)

chat_models.llm.pipeline.tokenizer.pad_token_id = chat_models.llm.pipeline.tokenizer.eos_token_id

This will be the correct code

Sign up or log in to comment