Unexpected behavior of Phi-3.5-vision model with respect to 4K token length.
#14
by
ita9naiwa
- opened
While testing the Phi3.5 vision model, I noticed that if prompt length is less than 4096 and len(prompt) + len(generation tokens) is greater than 4096, the model either creates dummy output or stops generating.
In this case, the model tends to generate <|end|>
tokens (in logit distribution), but generates dummy output because <|end|>
is not a stop token in the HF implementation.
Here is the scripts I used to generate this behavior, but brought from code snippet in https://huggingface.co./microsoft/Phi-3.5-vision-instruct
I just reduced the number of images in the prompt.
from PIL import Image
import requests
from transformers import AutoModelForCausalLM
from transformers import AutoProcessor
model_id = "/opt/models/phi/Phi-3.5-vision-instruct/"
# Note: set _attn_implementation='eager' if you don't have flash_attn installed
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="cuda",
trust_remote_code=True,
torch_dtype="auto",
_attn_implementation='eager'
)
# for best performance, use num_crops=4 for multi-frame, num_crops=16 for single-frame.
processor = AutoProcessor.from_pretrained(model_id,
trust_remote_code=True,
num_crops=4
)
images = []
placeholder = ""
# Note: if OOM, you might consider reduce number of frames in this example.
for i in range(1, 6):
url = f"https://image.slidesharecdn.com/azureintroduction-191206101932/75/Introduction-to-Microsoft-Azure-Cloud-{i}-2048.jpg"
images.append(Image.open(requests.get(url, stream=True).raw))
placeholder += f"<|image_{i}|>\n"
messages = [
{"role": "user", "content": placeholder+"Summarize the deck of slides."},
]
prompt = processor.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = processor(prompt, images, return_tensors="pt").to("cuda:0")
generation_args = {
"max_new_tokens": 512,
"temperature": 0.0,
}
generate_ids = model.generate(**inputs,
eos_token_id=processor.tokenizer.eos_token_id,
**generation_args
)
print("input prompt size", inputs.input_ids.shape[1])
generate_ids = generate_ids[:, inputs['input_ids'].shape[1]:]
print("output tokens size", generate_ids.shape[1])
response = processor.batch_decode(generate_ids,
skip_special_tokens=True,
clean_up_tokenization_spaces=False)[0]
it prints
input prompt size 3817
output tokens size 512
To encapsulate, the slides feature the following segments:
- Introduction to Azure:
The presentation introduces Microsoft Azure, a cloud computing platform. It highlights the three types of Azure services: Enterprise, Hybrid, and Hyper-scale. The presenter is Dinesh Kumar Wickramasinghe, a Senior Software Engineer from CMS Private Limited in Sri Lanka.
- Azure Services Overview:
Azure offers a continuously expanding set of cloud services to help organizations meet their current and future business challenges. It provides the freedom to build, manage, and deploy applications on a massive global network using favorite tools and frameworks.
- Cloud Computing Models:
The presentation explains the three main models of cloud computing: IaaS (Infrastructure-as-a-Service), PaaS (Platform-as-a-Service), and SaaS (Software-as-a-Service). Each model is represented by a unique icon and color.
- Cloud Service Comparison:
The presentation compares the roles of the user in different cloud service models using a dining table analogy. In IaaS, the user manages the infrastructure. In PaaS, the user manages the platform. In SaaS, the user manages, and and and services. The, and the and the service. The the and and the service.
s the service. The and and the service. The is the service. The service. The service. The service. The service. The service. The service. The and and the is the service. The service.
and the and the service. The and and and
and and the and ands.
in the service. The service. The in the
and and the to the
the the the home.
and and the the the vendor.
and and and ands and and and and the me.
and and and and and and and and and and and and and
and
and
and and and:
and
and
andS
How can it be solved, or is it an intended behavior?
@ita9naiwa fixed, please check again
haipingwu
changed discussion status to
closed