Error processing images: 'MolmoProcessor' object is not callable

#23
by PadmajaVaishnavi - opened

I have an issue with using the molmo models allenai/Molmo-7B-D-0924 and allenai/MolmoE-1B-0924 for my ICL and ZSL tasks, it continuously throwing the error,” TypeError Traceback (most recent call last) in <cell line: 0>() 39 40 # Step 7: Process the images using the processor ---> 41 inputs = processor(images=example_images, return_tensors="pt", padding=True) 42 target_input = processor(images=target_image, return_tensors="pt") 43 TypeError: 'MolmoProcessor' object is not callable”, could you please help me with this error .

@PadmajaVaishnavi Can you post your entire code snippet so that I can look into this?

@amanrangapur

Code:

Install required libraries

!pip install transformers huggingface_hub scikit-learn pillow

import logging
from huggingface_hub import snapshot_download
from transformers import AutoModelForCausalLM, AutoProcessor
import torch
from PIL import Image
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

Set logging

logging.basicConfig(level=logging.INFO)

Define the model name

model_name = "allenai/MolmoE-1B-0924"

Download model checkpoint

logging.info(f"Downloading {model_name} model checkpoint...")
model_path = snapshot_download(model_name)

Load processor

logging.info("Loading processor...")
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)

Explicitly set the device to CPU

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Load the model on the selected device

logging.info("Initializing model...")
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True).to(device)

Upload example images

from google.colab import files

logging.info("Please upload example images (ensure paths match captions):")
uploaded = files.upload()

List of example images and their captions

example_images = [
("58658.jpg", "The screen is a tutorial screen having a large text element located at the top part of the screen."),
("59376.jpg", "A search app with a large background image located at the center part."),
("14085.jpg", "The interface looks like a maps app having a large background image component placed at the center part.")
]

Upload target image

logging.info("Please upload the target image for evaluation:")
target_uploaded = files.upload()
target_image_path = list(target_uploaded.keys())[0]

Load target image

target_image = Image.open(target_image_path)

Function to compute similarity score between generated caption and example captions

def compute_similarity(generated_caption, example_caption):
from sklearn.feature_extraction.text import TfidfVectorizer

# Vectorize both captions
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform([generated_caption, example_caption])

# Compute cosine similarity
cosine_sim = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])
return cosine_sim[0][0]

Loop through the example images and captions

for img_name, caption in example_images:
# Load the image
image = Image.open(img_name)

# Example Text Prompt
text_prompt = "Please describe the content of this image."

# Process the image and text
logging.info(f"Processing image: {img_name} with caption: {caption}...")
inputs = processor(images=image, text=text_prompt, return_tensors="pt").to(device)

# Perform inference
logging.info("Performing In-context Learning (ICL) inference...")
with torch.no_grad():
    outputs = model.generate(**inputs)

# Decode the output
generated_caption = processor.decode(outputs[0], skip_special_tokens=True)

# Compute similarity score between the generated caption and the example caption
similarity_score = compute_similarity(generated_caption, caption)
logging.info(f"Generated Caption: {generated_caption}")
logging.info(f"Example Caption: {caption}")
logging.info(f"Cosine Similarity Score: {similarity_score}")

Now, process the target image similarly

logging.info("Processing target image for evaluation...")
inputs = processor(images=target_image, text="Please describe the content of this image.", return_tensors="pt").to(device)

Perform inference for the target image

logging.info("Performing inference for target image...")
with torch.no_grad():
target_outputs = model.generate(**inputs)

Decode the output for the target image

target_generated_caption = processor.decode(target_outputs[0], skip_special_tokens=True)
logging.info(f"Generated Caption for Target Image: {target_generated_caption}")

ERROR:

TypeError Traceback (most recent call last)
in <cell line: 0>()
74 # Process the image and text
75 logging.info(f"Processing image: {img_name} with caption: {caption}...")
---> 76 inputs = processor(images=image, text=text_prompt, return_tensors="pt").to(device)
77
78 # Perform inference

TypeError: 'MolmoProcessor' object is not callable

Hey, looks like you need to use the appropriate method for preparing the image and text for the model. The correct way to call the processor is by using process method, eg:
inputs = processor.process(images=image, text='Please describe the content of this image.', return_tensors="pt").to(device)
If you're still facing the issue try to print processor attributes by: logging.info(dir(processor))

Sign up or log in to comment