InternVL2-2B-int8-ov

Model creator: OpenGVLab
Original model: InternVL2-2B

Description

This is OpenGVLab/InternVL2-2B model converted to the OpenVINO™ IR (Intermediate Representation) format with weights compressed to INT8 by NNCF.

Quantization Parameters

Weight compression was performed using nncf.compress_weights with the following parameters:

mode: INT8_ASYM

Compatibility

The provided OpenVINO™ IR model is compatible with:

OpenVINO version 2025.0.0 and higher
Optimum Intel 1.21.0 and higher

Running Model Inference with Optimum Intel

Install packages required for using Optimum Intel integration with the OpenVINO backend:

pip install --pre -U --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release openvino_tokenizers openvino

pip install git+https://github.com/huggingface/optimum-intel.git

Run model inference

from PIL import Image 
import requests 
from optimum.intel.openvino import OVModelForVisualCausalLM
from transformers import AutoTokenizer, TextStreamer

model_id = "OpenVINO/InternVL2-2B-int8-ov"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

ov_model = OVModelForVisualCausalLM.from_pretrained(model_id, trust_remote_code=True)
prompt = "What is unusual on this picture?"

url = "https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/d5fbbd1a-d484-415c-88cb-9986625b7b11"
image = Image.open(requests.get(url, stream=True).raw)

inputs = ov_model.preprocess_inputs(text=prompt, image=image, tokenizer=tokenizer, config=ov_model.config)

generation_args = { 
    "max_new_tokens": 100, 
    "streamer": TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
} 

generate_ids = ov_model.generate(**inputs, **generation_args)

generate_ids = generate_ids[:, inputs['input_ids'].shape[1]:]
response = tokenizer.batch_decode(generate_ids, skip_special_tokens=True)[0]

Running Model Inference with OpenVINO GenAI

Install packages required for using OpenVINO GenAI.

pip install --pre -U --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release openvino openvino-tokenizers openvino-genai

pip install huggingface_hub

Download model from HuggingFace Hub

import huggingface_hub as hf_hub

model_id = "OpenVINO/InternVL2-2B-int8-ov"
model_path = "InternVL2-2B-int8-ov"

hf_hub.snapshot_download(model_id, local_dir=model_path)

Run model inference:

import openvino_genai as ov_genai
import requests
from PIL import Image
from io import BytesIO
import numpy as np
import openvino as ov

device = "CPU"
pipe = ov_genai.VLMPipeline(model_path, device)

def load_image(image_file):
    if isinstance(image_file, str) and (image_file.startswith("http") or image_file.startswith("https")):
        response = requests.get(image_file)
        image = Image.open(BytesIO(response.content)).convert("RGB")
    else:
        image = Image.open(image_file).convert("RGB")
    image_data = np.array(image.getdata()).reshape(1, image.size[1], image.size[0], 3).astype(np.byte)
    return ov.Tensor(image_data)

prompt = "What is unusual on this picture?"

url = "https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/d5fbbd1a-d484-415c-88cb-9986625b7b11"
image_tensor = load_image(url)

def streamer(subword: str) -> bool:
    print(subword, end="", flush=True)
    return False

pipe.start_chat()
output = pipe.generate(prompt, image=image_tensor, max_new_tokens=100, streamer=streamer)
pipe.finish_chat()

More GenAI usage examples can be found in OpenVINO GenAI library docs and samples

Limitations

Check the original model card for limitations.

Legal information

The original model is distributed under MIT license. More details can be found in original model card.

OpenVINO
/

InternVL2-2B-int8-ov