Visual Language Models
Collection
Collection of OpenVINO optimized models for visual-language assistance
•
8 items
•
Updated
This is OpenGVLab/InternVL2-2B model converted to the OpenVINO™ IR (Intermediate Representation) format with weights compressed to INT8 by NNCF.
Weight compression was performed using nncf.compress_weights
with the following parameters:
The provided OpenVINO™ IR model is compatible with:
pip install --pre -U --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release openvino_tokenizers openvino
pip install git+https://github.com/huggingface/optimum-intel.git
from PIL import Image
import requests
from optimum.intel.openvino import OVModelForVisualCausalLM
from transformers import AutoTokenizer, TextStreamer
model_id = "OpenVINO/InternVL2-2B-int8-ov"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
ov_model = OVModelForVisualCausalLM.from_pretrained(model_id, trust_remote_code=True)
prompt = "What is unusual on this picture?"
url = "https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/d5fbbd1a-d484-415c-88cb-9986625b7b11"
image = Image.open(requests.get(url, stream=True).raw)
inputs = ov_model.preprocess_inputs(text=prompt, image=image, tokenizer=tokenizer, config=ov_model.config)
generation_args = {
"max_new_tokens": 100,
"streamer": TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
}
generate_ids = ov_model.generate(**inputs, **generation_args)
generate_ids = generate_ids[:, inputs['input_ids'].shape[1]:]
response = tokenizer.batch_decode(generate_ids, skip_special_tokens=True)[0]
pip install --pre -U --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release openvino openvino-tokenizers openvino-genai
pip install huggingface_hub
import huggingface_hub as hf_hub
model_id = "OpenVINO/InternVL2-2B-int8-ov"
model_path = "InternVL2-2B-int8-ov"
hf_hub.snapshot_download(model_id, local_dir=model_path)
import openvino_genai as ov_genai
import requests
from PIL import Image
from io import BytesIO
import numpy as np
import openvino as ov
device = "CPU"
pipe = ov_genai.VLMPipeline(model_path, device)
def load_image(image_file):
if isinstance(image_file, str) and (image_file.startswith("http") or image_file.startswith("https")):
response = requests.get(image_file)
image = Image.open(BytesIO(response.content)).convert("RGB")
else:
image = Image.open(image_file).convert("RGB")
image_data = np.array(image.getdata()).reshape(1, image.size[1], image.size[0], 3).astype(np.byte)
return ov.Tensor(image_data)
prompt = "What is unusual on this picture?"
url = "https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/d5fbbd1a-d484-415c-88cb-9986625b7b11"
image_tensor = load_image(url)
def streamer(subword: str) -> bool:
print(subword, end="", flush=True)
return False
pipe.start_chat()
output = pipe.generate(prompt, image=image_tensor, max_new_tokens=100, streamer=streamer)
pipe.finish_chat()
More GenAI usage examples can be found in OpenVINO GenAI library docs and samples
Check the original model card for limitations.
The original model is distributed under MIT license. More details can be found in original model card.
Base model
OpenGVLab/InternVL2-2B