File size: 4,978 Bytes
d2e787d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
license: mit
language:
- multilingual
pipeline_tag: image-text-to-text
tags:
- nlp
- vision
- internvl
base_model:
- OpenGVLab/InternVL2-2B
base_model_relation: quantized
---

# InternVL2-2B-int4-ov

 * Model creator: [OpenGVLab](https://huggingface.co./OpenGVLab)
 * Original model: [InternVL2-2B](https://huggingface.co./OpenGVLab/InternVL2-2B)

## Description

This is [OpenGVLab/InternVL2-2B](https://huggingface.co./OpenGVLab/InternVL2-2B) model converted to the [OpenVINO™ IR](https://docs.openvino.ai/2024/documentation/openvino-ir-format.html) (Intermediate Representation) format with weights compressed to INT4 using Activation Aware Quantization (AWQ) by [NNCF](https://github.com/openvinotoolkit/nncf).


## Quantization Parameters

Weight compression was performed using `nncf.compress_weights` with the following parameters:

* mode: **INT4_ASYM**
* ratio: **1.0**
* group_size: **128**
* awq: **True**
* dataset: **[contextual](https://huggingface.co./datasets/ucla-contextual/contextual_test)**
* num_samples: **32**


## Compatibility

The provided OpenVINO™ IR model is compatible with:

* OpenVINO version 2025.0.0 and higher
* Optimum Intel 1.21.0 and higher

## Running Model Inference with [Optimum Intel](https://huggingface.co./docs/optimum/intel/index)

1. Install packages required for using [Optimum Intel](https://huggingface.co./docs/optimum/intel/index) integration with the OpenVINO backend:

```
pip install --pre -U --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release openvino_tokenizers openvino

pip install git+https://github.com/huggingface/optimum-intel.git
```

2. Run model inference

```
from PIL import Image 
import requests 
from optimum.intel.openvino import OVModelForVisualCausalLM
from transformers import AutoTokenizer, TextStreamer

model_id = "OpenVINO/InternVL2-2B-int4-ov"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

ov_model = OVModelForVisualCausalLM.from_pretrained(model_id, trust_remote_code=True)
prompt = "What is unusual on this picture?"

url = "https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/d5fbbd1a-d484-415c-88cb-9986625b7b11"
image = Image.open(requests.get(url, stream=True).raw)

inputs = ov_model.preprocess_inputs(text=prompt, image=image, tokenizer=tokenizer, config=ov_model.config)

generation_args = { 
    "max_new_tokens": 100, 
    "streamer": TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
} 

generate_ids = ov_model.generate(**inputs, **generation_args)

generate_ids = generate_ids[:, inputs['input_ids'].shape[1]:]
response = tokenizer.batch_decode(generate_ids, skip_special_tokens=True)[0]

```

## Running Model Inference with [OpenVINO GenAI](https://github.com/openvinotoolkit/openvino.genai)

1. Install packages required for using OpenVINO GenAI.
```
pip install --pre -U --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release openvino openvino-tokenizers openvino-genai

pip install huggingface_hub
```

2. Download model from HuggingFace Hub
   
```
import huggingface_hub as hf_hub

model_id = "OpenVINO/InternVL2-2B-int4-ov"
model_path = "InternVL2-2B-int4-ov"

hf_hub.snapshot_download(model_id, local_dir=model_path)

```

1. Run model inference:

```
import openvino_genai as ov_genai
import requests
from PIL import Image
from io import BytesIO
import numpy as np
import openvino as ov

device = "CPU"
pipe = ov_genai.VLMPipeline(model_path, device)

def load_image(image_file):
    if isinstance(image_file, str) and (image_file.startswith("http") or image_file.startswith("https")):
        response = requests.get(image_file)
        image = Image.open(BytesIO(response.content)).convert("RGB")
    else:
        image = Image.open(image_file).convert("RGB")
    image_data = np.array(image.getdata()).reshape(1, image.size[1], image.size[0], 3).astype(np.byte)
    return ov.Tensor(image_data)

prompt = "What is unusual on this picture?"

url = "https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/d5fbbd1a-d484-415c-88cb-9986625b7b11"
image_tensor = load_image(url)

def streamer(subword: str) -> bool:
    print(subword, end="", flush=True)
    return False

pipe.start_chat()
output = pipe.generate(prompt, image=image_tensor, max_new_tokens=100, streamer=streamer)
pipe.start_chat()
```

More GenAI usage examples can be found in OpenVINO GenAI library [docs](https://github.com/openvinotoolkit/openvino.genai/blob/master/src/README.md) and [samples](https://github.com/openvinotoolkit/openvino.genai?tab=readme-ov-file#openvino-genai-samples)


## Limitations


Check the original [model card](https://huggingface.co./OpenGVLab/InternVL2-2B) for limitations.

## Legal information

The original model is distributed under [MIT](https://huggingface.co./datasets/choosealicense/licenses/blob/main/markdown/mit.md) license. More details can be found in [original model card](https://huggingface.co./OpenGVLab/InternVL2-2B).