|
--- |
|
license: mit |
|
--- |
|
|
|
# **Phi-3.5 Instruct OpenVINO INT4 Model** |
|
|
|
<b><span style="text-decoration:underline">Note: This is unoffical version,just for test and dev.</span></b> |
|
|
|
This is the OpenVINO format INT 4 quantized version of the Microsoft Phi-3.5 Instruct. You can use it with the Intel OpenVINO SDK. |
|
|
|
```bash |
|
|
|
optimum-cli export openvino --model "microsoft/Phi-3.5-mini-instruct" --task text-generation-with-past --weight-format int4 --group-size 128 --ratio 0.6 --sym --trust-remote-code ./model/phi3.5-instruct/int4 |
|
|
|
``` |
|
|
|
## **Sample Code** |
|
|
|
|
|
```python |
|
|
|
from transformers import AutoConfig, AutoTokenizer |
|
from optimum.intel.openvino import OVModelForCausalLM |
|
|
|
model_dir = 'Your Phi-3.5 OpenVINO Path' |
|
|
|
ov_config = {"PERFORMANCE_HINT": "LATENCY", "NUM_STREAMS": "1", "CACHE_DIR": ""} |
|
|
|
ov_model = OVModelForCausalLM.from_pretrained( |
|
model_dir, |
|
device='GPU', |
|
ov_config=ov_config, |
|
config=AutoConfig.from_pretrained(model_dir, trust_remote_code=True), |
|
trust_remote_code=True, |
|
) |
|
|
|
tok = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True) |
|
|
|
tokenizer_kwargs = {"add_special_tokens": False} |
|
|
|
prompt = "<|user|>\nCan you introduce OpenVINO?\n<|end|><|assistant|>\n" |
|
|
|
input_tokens = tok(prompt, return_tensors="pt", **tokenizer_kwargs) |
|
|
|
answer = ov_model.generate(**input_tokens, max_new_tokens=1024) |
|
|
|
tok.batch_decode(answer, skip_special_tokens=True)[0] |
|
|
|
``` |
|
|