lokinfey
/

Phi-3.5-mini-instruct-ov-int4

Model card Files Files and versions Community

Phi-3.5-mini-instruct-ov-int4 / README.md

lokinfey's picture

Update README.md

8073bd9 verified 6 months ago

|

history blame contribute delete

1.4 kB

	---
	license: mit
	---

	# Phi-3.5 Instruct OpenVINO INT4 Model

	<b><span style="text-decoration:underline">Note: This is unoffical version,just for test and dev.</span></b>

	This is the OpenVINO format INT 4 quantized version of the Microsoft Phi-3.5 Instruct. You can use it with the Intel OpenVINO SDK.

	```bash

	optimum-cli export openvino --model "microsoft/Phi-3.5-mini-instruct" --task text-generation-with-past --weight-format int4 --group-size 128 --ratio 0.6 --sym --trust-remote-code ./model/phi3.5-instruct/int4

	```

	## Sample Code


	```python

	from transformers import AutoConfig, AutoTokenizer
	from optimum.intel.openvino import OVModelForCausalLM

	model_dir = 'Your Phi-3.5 OpenVINO Path'

	ov_config = {"PERFORMANCE_HINT": "LATENCY", "NUM_STREAMS": "1", "CACHE_DIR": ""}

	ov_model = OVModelForCausalLM.from_pretrained(
	model_dir,
	device='GPU',
	ov_config=ov_config,
	config=AutoConfig.from_pretrained(model_dir, trust_remote_code=True),
	trust_remote_code=True,
	)

	tok = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)

	tokenizer_kwargs = {"add_special_tokens": False}

	prompt = "<\|user\|>\nCan you introduce OpenVINO?\n<\|end\|><\|assistant\|>\n"

	input_tokens = tok(prompt, return_tensors="pt", **tokenizer_kwargs)

	answer = ov_model.generate(**input_tokens, max_new_tokens=1024)

	tok.batch_decode(answer, skip_special_tokens=True)[0]

	```