lokinfey commited on
Commit
3edfbbb
·
verified ·
1 Parent(s): 559e6ea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -3
README.md CHANGED
@@ -1,3 +1,48 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # **Phi-4 OpenVINO INT4 Model**
6
+
7
+ <b><span style="text-decoration:underline">Note: This is unoffical version,just for test and dev.</span></b>
8
+
9
+ This is the OpenVINO format INT 4 quantized version of the Microsoft Phi-4 . You can use it with the Intel OpenVINO SDK.
10
+
11
+ ```bash
12
+
13
+ optimum-cli export openvino --model .\Your Phi-4 path --task text-generation-with-past --weight-format int4 --sym --group-size 128 --ratio 0.6 --sym --trust-remote-code .\Your output Phi-4 OpenVINO location
14
+ ```
15
+
16
+ ## **Sample Code**
17
+
18
+
19
+ ```python
20
+
21
+ from transformers import AutoConfig, AutoTokenizer
22
+ from optimum.intel.openvino import OVModelForCausalLM
23
+
24
+ model_dir = 'Your Phi-4 OpenVINO Path'
25
+
26
+ ov_config = {"PERFORMANCE_HINT": "LATENCY", "NUM_STREAMS": "1", "CACHE_DIR": ""}
27
+
28
+ ov_model = OVModelForCausalLM.from_pretrained(
29
+ model_dir,
30
+ device='GPU',
31
+ ov_config=ov_config,
32
+ config=AutoConfig.from_pretrained(model_dir, trust_remote_code=True),
33
+ trust_remote_code=True,
34
+ )
35
+
36
+ tok = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
37
+
38
+ tokenizer_kwargs = {"add_special_tokens": False}
39
+
40
+ prompt = "<|user|>\nI have $20,000 in my savings account, where I receive a 4% profit per year and payments twice a year. Can you please tell me how long it will take for me to become a millionaire? Also, can you please explain the math step by step as if you were explaining it to an uneducated person?\n<|end|><|assistant|>\n"
41
+
42
+ input_tokens = tok(prompt, return_tensors="pt", **tokenizer_kwargs)
43
+
44
+ answer = ov_model.generate(**input_tokens, max_new_tokens=1024)
45
+
46
+ tok.batch_decode(answer, skip_special_tokens=True)[0]
47
+
48
+ ```