pooja-ganesh commited on
Commit
1efec25
·
verified ·
1 Parent(s): 1320524

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -3
README.md CHANGED
@@ -16,11 +16,41 @@ tags:
16
 
17
  # Meta-Llama-3-8B-awq-g128-int4-asym-fp16-onnx-hybrid
18
  - ## Introduction
19
- This model was created by applying [Quark](https://quark.docs.amd.com/latest/index.html) with calibration samples from Pile dataset, and applying [onnxruntime-genai model builder](https://github.com/microsoft/onnxruntime-genai/tree/main/src/python/py/models) to convert to ONNX.
 
 
20
  - ## Quantization Strategy
21
- - AWQ / Group 128 / Asymmetric / FP16 activations / INT4 weights
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  - ## Quick Start
23
- For quickstart, refer to AMD [RyzenAI-SW-EA](https://account.amd.com/en/member/ryzenai-sw-ea.html) (to be updated)
 
 
 
24
 
25
  #### License
26
  Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.
 
16
 
17
  # Meta-Llama-3-8B-awq-g128-int4-asym-fp16-onnx-hybrid
18
  - ## Introduction
19
+ - Quantization Tool: Quark 0.6.0
20
+ - OGA Model Builder: v0.5.1
21
+ - Postprocess
22
  - ## Quantization Strategy
23
+ - AWQ / Group 128 / Asymmetric / UINT4 Weights / FP16 activations
24
+ - Excluded Layers: None
25
+ ```
26
+ python3 quantize_quark.py \
27
+ --model_dir "$model" \
28
+ --output_dir "$output_dir" \
29
+ --quant_scheme w_uint4_per_group_asym \
30
+ --num_calib_data 128 \
31
+ --quant_algo awq \
32
+ --dataset pileval_for_awq_benchmark \
33
+ --seq_len 512 \
34
+ --model_export quark_safetensors \
35
+ --data_type float16 \
36
+ --exclude_layers [] \
37
+ --custom_mode awq
38
+ ```
39
+ - ## OGA Model Builder
40
+ ```
41
+ python builder.py \
42
+ -i <quantized safetensor model dir> \
43
+ -o <oga model output dir> \
44
+ -p int4 \
45
+ -e dml
46
+ ```
47
+ - PostProcessed to generate Hybrid Model
48
+
49
  - ## Quick Start
50
+ For quickstart, refer to AMD [RyzenAI-SW-EA](https://account.amd.com/en/member/ryzenai-sw-ea.html)
51
+
52
+ #### Evaluation scores
53
+ The perplexity measurement is run on the wikitext-2-raw-v1 (raw data) dataset provided by Hugging Face. Perplexity score measured for prompt length 2k is .
54
 
55
  #### License
56
  Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.