amd
/

Llama-3-8B-awq-g128-int4-asym-fp16-onnx-hybrid

Text Generation

Model card Files Files and versions Community

pooja-ganesh commited on Dec 13, 2024

Commit

1efec25

·

verified ·

1 Parent(s): 1320524

Update README.md

Files changed (1) hide show

README.md +33 -3

README.md CHANGED Viewed

@@ -16,11 +16,41 @@ tags:
 # Meta-Llama-3-8B-awq-g128-int4-asym-fp16-onnx-hybrid
 - ## Introduction
-  This model was created by applying [Quark](https://quark.docs.amd.com/latest/index.html) with calibration samples from Pile dataset, and applying [onnxruntime-genai model builder](https://github.com/microsoft/onnxruntime-genai/tree/main/src/python/py/models) to convert to ONNX.
 - ## Quantization Strategy
-  - AWQ / Group 128 / Asymmetric / FP16 activations / INT4 weights
 - ## Quick Start
-  For quickstart, refer to AMD [RyzenAI-SW-EA](https://account.amd.com/en/member/ryzenai-sw-ea.html) (to be updated)
 #### License
 Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.

 # Meta-Llama-3-8B-awq-g128-int4-asym-fp16-onnx-hybrid
 - ## Introduction
+  - Quantization Tool: Quark 0.6.0
+  - OGA Model Builder: v0.5.1
+  - Postprocess
 - ## Quantization Strategy
+  - AWQ / Group 128 / Asymmetric / UINT4 Weights / FP16 activations
+  - Excluded Layers: None
+  ```
+  python3 quantize_quark.py \
+        --model_dir "$model" \
+        --output_dir "$output_dir" \
+        --quant_scheme w_uint4_per_group_asym \
+        --num_calib_data 128 \
+        --quant_algo awq \
+        --dataset pileval_for_awq_benchmark \
+        --seq_len 512 \
+        --model_export quark_safetensors \
+        --data_type float16 \
+        --exclude_layers [] \
+        --custom_mode awq
+  ```
+- ## OGA Model Builder
+  ```
+  python builder.py \
+    -i <quantized safetensor model dir> \
+    -o <oga model output dir> \
+    -p int4 \
+    -e dml
+  ```
+- PostProcessed to generate Hybrid Model
 - ## Quick Start
+For quickstart, refer to AMD [RyzenAI-SW-EA](https://account.amd.com/en/member/ryzenai-sw-ea.html)
+#### Evaluation scores
+The perplexity measurement is run on the wikitext-2-raw-v1 (raw data) dataset provided by Hugging Face. Perplexity score measured for prompt length 2k is .
 #### License
 Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.