fbaldassarri commited on
Commit
6fa51f2
·
verified ·
1 Parent(s): 4666394

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -3
README.md CHANGED
@@ -1,3 +1,91 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - de
5
+ - fr
6
+ - it
7
+ - pt
8
+ - hi
9
+ - es
10
+ - th
11
+ license: apache-2.0
12
+ library_name: transformers
13
+ tags:
14
+ - autoround
15
+ - intel
16
+ - gptq
17
+ - woq
18
+ - meta
19
+ - pytorch
20
+ - transformers
21
+ model_name: SmolLM2 1.7B Instruct
22
+ base_model: HuggingFaceTB/SmolLM2-1.7B-Instruct
23
+ inference: false
24
+ model_creator: HuggingFaceTB
25
+ pipeline_tag: text-generation
26
+ prompt_template: '{prompt}
27
+ '
28
+ quantized_by: fbaldassarri
29
+ ---
30
+
31
+ ## Model Information
32
+
33
+ Quantized version of [HuggingFaceTB/SmolLM2-1.7B-Instruct](HuggingFaceTB/SmolLM2-1.7B-Instruct) using torch.float32 for quantization tuning.
34
+ - 4 bits (INT4)
35
+ - group size = 128
36
+ - Symmetrical Quantization
37
+ - Method AutoRound (WOQ)
38
+
39
+ Fast and low memory, 2-3X speedup (slight accuracy drop at W4G128)
40
+
41
+ Quantization framework: [Intel AutoRound](https://github.com/intel/auto-round)
42
+
43
+ Note: this INT4 version of SmolLM2-1.7B-Instruct has been quantized to run inference through CPU.
44
+
45
+ ## Replication Recipe
46
+
47
+ ### Step 1 Install Requirements
48
+
49
+ I suggest to install requirements into a dedicated python-virtualenv or a conda enviroment.
50
+
51
+ ```
52
+ python -m pip install <package> --upgrade
53
+ ```
54
+
55
+ - accelerate==1.0.1
56
+ - auto_gptq==0.7.1
57
+ - neural_compressor==3.1
58
+ - torch==2.3.0+cpu
59
+ - torchaudio==2.5.0+cpu
60
+ - torchvision==0.18.0+cpu
61
+ - transformers==4.45.2
62
+
63
+ ### Step 2 Build Intel Autoround wheel from sources
64
+
65
+ ```
66
+ python -m pip install git+https://github.com/intel/auto-round.git
67
+ ```
68
+
69
+ ### Step 3 Script for Quantization
70
+
71
+ ```
72
+ from transformers import AutoModelForCausalLM, AutoTokenizer
73
+ model_name = "HuggingFaceTB/SmolLM2-1.7B-Instruct"
74
+ model = AutoModelForCausalLM.from_pretrained(model_name)
75
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
76
+ from auto_round import AutoRound
77
+ bits, group_size, sym = 4, 128, True
78
+ autoround = AutoRound(model, tokenizer, nsamples=128, iters=200, seqlen=512, batch_size=4, bits=bits, group_size=group_size, sym=sym)
79
+ autoround.quantize()
80
+ output_dir = "./AutoRound/HuggingFaceTB_SmolLM2-1.7B-Instruct-auto_round-int4-gs128-sym"
81
+ autoround.save_quantized(output_dir, format='auto_round', inplace=True)
82
+ ```
83
+
84
+ ## License
85
+
86
+ [Apache 2.0 License](https://choosealicense.com/licenses/apache-2.0/)
87
+
88
+ ## Disclaimer
89
+
90
+ This quantized model comes with no warrenty. It has been developed only for research purposes.
91
+