adding initial metrics
Browse files
README.md
CHANGED
@@ -37,10 +37,19 @@ outputs = model.generate(**(inputs.to('cuda')), max_new_tokens=1000)
|
|
37 |
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
38 |
```
|
39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
----------------------------------------------------------------------------------------------------------------------------------
|
41 |
</p>
|
42 |
|
43 |
### Quantization
|
|
|
44 |
You can reproduce the model using the following quant configs:
|
45 |
|
46 |
``` Python
|
@@ -70,4 +79,4 @@ quant_config['block_sparse_moe.experts.w3'] = experts_params
|
|
70 |
model.quantize_model(quant_config=quant_config, compute_dtype=torch.float16);
|
71 |
model.eval();
|
72 |
```
|
73 |
-
|
|
|
37 |
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
38 |
```
|
39 |
|
40 |
+
|
41 |
+
## Performance
|
42 |
+
| Models | Mixtral Original | HQQ quantized |
|
43 |
+
|-------------------|------------------|------------------|
|
44 |
+
| ARC (25-shot) | 70.22 | 66.47 |
|
45 |
+
| TruthfulQA-MC2 | 64.57 | 62.85 |
|
46 |
+
| Winogrande (5-shot)| 81.36 | 79.40 |
|
47 |
+
|
48 |
----------------------------------------------------------------------------------------------------------------------------------
|
49 |
</p>
|
50 |
|
51 |
### Quantization
|
52 |
+
|
53 |
You can reproduce the model using the following quant configs:
|
54 |
|
55 |
``` Python
|
|
|
79 |
model.quantize_model(quant_config=quant_config, compute_dtype=torch.float16);
|
80 |
model.eval();
|
81 |
```
|
82 |
+
The code in github at https://github.com/mobiusml/hqq/blob/master/examples/hf/mixtral_13GB_example.py
|