ibnzterrell commited on
Commit
4639aa6
·
verified ·
1 Parent(s): 601bbc4

Update compatibility in README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -28,7 +28,7 @@ base_model:
28
 
29
  This model was quantized using [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) from FP16 down to INT4 using GEMM kernels, with zero-point quantization and a group size of 128.
30
 
31
- Hardware: Intel Xeon CPU E5-2699A v4 @ 2.40GHz, 256GB of RAM, and 2x NVIDIA RTX 3090. I have only tested this with vLLM, but this should work on any platform that supports LLama 3.1 70B Instruct AWQ INT4. The primary limiting factor seems to be whether the platform supports Rotary Positional Embeddings (RoPE).
32
 
33
  Model usage (inference) information for Transformers, AutoAWQ, Text Generation Interface (TGI), and vLLM , as well as quantization reproduction details, are below.
34
 
 
28
 
29
  This model was quantized using [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) from FP16 down to INT4 using GEMM kernels, with zero-point quantization and a group size of 128.
30
 
31
+ Hardware: Intel Xeon CPU E5-2699A v4 @ 2.40GHz, 256GB of RAM, and 2x NVIDIA RTX 3090. This should work on any platform that supports LLama 3.1 70B Instruct AWQ INT4.
32
 
33
  Model usage (inference) information for Transformers, AutoAWQ, Text Generation Interface (TGI), and vLLM , as well as quantization reproduction details, are below.
34