Llamacpp Quantizations of Meta-Llama-3.1-8B

Using llama.cpp release b3583 for quantization.

Original model: https://huggingface.co./google/gemma-7b

Download a file (not the whole branch) from below:

Filename Quant type File Size Perplexity (wikitext-2-raw-v1.test)
gemma-7b.BF16.gguf BF16 17.1 GB 6.9857 +/- 0.04411
gemma-7b-Q8_0.gguf Q8_0 9.08 GB 7.0373 +/- 0.04456
gemma-7b-Q6_K.gguf Q6_K 7.01 GB 7.3858 +/- 0.04762
gemma-7b-Q5_K_M.gguf Q5_K_M 6.14 GB 7.4227 +/- 0.04781
gemma-7b-Q5_K_S.gguf Q5_K_S 5.98 GB 7.5232 +/- 0.04857
gemma-7b-Q4_K_M.gguf Q4_K_M 5.33 GB 7.5800 +/- 0.04918
gemma-7b-Q4_K_S.gguf Q4_K_S 5.05 GB 7.9673 +/- 0.05225
gemma-7b-Q3_K_L.gguf Q3_K_L 4.71 GB 7.9586 +/- 0.05186
gemma-7b-Q3_K_M.gguf Q3_K_M 4.37 GB 8.4077 +/- 0.05545
gemma-7b-Q3_K_S.gguf Q3_K_S 3.98 GB 102.6126 +/- 1.62310
gemma-7b-Q2_K.gguf Q2_K 3.48 GB 3970.5385 +/- 102.46527

Downloading using huggingface-cli

First, make sure you have hugginface-cli installed:

pip install -U "huggingface_hub[cli]"

Then, you can target the specific file you want:

huggingface-cli download fedric95/gemma-7b-GGUF --include "gemma-7b-Q4_K_M.gguf" --local-dir ./

If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:

huggingface-cli download fedric95/gemma-7b-GGUF --include "gemma-7b-Q8_0.gguf/*" --local-dir gemma-7b-Q8_0

You can either specify a new local-dir (gemma-7b-Q8_0) or download them all in place (./)

Reproducibility

https://github.com/ggerganov/llama.cpp/discussions/9020#discussioncomment-10335638

Downloads last month
322
GGUF
Model size
8.54B params
Architecture
gemma

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for fedric95/gemma-7b-GGUF

Base model

google/gemma-7b
Quantized
(22)
this model