GGUF?

#11

by johnnnna - opened Feb 21, 2024

Discussion

johnnnna

Feb 21, 2024

Please ^-^

manvinder

Google org Feb 21, 2024

there is a GGUF file provided in the repo

qpqpqp

Feb 21, 2024

it's quite large, can we get a quant?

qpqpqp

Feb 21, 2024

@TheBloke ?

aptha

Feb 21, 2024

Please @TheBloke

nonetrix

Feb 21, 2024

•

edited Feb 21, 2024

Seems really large for a GGUF file, I have enough memory but why is it so large? Is it FP16 etc.? Other variants should be provided. I found these though, haven't tested them
https://huggingface.co./mlabonne/gemma-2b-GGUF

He also has a 7B version, but the repo seems empty:
https://huggingface.co./mlabonne/gemma-7b-it-GGUF

Download at your own risk of course

postmasters

Google org Feb 21, 2024

You can run quantize (include in llama.cpp repo) to get Q8_0 versions. I expect the community will spring up with various quantized versions very soon too.

postmasters

Google org Feb 21, 2024

why is it so large? Is it FP16 etc.?

Yes, it is float 32.

msanterre

Feb 21, 2024

•

edited Feb 21, 2024

This is the command to quantize to 4-bits. It assumes you have llama.cpp built and installed.

8-bit: quantize gemma-7b.gguf ./gemma-7b-Q8_0.gguf Q8_0
4-bit: quantize gemma-7b.gguf ./gemma-7b-Q4_K_M.gguf Q4_K_M

aptha

Feb 21, 2024

I tried the GGUF from https://huggingface.co./rahuldshetty/gemma-7b-it-gguf-quantized in ollama
But it crashes! Any facing same issue?

postmasters

Google org Feb 21, 2024

•

edited Feb 21, 2024

@aptha a dumb question but are you compling from the latest ollama source, including updating its llama.cpp submodule?

msanterre

Feb 22, 2024

@aptha try with a 8-bit quantized version. Ollama crashes if you're out of memory.

JoseferEins

Feb 23, 2024

llm = CTransformers(model="mlabonne/Gemmalpaca-2B-GGUF", model_file="gemmalpaca-2b.Q8_0.gguf", model_type="gemma", gpu_layers=0)

this one doesn't work. Is there a generic way to open gguf files with CTransformers?

johnnnna changed discussion status to closed Feb 23, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment