chenghenry
/

gemma-2-27b-it-GGUF

Inference Endpoints

Model card Files Files and versions Community

chenghenry commited on Jul 5

Commit

0fd5b3b

•

1 Parent(s): bd932ad

Create README.md

Files changed (1) hide show

README.md +40 -0

README.md ADDED Viewed

	@@ -0,0 +1,40 @@

+---
+license: gemma
+library_name: transformers
+base_model: google/gemma-2-27b-it
+---
+## Model
+- Quantized Gemma 2 27B Instruction Tuned with IQ3_M
+- Fit a single T4 (16GB)
+## Usage (llama-cli with GPU):
+```
+llama-cli -m ./gemma-2-27b-it-IQ3_M.gguf -ngl 100 --temp 0 --repeat-penalty 1.0 --color -p "Why is the sky blue?"
+```
+## Usage (llama-cli with CPU):
+```
+llama-cli -m ./gemma-2-27b-it-IQ3_M.gguf --temp 0 --repeat-penalty 1.0 --color -p "Why is the sky blue?"
+```
+## Usage (llama-cpp-python via Hugging Face Hub):
+```
+from llama_cpp import Llama
+llm = Llama.from_pretrained(
+    repo_id="chenghenry/gemma-2-27b-it-GGUF ",
+    filename="gemma-2-27b-it-IQ3_M.gguf",
+    n_ctx=8192,
+    n_batch=2048,
+    n_gpu_layers=100,
+    verbose=False,
+    chat_format="gemma"
+)
+prompt = "Why is the sky blue?"
+messages = [{"role": "user", "content": prompt}]
+response = llm.create_chat_completion(
+    messages=messages,
+    repeat_penalty=1.0,
+    temperature=0)
+print(response["choices"][0]["message"]["content"])
+```