This repo provides the GGUF format for the aksara_v1 model. This model has a precision of 4-bit and is capable of doing inference with GPU as well as CPU only.
To run using Python:
- Install llama-cpp-python:
! CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
- Download the model:
from huggingface_hub import hf_hub_download
model_name = "cropinailab/aksara_v1_GGUF"
model_file = "aksara_v1.Q4_K_M.gguf"
model_path = hf_hub_download(model_name,
filename=model_file,
token='<YOUR_HF_TOKEN>'
local_dir='<PATH_TO_SAVE_MODEL>')
- Run the model:
from llama_cpp import Llama
llm = Llama(
model_path=model_path, # path to GGUF file
n_ctx=4096, # The max sequence length to use - note that longer sequence lengths require much more resources
n_gpu_layers=-1, # The number of layers to offload to GPU, if you have GPU acceleration available.
# Set to 0 if no GPU acceleration is available on your system and -1 for all GPU layers.
)
prompt = "What are the recommended NPK dosage for maize varieties?"
# Simple inference example
output = llm(
f"<|user|>\n{prompt}<|end|>\n<|assistant|>",
max_tokens=512, # Generate up to 512 tokens
stop=["<|end|>"],
echo=True, # Whether to echo the prompt
)
print(output['choices'][0]['text'])
For using the model with a more detailed pipeline refer to the following notebook
- Downloads last month
- 0