|
--- |
|
license: mit |
|
language: |
|
- en |
|
base_model: |
|
- NousResearch/Hermes-3-Llama-3.1-8B |
|
--- |
|
# Inference with Your Model |
|
|
|
This guide explains how to run inference with your custom model using the Hugging Face `transformers` library. |
|
|
|
## Prerequisites |
|
|
|
Make sure you have the following dependencies installed: |
|
|
|
- Python 3.7+ |
|
- PyTorch |
|
- Hugging Face `transformers` library |
|
|
|
You can install the required packages using pip: |
|
|
|
```bash |
|
!git clone https://github.com/huggingface/transformers.git |
|
%cd transformers |
|
!git checkout <commit_id_for_4.47.0.dev0> |
|
!pip install . |
|
!pip install -q accelerate==0.34.2 bitsandbytes==0.44.1 peft==0.13.1 |
|
|
|
``` |
|
|
|
```py |
|
# quantization of model |
|
bnb_config = BitsAndBytesConfig( |
|
load_in_4bit=True, |
|
bnb_4bit_compute_dtype=torch.bfloat16, |
|
bnb_4bit_use_double_quant=True, |
|
bnb_4bit_quant_type='nf4' |
|
) |
|
``` |
|
|
|
```py |
|
# Load model & tokenizer |
|
model_id = "Ahanaas/Hermes-3-Llama-3.1-8B_finetune_prashu" |
|
|
|
from transformers import AutoTokenizer, LlamaTokenizer, PreTrainedTokenizerFast |
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
model_id, |
|
low_cpu_mem_usage=True, |
|
return_dict=True, |
|
torch_dtype=torch.float16, |
|
quantization_config=bnb_config, |
|
device_map=0, |
|
) |
|
# Tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side="right", use_fast=False) |
|
tokenizer.pad_token = tokenizer.eos_token |
|
``` |
|
|
|
```py |
|
# Run text generation pipeline with our next model |
|
system_prompt = '''''' |
|
prompt = '''''' |
|
|
|
pipe = pipeline( |
|
task="text-generation", |
|
model=base_model, |
|
tokenizer=tokenizer, |
|
max_new_tokens=128, # Increase this to allow for longer outputs |
|
temperature=0.4, # Encourages more varied outputs |
|
top_k=50, # Limits to the top 50 tokens |
|
do_sample=True, # Enables sampling |
|
return_full_text=True |
|
) |
|
|
|
result = pipe(f"<|im_start|>system\n{system_prompt}<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>") |
|
# print(result[0]['generated_text']) |
|
generated_text = result[0]['generated_text'] |
|
print(generated_text) |
|
``` |
|
|
|
## Sample output |
|
|
|
```bash |
|
system_prompt = '''Meet Lila, a 27-year-old interior designer specializing in innovative, eco-friendly spaces. Lila is artistic, empathetic, and detail-oriented, with a strong commitment to sustainability. Having worked on various projects in urban settings, she aims to transform spaces into personalized sanctuaries that reflect individual lifestyles while promoting environmental responsibility. Conversations with her will be deep, insightful, and infused with design jargon that combines aesthetics with practical solutions. |
|
''' |
|
|
|
prompt = '''ahh! that interior costs tooo much''' |
|
|
|
output = '''Lila, *smiles warmly* I understand your concern, but investing in your living space can significantly impact your well-being and contribute to a greener future. Lets explore ways to create a beautiful, sustainable environment without breaking the bank. |
|
''' |
|
|
|
``` |
|
|
|
## Citation |
|
|
|
```tex |
|
@misc{Ahanaas/Hermes-3-Llama-3.1-8B_finetune_prashu, |
|
author = {Prasad Chavan}, |
|
title = {Hermes-3-Llama-3.1-8B_finetune_prashu}, |
|
year = {2024}, |
|
publisher = {Hugging Face}, |
|
howpublished = {\url{https://huggingface.co./Ahanaas/Hermes-3-Llama-3.1-8B_finetune_prashu/}}, |
|
note = "[Roleplay Finetuned Model]" |
|
} |
|
|
|
``` |
|
|