Uploaded model
- Developed by: alibidaran
- License: apache-2.0
- Finetuned from model : unsloth/llama-3-8b-bnb-4bit
This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
Direct usage
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage.
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "alibidaran/LLAMA3_Mental_Health_Cosulting",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
prompt="""
I have many issues to address, my firends leave me alone becuase my hobbies are radically different from them
but I want to stay in touch with but they ignore me even they don't invite me to their parties, what do you recommend? """
instructions=f"<s>[INST] {prompt} [/INST]"
inputs = tokenizer(
[
instructions
], return_tensors = "pt").to("cuda")
with torch.no_grad():
outputs=model.generate(**inputs,max_new_tokens=500,do_sample=True,top_p=0.95,top_k=10,temperature=0.5)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Model tree for alibidaran/LLAMA3_Mental_Health_Consulting
Base model
meta-llama/Meta-Llama-3-8B
Quantized
unsloth/llama-3-8b-bnb-4bit