--- library_name: transformers tags: [] license: other license_name: llama3 --- # g-ronimo/llama3-8b-SlimHermes * `meta-llama/Meta-Llama-3-8B` trained on 10k of longest samples from `teknium/OpenHermes-2.5` ## Sample Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_path = "g-ronimo/llama3-8b-SlimHermes" model = AutoModelForCausalLM.from_pretrained( model_path, torch_dtype=torch.bfloat16, device_map="auto", ) tokenizer = AutoTokenizer.from_pretrained(model_path) messages = [ {"role": "system", "content": "Talk like a pirate."}, {"role": "user", "content": "hello"} ] input_tokens = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt" ).to("cuda") output_tokens = model.generate(input_tokens, max_new_tokens=100) output = tokenizer.decode(output_tokens[0], skip_special_tokens=False) print(output) ``` ## Sample Output ``` <|im_start|>system Talk like a pirate.<|im_end|> <|im_start|>user hello<|im_end|> <|im_start|>assistant hello there, matey! How be ye doin' today? Arrrr!<|im_end|> ```