--- language: - en license: apache-2.0 datasets: - tatsu-lab/alpaca tags: - text-generation-inference - transformers - unsloth - llama - trl - sft base_model: unsloth/llama-3-8b-Instruct-bnb-4bit --- ## lxyuan/llama-3-8b-Instruct-lora-merged **Model Description**: Finetuned the [Llama-3-8B-Instruct Model](https://huggingface.co./unsloth/llama-3-8b-Instruct-bnb-4bit) using [unsloth](https://github.com/unslothai/unsloth) on [Alpaca Dataset](https://huggingface.co./datasets/tatsu-lab/alpaca) for 1000 steps. - **Developed by:** lxyuan - **License:** apache-2.0 - **Finetuned from model :** unsloth/llama-3-8b-Instruct-bnb-4bit - **Finetuned from model :** tatsu-lab/alpaca ## Installation ```python import torch major_version, minor_version = torch.cuda.get_device_capability() # Must install separately since Colab has torch 2.2.1, which breaks packages !pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" if major_version >= 8: # Use this for new GPUs like Ampere, Hopper GPUs (RTX 30xx, RTX 40xx, A100, H100, L40) !pip install --no-deps packaging ninja einops flash-attn xformers trl peft accelerate bitsandbytes else: # Use this for older GPUs (V100, Tesla T4, RTX 20xx) !pip install --no-deps xformers trl peft accelerate bitsandbytes ``` ## Inference example ```python from transformers import pipeline from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( model_name = "lxyuan/llama-3-8b-Instruct-lora-merged", dtype = None, # auto detect load_in_4bit = True, # default is True ) FastLanguageModel.for_inference(model) # Enable native 2x faster inference pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer) messages = [ {"role": "system", "content": "You are helpful AI bot that follows instruction to complete task."}, {"role": "user", "content": "Write me 10 sentences that end with 'apple"}, ] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) terminators = [ tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>") ] outputs = pipeline( prompt, max_new_tokens=512, eos_token_id=terminators, do_sample=True, temperature=0.6, top_p=0.9, ) print(outputs[0]["generated_text"]) ``` #### Inference Output ```markdown <|begin_of_text|><|start_header_id|>system<|end_header_id|> You are helpful AI bot that follows instruction to complete task.<|eot_id|><|start_header_id|>user<|end_header_id|> Write me 10 sentences that end with 'apple<|eot_id|><|start_header_id|>assistant<|end_header_id|> Here are 10 sentences that end with the word "apple": 1. The farmer grew a juicy red apple. 2. She ate a crunchy green apple. 3. The tree bore a ripe yellow apple. 4. He bit into a sweet Granny Smith apple. 5. The basket was filled with fresh apples. 6. The juice was squeezed from a ripe red apple. 7. She picked a perfect autumn apple. 8. The pie was filled with tender Granny Smith apple. 9. The farmer's market sold a variety of apples. 10. The snack was a crisp, juicy apple. ``` ## Training procedure - [Finetuning notebook](https://github.com/LxYuan0420/nlp/blob/main/notebooks/Lora_finetuning_Llama_3_8b_Instruct_with_Alpaca.ipynb) - [Original Notebook from unsloth](https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing#scrollTo=MKX_XKs_BNZR)