lxyuan
/

llama-3-8b-Instruct-lora-merged

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

llama-3-8b-Instruct-lora-merged / README.md

lxyuan's picture

Update README.md

fbe6327 verified 10 months ago

|

history blame contribute delete

3.41 kB

	---
	language:
	- en
	license: apache-2.0
	datasets:
	- tatsu-lab/alpaca
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- llama
	- trl
	- sft
	base_model: unsloth/llama-3-8b-Instruct-bnb-4bit
	---

	## lxyuan/llama-3-8b-Instruct-lora-merged

	Model Description: Finetuned the [Llama-3-8B-Instruct Model](https://huggingface.co./unsloth/llama-3-8b-Instruct-bnb-4bit) using [unsloth](https://github.com/unslothai/unsloth)
	on [Alpaca Dataset](https://huggingface.co./datasets/tatsu-lab/alpaca) for 1000 steps.


	- Developed by: lxyuan
	- License: apache-2.0
	- Finetuned from model : unsloth/llama-3-8b-Instruct-bnb-4bit
	- Finetuned from model : tatsu-lab/alpaca

	## Installation

	```python
	import torch

	major_version, minor_version = torch.cuda.get_device_capability()

	# Must install separately since Colab has torch 2.2.1, which breaks packages
	!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

	if major_version >= 8:
	# Use this for new GPUs like Ampere, Hopper GPUs (RTX 30xx, RTX 40xx, A100, H100, L40)
	!pip install --no-deps packaging ninja einops flash-attn xformers trl peft accelerate bitsandbytes
	else:
	# Use this for older GPUs (V100, Tesla T4, RTX 20xx)
	!pip install --no-deps xformers trl peft accelerate bitsandbytes
	```

	## Inference example

	```python
	from transformers import pipeline
	from unsloth import FastLanguageModel

	model, tokenizer = FastLanguageModel.from_pretrained(
	model_name = "lxyuan/llama-3-8b-Instruct-lora-merged",
	dtype = None, # auto detect
	load_in_4bit = True, # default is True
	)

	FastLanguageModel.for_inference(model) # Enable native 2x faster inference

	pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer)

	messages = [
	{"role": "system", "content": "You are helpful AI bot that follows instruction to complete task."},
	{"role": "user", "content": "Write me 10 sentences that end with 'apple"},
	]

	prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

	terminators = [
	tokenizer.eos_token_id,
	tokenizer.convert_tokens_to_ids("<\|eot_id\|>")
	]

	outputs = pipeline(
	prompt,
	max_new_tokens=512,
	eos_token_id=terminators,
	do_sample=True,
	temperature=0.6,
	top_p=0.9,
	)

	print(outputs[0]["generated_text"])
	```

	#### Inference Output
	```markdown
	<\|begin_of_text\|><\|start_header_id\|>system<\|end_header_id\|>

	You are helpful AI bot that follows instruction to complete task.<\|eot_id\|><\|start_header_id\|>user<\|end_header_id\|>

	Write me 10 sentences that end with 'apple<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>

	Here are 10 sentences that end with the word "apple":

	1. The farmer grew a juicy red apple.
	2. She ate a crunchy green apple.
	3. The tree bore a ripe yellow apple.
	4. He bit into a sweet Granny Smith apple.
	5. The basket was filled with fresh apples.
	6. The juice was squeezed from a ripe red apple.
	7. She picked a perfect autumn apple.
	8. The pie was filled with tender Granny Smith apple.
	9. The farmer's market sold a variety of apples.
	10. The snack was a crisp, juicy apple.
	```

	## Training procedure

	- [Finetuning notebook](https://github.com/LxYuan0420/nlp/blob/main/notebooks/Lora_finetuning_Llama_3_8b_Instruct_with_Alpaca.ipynb)
	- [Original Notebook from unsloth](https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing#scrollTo=MKX_XKs_BNZR)