Uploaded model
- Developed by: waylandzhang
- License: apache-2.0
- Finetuned from model : unsloth/llama-3-8b-bnb-4bit
Teaching purpose model。 这个model只是配合我视频教学目的 :D
QLoRA (4bit)
Params to replicate training
Peft Config
r=8,
target_modules=[
"q_proj",
"k_proj",
"v_proj",
"o_proj",
"gate_proj",
"up_proj",
"down_proj",
],
lora_alpha=16,
lora_dropout=0,
bias="none",
random_state=3407,
use_rslora=False, # Rank stabilized LoRA
loftq_config=None, # LoftQ
Training args
per_device_train_batch_size=2,
per_device_eval_batch_size=2,
gradient_accumulation_steps=4, # set to 4 to avoid issues with GPTQ Quantization
warmup_steps=5,
max_steps=300, # Fine-tune iterations
learning_rate=2e-4,
fp16=not torch.cuda.is_bf16_supported(),
bf16=torch.cuda.is_bf16_supported(),
evaluation_strategy="steps",
prediction_loss_only=True,
eval_accumulation_steps=1,
eval_steps=10,
logging_steps=1,
optim="adamw_8bit",
weight_decay=0.01,
lr_scheduler_type="cosine", # instead of "linear"
seed=1337,
output_dir="wayland-files/models",
report_to="wandb", # Log report to W&B
Interernce Code
from unsloth import FastLanguageModel
import os
import torch
max_seq_length = 4096 # 2048
dtype = None
load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="waylandzhang/Llama-3-8b-Chinese-Novel-4bit-lesson-v0.1",
max_seq_length=max_seq_length,
dtype=dtype,
load_in_4bit=load_in_4bit,
device_map="cuda",
attn_implementation="flash_attention_2"
)
FastLanguageModel.for_inference(model) # 使用unsloth的推理模式可以加快2倍速度
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{}
### Input:
{}
### Response:
{}"""
inputs = tokenizer(
[
alpaca_prompt.format(
"给你一段话,帮我继续写下去。", # 任务指令
"小明在西安城墙上", # 用户指令
"", # output - 留空以自动生成 / 不留空以填充
)
], return_tensors="pt").to("cuda")
# Opt 1: 文本生成输出
# outputs = model.generate(**inputs, max_new_tokens=500, use_cache=True)
# print(tokenizer.batch_decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True))
# Opt 2: 消息流式输出
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt=True)
_ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=500)
This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
- Downloads last month
- 17
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for waylandzhang/Llama-3-8b-Chinese-Novel-4bit-lesson-v0.1
Unable to build the model tree, the base model loops to the model itself. Learn more.