saemzzang's picture
Update README.md
2d37693 verified
---
base_model: yanolja/EEVE-Korean-Instruct-10.8B-v1.0
language:
- en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- gguf
---
# Uploaded model
- **Developed by:** saemzzang
- **License:** apache-2.0
- **Finetuned from model :** yanolja/EEVE-Korean-Instruct-10.8B-v1.0
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
```
model = FastLanguageModel.get_peft_model(
model,
r=8, # 0보닀 큰 μ–΄λ–€ μˆ«μžλ„ 선택 κ°€λŠ₯! 8, 16, 32, 64, 128이 ꢌμž₯λ©λ‹ˆλ‹€.
lora_alpha=16, # LoRA μ•ŒνŒŒ 값을 μ„€μ •ν•©λ‹ˆλ‹€.
lora_dropout=0.05, # λ“œλ‘­μ•„μ›ƒμ„ μ§€μ›ν•©λ‹ˆλ‹€.
target_modules=[
"q_proj",
"k_proj",
"v_proj",
"o_proj",
"gate_proj",
"up_proj",
"down_proj",
], # νƒ€κ²Ÿ λͺ¨λ“ˆμ„ μ§€μ •ν•©λ‹ˆλ‹€.
bias="none", # λ°”μ΄μ–΄μŠ€λ₯Ό μ§€μ›ν•©λ‹ˆλ‹€.
# True λ˜λŠ” "unsloth"λ₯Ό μ‚¬μš©ν•˜μ—¬ 맀우 κΈ΄ μ»¨ν…μŠ€νŠΈμ— λŒ€ν•΄ VRAM을 30% 덜 μ‚¬μš©ν•˜κ³ , 2λ°° 더 큰 배치 크기λ₯Ό μ§€μ›ν•©λ‹ˆλ‹€.
use_gradient_checkpointing="unsloth",
random_state=123, # λ‚œμˆ˜ μƒνƒœλ₯Ό μ„€μ •ν•©λ‹ˆλ‹€.
use_rslora=False, # μˆœμœ„ μ•ˆμ •ν™” LoRAλ₯Ό μ§€μ›ν•©λ‹ˆλ‹€.
loftq_config=None, # LoftQλ₯Ό μ§€μ›ν•©λ‹ˆλ‹€.
)
from trl import SFTTrainer
from transformers import TrainingArguments
tokenizer.padding_side = "right" # ν† ν¬λ‚˜μ΄μ €μ˜ νŒ¨λ”©μ„ 였λ₯Έμͺ½μœΌλ‘œ μ„€μ •ν•©λ‹ˆλ‹€.
# SFTTrainerλ₯Ό μ‚¬μš©ν•˜μ—¬ λͺ¨λΈ ν•™μŠ΅ μ„€μ •
trainer = SFTTrainer(
model=model, # ν•™μŠ΅ν•  λͺ¨λΈ
tokenizer=tokenizer, # ν† ν¬λ‚˜μ΄μ €
train_dataset=dataset, # ν•™μŠ΅ 데이터셋
eval_dataset=dataset,
dataset_text_field="text", # λ°μ΄ν„°μ…‹μ—μ„œ ν…μŠ€νŠΈ ν•„λ“œμ˜ 이름
max_seq_length=max_seq_length, # μ΅œλŒ€ μ‹œν€€μŠ€ 길이
dataset_num_proc=2, # 데이터 μ²˜λ¦¬μ— μ‚¬μš©ν•  ν”„λ‘œμ„ΈμŠ€ 수
packing=False, # 짧은 μ‹œν€€μŠ€μ— λŒ€ν•œ ν•™μŠ΅ 속도λ₯Ό 5λ°° λΉ λ₯΄κ²Œ ν•  수 있음
args=TrainingArguments(
per_device_train_batch_size=2, # 각 λ””λ°”μ΄μŠ€λ‹Ή ν›ˆλ ¨ 배치 크기
gradient_accumulation_steps=4, # κ·Έλž˜λ””μ–ΈνŠΈ λˆ„μ  단계
warmup_steps=5, # μ›œμ—… μŠ€ν… 수
num_train_epochs=3, # ν›ˆλ ¨ 에폭 수
max_steps=120, # μ΅œλŒ€ μŠ€ν… 수
do_eval=True,
evaluation_strategy="steps",
logging_steps=1, # logging μŠ€ν… 수
learning_rate=2e-4, # ν•™μŠ΅λ₯ 
fp16=not torch.cuda.is_bf16_supported(), # fp16 μ‚¬μš© μ—¬λΆ€, bf16이 μ§€μ›λ˜μ§€ μ•ŠλŠ” κ²½μš°μ—λ§Œ μ‚¬μš©
bf16=torch.cuda.is_bf16_supported(), # bf16 μ‚¬μš© μ—¬λΆ€, bf16이 μ§€μ›λ˜λŠ” κ²½μš°μ—λ§Œ μ‚¬μš©
optim="adamw_8bit", # μ΅œμ ν™” μ•Œκ³ λ¦¬μ¦˜
weight_decay=0.01, # κ°€μ€‘μΉ˜ κ°μ†Œ
lr_scheduler_type="cosine", # ν•™μŠ΅λ₯  μŠ€μΌ€μ€„λŸ¬ μœ ν˜•
seed=123, # 랜덀 μ‹œλ“œ
output_dir="outputs", # 좜λ ₯ 디렉토리
),
)
```