saemzzang's picture
Update README.md
2d37693 verified
metadata
base_model: yanolja/EEVE-Korean-Instruct-10.8B-v1.0
language:
  - en
license: apache-2.0
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - llama
  - gguf

Uploaded model

  • Developed by: saemzzang
  • License: apache-2.0
  • Finetuned from model : yanolja/EEVE-Korean-Instruct-10.8B-v1.0

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

model = FastLanguageModel.get_peft_model(
    model,
    r=8,  # 0보닀 큰 μ–΄λ–€ μˆ«μžλ„ 선택 κ°€λŠ₯! 8, 16, 32, 64, 128이 ꢌμž₯λ©λ‹ˆλ‹€.
    lora_alpha=16,  # LoRA μ•ŒνŒŒ 값을 μ„€μ •ν•©λ‹ˆλ‹€.
    lora_dropout=0.05,  # λ“œλ‘­μ•„μ›ƒμ„ μ§€μ›ν•©λ‹ˆλ‹€.
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],  # νƒ€κ²Ÿ λͺ¨λ“ˆμ„ μ§€μ •ν•©λ‹ˆλ‹€.
    bias="none",  # λ°”μ΄μ–΄μŠ€λ₯Ό μ§€μ›ν•©λ‹ˆλ‹€.
    # True λ˜λŠ” "unsloth"λ₯Ό μ‚¬μš©ν•˜μ—¬ 맀우 κΈ΄ μ»¨ν…μŠ€νŠΈμ— λŒ€ν•΄ VRAM을 30% 덜 μ‚¬μš©ν•˜κ³ , 2λ°° 더 큰 배치 크기λ₯Ό μ§€μ›ν•©λ‹ˆλ‹€.
    use_gradient_checkpointing="unsloth",
    random_state=123,  # λ‚œμˆ˜ μƒνƒœλ₯Ό μ„€μ •ν•©λ‹ˆλ‹€.
    use_rslora=False,  # μˆœμœ„ μ•ˆμ •ν™” LoRAλ₯Ό μ§€μ›ν•©λ‹ˆλ‹€.
    loftq_config=None,  # LoftQλ₯Ό μ§€μ›ν•©λ‹ˆλ‹€.
)

from trl import SFTTrainer
from transformers import TrainingArguments

tokenizer.padding_side = "right"  # ν† ν¬λ‚˜μ΄μ €μ˜ νŒ¨λ”©μ„ 였λ₯Έμͺ½μœΌλ‘œ μ„€μ •ν•©λ‹ˆλ‹€.

# SFTTrainerλ₯Ό μ‚¬μš©ν•˜μ—¬ λͺ¨λΈ ν•™μŠ΅ μ„€μ •
trainer = SFTTrainer(
    model=model,  # ν•™μŠ΅ν•  λͺ¨λΈ
    tokenizer=tokenizer,  # ν† ν¬λ‚˜μ΄μ €
    train_dataset=dataset,  # ν•™μŠ΅ 데이터셋
    eval_dataset=dataset,
    dataset_text_field="text",  # λ°μ΄ν„°μ…‹μ—μ„œ ν…μŠ€νŠΈ ν•„λ“œμ˜ 이름
    max_seq_length=max_seq_length,  # μ΅œλŒ€ μ‹œν€€μŠ€ 길이
    dataset_num_proc=2,  # 데이터 μ²˜λ¦¬μ— μ‚¬μš©ν•  ν”„λ‘œμ„ΈμŠ€ 수
    packing=False,  # 짧은 μ‹œν€€μŠ€μ— λŒ€ν•œ ν•™μŠ΅ 속도λ₯Ό 5λ°° λΉ λ₯΄κ²Œ ν•  수 있음
    args=TrainingArguments(
        per_device_train_batch_size=2,  # 각 λ””λ°”μ΄μŠ€λ‹Ή ν›ˆλ ¨ 배치 크기
        gradient_accumulation_steps=4,  # κ·Έλž˜λ””μ–ΈνŠΈ λˆ„μ  단계
        warmup_steps=5,  # μ›œμ—… μŠ€ν… 수
        num_train_epochs=3,  # ν›ˆλ ¨ 에폭 수
        max_steps=120,  # μ΅œλŒ€ μŠ€ν… 수
        do_eval=True,
        evaluation_strategy="steps",
        logging_steps=1,  # logging μŠ€ν… 수
        learning_rate=2e-4,  # ν•™μŠ΅λ₯ 
        fp16=not torch.cuda.is_bf16_supported(),  # fp16 μ‚¬μš© μ—¬λΆ€, bf16이 μ§€μ›λ˜μ§€ μ•ŠλŠ” κ²½μš°μ—λ§Œ μ‚¬μš©
        bf16=torch.cuda.is_bf16_supported(),  # bf16 μ‚¬μš© μ—¬λΆ€, bf16이 μ§€μ›λ˜λŠ” κ²½μš°μ—λ§Œ μ‚¬μš©
        optim="adamw_8bit",  # μ΅œμ ν™” μ•Œκ³ λ¦¬μ¦˜
        weight_decay=0.01,  # κ°€μ€‘μΉ˜ κ°μ†Œ
        lr_scheduler_type="cosine",  # ν•™μŠ΅λ₯  μŠ€μΌ€μ€„λŸ¬ μœ ν˜•
        seed=123,  # 랜덀 μ‹œλ“œ
        output_dir="outputs",  # 좜λ ₯ 디렉토리
    ),
)